STAT 29000

Project 3 Solutions

Loading the data

bigDF <- read.csv("/data/public/dataexpo2009/allyears.csv")

Question 1

1a. The percentage of data that is missing from DepTime is

sum(is.na(bigDF$DepTime))/length(bigDF$DepTime)
## [1] 0.0186355

or, expressed as a percentage (i.e., multiplied by 100)

100*sum(is.na(bigDF$DepTime))/length(bigDF$DepTime)
## [1] 1.86355

The percentage of data that is missing from ArrTime is

sum(is.na(bigDF$ArrTime))/length(bigDF$ArrTime)
## [1] 0.02092102

or, expressed as a percentage (i.e., multiplied by 100)

100*sum(is.na(bigDF$ArrTime))/length(bigDF$ArrTime)
## [1] 2.092102

1b. The result of converting “1359” to POSIXlt is:

strptime("1359","%H%M")
## [1] "2015-09-22 13:59:00 EDT"

1c. The result of converting “1360” to POSIXlt is an NA:

strptime("1360","%H%M")
## [1] NA

1d. The vectors, each converted to POSIXlt format, are

myconvertedDepTime <- strptime(sprintf("%04d",bigDF$DepTime),"%H%M")
myconvertedCRSDepTime <- strptime(sprintf("%04d",bigDF$CRSDepTime),"%H%M")
myconvertedArrTime <- strptime(sprintf("%04d",bigDF$ArrTime),"%H%M")
myconvertedCRSArrTime <- strptime(sprintf("%04d",bigDF$CRSArrTime),"%H%M")

There are millions of erroneous data in the DepTime and ArrTime vectors. It all depends on what you treat as erroneous. For instance, here are the numbers of the NA values in the vectors:

sum(is.na(myconvertedDepTime))
## [1] 2305590
sum(is.na(myconvertedCRSDepTime))
## [1] 0
sum(is.na(myconvertedArrTime))
## [1] 2604126
sum(is.na(myconvertedCRSArrTime))
## [1] 1

Question 2

2a. First we make a vector of the delayed flights, i.e., those flights with delay 0 or greater. We would not have to make a separate vector for this, but it makes the solution a little easier to read.

v <- bigDF$DepTime[bigDF$DepTime >= 0]

The number of flights that are delayed, split according to 5 minute intervals, is:

mydelays <- tapply( v, cut(v, breaks=seq(from=0,to=max(v,na.rm=T),by=5), include.lowest=T), length )

The first several groups of delays are:

head(mydelays)
##   [0,5]  (5,10] (10,15] (15,20] (20,25] (25,30] 
##   45162   35692   33958   31458   32361   38726

We probably do not want to look at all of the delays, because many of these regions are NA’s. Here are the ones that have at least one delay.

mydelays[!is.na(mydelays)]
##       [0,5]      (5,10]     (10,15]     (15,20]     (20,25]     (25,30] 
##       45162       35692       33958       31458       32361       38726 
##     (30,35]     (35,40]     (40,45]     (45,50]     (50,55]     (55,60] 
##       35513       38825       37430       36444       33906       23849 
##     (90,95]    (95,100]   (100,105]   (105,110]   (110,115]   (115,120] 
##           1       11162       23083       25344       25439       21214 
##   (120,125]   (125,130]   (130,135]   (135,140]   (140,145]   (145,150] 
##       20346       23274       18178       19550       19708       17385 
##   (150,155]   (155,160]   (195,200]   (200,205]   (205,210]   (210,215] 
##       15432       10344        3920        8219        7336        6017 
##   (215,220]   (220,225]   (225,230]   (230,235]   (235,240]   (240,245] 
##        5602        5015        4152        3171        3062        3229 
##   (245,250]   (250,255]   (255,260]   (295,300]   (300,305]   (305,310] 
##        3077        2637        1849         879        1933        1610 
##   (310,315]   (315,320]   (320,325]   (325,330]   (330,335]   (335,340] 
##        1646        1443        1985        1657        1469         639 
##   (340,345]   (345,350]   (350,355]   (355,360]   (395,400]   (400,405] 
##         679         606         743         746         366         593 
##   (405,410]   (410,415]   (415,420]   (420,425]   (425,430]   (430,435] 
##         804         645         670         576         925        1178 
##   (435,440]   (440,445]   (445,450]   (450,455]   (455,460]   (495,500] 
##        1134        1094        2495        7213       10520        5780 
##   (500,505]   (505,510]   (510,515]   (515,520]   (520,525]   (525,530] 
##       11111       11388       14432       19313       34435       69493 
##   (530,535]   (535,540]   (540,545]   (545,550]   (550,555]   (555,560] 
##       46594       60383       77003      105728      281017      460778 
##   (595,600]   (600,605]   (605,610]   (610,615]   (615,620]   (620,625] 
##      263508      350448      330492      396719      402073      488800 
##   (625,630]   (630,635]   (635,640]   (640,645]   (645,650]   (650,655] 
##      862358      538897      575575      668085      604264      707732 
##   (655,660]   (695,700]   (700,705]   (705,710]   (710,715]   (715,720] 
##      908703      679028      811447      661072      628139      535039 
##   (720,725]   (725,730]   (730,735]   (735,740]   (740,745]   (745,750] 
##      541686      758218      533787      559252      616433      553844 
##   (750,755]   (755,760]   (795,800]   (800,805]   (805,810]   (810,815] 
##      616234      609425      390119      665123      629096      656795 
##   (815,820]   (820,825]   (825,830]   (830,835]   (835,840]   (840,845] 
##      616957      645464      844236      655225      661620      689035 
##   (845,850]   (850,855]   (855,860]   (895,900]   (900,905]   (905,910] 
##      654662      716797      616401      329965      681731      646929 
##   (910,915]   (915,920]   (920,925]   (925,930]   (930,935]   (935,940] 
##      646899      623621      642518      785225      608135      583982 
##   (940,945]   (945,950]   (950,955]   (955,960]  (995,1000] (1000,1005] 
##      578918      542487      581879      496359      275793      613779 
## (1005,1010] (1010,1015] (1015,1020] (1020,1025] (1025,1030] (1030,1035] 
##      608524      608299      563555      576352      689018      545739 
## (1035,1040] (1040,1045] (1045,1050] (1050,1055] (1055,1060] (1095,1100] 
##      529503      549436      546609      578595      485600      289665 
## (1100,1105] (1105,1110] (1110,1115] (1115,1120] (1120,1125] (1125,1130] 
##      611771      607694      594353      574279      597214      722039 
## (1130,1135] (1135,1140] (1140,1145] (1145,1150] (1150,1155] (1155,1160] 
##      578649      585213      606791      592520      637269      531730 
## (1195,1200] (1200,1205] (1205,1210] (1210,1215] (1215,1220] (1220,1225] 
##      302615      651244      629277      640046      613385      617720 
## (1225,1230] (1230,1235] (1235,1240] (1240,1245] (1245,1250] (1250,1255] 
##      738858      607799      597697      636986      610006      632403 
## (1255,1260] (1295,1300] (1300,1305] (1305,1310] (1310,1315] (1315,1320] 
##      528199      283868      642784      638104      646244      636382 
## (1320,1325] (1325,1330] (1330,1335] (1335,1340] (1340,1345] (1345,1350] 
##      667613      828319      675873      643530      644663      580877 
## (1350,1355] (1355,1360] (1395,1400] (1400,1405] (1405,1410] (1410,1415] 
##      581048      459950      245878      529679      529087      554148 
## (1415,1420] (1420,1425] (1425,1430] (1430,1435] (1435,1440] (1440,1445] 
##      545921      555650      684341      587677      580118      614886 
## (1445,1450] (1450,1455] (1455,1460] (1495,1500] (1500,1505] (1505,1510] 
##      591119      591758      490188      273661      595868      590097 
## (1510,1515] (1515,1520] (1520,1525] (1525,1530] (1530,1535] (1535,1540] 
##      623228      592186      595176      711995      589718      577419 
## (1540,1545] (1545,1550] (1550,1555] (1555,1560] (1595,1600] (1600,1605] 
##      590961      560364      589540      504619      289877      620243 
## (1605,1610] (1610,1615] (1615,1620] (1620,1625] (1625,1630] (1630,1635] 
##      593041      598956      557670      530119      659929      545425 
## (1635,1640] (1640,1645] (1645,1650] (1650,1655] (1655,1660] (1695,1700] 
##      537169      585914      604387      645083      539875      327602 
## (1700,1705] (1705,1710] (1710,1715] (1715,1720] (1720,1725] (1725,1730] 
##      672042      664300      679292      667667      660792      782753 
## (1730,1735] (1735,1740] (1740,1745] (1745,1750] (1750,1755] (1755,1760] 
##      643873      623424      644666      618115      640933      504270 
## (1795,1800] (1800,1805] (1805,1810] (1810,1815] (1815,1820] (1820,1825] 
##      273923      605355      571921      566007      527379      526037 
## (1825,1830] (1830,1835] (1835,1840] (1840,1845] (1845,1850] (1850,1855] 
##      653881      577178      615526      656588      665645      718720 
## (1855,1860] (1895,1900] (1900,1905] (1905,1910] (1910,1915] (1915,1920] 
##      545721      271150      645130      616266      586852      544865 
## (1920,1925] (1925,1930] (1930,1935] (1935,1940] (1940,1945] (1945,1950] 
##      532702      622040      516769      497851      489482      473694 
## (1950,1955] (1955,1960] (1995,2000] (2000,2005] (2005,2010] (2010,2015] 
##      510245      391008      218815      489689      489559      504960 
## (2015,2020] (2020,2025] (2025,2030] (2030,2035] (2035,2040] (2040,2045] 
##      482916      483360      571979      476211      475557      487680 
## (2045,2050] (2050,2055] (2055,2060] (2095,2100] (2100,2105] (2105,2110] 
##      458289      450375      322611      181716      394658      383534 
## (2110,2115] (2115,2120] (2120,2125] (2125,2130] (2130,2135] (2135,2140] 
##      381735      345875      328127      370371      314078      309522 
## (2140,2145] (2145,2150] (2150,2155] (2155,2160] (2195,2200] (2200,2205] 
##      315878      298176      301226      225888      111068      251059 
## (2205,2210] (2210,2215] (2215,2220] (2220,2225] (2225,2230] (2230,2235] 
##      237944      224988      205610      194076      201475      153090 
## (2235,2240] (2240,2245] (2245,2250] (2250,2255] (2255,2260] (2295,2300] 
##      139609      141124      135640      127837       97582       44029 
## (2300,2305] (2305,2310] (2310,2315] (2315,2320] (2320,2325] (2325,2330] 
##      102983       98095       92429       76974       75423       85586 
## (2330,2335] (2335,2340] (2340,2345] (2345,2350] (2350,2355] (2355,2360] 
##       68294       70465       79380       75791       76787       52708 
## (2395,2400] (2400,2405] (2405,2410] (2410,2415] (2415,2420] (2420,2425] 
##       12228         397         389         304         294         205 
## (2425,2430] (2430,2435] (2435,2440] (2440,2445] (2445,2450] (2450,2455] 
##         239         191         168         143         140         114 
## (2455,2460] (2495,2500] (2500,2505] (2505,2510] (2510,2515] (2515,2520] 
##          47          55          81          91          80          61 
## (2520,2525] (2525,2530] (2530,2535] (2535,2540] (2540,2545] (2545,2550] 
##          44          51          42          47          35          33 
## (2550,2555] (2555,2560] (2565,2570] (2595,2600] (2600,2605] (2605,2610] 
##          34          13           1          17           8          15 
## (2610,2615] (2615,2620] (2620,2625] (2625,2630] (2630,2635] (2635,2640] 
##          15          10          12          24          10           8 
## (2640,2645] (2645,2650] (2650,2655] (2655,2660] (2665,2670] (2695,2700] 
##           5           6           2           1           1           2 
## (2700,2705] (2705,2710] (2715,2720] (2720,2725] (2725,2730] (2730,2735] 
##           1           5           1           3           1           1 
## (2745,2750] (2750,2755] (2800,2805] (2925,2930] 
##           1           2           1           1

Now we convert to percents, i.e., we divide by the total number of flights altogether.

mydelays[!is.na(mydelays)]/length(bigDF$DepTime)
##        [0,5]       (5,10]      (10,15]      (15,20]      (20,25] 
## 3.655807e-04 2.889222e-04 2.748857e-04 2.546485e-04 2.619582e-04 
##      (25,30]      (30,35]      (35,40]      (40,45]      (45,50] 
## 3.134821e-04 2.874733e-04 3.142835e-04 3.029911e-04 2.950096e-04 
##      (50,55]      (55,60]      (90,95]     (95,100]    (100,105] 
## 2.744648e-04 1.930546e-04 8.094874e-09 9.035498e-05 1.868540e-04 
##    (105,110]    (110,115]    (115,120]    (120,125]    (125,130] 
## 2.051565e-04 2.059255e-04 1.717247e-04 1.646983e-04 1.884001e-04 
##    (130,135]    (135,140]    (140,145]    (145,150]    (150,155] 
## 1.471486e-04 1.582548e-04 1.595338e-04 1.407294e-04 1.249201e-04 
##    (155,160]    (195,200]    (200,205]    (205,210]    (210,215] 
## 8.373338e-05 3.173191e-05 6.653177e-05 5.938400e-05 4.870686e-05 
##    (215,220]    (220,225]    (225,230]    (230,235]    (235,240] 
## 4.534748e-05 4.059579e-05 3.360992e-05 2.566885e-05 2.478650e-05 
##    (240,245]    (245,250]    (250,255]    (255,260]    (295,300] 
## 2.613835e-05 2.490793e-05 2.134618e-05 1.496742e-05 7.115394e-06 
##    (300,305]    (305,310]    (310,315]    (315,320]    (320,325] 
## 1.564739e-05 1.303275e-05 1.332416e-05 1.168090e-05 1.606832e-05 
##    (325,330]    (330,335]    (335,340]    (340,345]    (345,350] 
## 1.341321e-05 1.189137e-05 5.172624e-06 5.496419e-06 4.905494e-06 
##    (350,355]    (355,360]    (395,400]    (400,405]    (405,410] 
## 6.014491e-06 6.038776e-06 2.962724e-06 4.800260e-06 6.508279e-06 
##    (410,415]    (415,420]    (420,425]    (425,430]    (430,435] 
## 5.221194e-06 5.423566e-06 4.662647e-06 7.487758e-06 9.535761e-06 
##    (435,440]    (440,445]    (445,450]    (450,455]    (455,460] 
## 9.179587e-06 8.855792e-06 2.019671e-05 5.838833e-05 8.515807e-05 
##    (495,500]    (500,505]    (505,510]    (510,515]    (515,520] 
## 4.678837e-05 8.994214e-05 9.218442e-05 1.168252e-04 1.563363e-04 
##    (520,525]    (525,530]    (530,535]    (535,540]    (540,545] 
## 2.787470e-04 5.625371e-04 3.771726e-04 4.887928e-04 6.233296e-04 
##    (545,550]    (550,555]    (555,560]    (595,600]    (600,605] 
## 8.558548e-04 2.274797e-03 3.729940e-03 2.133064e-03 2.836832e-03 
##    (605,610]    (610,615]    (615,620]    (620,625]    (625,630] 
## 2.675291e-03 3.211390e-03 3.254730e-03 3.956774e-03 6.980679e-03 
##    (630,635]    (635,640]    (640,645]    (645,650]    (650,655] 
## 4.362303e-03 4.659207e-03 5.408064e-03 4.891441e-03 5.729001e-03 
##    (655,660]    (695,700]    (700,705]    (705,710]    (710,715] 
## 7.355836e-03 5.496646e-03 6.568561e-03 5.351294e-03 5.084706e-03 
##    (715,720]    (720,725]    (725,730]    (730,735]    (735,740] 
## 4.331073e-03 4.384880e-03 6.137679e-03 4.320938e-03 4.527074e-03 
##    (740,745]    (745,750]    (750,755]    (755,760]    (795,800] 
## 4.989947e-03 4.483297e-03 4.988337e-03 4.933219e-03 3.157964e-03 
##    (800,805]    (805,810]    (810,815]    (815,820]    (820,825] 
## 5.384087e-03 5.092453e-03 5.316673e-03 4.994189e-03 5.224950e-03 
##    (825,830]    (830,835]    (835,840]    (840,845]    (845,850] 
## 6.833984e-03 5.303964e-03 5.355730e-03 5.577651e-03 5.299406e-03 
##    (850,855]    (855,860]    (895,900]    (900,905]    (905,910] 
## 5.802381e-03 4.989688e-03 2.671025e-03 5.518526e-03 5.236809e-03 
##    (910,915]    (915,920]    (920,925]    (925,930]    (930,935] 
## 5.236566e-03 5.048133e-03 5.201102e-03 6.356297e-03 4.922776e-03 
##    (935,940]    (940,945]    (945,950]    (950,955]    (955,960] 
## 4.727261e-03 4.686268e-03 4.391364e-03 4.710237e-03 4.017964e-03 
##   (995,1000]  (1000,1005]  (1005,1010]  (1010,1015]  (1015,1020] 
## 2.232510e-03 4.968464e-03 4.925925e-03 4.924104e-03 4.561907e-03 
##  (1020,1025]  (1025,1030]  (1030,1035]  (1035,1040]  (1040,1045] 
## 4.665497e-03 5.577514e-03 4.417688e-03 4.286260e-03 4.447615e-03 
##  (1045,1050]  (1050,1055]  (1055,1060]  (1095,1100]  (1100,1105] 
## 4.424731e-03 4.683654e-03 3.930871e-03 2.344802e-03 4.952209e-03 
##  (1105,1110]  (1110,1115]  (1115,1120]  (1120,1125]  (1125,1130] 
## 4.919206e-03 4.811213e-03 4.648716e-03 4.834372e-03 5.844815e-03 
##  (1130,1135]  (1135,1140]  (1140,1145]  (1145,1150]  (1150,1155] 
## 4.684091e-03 4.737225e-03 4.911897e-03 4.796375e-03 5.158612e-03 
##  (1155,1160]  (1195,1200]  (1200,1205]  (1205,1210]  (1210,1215] 
## 4.304287e-03 2.449630e-03 5.271738e-03 5.093918e-03 5.181092e-03 
##  (1215,1220]  (1220,1225]  (1225,1230]  (1230,1235]  (1235,1240] 
## 4.965274e-03 5.000366e-03 5.980962e-03 4.920056e-03 4.838282e-03 
##  (1240,1245]  (1245,1250]  (1250,1255]  (1255,1260]  (1295,1300] 
## 5.156321e-03 4.937922e-03 5.119223e-03 4.275704e-03 2.297876e-03 
##  (1300,1305]  (1305,1310]  (1310,1315]  (1315,1320]  (1320,1325] 
## 5.203255e-03 5.165371e-03 5.231264e-03 5.151432e-03 5.404243e-03 
##  (1325,1330]  (1330,1335]  (1335,1340]  (1340,1345]  (1345,1350] 
## 6.705138e-03 5.471107e-03 5.209294e-03 5.218466e-03 4.702126e-03 
##  (1350,1355]  (1355,1360]  (1395,1400]  (1400,1405]  (1405,1410] 
## 4.703510e-03 3.723237e-03 1.990351e-03 4.287685e-03 4.282893e-03 
##  (1410,1415]  (1415,1420]  (1420,1425]  (1425,1430]  (1430,1435] 
## 4.485758e-03 4.419162e-03 4.497917e-03 5.539654e-03 4.757171e-03 
##  (1435,1440]  (1440,1445]  (1445,1450]  (1450,1455]  (1455,1460] 
## 4.695982e-03 4.977425e-03 4.785034e-03 4.790206e-03 3.968010e-03 
##  (1495,1500]  (1500,1505]  (1505,1510]  (1510,1515]  (1515,1520] 
## 2.215251e-03 4.823476e-03 4.776761e-03 5.044952e-03 4.793671e-03 
##  (1520,1525]  (1525,1530]  (1530,1535]  (1535,1540]  (1540,1545] 
## 4.817875e-03 5.763510e-03 4.773693e-03 4.674134e-03 4.783755e-03 
##  (1545,1550]  (1550,1555]  (1555,1560]  (1595,1600]  (1600,1605] 
## 4.536076e-03 4.772252e-03 4.084827e-03 2.346518e-03 5.020789e-03 
##  (1605,1610]  (1610,1615]  (1615,1620]  (1620,1625]  (1625,1630] 
## 4.800592e-03 4.848473e-03 4.514268e-03 4.291246e-03 5.342042e-03 
##  (1630,1635]  (1635,1640]  (1640,1645]  (1645,1650]  (1650,1655] 
## 4.415147e-03 4.348315e-03 4.742900e-03 4.892437e-03 5.221866e-03 
##  (1655,1660]  (1695,1700]  (1700,1705]  (1705,1710]  (1710,1715] 
## 4.370220e-03 2.651897e-03 5.440095e-03 5.377425e-03 5.498783e-03 
##  (1715,1720]  (1720,1725]  (1725,1730]  (1730,1735]  (1735,1740] 
## 5.404680e-03 5.349028e-03 6.336287e-03 5.212071e-03 5.046539e-03 
##  (1740,1745]  (1745,1750]  (1750,1755]  (1755,1760]  (1795,1800] 
## 5.218490e-03 5.003563e-03 5.188272e-03 4.082002e-03 2.217372e-03 
##  (1800,1805]  (1805,1810]  (1810,1815]  (1815,1820]  (1820,1825] 
## 4.900272e-03 4.629628e-03 4.581755e-03 4.269067e-03 4.258203e-03 
##  (1825,1830]  (1830,1835]  (1835,1840]  (1840,1845]  (1845,1850] 
## 5.293084e-03 4.672183e-03 4.982605e-03 5.314997e-03 5.388312e-03 
##  (1850,1855]  (1855,1860]  (1895,1900]  (1900,1905]  (1905,1910] 
## 5.817948e-03 4.417543e-03 2.194925e-03 5.222246e-03 4.988596e-03 
##  (1910,1915]  (1915,1920]  (1920,1925]  (1925,1930]  (1930,1935] 
## 4.750493e-03 4.410613e-03 4.312156e-03 5.035335e-03 4.183180e-03 
##  (1935,1940]  (1940,1945]  (1945,1950]  (1950,1955]  (1955,1960] 
## 4.030041e-03 3.962295e-03 3.834493e-03 4.130369e-03 3.165160e-03 
##  (1995,2000]  (2000,2005]  (2005,2010]  (2010,2015]  (2015,2020] 
## 1.771280e-03 3.963971e-03 3.962918e-03 4.087588e-03 3.909144e-03 
##  (2020,2025]  (2025,2030]  (2030,2035]  (2035,2040]  (2040,2045] 
## 3.912738e-03 4.630098e-03 3.854868e-03 3.849574e-03 3.947708e-03 
##  (2045,2050]  (2050,2055]  (2055,2060]  (2095,2100]  (2100,2105] 
## 3.709792e-03 3.645729e-03 2.611495e-03 1.470968e-03 3.194707e-03 
##  (2105,2110]  (2110,2115]  (2115,2120]  (2120,2125]  (2125,2130] 
## 3.104659e-03 3.090097e-03 2.799815e-03 2.656147e-03 2.998107e-03 
##  (2130,2135]  (2135,2140]  (2140,2145]  (2145,2150]  (2150,2155] 
## 2.542422e-03 2.505542e-03 2.556993e-03 2.413697e-03 2.438386e-03 
##  (2155,2160]  (2195,2200]  (2200,2205]  (2205,2210]  (2210,2215] 
## 1.828535e-03 8.990815e-04 2.032291e-03 1.926127e-03 1.821249e-03 
##  (2215,2220]  (2220,2225]  (2225,2230]  (2230,2235]  (2235,2240] 
## 1.664387e-03 1.571021e-03 1.630915e-03 1.239244e-03 1.130117e-03 
##  (2240,2245]  (2245,2250]  (2250,2255]  (2255,2260]  (2295,2300] 
## 1.142381e-03 1.097989e-03 1.034824e-03 7.899140e-04 3.564092e-04 
##  (2300,2305]  (2305,2310]  (2310,2315]  (2315,2320]  (2320,2325] 
## 8.336344e-04 7.940667e-04 7.482011e-04 6.230948e-04 6.105397e-04 
##  (2325,2330]  (2330,2335]  (2335,2340]  (2340,2345]  (2345,2350] 
## 6.928079e-04 5.528313e-04 5.704053e-04 6.425711e-04 6.135186e-04 
##  (2350,2355]  (2355,2360]  (2395,2400]  (2400,2405]  (2405,2410] 
## 6.215811e-04 4.266646e-04 9.898412e-05 3.213665e-06 3.148906e-06 
##  (2410,2415]  (2415,2420]  (2420,2425]  (2425,2430]  (2430,2435] 
## 2.460842e-06 2.379893e-06 1.659449e-06 1.934675e-06 1.546121e-06 
##  (2435,2440]  (2440,2445]  (2445,2450]  (2450,2455]  (2455,2460] 
## 1.359939e-06 1.157567e-06 1.133282e-06 9.228156e-07 3.804591e-07 
##  (2495,2500]  (2500,2505]  (2505,2510]  (2510,2515]  (2515,2520] 
## 4.452181e-07 6.556848e-07 7.366335e-07 6.475899e-07 4.937873e-07 
##  (2520,2525]  (2525,2530]  (2530,2535]  (2535,2540]  (2540,2545] 
## 3.561745e-07 4.128386e-07 3.399847e-07 3.804591e-07 2.833206e-07 
##  (2545,2550]  (2550,2555]  (2555,2560]  (2565,2570]  (2595,2600] 
## 2.671308e-07 2.752257e-07 1.052334e-07 8.094874e-09 1.376129e-07 
##  (2600,2605]  (2605,2610]  (2610,2615]  (2615,2620]  (2620,2625] 
## 6.475899e-08 1.214231e-07 1.214231e-07 8.094874e-08 9.713849e-08 
##  (2625,2630]  (2630,2635]  (2635,2640]  (2640,2645]  (2645,2650] 
## 1.942770e-07 8.094874e-08 6.475899e-08 4.047437e-08 4.856924e-08 
##  (2650,2655]  (2655,2660]  (2665,2670]  (2695,2700]  (2700,2705] 
## 1.618975e-08 8.094874e-09 8.094874e-09 1.618975e-08 8.094874e-09 
##  (2705,2710]  (2715,2720]  (2720,2725]  (2725,2730]  (2730,2735] 
## 4.047437e-08 8.094874e-09 2.428462e-08 8.094874e-09 8.094874e-09 
##  (2745,2750]  (2750,2755]  (2800,2805]  (2925,2930] 
## 8.094874e-09 1.618975e-08 8.094874e-09 8.094874e-09

From this point of view, the delays do not look so bad.

2b. The delay, plotted in boxplots, according to the day of the week, is given as follows. (As I suggested, I am only plotting every 10000th value, so that my plot renders in a reasonable time in R.)

boxplot(bigDF$DepDelay[seq(1,length(bigDF$DepDelay),by=1000)]
        ~ bigDF$DayOfWeek[seq(1,length(bigDF$DayOfWeek),by=1000)],
        xlab="Day of Week", ylab="Delay Time (in Minutes)",
        names=c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))

Question 3

3a. We first extract the departure delays that are bigger than 0, along with the corresponding months and years.

mydelays <- bigDF$DepDelay[bigDF$DepDelay > 0]
mymonths <- bigDF$Month[bigDF$DepDelay > 0]
myyears <- bigDF$Year[bigDF$DepDelay > 0]

Then we make the required table my splitting the delays up, according to the years and months, and taking the length of each group.

tapply(mydelays, list(myyears,mymonths), length)
##           1      2      3      4      5      6      7      8      9     10
## 1987     NA     NA     NA     NA     NA     NA     NA     NA     NA 175568
## 1988 198610 177939 187141 159216 164107 165596 174844 175591 138322 162211
## 1989 178161 181324 204720 157890 170654 201395 187426 203535 154504 173312
## 1990 192521 184949 205043 174695 174125 188768 186310 210335 160184 182707
## 1991 203500 160719 185433 178541 177146 178767 184046 191689 146535 174082
## 1992 178973 168341 193367 163937 174097 207596 217399 220465 171697 171109
## 1993 202742 185820 214420 185850 167812 189905 187657 193809 162196 181722
## 1994 215791 184811 195139 181870 170538 201658 222409 207187 170182 190085
## 1995 239816 199741 225147 205346 207197 229587 219451 226650 170373 208243
## 1996 242239 217413 226970 195384 202024 227273 222170 228420 177621 204564
## 1997 230529 189105 212869 189348 178435 208492 205474 205627 141879 172938
## 1998 183744 169230 202308 183799 190386 223474 203528 195213 141190 163813
## 1999 203215 153464 184895 181376 176281 209395 217703 195486 149983 175296
## 2000 175940 172394 192111 187660 195957 238751 229929 229822 164704 189149
## 2001 203813 191111 217148 187364 175053 212220 210760 213701 127047 166568
## 2002 150187 130155 176699 142233 143028 171039 174338 156760  99776 128181
## 2003 151238 158369 152156 125699 136551 163497 183491 178979 113916 131409
## 2004 198818 183658 183273 170114 191604 238074 237670 215667 147508 193951
## 2005 229809 184920 226883 169221 178327 236724 268988 240410 165541 186778
## 2006 197789 198371 235207 212412 218097 263900 281457 254405 209985 248878
## 2007 255777 259288 276261 249097 241699 307986 307864 298530 195615 231129
## 2008 247948 252765 271969 220864 220614 271014 253632 231349 147061 162531
##          11     12
## 1987 177218 218858
## 1988 175123 189137
## 1989 176805 213745
## 1990 173768 218597
## 1991 167768 203388
## 1992 185240 224848
## 1993 188021 208364
## 1994 213819 240057
## 1995 218360 252739
## 1996 188032 253297
## 1997 176547 216099
## 1998 148623 210366
## 1999 164648 189462
## 2000 207426 245640
## 2001 152375 177983
## 2002 117199 165138
## 2003 157157 206743
## 2004 197560 254786
## 2005 193399 256861
## 2006 230224 274930
## 2007 217557 304011
## 2008 157278 263949

3b. We first extract the carriers for the flights with positive DepDelays.

mycarriers <- bigDF$UniqueCarrier[bigDF$DepDelay > 0]

Then we find the 5 carriers with the most delays, using table:

sort(table(mycarriers),decreasing=TRUE)[1:5]
## mycarriers
##      DL      US      WN      AA      UA 
## 8064705 6771312 6264617 6064229 6005036

Question 4

4a. We read in the data to a data frame called airports.

airports <- read.csv("http://stat-computing.org/dataexpo/2009/airports.csv")

4b. First we find the number of flights into each airport, and then we use the IATA data from the airports as an index into this table, to select the airports that we want. I am using airport$iata as a vector, to ensure that they stay in the right order.

airports$freq <- table(bigDF$Dest)[as.character(airports$iata)] + table(bigDF$Origin)[as.character(airports$iata)]

If you want to check and see that the results look promising, for instance, you could check the lines for Los Angeles and Chicago:

airports[airports$iata %in% c("LAX","ORD") , ]
##      iata                      airport        city state country      lat
## 2040  LAX    Los Angeles International Los Angeles    CA     USA 33.94254
## 2532  ORD Chicago O'Hare International     Chicago    IL     USA 41.97960
##            long     freq
## 2040 -118.40807  8175942
## 2532  -87.90446 13235477

4c. We paste together the departure to arrival paths.

v <- paste(bigDF$Origin, "to", bigDF$Dest, sep="")

Then we tabulate the results, sort them, and select the most popular 5 of them.

sort(table(v),decreasing=TRUE)[1:5]
## v
## SFOtoLAX LAXtoSFO LAXtoLAS LAStoLAX PHXtoLAX 
##   338472   336938   292125   286328   279716

4d. We again use the vector of departure to arrival paths above. Then we split the paths according to the year, and within each year, we tabulate the results and take the largest value within the year. Then we get the name of this largest value

tapply( v, bigDF$Year, function(x) names(sort(table(x),decreasing=TRUE)[1]) )
##       1987       1988       1989       1990       1991       1992 
## "SFOtoLAX" "LAXtoSFO" "LAXtoSFO" "SFOtoLAX" "SFOtoLAX" "LAXtoSFO" 
##       1993       1994       1995       1996       1997       1998 
## "LAXtoSFO" "SFOtoLAX" "LAXtoLAS" "LAXtoLAS" "LAXtoSFO" "LAXtoLAS" 
##       1999       2000       2001       2002       2003       2004 
## "LAXtoSFO" "LAXtoLAS" "LAXtoLAS" "LAXtoLAS" "SANtoLAX" "SANtoLAX" 
##       2005       2006       2007       2008 
## "SANtoLAX" "SANtoLAX" "OGGtoHNL" "SFOtoLAX"

Question 5

5a. We read in the data to a data frame called planes.

planes <- read.csv("http://stat-computing.org/dataexpo/2009/plane-data.csv")

5b. First we find the number of miles for each TailNum, and then we use the manufacturer data from the planes data frame as an index into this table, to select the TailNums that we want.

mymiles <- tapply(bigDF$Distance, bigDF$TailNum, sum, na.rm=TRUE)

Then we break these miles into groups, according to the manufacturer, and we sum the miles in each group (i.e., within each manufacturer). Finally we extract the top 10 of these manufacturers, according to the most miles flown altogether.

mymanufacturersmiles <- tapply(as.numeric(mymiles[as.character(planes$tailnum)]),planes$manufacturer,sum, na.rm=TRUE)
sort(mymanufacturersmiles,decreasing=TRUE)[1:10]
##                        BOEING              AIRBUS INDUSTRIE 
##                   17601082510                    4773100539 
##             MCDONNELL DOUGLAS                               
##                    3627696769                    3523114053 
##                BOMBARDIER INC                       EMBRAER 
##                    2333926590                    2237565666 
## MCDONNELL DOUGLAS AIRCRAFT CO                        AIRBUS 
##                    1418210205                    1388680768 
##                      CANADAIR                       DOUGLAS 
##                     353418163                     333387468

5c. The planes that flew over 10000 miles in 2008 have these TailNum’s:

mymiles2008 <- tapply(bigDF$Distance[bigDF$Year == 2008],
                      as.character(bigDF$TailNum)[bigDF$Year == 2008], sum, na.rm=TRUE)
mylong2008miles <- mymiles2008[mymiles2008 > 10000]

The number of such planes is:

length(mylong2008miles)
## [1] 5333

In fact, this is almost all of the planes that flew in 2008, because the total number of planes from 2008 is:

length(mymiles2008)
## [1] 5374

Now we make a vector of the plane dates, with the tailnum’s as the names of the vector, and we extract the elements of this vector that appeared in mylong2008miles

mydates <- as.character(planes$issue_date)
names(mydates) <- as.character(planes$tailnum)

The oldest such plane was issued on this date:

min(strptime(mydates[names(mylong2008miles)], "%m/%d/%Y"),na.rm=TRUE)
## [1] "1976-01-09 EST"

5d. We again use the number of miles for each TailNum, from 5b above, stored in mymiles. Then we break these miles into groups, according to the “type”, and we sum the miles in each group (i.e., within each type“). Finally we sort these”types“, according to the most miles flown altogether.

mytypesmiles <- tapply(as.numeric(mymiles[as.character(planes$tailnum)]),planes$type,sum, na.rm=TRUE)
sort(mytypesmiles,decreasing=TRUE)
##         Corporation                              Individual 
##         34483365326          3523114053           280267907 
## Foreign Corporation            Co-Owner         Partnership 
##            73042583            55112463            15614504

Question 6

6a. The number of airports per state is:

table(airports$state)
## 
##  AK  AL  AR  AS  AZ  CA  CO  CQ  CT  DC  DE  FL  GA  GU  HI  IA  ID  IL 
## 263  73  74   3  59 205  49   4  15   1   5 100  97   1  16  78  37  88 
##  IN  KS  KY  LA  MA  MD  ME  MI  MN  MO  MS  MT  NC  ND  NE  NH  NJ  NM 
##  65  78  50  55  30  18  34  94  89  74  72  71  72  52  73  14  35  51 
##  NV  NY  OH  OK  OR  PA  PR  RI  SC  SD  TN  TX  UT  VA  VI  VT  WA  WI 
##  32  97 100 102  57  71  11   6  52  57  70 209  35  47   5  13  65  84 
##  WV  WY 
##  24  32

6b. First we get the iata codes from Indiana airports.

indiana <- airports$iata[airports$state == "IN"]

Then we go into the bigDF, and look at the flights for which the airport code is in Indiana. There are actually only 4 such airports! These are also listed on Wikipedia’s list http://en.wikipedia.org/wiki/List_of_airports_in_Indiana

namely, they are IND, FWA, SBN, EVV:

sort(table(bigDF$Dest)[as.character(indiana)],decreasing=TRUE)[1:5]
## 
##    IND    FWA    SBN    EVV   <NA> 
## 821734  78590  65110  48571     NA

6c. The airports in the Midwest are:

midwest <- airports$iata[airports$state %in% c("IL", "IN", "MI", "OH", "WI")]

Then we use the departure-to-arrival paths from vector v, created back in question 4c. We limit attention to those paths with Origin and Destination in the midwest. Then we make a table of the relevant paths, sort the table, and extract the top 5 paths.

sort(table(v[(bigDF$Origin %in% midwest) & (bigDF$Dest %in% midwest)]),decreasing=TRUE)[1:5]
## 
## ORDtoDTW DTWtoORD CLEtoORD ORDtoCLE DTWtoMDW 
##   159849   156932   109296   109211   103333

Question 7

  1. We proceed as in 4c, but we save the results:
myresults <- sort(table(v),decreasing=TRUE)[1:5]

Then we use mapply to print the results:

mapply(paste, "The number ", 1:5, " departure-to-arrival path in the USA is ", names(myresults),
             " with ", myresults, " flights altogether.", sep="", USE.NAMES=FALSE)
## [1] "The number 1 departure-to-arrival path in the USA is SFOtoLAX with 338472 flights altogether."
## [2] "The number 2 departure-to-arrival path in the USA is LAXtoSFO with 336938 flights altogether."
## [3] "The number 3 departure-to-arrival path in the USA is LAXtoLAS with 292125 flights altogether."
## [4] "The number 4 departure-to-arrival path in the USA is LAStoLAX with 286328 flights altogether."
## [5] "The number 5 departure-to-arrival path in the USA is PHXtoLAX with 279716 flights altogether."

Question 8

8a. We group the origins according to the airlines. Then we tabulate these cities, sort and take the maximum, and find the name of the city.

tapply(bigDF$Origin, bigDF$UniqueCarrier, function(x) names(sort(table(x),decreasing=TRUE)[1]))
##     9E     AA     AQ     AS     B6     CO     DH     DL     EA     EV 
##  "DTW"  "DFW"  "HNL"  "SEA"  "JFK"  "IAH"  "IAD"  "ATL"  "ATL"  "ATL" 
##     F9     FL     HA     HP ML (1)     MQ     NW     OH     OO PA (1) 
##  "DEN"  "ATL"  "HNL"  "PHX"  "MDW"  "DFW"  "DTW"  "CVG"  "SLC"  "MIA" 
##     PI     PS     TW     TZ     UA     US     WN     XE     YV 
##  "CLT"  "LAX"  "STL"  "MDW"  "ORD"  "CLT"  "PHX"  "IAH"  "PHX"

8b. Same thing, but with destinations.

tapply(bigDF$Dest, bigDF$UniqueCarrier, function(x) names(sort(table(x),decreasing=TRUE)[1]))
##     9E     AA     AQ     AS     B6     CO     DH     DL     EA     EV 
##  "DTW"  "DFW"  "HNL"  "SEA"  "JFK"  "IAH"  "IAD"  "ATL"  "ATL"  "ATL" 
##     F9     FL     HA     HP ML (1)     MQ     NW     OH     OO PA (1) 
##  "DEN"  "ATL"  "HNL"  "PHX"  "MDW"  "DFW"  "DTW"  "CVG"  "SLC"  "MIA" 
##     PI     PS     TW     TZ     UA     US     WN     XE     YV 
##  "CLT"  "LAX"  "STL"  "MDW"  "ORD"  "CLT"  "PHX"  "IAH"  "PHX"

8c. Same thing as 8a, but interchanging the role of the origin and the airline.

tapply(bigDF$UniqueCarrier, bigDF$Origin, function(x) names(sort(table(x),decreasing=TRUE)[1]))
##  ABE  ABI  ABQ  ABY  ACK  ACT  ACV  ACY  ADK  ADQ  AEX  AGS  AKN  ALB  ALO 
## "US" "MQ" "WN" "EV" "XE" "MQ" "OO" "US" "AS" "AS" "EV" "DL" "AS" "US" "9E" 
##  AMA  ANC  ANI  APF  ASE  ATL  ATW  AUS  AVL  AVP  AZO  BDL  BET  BFF  BFI 
## "WN" "AS" "AS" "EV" "OO" "DL" "OO" "WN" "US" "US" "NW" "US" "AS" "OO" "CO" 
##  BFL  BGM  BGR  BHM  BIL  BIS  BJI  BLI  BMI  BNA  BOI  BOS  BPT  BQK  BQN 
## "OO" "US" "MQ" "WN" "NW" "NW" "9E" "US" "MQ" "WN" "WN" "US" "XE" "EV" "B6" 
##  BRO  BRW  BTM  BTR  BTV  BUF  BUR  BWI  BZN  CAE  CAK  CCR  CDC  CDV  CEC 
## "XE" "AS" "OO" "DL" "US" "US" "WN" "WN" "DL" "DL" "FL" "US" "OO" "AS" "OO" 
##  CHA  CHO  CHS  CIC  CID  CKB  CLD  CLE  CLL  CLT  CMH  CMI  CMX  COD  COS 
## "DL" "EV" "DL" "OO" "TW" "OH" "OO" "CO" "MQ" "US" "US" "MQ" "9E" "OO" "UA" 
##  CPR  CRP  CRW  CSG  CVG  CWA  CYS  DAB  DAL  DAY  DBQ  DCA  DEN  DET  DFW 
## "OO" "WN" "US" "EV" "DL" "OO" "OO" "DL" "WN" "US" "MQ" "US" "UA" "WN" "AA" 
##  DHN  DLG  DLH  DRO  DSM  DTW  DUT  EAU  EFD  EGE  EKO  ELM  ELP  ERI  EUG 
## "EV" "AS" "NW" "YV" "UA" "NW" "AS" "NW" "XE" "AA" "OO" "US" "WN" "US" "UA" 
##  EVV  EWN  EWR  EYW  FAI  FAR  FAT  FAY  FCA  FLG  FLL  FLO  FMN  FNT  FOE 
## "MQ" "EV" "CO" "EV" "AS" "NW" "OO" "US" "DL" "HP" "DL" "EV" "OO" "NW" "UA" 
##  FSD  FSM  FWA  GCC  GCN  GEG  GFK  GGG  GJT  GLH  GNV  GPT  GRB  GRK  GRR 
## "NW" "MQ" "MQ" "YV" "HP" "WN" "NW" "MQ" "OO" "9E" "EV" "EV" "NW" "MQ" "NW" 
##  GSO  GSP  GST  GTF  GTR  GUC  GUM  HDN  HHH  HKY  HLN  HNL  HOU  HPN  HRL 
## "US" "DL" "AS" "NW" "EV" "YV" "CO" "AA" "EV" "EV" "DL" "HA" "WN" "UA" "WN" 
##  HSV  HTS  HVN  IAD  IAH  ICT  IDA  ILE  ILG  ILM  IND  INL  IPL  ISO  ISP 
## "DL" "US" "UA" "UA" "CO" "UA" "OO" "MQ" "EV" "US" "US" "9E" "OO" "PI" "WN" 
##  ITH  ITO  IYK  JAC  JAN  JAX  JFK  JNU  KOA  KSM  KTN  LAN  LAS  LAW  LAX 
## "US" "HA" "OO" "DL" "DL" "US" "B6" "AS" "HA" "AS" "AS" "NW" "WN" "MQ" "UA" 
##  LBB  LCH  LEX  LFT  LGA  LGB  LIH  LIT  LMT  LNK  LNY  LRD  LSE  LWB  LWS 
## "WN" "XE" "DL" "XE" "US" "B6" "HA" "WN" "OO" "UA" "HA" "MQ" "NW" "EV" "OO" 
##  LYH  MAF  MAZ  MBS  MCI  MCN  MCO  MDT  MDW  MEI  MEM  MFE  MFR  MGM  MHT 
## "EV" "WN" "MQ" "NW" "WN" "EV" "DL" "US" "WN" "EV" "NW" "CO" "OO" "DL" "WN" 
##  MIA  MIB  MKC  MKE  MKG  MKK  MLB  MLI  MLU  MOB  MOD  MOT  MQT  MRY  MSN 
## "AA" "NW" "OO" "NW" "OO" "HA" "DL" "TW" "DL" "DL" "OO" "NW" "MQ" "OO" "NW" 
##  MSO  MSP  MSY  MTH  MTJ  MYR  OAJ  OAK  OGD  OGG  OKC  OMA  OME  ONT  ORD 
## "OO" "NW" "WN" "EV" "OO" "US" "US" "WN" "OO" "HA" "WN" "UA" "AS" "WN" "UA" 
##  ORF  ORH  OTH  OTZ  OXR  PBI  PDX  PFN  PHF  PHL  PHX  PIA  PIE  PIH  PIR 
## "US" "US" "OO" "AS" "OO" "DL" "AS" "EV" "FL" "US" "HP" "MQ" "TZ" "OO" "9E" 
##  PIT  PLN  PMD  PNS  PSC  PSE  PSG  PSP  PUB  PVD  PVU  PWM  RAP  RDD  RDM 
## "US" "9E" "HP" "DL" "DL" "B6" "AS" "OO" "HP" "US" "EV" "DL" "NW" "OO" "OO" 
##  RDR  RDU  RFD  RHI  RIC  RKS  RNO  ROA  ROC  ROP  ROR  ROW  RST  RSW  SAN 
## "NW" "AA" "OO" "9E" "US" "YV" "WN" "US" "US" "CO" "CO" "MQ" "NW" "DL" "WN" 
##  SAT  SAV  SBA  SBN  SBP  SCC  SCE  SCK  SDF  SEA  SFO  SGF  SGU  SHV  SIT 
## "WN" "DL" "OO" "NW" "OO" "AS" "OH" "US" "WN" "AS" "UA" "MQ" "OO" "DL" "AS" 
##  SJC  SJT  SJU  SLC  SLE  SMF  SMX  SNA  SOP  SPI  SPN  SPS  SRQ  STL  STT 
## "WN" "MQ" "AA" "DL" "OO" "WN" "OO" "AA" "EV" "OO" "CO" "MQ" "DL" "TW" "AA" 
##  STX  SUN  SUX  SWF  SYR  TEX  TLH  TOL  TPA  TRI  TTN  TUL  TUP  TUS  TVC 
## "AA" "OO" "UA" "AA" "US" "YV" "DL" "US" "US" "US" "OH" "WN" "EV" "AA" "NW" 
##  TVL  TWF  TXK  TYR  TYS  UCA  VCT  VIS  VLD  VPS  WRG  WYS  XNA  YAK  YAP 
## "AA" "OO" "MQ" "MQ" "DL" "US" "OO" "OO" "EV" "NW" "AS" "OO" "MQ" "AS" "CO" 
##  YKM  YUM 
## "US" "OO"

8d. Same thing as 8b, but interchanging the role of the destination and the airline.

tapply(bigDF$UniqueCarrier, bigDF$Dest, function(x) names(sort(table(x),decreasing=TRUE)[1]))
##  ABE  ABI  ABQ  ABY  ACK  ACT  ACV  ACY  ADK  ADQ  AEX  AGS  AKN  ALB  ALO 
## "US" "MQ" "WN" "EV" "XE" "MQ" "OO" "US" "AS" "AS" "EV" "DL" "AS" "US" "9E" 
##  AMA  ANC  ANI  APF  ASE  ATL  ATW  AUS  AVL  AVP  AZO  BDL  BET  BFF  BFI 
## "WN" "AS" "AS" "EV" "OO" "DL" "OO" "WN" "US" "US" "NW" "US" "AS" "OO" "AS" 
##  BFL  BGM  BGR  BHM  BIL  BIS  BJI  BLI  BMI  BNA  BOI  BOS  BPT  BQK  BQN 
## "OO" "US" "MQ" "WN" "NW" "NW" "9E" "US" "MQ" "WN" "WN" "US" "XE" "EV" "B6" 
##  BRO  BRW  BTM  BTR  BTV  BUF  BUR  BWI  BZN  CAE  CAK  CBM  CCR  CDC  CDV 
## "XE" "AS" "OO" "DL" "US" "US" "WN" "WN" "DL" "DL" "FL" "EV" "US" "OO" "AS" 
##  CEC  CHA  CHO  CHS  CIC  CID  CKB  CLD  CLE  CLL  CLT  CMH  CMI  CMX  COD 
## "OO" "DL" "EV" "DL" "OO" "TW" "OH" "OO" "CO" "MQ" "US" "US" "MQ" "9E" "OO" 
##  COS  CPR  CRP  CRW  CSG  CVG  CWA  CYS  DAB  DAL  DAY  DBQ  DCA  DEN  DET 
## "UA" "OO" "WN" "US" "EV" "DL" "OO" "OO" "DL" "WN" "US" "MQ" "US" "UA" "WN" 
##  DFW  DHN  DLG  DLH  DRO  DSM  DTW  DUT  EAU  EFD  EGE  EKO  ELM  ELP  ERI 
## "AA" "EV" "AS" "NW" "YV" "UA" "NW" "AS" "NW" "XE" "AA" "OO" "US" "WN" "US" 
##  EUG  EVV  EWN  EWR  EYW  FAI  FAR  FAT  FAY  FCA  FLG  FLL  FLO  FMN  FNT 
## "UA" "MQ" "EV" "CO" "EV" "AS" "NW" "OO" "US" "DL" "HP" "DL" "EV" "OO" "NW" 
##  FOE  FSD  FSM  FWA  GCC  GCN  GEG  GFK  GGG  GJT  GLH  GNV  GPT  GRB  GRK 
## "UA" "NW" "MQ" "MQ" "YV" "HP" "WN" "NW" "MQ" "OO" "9E" "EV" "EV" "NW" "MQ" 
##  GRR  GSO  GSP  GST  GTF  GTR  GUC  GUM  HDN  HHH  HKY  HLN  HNL  HOU  HPN 
## "NW" "US" "DL" "AS" "NW" "EV" "YV" "CO" "AA" "EV" "EV" "DL" "HA" "WN" "UA" 
##  HRL  HSV  HTS  HVN  IAD  IAH  ICT  IDA  ILE  ILG  ILM  IND  INL  IPL  ISO 
## "WN" "DL" "US" "UA" "UA" "CO" "UA" "OO" "MQ" "EV" "US" "US" "9E" "OO" "PI" 
##  ISP  ITH  ITO  IYK  JAC  JAN  JAX  JFK  JNU  KOA  KSM  KTN  LAN  LAR  LAS 
## "WN" "US" "HA" "OO" "DL" "DL" "US" "B6" "AS" "HA" "AS" "AS" "NW" "OO" "WN" 
##  LAW  LAX  LBB  LBF  LCH  LEX  LFT  LGA  LGB  LIH  LIT  LMT  LNK  LNY  LRD 
## "MQ" "UA" "WN" "OO" "XE" "DL" "XE" "US" "B6" "HA" "WN" "OO" "UA" "HA" "MQ" 
##  LSE  LWB  LWS  LYH  MAF  MAZ  MBS  MCI  MCN  MCO  MDT  MDW  MEI  MEM  MFE 
## "NW" "EV" "OO" "EV" "WN" "MQ" "NW" "WN" "EV" "DL" "US" "WN" "EV" "NW" "CO" 
##  MFR  MGM  MHT  MIA  MIB  MKC  MKE  MKG  MKK  MLB  MLI  MLU  MOB  MOD  MOT 
## "OO" "DL" "WN" "AA" "NW" "AA" "NW" "OO" "HA" "DL" "TW" "DL" "DL" "OO" "NW" 
##  MQT  MRY  MSN  MSO  MSP  MSY  MTH  MTJ  MYR  OAJ  OAK  OGD  OGG  OKC  OMA 
## "MQ" "OO" "NW" "OO" "NW" "WN" "EV" "OO" "US" "US" "WN" "OO" "HA" "WN" "UA" 
##  OME  ONT  ORD  ORF  ORH  OTH  OTZ  OXR  PBI  PDX  PFN  PHF  PHL  PHX  PIA 
## "AS" "WN" "UA" "US" "US" "OO" "AS" "OO" "DL" "AS" "EV" "FL" "US" "HP" "MQ" 
##  PIE  PIH  PIR  PIT  PLN  PMD  PNS  PSC  PSE  PSG  PSP  PUB  PVD  PVU  PWM 
## "TZ" "OO" "9E" "US" "9E" "HP" "DL" "DL" "B6" "AS" "OO" "HP" "US" "OO" "DL" 
##  RAP  RCA  RDD  RDM  RDR  RDU  RFD  RHI  RIC  RKS  RNO  ROA  ROC  ROP  ROR 
## "NW" "OO" "OO" "OO" "NW" "AA" "OO" "9E" "US" "YV" "WN" "US" "US" "CO" "CO" 
##  ROW  RST  RSW  SAN  SAT  SAV  SBA  SBN  SBP  SCC  SCE  SCK  SDF  SEA  SFO 
## "MQ" "NW" "DL" "WN" "WN" "DL" "OO" "NW" "OO" "AS" "OH" "US" "WN" "AS" "UA" 
##  SGF  SGU  SHV  SIT  SJC  SJT  SJU  SKA  SLC  SLE  SMF  SMX  SNA  SOP  SPI 
## "MQ" "OO" "DL" "AS" "WN" "MQ" "AA" "OO" "DL" "OO" "WN" "OO" "AA" "EV" "OO" 
##  SPN  SPS  SRQ  STL  STT  STX  SUN  SUX  SWF  SYR  TEX  TLH  TOL  TPA  TRI 
## "CO" "MQ" "DL" "TW" "AA" "AA" "OO" "UA" "AA" "US" "YV" "DL" "US" "US" "US" 
##  TTN  TUL  TUP  TUS  TVC  TVL  TWF  TXK  TYR  TYS  UCA  VCT  VIS  VLD  VPS 
## "OH" "WN" "EV" "AA" "NW" "AA" "OO" "MQ" "MQ" "DL" "US" "OO" "OO" "EV" "NW" 
##  WRG  WYS  XNA  YAK  YAP  YKM  YUM 
## "AS" "OO" "MQ" "AS" "CO" "US" "OO"

Question 9

9a. We look at the departure delays, split into groups according to how long the flight will be, and we take an average within each group. We discover that the flights that are less than 500 miles are delayed only 7.42 minutes, on average.

tapply(bigDF$DepDelay, cut(bigDF$Distance, breaks=seq(from=0,to=max(bigDF$Distance,na.rm=T),by=500)), mean, na.rm=TRUE)
##         (0,500]     (500,1e+03] (1e+03,1.5e+03] (1.5e+03,2e+03] 
##        7.422427        8.621311        8.945828        9.405297 
## (2e+03,2.5e+03] (2.5e+03,3e+03] (3e+03,3.5e+03] (3.5e+03,4e+03] 
##        9.423052        8.891673       11.272760        8.145777 
## (4e+03,4.5e+03] 
##        9.727186

9b. Similar idea, but we now split according to the scheduled departure time. We see that the flights that depart before 6 AM are delayed only 4.21 minutes, on average.

tapply(bigDF$DepDelay, cut(bigDF$CRSDepTime, breaks=seq(from=0,to=2400,by=600), include.lowest=TRUE), mean, na.rm=TRUE)
##           [0,600]     (600,1.2e+03] (1.2e+03,1.8e+03] (1.8e+03,2.4e+03] 
##          4.213356          4.437826          9.635142         12.798243

Question 10

10a. To build the function, we find the number of flights with Origin city that matches parameter 1 (origincity), and that simultaneously have Destination city that matches parameter 2 (destcity). We use sum to count the number of such flights.

numflightsfunc <- function(origincity, destcity) {
  return(sum((bigDF$Origin == origincity) & (bigDF$Dest == destcity), na.rm=TRUE))
}

10b. Now we test this function with IND as Origin and ORD as destination.

numflightsfunc("IND", "ORD")
## [1] 80498

10c. We extract the destinations that are in the given list of cities. Then we tabulate the results, sort them, and print the largest.

mostpopfunc <- function(cities) {
  return( sort(table(bigDF$Dest[bigDF$Dest %in% cities]),decreasing=TRUE)[1] )
}

10d. Of these three cities, ORD is the most popular destination.

mostpopfunc( c("JFK", "ORD", "LAX") )
##     ORD 
## 6638035