##################
### Solutions: ###
##################
# The solutions are more simple if we first
# change our directory to the place where the data is stored.
cd /data/public/dataexpo2009
# 1a. There are 347 distinct airport codes that appear in the Origin column
cut -d, -f17 allyears.csv | sort | uniq | grep -v "Origin" | wc -l
# 1b. There are 352 distinct airport codes that appear in the Destination column
cut -d, -f18 allyears.csv | sort | uniq | grep -v "Dest" | wc -l
# 1c. In fact, there are 352 distinct airport codes that appear in the origin and destination columns altogether.
cut -d, -f17 allyears.csv | grep -v "Origin" >/home/mdw/origins.txt
cut -d, -f18 allyears.csv | grep -v "Dest" >/home/mdw/destinations.txt
cat /home/mdw/origins.txt /home/mdw/destinations.txt | sort | uniq -c | wc -l
# You can double-check your solution to 1a by typing:
sort /home/mdw/origins.txt | uniq -c | wc -l
# and you can double-check your solution to 1b by typing:
sort /home/mdw/destinations.txt | uniq -c | wc -l
# After question 1c is over, we need to be sure to delete these temporary files!
rm /home/mdw/origins.txt
rm /home/mdw/destinations.txt
# 2a. There are 1646578 IND flights altogether.
cut -d, -f17,18 allyears.csv | grep IND | wc -l
# 2b. Of course we get the same result:
grep IND allyears.csv | wc -l
# 3. First we can save the tailnums to a file, for convenience:
cut -d, -f11 allyears.csv >/home/mdw/tailnums.txt
# 3abc. We can answer questions 3a, 3b, 3c by just classifying the
# various kinds of tailnums that appear in the file:
sort /home/mdw/tailnums.txt | uniq -c | sort -n | tail
# 3a. There are 37245646 lines equal to NA
# 3b. There are 351056 lines equal to 0
# 3c. There are 55349 lines equal to 000000
# 3d. There are 756609 flights that have NKNO in the tailnum:
grep NKNO /home/mdw/tailnums.txt | wc -l
# 3e. There are 139774 flights that have (only) zero or more whitespace characters,
but no other content.
grep '^[:space:]*$' /home/mdw/tailnums.txt | wc -l
# or we could get this same answer by going back to our solution for 3abc,
# and noticing this same answer from that solution.
# 4. The 10 planes that took the most flights overall can be found by:
cut -d, -f11 allyears.csv | sort | uniq -c | sort -n | tail -n15
# We get the following:
34216 N522
34230 N521
34253 N523
34254 N513
34275 N524
34344 N514
34394 N527
34474 N525
34519 N526
34526 N528
55349 000000
139774
351056 0
572299 UNKNOW
37245646 NA
# 5a. The airplane with tailnum N528 made 34526 flights altogether.
grep ,N528, allyears.csv | wc -l
# 5b. This airplane made 14 flights on a single day.
# This happened three times altogether.
grep ,N528, allyears.csv | cut -d, -f1,2,3 | sort | uniq -c | sort -n | tail
# 5c. One of those 14 flights in a single day was made on Nov 3, 1995:
grep ,N528, allyears.csv | cut -d, -f1,2,3 | grep -w 1995,11,3 | wc -l
# 6. We can print them all, if we leave the tail off the command below.
# For convenience, here we just print the 10 most popular.
cut -d, -f17,9 allyears.csv | grep -v Origin | sort | uniq -c | sort -nr | head
# We get the following:
3884756 DL,ATL
3312135 AA,DFW
2726727 UA,ORD
2176716 NW,DTW
2120503 NW,MSP
2073554 AA,ORD
2008069 US,CLT
1814823 CO,IAH
1809174 UA,DEN
1681793 US,PIT
# 7. To get the flights per year, we can just do the following:
cut -d, -f1 allyears.csv | grep -v Year | sort | uniq -c | sort -n
# We get these results
1311826 1987
5041200 1989
5070501 1993
5076925 1991
5092157 1992
5180048 1994
5202096 1988
5270893 1990
5271359 2002
5327435 1995
5351983 1996
5384721 1998
5411843 1997
5527884 1999
5683047 2000
5967780 2001
6488540 2003
7009728 2008
7129270 2004
7140596 2005
7141922 2006
7453215 2007