Project 1

1. Use the airline data stored in this directory:

/depot/statclass/data/dataexpo2009

In the year 2005, find:

1a. the number of flights that occurred, on every day of the year, and

1b. find the day of the year on which the most flights occur.

2. Again considering the year 2005, did United or Delta have more flights?

3. Consider the June 2017 taxi cab data, which is located in this folder:

/depot/statclass/data/taxi2018

What is the distribution of the number of passengers in the taxi cab rides? In other words, make a list of the number of rides that have 1 passenger; that have 2 passengers; etc.

4. Revisit question 1a, but using all of the data about all of the flights, from 1987 to 2008. (Note: It is not necessary to run 22 separate commands for this purpose.)

5a. Give a distribution of the number of flights in the ASA Data Expo 2009 according to the day of the week.
5b. Which day of the week is the most popular for travel?
5c. Which day of the week is least popular for travel?

6a. Create a text file call INDtoORD.txt that contains the data about every flight from Indianapolis (IND) to Chicago O'Hare (ORD).
6b. Zip this data into a compressed file called INDtoORD.zip

7a. Identify the 10 airports in the ASA Data Expo 2009 that are the busiest, according to the number of departures (i.e., according to serving as the origin airport for flights).
7b. Use the grep command (with multiple patterns) to store the complete data (all 29 parameters) about these 10 airports, into a file called popularairportdata.txt.

8. How many distinct airports are represented in the ASA Data Expo?

9a. Revisit question 3, but using all of the data about all of the taxi cab rides, from 2009 to 2017. (Note: It is not necessary to run dozens of separate commands for this purpose.)
9b. Do you notice anything unusual about this data?

10a. Find the number of taxi cab rides per day in June 2017. (Use the date when the cabs depart, in case the trip lasts past midnight.) Hint: You might need to use *two* cut commands, since you will need to extract the data about the day from the timestamp. The exact time of departure is given, but the hours, minutes, and seconds are not needed for this question, and must be avoided.
10b. Same question, but use all of the data about all of the taxi cab rides, from 2009 to 2017.