Project 2

1a. What was the average arrival delay (in minutes) for flights in 2005?

1b. What was the average departure delay (in minutes) for flights in 2005?

1cd. Now revise your solution to 1ab, to account for the delays (of both types) in the full set of data, across all years.


2. Revise your solutions to 1abcd to only include flights that took place on the weekends.


3a. What is the average departure delay on flights from IND to ORD?

3b. Double-check your work, by analyzing the file created in question 6ab in project 1.


4a. What is the average distance of a taxi cab ride in New York City in June 2017?

4b. Now revise your solution to 4a, to account for the full set of data, across all years.

Hint: On problems that will take a very long time to run, like 4b, you can use the following method:

nohup allofyourusualcommandstuffgoeshere >~/myoutputfile.txt &

The "nohup" causes the program to keep running, even if you log out.
The ampersand lets you keep typing in the meantime.
The file called myoutputfile.txt will be saved in your home directory.
(The tilde stands for your home directory. You can choose another location if you prefer, of course.)
You get a job number when you start a command running like this. For instance, you job number might be 13788.
You can stop that job running at any point during its execution by typing, for instance, kill 13788


5. For each taxi cab ride, compute the percentage of the ride's fare that is dedicated to the tolls. What is this percentage, on average?


6. Consider customers who pay for their taxi cabs with credit card versus with cash. Does this distinction affect the distance traveled?


7a. Use the method from the end of the notes, to add up the total number of miles flow by each airline, in 2005.

7b. Now revise your solution to 7a, to account for the full set of data, across all years.


8. Repeat question 7ab but using the tail number (which is unique to each airplane) instead of the airline.


9. Revisit question 4, breaking the results down according to the number of passengers. Here is a basic outline of how to do this:

9a. Use the method from the end of the notes, to add up the total distance across all taxi rides, according to the number of passengers.

9b. Now augment the previous awk program to also include the total number of taxi rides, according to the number of passengers.

9c. Now add another feature: At the end of the awk program, divide the total distance across all taxi rides (according to the number of passengers) by the corresponding total number of taxi rides (again, according to that same number of passengers). With this method, at the end of the awk program, you can print the average distance per taxi ride, according to the number of passengers.


10a. Find the number of taxi cab rides on each day in June 2017.

Hint: You can use two delimiters like this:
awk -F[,\ ]
(the backslash is before the space to ensure that the space is detected)

10b. Does your answer to 10a agree with your answer to 10a from the previous problem set?