Project 3



Solve the following questions in R.

Put your results in RMarkdown when you are finished.

Submit your code in a .Rmd file, and also include either an html or pdf file that displays the output.

As always, please comment on your method of solution, and display your solutions.


Use R to revisit these questions. They can each be accomplished with 1 line of code.

1. As in Project 1, question 2: In the year 2005, did United or Delta have more flights?

2. As in Project 2, question 2a: Restricting attention to weekends (only), what was the average arrival delay (in minutes) for flights in 2005?

3. As in Project 1, question 3: In June 2017, what is the distribution of the number of passengers in the taxi cab rides?

4. As in Project 2, question 4a: What is the average distance of a taxi cab ride in New York City in June 2017?

5. Use the tapply function to find the following:
5a. The average number of passengers (on taxi cabs rides) for each day of June 2017.
5b. The average distance (on taxi cabs rides) for each day of June 2017.
5c. The average distance (on airplane flights) for each day of 2005.
5d. The average arrival delay (on airplane flights) for each day of 2005.



In these questions, we use our UNIX knowledge to help extract some of the data, because we do not want to import all of the data into R. We want to start combining some of our knowledge about various tools.

6. Use UNIX to make a new file that contains the departure delays of the flights from IND to ORD. What is the average departure delay on flights from IND to ORD? Double-check your work, by analyzing the file created in question 6ab in Project 1, and by comparing to your awk solution in question 3 in Project 2.

7a. Use UNIX to make a new file that contains the distances of every taxi cab ride in the New York City yellow cab files (across all months and years). It should have 1 distance per line.

7b. What is the mean and standard deviation of the distance of these taxi cab rides?

8a. Use UNIX to extract the information about how many flights (across all years) occur with each airline (i.e., with each UniqueCarrier). You can tabulate these results in UNIX or in R.

8b. Make a dotchart in R that displays this data.



9. Use the data about the airports available from the supplemental data of the ASA DataExpo 2009 to make a map of the contiguous portion of the United States, displaying all of the airports.

10. Make an analogous map of Indiana with its airports.