Project 5


Please submit your project in RMarkdown.

1. Read the selection of The Elements of Graphing Data by William Cleveland, and the selection of Creating More Effective Graphs by Naomi Robbins.

Also read the classic article "How to Display Data Badly" by Howard Wainer: http://www.jstor.org.ezproxy.lib.purdue.edu/stable/2683253

2. Find 6 visualizations from the Information Is Beautiful website (http://www.informationisbeautiful.net/) that do a bad job of portraying data, according to the best practices in the selections from question 1. Write 1/3 of a page (for each such visualization) about what is done poorly.

3. Identify 3 excellent visualizations of data from the site in question 2. Write 1/3 of a page (for each such visualization) about what is done well.

4. The Gapminder comparison of life expectancy and income per person, plotted from 1800 to 2018, is eye-catching but does not necessary follow the best practices for data visualization. (See www.gapminder.org/tools/) Write half a page about things that this visualization does poorly, and half a page about ways that this visualization could be re-created and re-designed, if you were to recreate this website yourself.

5. Consider the poster winner "Congestion in the Sky", from the 2009 Data Expo: http://stat-computing.org/dataexpo/2009/posters/ Describe at least 3 significant ways that this poster could be improved. For each of these 3 ways, write a 1/3 of a page constructive criticism, specifying what could be improved and how that aspect of the visualization could be done better.

6. Choose a different poster from the 2009 Data Expo, and construct a similar analysis to question 5, i.e., give a constructive criticism of at least 3 significant ways that this poster could be improved, with 1/3 of a page writeup for each such significant need for improvement.

7. Which of the posters in the Data Expo 2009 do you think should be the winner? Why? (It is OK if you choose the poster that actually won, or any of the other posters.) Thoroughly justify your answer, using the techniques of effective data visualization, to justify your answer, with an explanation that is at least 1 page long altogether.

8., 9., 10., 11., 12. Imagine that you are going to enter the Data Expo 2009. Your team does not need to assemble a poster, but you should design 5 plots that explore the airline data set. Use the best practices about visualization, from the Cleveland and Robbins texts.

Be sure to put some text to explain your graphs as well, as if you were working to assemble a poster.

There are many R resources for exploring the graphing tools that are available in R. For instance, see:

https://www.r-graph-gallery.com/

and many other similar websites.