STAT 29000
Project 1
due Wednesday, September 3, at 9:30 AM


Put all of your solutions into one text file for your group. The file should be called, for instance:
p01g1.R for project 1, group 1, and should be stored in the folder: /proj/gpproj14/p01g1/
   Group 1 consists of: kidd6,ffranci,lyoke,boydp,fu82
p01g2.R for project 1, group 2, and should be stored in the folder: /proj/gpproj14/p01g2/
   Group 2 consists of: enorlin,gallaghp,vincentc,philliw,rcrutkow
p01g3.R for project 1, group 3, and should be stored in the folder: /proj/gpproj14/p01g3/
   Group 3 consists of: john1209,zhan1460,malek,cringwa,avorhies
p01g4.R for project 1, group 4, and should be stored in the folder: /proj/gpproj14/p01g4/
   Group 4 consists of: cdesanti,reno0,marti748,omalleyb,peter188

1. During which pair of years did the level of Lake Huron rise the most? The data to use is from the built-in LakeHuron data set. (E.g., during 1875 to 1876, Lake Huron rose 1.48 feet.) It might help to use the diff command.

2a. What is the average duration of an eruption in the geyser dataset in the MASS library?
2b. What were the 10 longest durations?
2c. How many durations were 3 minutes or longer?
(You do not need to install the MASS library; it is installed already. You do, however, need to load the MASS library.)

3a. Which car(s) in the mtcars data set had the highest gas mileage?
3b. Which car(s) had the highest horsepower?
3c. Which car(s) had the shortest (i.e., fastest) 1/4 mile time?
3d. How many cars had manual transmission?
3e. How many cars had manual transmission and also six cylinders?

4a. Which states are (strictly) larger in population than Indiana but (strictly) smaller in population than Pennsylvania, according to the data in the state data set?
Hint: You can get the state populations using state.x77[,"Population"].
4b. Which states are (strictly) larger in land area than Indiana but (strictly) smaller in land area than Pennsylvania, according to the data in the state data set, as listed in state.x77[,"Area"]?

5. If Z is a standard normal random variable, we know that Z has average 0 and variance 1. Use R to simulate:
5a. the value of the average of |Z|, and
5b. the value of the variance of |Z|.
Here, |Z| is just the absolute value of Z.

6. Write a function called: countas That takes a sequence of words and returns the number of words that have 1 or more a's. For instance, countas( c("ate", "hello", "duolingo", "pat", "aa") ) should return the value 3. Hint: It might help to use the grep function.

7a. Write a function called: firstthree that returns the location of the first occurrence of 3 in a vector, for instance, firstthree( c(-2.5,3,3,0.001,22,5,7,19,3,17) ) should return the value 2.
7b. Write a function called: thirdthree that returns the location of the third occurrence of 3 in a vector, for instance, thirdthree( c(-2.5,3,3,0.001,22,5,7,19,3,17) ) should return the value 9.

8. Write a function called: topfive that returns the most common five values in a vector, along with the counts for each of the 5 values.

9a. Euler's number is 2.718281828459...
Euler's number is defined as
1 + 1/1 + 1/(1*2) + 1/(1*2*3) + 1/(1*2*3*4) + 1/(1*2*3*4*5) + ...
Find a good way to calculate this in R, with few keystrokes.
If you subtract 2.718281828459 from your estimate, you should get something very small, e.g., roughly 4.5 * 10^{-14}.

9b. Find a good way to approximate the value of Pi, using only the fact that
Pi^2 / 6 = 1/1^2 + 1/2^2 + 1/3^2 + 1/4^2 + 1/5^2 + 1/6^2 + 1/7^2 + ....

10a. The triangular numbers are:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
See: http://oeis.org/A000217
Find an efficient way to compute, in R, the first 100 such numbers. Does your method extend to the first 1000 such numbers too?

10b. The tetrahedral numbers are:
1, 4, 10, 20, 35, 56, 84, 120, 165, 220, ...
See: http://oeis.org/A000292
Find an efficient way to compute, in R, the first 100 such numbers. Does your method extend to the first 1000 such numbers too?