Assignments for the class will be submitted via Github. The class Github page is https://github.com/Purdue-STAT290. Each student should have access to ten different project repositories corresponding to the groups they are in for each project. To upload assignment submissions, click 'Upload File' on the project page. The options for upload are either drag and drop or via the file manager. Add a commit message before upload (e.g. Added problem solutions) and then click 'Commit changes' to upload the file.
All group members will have access to the same Github repository for the project. Groups only need to submit one solutions file for each project. However, I highly recommend everyone work through the problems by themselves and keep their own solutions file. It is not OK to copy-and-paste code from the web. If a website is helpful to you, that is OK as long as you document the source of help, and as long as you still create your own solution and make your own group comments about the code.Groups should not divide up problems to do individually. Everyone should do every single problem in the problem sets assigned. Comprehending the material in this class fully is critical so do not force one person in the group to do the entire project by themselves. Help each other!
Each project will be out of 10 points.
Project deadlines are subject to change if the majority of groups have not finished the project. Once a project deadline has passed however, I will dock 10 percent per day from the groups who have not submitted their project solutions.
For projects 1 and 2, the assignments will submitted at text documents (.txt) (Do NOT submit a Microsoft Word file.) Text documents can be created on the Scholar cluster by using one of several text editors. The easiest of these to use is gedit, but nano, emacs, and vi (my favorite) are also options. To open gedit, either find it in the Applications menu or simply type
gedit in the terminal. A sample of how I want the homework to be formatted in the document is shown below.
2a. Use the ls command to learn which of the resulting .csv files is biggest in terms of bytes. How many bytes does this largest file have? # lists the contents of the working directory including file size, file permissions, etc ls -l 702878193 The largest file was 2007.csv with 702878193 bytes. 2b. Use the wc command to learn which of the resulting .csv files has the most lines. How many lines does this largest file have? # counts number of lines in all .csv files wc -1 *.csv | sort -n 7453216 2007.csv The most lines are in 2007.csv with 7453216 lines.
Please copy the question exactly from the problem set, and add solutions with comments (explain what the code is doing in the comments). Paste the output of the command into the document below the solution (if more than 10 lines, please just give me either the first or last 10 lines to make the solutions more readable). Write a short sentence explaining the output in the context of the problem.
The projects later in the semester (after the first few weeks) will be written in RMarkdown. A similar format will be expected on those as well. I'll send out an email after the class goes over RMarkdown with a sample of what a project in RMarkdown should look like.