Here are a handful of examples of regular expressions. The best way to learn them in earnest is to just read some documentation about regular expressions and then try them! v <- c("me", "you", "mark", "laura","kale","emma","err","eat","queue","kangaroo","kangarooooo","kangarooooooooo") v[grep("m", v)] v[grep("me", v)] v[grep("a", v)] v[grep("a", v)] v[grep("k", v)] v[grep("^k", v)] v[grep("e", v)] v[grep("a$", v)] v[grep("o$", v)] v[grep("o", v)] v[grep("o{2}", v)] v[grep("o{3}", v)] v[grep("o{2,5}", v)] v[grep("q(ue){1}", v)] v[grep("q(ue){2}", v)] v[grep("q(ue){3}", v)] v[grep("e(m|r)", v)] v[grep("e[mr]", v)] v[grep("e(ma|rr)", v)] v[grep("([a-z])\\1", v)] When asking about names in the questions below, we assume that you are using the names from the election data, available here: /depot/statclass/data/election2018/itcont.txt and we are assuming that you are using unique names from column 8, i.e., that you have already removed duplicates of any names of the donors. Here is a summary of regular expressions: https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285 You are welcome to use any source or reference for regular expressions that you like. Please note that we need to use double backslash for back-references, when writing them in R. In general, in R, when writing a backslash in a regular expression, a double backslash is usually needed. 1. For each of the members of your team, find the number of donors who have your name, embedded somewhere in the donor's name (not necessarily as the first or last name). 2a. How many donors have a consecutive repeated letter in their name? 2b. How many donors have a consecutive repeated vowel in their name? 2c. How many donors have a consecutive repeated consonant in their name? 3. Which name(s) has/have the longest sequence of alphanumeric characters? 4a. How many of the names have no commas? One comma? Two commas? Etc.? 4b. Make a plot of the number of names with each of the counts above. 4c. The plot from 4b is a mouse-and-elephants plot, i.e., it shows small and large numbers on the same plot. So recreate the plot using the logarithms of the counts. This is a common technique for illustrating very large and very small numbers on the same plot. 5abc. Same question as 4, but using chunks of whitespace, instead of commas. For instance, the name STERN, MARY YERBY has two chunks of whitespace, namely, one chunk of whitespace between STERN, and MARY and another chunk of whitespace between MARY and YERBY (even though it is a double space, it is still just one chunk of whitespace). 6. Find the number of people with a name that contains the *word* "SMITH". For example, "SMITH" and "ELLIS-SMITH" are ok, but "SMITHY" is not. \b is useful! Also note that, in order to use \b, you will need to do \\b, just as we used another escape character for \1, resulting in \\1. For example, \\bWARD\\b will match all of the full names containing the *word* "WARD" :D 7. Question from Project 10, Team 2 8. Question from Project 10, Team 3 9. Find the number of names that contain the letter A followed by either BB or RN 10. How many people have names (first or last) that end in Z? 11a. How many people have Dr Ward's name or part of his name, for instance, WARD, MARK D as part of the name? OR: 11b. How many people have both a Z and an X in their name? 12. Question from Project 10, Team 7