Research Thrusts (2014-15): Statistics Living-Learning Community

Research Thrust A: Atmospheric/Earth Science
  1. (Dr. Michael Baldwin) My research group is focused on one of the most important challenges in the atmospheric sciences: improving the understanding and prediction of high-impact weather events. These weather events (e.g., tornados, droughts, flooding precipitation) affect public safety as well as many sectors of the economy, such as agriculture, energy, water resources, and the insurance industry; their costs can be severe and wide-ranging. My group has focused on both the short-term prediction problem as well as the longer-term challenge of understanding the effects of global climate change on high-impact weather systems. More specifically, we have made substantial contributions to the understanding of weather systems through the development and application of automated analysis procedures that identify and analyze such systems in meteorological data. The algorithms developed within my research group have allowed for the rapid analysis of massive data sets, such as multi-decade downscaled climate simulations and evaluation of high-resolution forecasts covering periods of multiple years. My research group recently developed a prototype, real-time prediction system to directly measure the characteristics of precipitating weather systems in high-resolution model forecasts. Through the application of image processing techniques, weather features are automatically identified, characterized, classified, and tracked over time; we apply appropriate statistical models, and we evaluate the resulting predictions.

  2. (Dr. Sonia Lasher-Trapp) I am very excited to have the opportunity to introduce students in Statistics to real research problems in Atmospheric Science, where we make extensive use of statistical concepts and tools to evaluate trends, variability, and correlations, for example, for very large data sets. The data sets used in my research most often include time series of airborne observations of cloud and precipitation development (using a variety of instruments mounted on the aircraft) and/or 3D radar scans of cumulus congestus clouds (the precursors to thunderstorms), and output from high-resolution 3D numerical simulations of these clouds and the precipitation processes occurring within them. When studying clouds and precipitation, we are always struggling with issues of data representativeness, missing data, limited sampling, etc., and the few statisticians working in our field have made significant advances using different statistical models.

  3. (Dr. Robert J. Trapp) We make extensive use of statistical concepts and tools to evaluate trends, variability, and correlations in data within large sets. In particular, I use time series of tornado and severe-storm occurrences, Doppler weather radar observations of tornadoes and tornadic storms, and output of numerical models simulations of such storms. In our research of severe weather, we constantly struggle with issues of data representativeness, biases, sampling issues, etc. Statistical models have helped resolve some of these issues, but needed are fresh minds with a strong statistical background to develop further models, and otherwise help advance the science.

  4. (Dr. Wen-wen Tung) Our laboratory specializes in studying the dynamical predictability of Earth and atmospheric systems and related phenomena on a variety of temporal and spatial scales. Some of our datasets are drawn directly from the United States Geological Survey database. For example, recently we have found a particular interest in time series data related to the flows of rivers. For many rivers, especially those in long-developed locations, we have access to reliable records of daily river flow measurements spanning well over 100 years. These complete and relatively lengthy records are an excellent starting point for analysis. Multiscale analysis of geophysical time series is one of our lab's specialties.

  5. (Dr. Hao Zhang) Many projects are possible, using the U.S. Climate Data Online NCDC, which provides daily, monthly, and annual precipitation and temperature data at thousands of weather stations. Students can build an understanding of time series analysis, spatial correlation and interpolation, extreme value theory, etc. They can practice model fitting and forecasting. They will learn skills to manage data, e.g., editing, merging, and splitting data sets. Students will also get introduced to new topics not taught in UG classes, e.g., spatial interpolation and extreme value theory.

  6. (Dr. Frederi Viens) Students can work with Viens and with former Ph.D. student L. Barboza (now at the university of Costa Rica) and other Ph.D. students and colleagues at Purdue and elsewhere, to quantify temperature changes, including uncertainty evaluation, over the last 1,000 years regionaly and globally. Viens's group's current research draws on global data; a possible new specific focus could be parts of Africa, because that is where climate change will have the biggest impact on populations, and where some of the most effective solutions may reside. Viens recently served as a Franklin Fellow (2010--2011) for the Africa Bureau at the U.S. Department of State, where he advised U.S. diplomacy on environmental challenges facing sub-Saharan Africa. His background is in probability theory and stochastic processes; he works on theoretical topics in stochastic analysis and applications to mathematical finance, mathematical statistics, and environmental modeling.

Research Thrust B: Biostatistics
  1. (Dr. Ruben Claudio Aguilar) My cell biology laboratory is particularly interested in basic cellular mechanism with emphasis in vesicle trafficking (e.g., intracellular protein transport). We daily produce enormous data sets from our morphometric analysis of microscopy-generated cell images. The analysis of these (and similar) result collections will be valuable to the students and useful to us. We expect that following an initial training, the students will be able to propose and discuss the advantages and disadvantages of different analytical approaches and to actively participate in the experimental design. In the past, I have successfully recruited undergraduate students from the biology courses I teach. In addition, I participate in the NSF-backed Louis Stokes Alliance for Minority Participation (LSAMP) program and the Purdue Summer Research Opportunity Program (SROP). In our lab, undergraduate students receive scientific training and are presented with the opportunity of pursuing independent research sub-projects. In addition, our undergraduates participate in lab meetings (where they are encouraged to participate and ask questions), and they are being trained in the good practices of scientific presentation. Indeed, our students have been very successful in their research endeavors; we have multiple awards to poster presentations in undergraduate research events and several paper authorships.

  2. (Dr. Hyonho Chun) Recent advances in high-throughput sequencing technology produce massive data for revealing DNA sequence composition, finding transcription factor binding, and quantifying gene expression levels; these are 2--3 GB per assay; with multiple assays (replicates), this is truly ``Big Data''. A sequencing machine reveals the bases of millions of short segments of DNA or RNA in a massively parallel way. The resulting reads need to be mapped back to the genome. This can be done with many free open software tools such as Bowtie and SOAP. One needs to summarize the mapping results, called the pile-up step, to see whether there is a base pair change in DNA (SNP discovery), whether the transcription factor binding occurs (peak calling), and to measure how genes are expressed (transcribed). Afterwards, one can perform statistical analysis. Since the sequencing techniques are new, most analyses are based on very simple statistical methods, and should be understandable to UG's with appropriate guidance and discussion. The students will benefit from working with Chun and Ward on Condor for parallel computational analysis.

  3. (Dr. Laszlo Csonka) We have two potential projects that could involve sophomore students. One of these involves comparing the rates of evolution of "meaningless" non-coding sequences and gene-coding sequences in Escherichia coli, Salmonella enterica, and other closely related Enterobacteriaceae.

  4. (Dr. Laszlo Csonka) The second one consists of investigation of the conservation of gene order (synteny) in distant species of bacteria. Both of these projects require analyses of very large DNA sequence data sets, and therefore would be ideal for computer-savvy statistics majors. It will be a great learning experience for them to be exposed to the data, vocabulary, and way of thinking of biologists.

  5. (Dr. Rebecca Doerge) Trainees will study an epigenetic modification called DNA methylation, which plays a role in cellular differentiation and cancer development. ``Next-Generation Sequencing'' (NGS) technologies yield discrete count data, at single-base resolution, across the entire genome. With sodium bisulfite treatment (which causes changes to the DNA based on individual cytosine methylation status), NGS can be used to investigate DNA methylation. Students can perform Fisher's exact test for differences in methylation levels at every genomic cytosine. Using start/stop locations, students can essentially test every gene for differences in methylation levels. The dichotomy between cytosine-level and gene-level testing allows students to experience statistical issues such as data quality, variability, and multiple testing in large-data applications.

  6. (Dr. George Moore) Our research and collaborations involve veterinary medical and veterinary public health data generated from Purdue's Veterinary Teaching Hospital, large veterinary practices, or commercial veterinary diagnostic laboratories. Projects for student involvement will include practical applications of medical dataset structure, handling missing patient data, appropriate statistical methods, and presentation of data/findings for veterinary clinical audiences and publication.

  7. (Dr. Doraiswami Ramkrishna) In my research group, we have been developing mathematical models to describe metabolism since the 1980s. In doing so, we have developed our own theory to describe the metabolic behavior of cells. Our main goal is to compare our model predictions with high throughput bioinformatic data that represent intricate intracellular processes on a genomic level. A variety of technologies are equipped with the power to provide the needed quantity of data including microarrays, RNA-seq, and protein mass-spectroscopy. The overall goal of this project is the validation of a metabolic theory by means of extracting patterns from data. Looking for trends in the differential expression of genes in volumes of omic data--and comparing them with model predictions--presents the opportunity for the authentication of this model at the genome level. Approaches for analyzing high throughput bioinformatic data are diverse and extend to data mining, Bayesian statistics, and Markov Chain Monte Carlo analysis.

  8. (Dr. Jun Xie) Students will learn about statistical methods for large-scale genomic data analysis. Nowadays whole-genome genetics information is commonly available in disease studies and clinical trials. For example, genome-wide associate studies analyze a large amount of common genetic variants, i.e., single nucleotide polymorphisms (SNPs), in individuals to examine if any genetic variants are associated with a disease. Another example is pharmacogenomics research, which uses whole genome information to predict individuals' drug response. Students will learn about modern statistical methodology developed for these types of big data, including multiple testing rules, variable selection and dimension reduction methods. Students can learn hands-on experiences through statistical analysis of specific data sets from the databases of the National Center for Biotechnology Information (NCBI) at the National Institute of Health (NIH).

Research Thrust C: Healthcare Engineering and Healthcare/Biomedical Analytics
  1. (Dr. Ulrike Dydak) Dydak's area of expertise is in Magnetic Resonance Imaging and Magnetic Resonance Spectroscopy. She also maintains a research lab at the Indiana Institute for Biomedical Imaging Sciences (IIBIS) at the Indiana University School of Medicine. She is currently working with colleagues in Biostatistics, Neurology, Toxicology and Psychiatry, designing and implementing clinical MRI/MRS studies. For instance, at present, they are working on finding significant effects in datasets that contain measurements of metabolite concentrations from different brain regions, and they correlate those measurements with biological measures, diagnostic groups, levels of environmental exposure and other measures.

  2. (Dr. Haslyn Hunte) As Assistant Director of the Center on Poverty and Health Inequities (COPHI) at Purdue University, I work on reducing poverty-related inequities through partnerships with local communities. We study trends and problems such as insufficient access to food, barriers to treatment and health care inequalities, and inequities in policies, especially with regard to the poorer segments of populations. We believe that students benefit from seeing the full scope of the data analysis and policy research that we work on. For example, I believe that some of the Sophomore participants will appreciate (and perhaps even relate to) the variables I study as a part of a funded project of the health care safety net population. The specific aims of the research project are to provide insights to strategies that will 1) improve cost effective healthcare delivery and 2) reduce disparities in health outcomes for vulnerable populations by establishing new methods for the planning and operation of the safety net system. The research objectives: 1) Identify and map the locations of where individuals live and where they receive care within the core safety net provider system. 2) Determine whether bypass behavior is exhibited by patients for each episode of care they seek. 3) Determine the association among sociodemographic variables and care seeking behavior. To achieve our objectives we will utilize a data set with 69 million patient encounters over a five-year period from the Indianapolis Metropolitan Statistical Area.

  3. (Dr. Haslyn Hunte) I am also engaged in more traditional social epidemiology research that would also provide opportunities for one or more mentees. Using several large datasets, I am interested in the following research questions: 1) What is the association between experiences of interpersonal discrimination and health behaviors and health outcomes? To what extent, if any, does racial/ethnic discrimination explain any of the observed racial/ethnic disparities in health related outcomes like tobacco and alcohol use/abuse, obesity and elevated blood pressure? 3) What is the relationship between positive psychological functioning and psychological challenges such as discrimination and how do they interact to produce the absence or presence of poor health. 4) Does the heterogeneity within the US Black population explain any of the observed Black-White disparities in health outcomes?

  4. (Dr. Nan Kong) Feature Selection in Efficient and Effective Analysis of Photoacoustic Imaging Data for Plaque Vulnerability Characterization in Acute Cardiovascular Syndrome: Of all pathological features, lipid rich core, thin fibrous cap, and infiltration of inflammatory cells are considered as three major hallmarks of acute cardiovascular syndrome. Current imaging modalities have limited abilities to characterize plaque vulnerability. In this project, we will analyze the PA microscopy spectral data collected by a phantom made of clusters of cholesterol ester (CE) and cholesterol crystal (CC). For the spectral data analysis, an important aspect is to identify and select features with which we can efficiently and effectively distinguish CEs and CCs. The funded student is expected to work closely with Dr. Kong and periodically participate in Dr. Ji-xin Cheng's intravascular photoacoustic research team meeting when data analysis is discussed.

  5. (Dr. Nan Kong) Fall Detection using 3D Accelerometer Data: Falls are a common problem for the elderly, often resulting in hospitalization. Despite extensive preventive efforts, falls continue to be a major source of morbidity and mortality among elderly. Real-time detection of falls may enable rapid medical assistance, thus increasing the sense of security of the elderly and reducing some of the negative consequences of falls. In this project, we will analyze 3D accelerometer data collected on simulated falls performed by healthy volunteers. The objective of the project is to develop fall detection algorithms and conduct comparative studies. The funded student is expected to work closely with Dr. Kong and periodically meet with Dr. Babak Zaire, Professor of Electrical and Computer Engineering and Dr. Shirley Rietdyk, Professor of Health and Kinesiology.

  6. (Dr. Mark Lawley) The first project involves a large data set with 30 years of patient data on outpatient appointments, emergency department usage, hospitalizations, and laboratory results. The students would need to first understand the nuances of working with de-identified medical data, confidentiality, HIPAA guidelines, etc. The intended outcome of the project would be a set of models for predicting the cost and health impacts no- behavior (failing to attend a scheduled medical appointment) for chronically ill patients. Our past work has shown a strong correlation between no-show behavior and increased use of hospital resources, but we need additional work to better explore and understand these relationships. Because we are all users of the U.S. healthcare system, this is an important problem with which the students can easily relate. Further, it introduces them to a number of important statistical techniques in a practical, concrete way in a context that they can appreciate.

  7. (Dr. Mark Lawley) Another project involves diabetes. Students should, once again, relate to the context of this problem since they will almost certainly have relatives or close acquaintances afflicted by diabetes. The management of diabetes requires a careful balancing act. Patients that have chronically high glucose levels (hyperglycemia) risk long term damage to the kidneys, heart, eyes, and feet. On the other hand, over-control of glucose levels can lead to short term glucose levels that are far too low (hypoglycemia), which can cause dizziness, incoherence, fainting, and other problems. We are in the process of obtaining a large data set on diabetic patient glucose levels which we will use to study this problem of short and long term risk balancing. The students could help with time series analysis and learn about some of the simulation and optimization techniques we intend to use in the work.

  8. (Dr. Laura Prouty Sands) My training in multivariate modeling and psychometric analysis of survey instruments combined with my 25 years of research in health outcomes research reveals that I have the content expertise to effectively mentor undergraduates interested in learning how to analyze and interpret healthcare practice and policy relevant databases. My research is focused on determining optimal care pathways for vulnerable older adults. Currently I am funded by two NIH grants. The first assesses risks for post-operative cognitive decline among older surgical patients. My role on that project is to develop the methods for detecting post-operative cognitive decline and to supervise analyses of project data.

  9. (Dr. Laura Prouty Sands) The second project is directed toward determining health outcomes of unmet need for disabilities among older adults using survey and Medicare claims data. I have mentored eight Ph.D. students from the Department of Statistics, as well as two undergraduate Statistics students. I have access to a wide range of datasets related to healthcare practice and policy, e.g., the Interuniversity Consortiun for Political and Social Research (ICPSR), as well as data from the Centers for Medicare and Medicaid and the Centers for Disease Control and Prevention (CDC).

  10. (Dr. Cleveland Shields) Every semester, we have 4 to 6 undergraduate students working in our Relationships and Healthcare Lab, which I co-direct with Dr. Melissa Franks. Thus, we have considerable experience integrating undergraduates in research projects. We can involve students in analyzing data on three projects. First, students can work with data analysis for a project Dr. Franks and I are conducting of hospital readmission of patients with diabetes examining discharge planning and family involvement.

  11. (Dr. Cleveland Shields) Second, colleagues and I in the Regenstrief Center for Healthcare Engineering (RCHE), are conducting a study of health services utilization to identify geographic locations producing high utilization and costs using longitudinal data from medical records from St. Vincent Hospital Systems in Indianapolis area. Students could help with the design and conduct the analysis for this project.

  12. (Dr. Cleveland Shields) Finally, I am conducting a field experiment examining physician-patient communication. We will be gathering 240 audio recordings of interactions between physicians and actors who will be portraying a patient role. Dr. Sharon Christ serves as the statistician on this project. We will be conducting psychometric analyses to understand the underlying constructs in the measurement of communication in these medical encounters. This presents a context that is surprising and is likely to pique students' interests.

  13. (Dr. Lingsong Zhang) Zhang is working with Lawley (Biomedical Engineering) and Sands (Nursing) on statistical modeling of patient ``no-show''. They are collaborating with Alliance of Chicago, using scheduling data and electronic medical records from 7 clinics over 3 years. The focus is on diabetic patients, who visit regularly. They want to involve students in using scheduling history and demographic factors (payer class, income, education, age) to predict no-show probability and uncertainty.

  14. (Dr. Lingsong Zhang) Zhang is also working with S. Witz and K. Musselman (from the Regenstrief Center for Healthcare Engineering, RCHE), H. Wan from Purdue Industrial Engineering, and J. Castro from University of South Florida, on analysis of hospital readmission characteristics and prediction. RCHE is working with BayCare Health System (Tampa, FL) and St. Vincent Health (Indianapolis, IN), using multiple-year discharge data and patient characteristics, to identify (1) readmission probability upon discharge, (2) clinical/demographic factors associated with readmission, and (3) performance comparisons of hospital readmissions.

Research Thrust D: Probability; Theoretical Statistics; Image Processing; Financial Modeling
  1. (Dr. Guang Cheng) Students will learn resampling using bootstrap methods. The bootstrap is widely applicable for inference in massive data; however, bootstrap is computationally demanding. Kleiner et al. introduce the Bag of Little bootstrap (BLB): a robust, computationally efficient means of assessing the quality of estimators; it combines the results of bootstrapping multiple, small subsets, on parallel computing architectures. G. Cheng proposes to investigate with students whether the application of the m out of n bootstrap or subsampling in each subset in the BLB bootstrap will overcome the inconsistency in the bootstrap. Longer term: he wants to study BLB under the settings of M-estimation with a student, e.g., its consistency, asymptotics and computational efficiency in dealing with massive data.

  2. (Dr. Raghu Pasupathy) Motivated by contexts such as air quality measurement using cheap sensors, energy monitoring through smart meters in large buildings, and tracking stock tickers on mobile devices, we ask: Is existing statistical and simulation methodology adequate for online "big data" contexts? How should methods for estimating traditional statistical measures, e.g., quantiles, conditional value-at-risk adapt to the online context? Are there low-storage, fast-compute versions of function estimators, e.g., kernel densities, stochastic kriging, that are just as accurate as existing estimators? Students will help to construct and analyze O-estimators --- a new class of estimators characterized by (provably) minimal storage and update complexities, and having convergence rates matching those of analogous traditional statistical objects.

  3. (Dr. Ilya Pollak) An important area of image processing that I work in is segmentation, i.e., developing computer algorithms for automated detection of object boundaries in images. This is a critical image analysis step in problems arising in many areas, such as biomedical imaging, computer vision, and microscopy of materials. The analysis of such algorithms requires statistical comparisons of the algorithms' outputs on a large image database with ground-truth segmentations. Constructing ground-truth segmentations and writing basic utilities for such comparisons (in R or in Matlab) would be a great sophomore research project.

  4. (Dr. Ilya Pollak) In the area of quantitative finance, there are a number of recent papers on the analysis of so-called technical indicators. A very interesting sophomore research project would be to read one of these papers, implement (again, in R or Matlab) several indicators described therein and conduct statistical analysis of forecasting performance of these indicators on real market data.

  5. (Dr. Xiao Wang) Wang is currently working with Professors Chuanhai Liu and Lingsong Zhang on projects related to statistical computing, spatial statistics and image analysis. Dr. Wang has ongoing work with one undergraduate student, Yuxi Yang, on statistical modeling of ozone, nitric oxide and nitrogen dioxide levels in California. He wants to include more undergraduate students in this project.

  6. (Dr. Mark Daniel Ward) Students in Ward's research group will analyze asymptotic properties of randomly-generated sequences and trees using probabilistic generating functions, and simulations in R, as well as some Maple, for solving recurrences and deriving asymptotics. Undergraduates can also work with Ward on stochastic leader election algorithms.

Research Thrust E: Human Development and Family Studies
  1. (Dr. Edward Bartlett) Bartlett's research area is sensory neurophysiology. The research focus is to dissect the neural circuits involved in the neural coding of sound features across the lifespan, from early development through adulthood and age-related decline. Neural data are obtained from recordings of single neurons and neural populations in response to speech-like and simple sounds. In addition, realistic computational models of single neurons or small groups of neurons are constructed to understand the data.

  2. (Dr. Sharon Christ) Christ can guide students on analysis of large survey data collected from people. One sample is representative of the children involved with Child Protective Services in the U.S. and the other is representative of all non-institutionalized adults in the U.S. These data are longitudinal and involved complex sampling such as clustering (non-independence) and unequal selection probabilities. As a result, trainees will learn how to apply probability weighted estimation and variance estimates that are robust to clustering. In addition, these surveys suffer from missing data and measurement errors due to self-reported nature of the data collection. Trainees will learn modeling approaches used to avoid biases due to missing data and measurement error. The statistical analysis will be applied to the study of the effects of maltreatment on adolescent development and the effects of work and working conditions on health and well-being of youth and young adults.

  3. (Dr. Sharon Christ) Christ will work with Weber-Fox and her students on the sample of children observed in her audiology lab. In this study, they will work on modeling changes in stuttering over time, and what characteristics are correlated with persistent versus desisted stuttering.

  4. (Dr. Sharon Christ) For another study, students can use the National Health Interview Survey (NHIS) to evaluate how occupations are related to smoking, alcohol use, exercise, asthma, heart disease, etc. NHIS is the national data set used to survey the U.S. adult population with respect to health.

  5. (Dr. Lisa Goffman) I study specific language impairment (SLI) in children. Children with SLI show cognitive abilities within normal levels, but significantly impaired language abilities. Although their cognition is typical, it has recently been found that these children also often show impairments in their gross and fine motor skills. Because children with SLI are at risk for long-term social and academic difficulties, there is a critical need to understand the factors underlying their language and motor deficits and to develop efficacious approaches to treatment. In my NIH funded research, we are presently conducting a longitudinal study of children with SLI to better understand how their language and related motor skills change from the preschool to the school age years. We include standard language and motor measures as well as direct recordings of speech and limb movement. Our goal is to better understand how language and motor domains develop in these children, and how they change over the critical early school years.

  6. (Dr. Christine Weber-Fox) My work is in neural systems for language processing in typical development and in those with communication disorders such as stuttering or language impairment. I also have clinical experience working in both hospital (outpatient, inpatient, and acute care) and school settings. The motivation to study language processing and its connections to stuttering is apropos for sophomore students, who can readily understand the reasoning and context for why this context is important. (The recent success, for instance, of The King's Speech demonstrates that this is a topic of broad concern and interest.) My work also focuses on how neural subsystems may differ in speakers with different language experiences and communication skills, as brain processes for language vary even across individuals with 'normal' language abilities. The type of data to be analyzed in our research group include behavioral/clinical measures from children, such as cognitive test scores, including nonverbal IQ and working memory, as well as detailed measures of their speech and language performance. In addition, our data set includes physiological measures of brain activity (Event-related Brain Potentials, ERPs). As the student becomes familiar with the research goals and methods, the expected outcome for a statistics sophomore student is for them to help manage and analyze large data sets that cross domains (behavioral, electrophysiological) and span longitudinally from 4-9 years of age.

  7. (Dr. Ellen Wells) The Deep Green and Healthy Homes project was sponsored by the nonprofit Environmental Health Watch, in Cleveland, Ohio (PI: Stuart Greenberg). It compares two standards of energy efficiency renovations in low-to-moderate income housing in Cleveland, Ohio. Six homes were renovated to standard energy efficiency recommendations (~50% energy savings); 6 additional were renovated to a stricter standard (~75-90% energy savings) and included mechanical ventilation systems to help preserve air quality. Homes were monitored just after renovation and for ~ 1 year following renovation using new indoor air quality technology, and home visits were conducted every three months to conduct visual inspections, take indoor air quality measurements with field instruments, and collect data from participants via questionnaire. The new indoor air quality monitoring technology incorporates low-cost gas /temperature /relative humidity sensors into a single platform which wirelessly transmits data from the field site to our servers twice/minute. Six parameters are included in the sensors: temperature, relative humidity, CO, CO2, NOx, and total VOCs. We developed calibration equations which incorporate data from all sensors within the unit (the sensors will respond, somewhat more weakly, to a gas similar in structure to its target gas). For most homes we collected more than 1 million rows of data. Potential projects using these data include: Further methodological analysis of calibration/data transmission from the new technology; Correlation of data from new technology compared to standard field instruments; Comparing the two renovation types with regards to air quality or energy use; Description of continuous data patterns from remote monitors on a daily/weekly/etc. scale; Description of data before/during/after an event which would affect air quality (i.e., ventilation system not working, dispersal of an enormous amount of mothballs, etc.); Analysis of how occupant behavior may affect energy use/indoor air quality.

Research Thrust F: Statistical Consulting Service
  1. (Dr. Bruce Craig and Ms. Ce-Ce Furtner) The SCS has over 200 research consultations/year, serving clients from every College in the University. Any Purdue faculty, staff member, or student can be a client, for free, to receive statistical consulting and advice. Consultants help with proposal preparation, design of studies, data import/export, data analysis, and interpretation and presentation of results. For funding reasons, the SCS only employs grad students, but Director Craig is willing to involve undergraduates from this MCTP project. C. Furtner (Manager), former UG academic advisor, knows what is feasible for undergraduate students. Consulting will: lead UG's to apply for graduate study in Applied Statistics; boost communication skills; and sometimes result in papers with clients. Undergraduate students will attend meetings---led by grad student consultants---and will help with the data analysis. Listening at meetings, UG's will get an early, tangible understanding of how modeling, time series, design of experiments, etc., are used in practice.

Research Thrust G: Coastal Margin Observation & Prediction
  1. (Dr. Tawnya Peterson and Dr. António Baptista) (Please note that the summer component of this thrust is in Portland, Oregon, and would require summer travel.) The NSF Science and Technology Center for Coastal Margin Observation & Prediction (CMOP) is dedicated to the study of estuaries as bioreactors that deliver unique ecosystem services, including the filtering of land inputs into the ocean. We use the Columbia River estuary as our long-term testbed, and we support our research through continuous high-resolution observations and simulations of a vast array of multi-disciplinary variables. Diverse opportunities for statistical analysis of the data are available for undergraduate students, in association with understanding physical and ecological processes, assessment and control of the quality of observations, and assessment and improvement of computational models. These opportunities are available during the summer or---by special arrangement---throughout the year. Because of the inter-institutional and inter-disciplinary nature of CMOP, students can work with leading scientists at three universities: Oregon Health & Science University, Oregon State University and University of Washington.

    This material is based upon work supported by the National Science Foundation under Grant No. 1246818. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.