Research Thrusts (2017-18): Statistics Living-Learning Community

Research Thrust A: Atmospheric/Earth Science
  1. (Dr. Michael Baldwin) My research group is focused on one of the most important challenges in the atmospheric sciences: improving the understanding and prediction of high-impact weather events. These weather events (e.g., tornados, droughts, flooding precipitation) affect public safety as well as many sectors of the economy, such as agriculture, energy, water resources, and the insurance industry; their costs can be severe and wide-ranging. My group has focused on both the short-term prediction problem as well as the longer-term challenge of understanding the effects of global climate change on high-impact weather systems. More specifically, we have made substantial contributions to the understanding of weather systems through the development and application of automated analysis procedures that identify and analyze such systems in meteorological data. The algorithms developed within my research group have allowed for the rapid analysis of massive data sets, such as multi-decade downscaled climate simulations and evaluation of high-resolution forecasts covering periods of multiple years. My research group recently developed a prototype, real-time prediction system to directly measure the characteristics of precipitating weather systems in high-resolution model forecasts. Through the application of image processing techniques, weather features are automatically identified, characterized, classified, and tracked over time; we apply appropriate statistical models, and we evaluate the resulting predictions.

  2. (Drs. Julie Elliott and Lucy Flesch) The Geophysics group uses geodetic data to answer questions relating to the movement of tectonic plates, slip along continental faults, generation of earthquakes and mountain growth, and movement of glaciers. With new satellites in orbit the the amount of geodetic observations of the Earth from space is allowing for unprecedented exploration of the changes in the Earth's surface over short time scales. Recent significant increases in the amount of freely available geodetic data available requires development of data processing and tracking algorithms to generate detailed times series of surface motions.

  3. (Dr. Wen-wen Tung) The Earth System Dynamic Predictability Laboratory studies the dynamics and the predictability of phenomena in the Earth systems on a variety of temporal and spatial scales. We draw data readily available in the public domain or from our own physical model simulations, and we apply or develop methods to perform data analysis. The main thrust of our research is to ask significant and domain-relevant questions. Our work does not end at the analysis results; we train our team, as opportunities often rise, to assess and interpret the results at levels that can be disseminated to stakeholders in the public or private sectors. Students interested in applying to work with us can consider one of the following beginning questions: 1. Does urban air pollution affect landfalling typhoons or hurricanes? How and how much? 2. When an atmospheric river (a very cool phenomenon, check it out!) transits into precipitating weather systems over the US, does the associated phase change of water manifest in Sun--Earth radiation energy budget? How and what are the consequences? 3. Has the changing climate been manifested in global biome and how? What are the implications?

  4. (Dr. Yutian Wu) Our research group aims at understanding the dynamical processes in the large-scale circulation of the atmosphere and how the processes respond to anthropogenic climate change. One current research project eyes on the fastest warming region in the globe - the Arctic. We are particularly interested in questions like - what are the processes that cause the Arctic warming, how the Arctic warming affects the weather and climate in North America, and are we going to suffer more extreme weather events in the future? The project will be of both scientific and societal importance for better understanding and predicting the future climate in North America. The project will use both observational datasets and state-of-the-art global climate model simulations. Analysis techniques such as time series analysis, spectral decomposition, maximum covariance analysis will be utilized.

Research Thrust B: Biostatistics
  1. (Dr. Ruben Claudio Aguilar) My cell biology laboratory is particularly interested in basic cellular mechanism with emphasis in vesicle trafficking (e.g., intracellular protein transport). We daily produce enormous data sets from our morphometric analysis of microscopy-generated cell images. The analysis of these (and similar) result collections will be valuable to the students and useful to us. We expect that following an initial training, the students will be able to propose and discuss the advantages and disadvantages of different analytical approaches and to actively participate in the experimental design. In the past, I have successfully recruited undergraduate students from the biology courses I teach. In addition, I participate in the NSF-backed Louis Stokes Alliance for Minority Participation (LSAMP) program and the Purdue Summer Research Opportunity Program (SROP). In our lab, undergraduate students receive scientific training and are presented with the opportunity of pursuing independent research sub-projects. In addition, our undergraduates participate in lab meetings (where they are encouraged to participate and ask questions), and they are being trained in the good practices of scientific presentation. Indeed, our students have been very successful in their research endeavors; we have multiple awards to poster presentations in undergraduate research events and several paper authorships.

  2. (Dr. Laszlo Csonka) We have two potential projects that could involve sophomore students. One of these involves comparing the rates of evolution of "meaningless" non-coding sequences and gene-coding sequences in Escherichia coli, Salmonella enterica, and other closely related Enterobacteriaceae.

  3. (Dr. Laszlo Csonka) The second one consists of investigation of the conservation of gene order (synteny) in distant species of bacteria. Both of these projects require analyses of very large DNA sequence data sets, and therefore would be ideal for computer-savvy statistics majors. It will be a great learning experience for them to be exposed to the data, vocabulary, and way of thinking of biologists.

  4. (Dr. George Moore) Our research and collaborations involve veterinary medical and veterinary public health data generated from Purdue's Veterinary Teaching Hospital, large veterinary practices, or commercial veterinary diagnostic laboratories. Projects for student involvement will include practical applications of medical dataset structure, handling missing patient data, appropriate statistical methods, and presentation of data/findings for veterinary clinical audiences and publication.

  5. (Dr. Doraiswami Ramkrishna) In my research group, we have been developing mathematical models to describe metabolism since the 1980s. In doing so, we have developed our own theory to describe the metabolic behavior of cells. Our main goal is to compare our model predictions with high throughput bioinformatic data that represent intricate intracellular processes on a genomic level. A variety of technologies are equipped with the power to provide the needed quantity of data including microarrays, RNA-seq, and protein mass-spectroscopy. The overall goal of this project is the validation of a metabolic theory by means of extracting patterns from data. Looking for trends in the differential expression of genes in volumes of omic data--and comparing them with model predictions--presents the opportunity for the authentication of this model at the genome level. Approaches for analyzing high throughput bioinformatic data are diverse and extend to data mining, Bayesian statistics, and Markov Chain Monte Carlo analysis.

  6. (Dr. Maria Sepulveda) Water quality has a huge influence on fish physiology. Marine fish of course are healthier when raised in high salinity (30 ppt) water. However, it is cheaper and easier to raise marine fish in low salinity (< 5 ppt) conditions. This is important because we are relying more and more on hatchery raised fish for our consumption since most marine fish stocks have been depleted. We raised Florida Pompano, a marine fish, under low and high salinity conditions and noticed that some of the fish raised under low salinity conditions did very well while others got sick and died. We collected tissues responsible for osmoregulation (kidneys, liver, gills and gastrointestinal tract) from healthy and sick fish and conducted Next Generation Sequencing to determine differentially expressed genes in these two groups of fish. Specific objectives of this project include: 1) establish transcriptome libraries for gill, liver, kidney and gastrointestinal tract of Florida pompano reared in high and low salinities; 2) identify gene transcripts for osmoregulatory genes, key metabolic enzymes and stress response; 3) compare gene transcript abundance between Florida pompano reared in high and low salinities; and 4) discover unique sequences that may play key roles in the adaptability of marine fish to low salinity.

  7. (Dr. Lyudmila Slipchenko) We develop a new polarizable force field BioEFP for modeling processes in biology, biomedicine and materials. Potential applications of BioEFP are in drug design, cancer research, bioimaging and photovoltaics. BioEFP is based on ideas derived from quantum mechanics and does not contain parameters fitted to experiment. Instead, parameters are obtained from electronic structure calculations on chemical fragments. The accuracy of the BioEFP force field is superior to the accuracy of common classical force fields. One of the main shortcomings of BioEFP is that the parameters are not readily available but have to be computed a priori. To overcome this obstacle, we propose to create an online repository of pre-computed fragment parameters and develop a similarity search algorithm that would ascribe each fragment of a biological or materials macromolecule to a pre-defined fragment. As a longer-term goal, we propose to interface a high performance computing (HPC) cluster with a web-interface such that missing parameters could be computed on-the-fly. We expect the fragment database will contain several thousands of chemically unique fragments; the amount of data associated with each fragment ranges from several Kb to several Mb.

  8. (Dr. Jun Xie) Students will learn about statistical methods for large-scale genomic data analysis. Nowadays whole-genome genetics information is commonly available in disease studies and clinical trials. For example, genome-wide associate studies analyze a large amount of common genetic variants, i.e., single nucleotide polymorphisms (SNPs), in individuals to examine if any genetic variants are associated with a disease. Another example is pharmacogenomics research, which uses whole genome information to predict individuals' drug response. Students will learn about modern statistical methodology developed for these types of big data, including multiple testing rules, variable selection and dimension reduction methods. Students can learn hands-on experiences through statistical analysis of specific data sets from the databases of the National Center for Biotechnology Information (NCBI) at the National Institute of Health (NIH).

Research Thrust C: Healthcare Engineering and Healthcare/Biomedical Analytics
  1. (Dr. Azza Ahmed) Ahmed's research is focused on developing and testing interventions that support and improve breastfeeding outcomes among vulnerable populations, specifically, preterm infants and low-income mother/infant dyads. She has been collaborating with the Indiana WIC program to study breastfeeding outcomes among late preterm and early term infants in a longitudinal study. She designed LACTOR, an interactive web-based breastfeeding monitoring system. She just finalized a randomized control trial to test the effect of LACTOR on breastfeeding outcomes with a large online dataset. Dr. Ahmed is also collaborating with Purdue Animal Sciences, Purdue Statistics, and Eskinazi Health, in a longitudinal study to test the effect of sleep quality during pregnancy on breastfeeding outcomes. She is also collecting data on peripartum depression, night eating habit and obesity.

  2. (Dr. Ulrike Dydak) Dydak's area of expertise is in Magnetic Resonance Imaging and Magnetic Resonance Spectroscopy. She also maintains a research lab at the Indiana Institute for Biomedical Imaging Sciences (IIBIS) at the Indiana University School of Medicine. She is currently working with colleagues in Biostatistics, Neurology, Toxicology and Psychiatry, designing and implementing clinical MRI/MRS studies. For instance, at present, they are working on finding significant effects in datasets that contain measurements of metabolite concentrations from different brain regions, and they correlate those measurements with biological measures, diagnostic groups, levels of environmental exposure and other measures.

  3. (Dr. Nan Kong) Acoustic Data Analysis for Characterizing Pet Dog's Behaviors: Behavior problems shown by pet dogs are considered to reflect their suboptimal physiological condition and social environment; however, an etiology of each behavior problem has yet to be revealed because the clinical population is likely heterogeneous. Our laboratory has studied about behavior responses as well as physiological variables of canine behavior problems. The current project will focus on vocal sounds of dogs with behavior problems to investigate if there are distinct acoustic patterns, which can help us infer different underlying emotional motivations in affected dogs. The funded student will use sound editing software, like Adobe Audition, to extract features from the voice files, and perform multivariate analysis, e.g., discriminant function analysis, and univariate analysis, e.g., ANOVA. The funded student will work closely with Drs. Kong and Ogata, Assistant Professor of Animal Behaviors from the College of Veterinary Medicine. The student will be assisted by Miss Carolina Vivas-Valencia, a PhD student from Dr. Kong's research lab.

  4. (Dr. Nan Kong) Glucometer Data Analysis for Understanding the Impact of Activities and Interventions on Diabetes Management: A chronic disease is a medical condition that can last for a long period of time and can progressively cause disability, and even death. Many chronic diseases are caused by, or exacerbated by, multiple environmental features or behavioral factors, such as tobacco use, diets high in fact, and physical inactivity. Type-II diabetes mellitus is such a chronic disease for which we have little understanding on how environmental and behavioral variables can influence at the individual level. A group of Indian University Medical School researchers have engaged diabetes patients in a human subject study in which the diabetes patients' glucose data are continuously recorded together with the corresponding activity ("eating", "sitting", etc.) information entered by the experimental subjects. The funded student will perform wavelet based feature extraction on continuous glucose monitoring data and develop various classifiers to predict hypoglycemia events (extremely low glucose level). The funded student will work closely with Drs. Kong, and have chance to attend regular meetings with a cross-disciplinary group of researchers at the Indiana University for Aging Research Center (i.e., biweekly phone meeting and biweekly face-to-face meeting). The student will be assisted by Miss Carolina Vivas-Valencia, a PhD student from Dr. Kong's research lab.

  5. (Dr. Cleveland Shields) Every semester, we have 4 to 6 undergraduate students working in our Relationships and Healthcare Lab, which I co-direct with Dr. Melissa Franks. Thus, we have considerable experience integrating undergraduates in research projects. We can involve students in analyzing data on three projects. First, students can work with data analysis for a project Dr. Franks and I are conducting of hospital readmission of patients with diabetes examining discharge planning and family involvement.

  6. (Dr. Cleveland Shields) Second, colleagues and I in the Regenstrief Center for Healthcare Engineering (RCHE), are conducting a study of health services utilization to identify geographic locations producing high utilization and costs using longitudinal data from medical records from St. Vincent Hospital Systems in Indianapolis area. Students could help with the design and conduct the analysis for this project.

  7. (Dr. Cleveland Shields) Finally, I am conducting a field experiment examining physician-patient communication. We will be gathering 240 audio recordings of interactions between physicians and actors who will be portraying a patient role. Dr. Sharon Christ serves as the statistician on this project. We will be conducting psychometric analyses to understand the underlying constructs in the measurement of communication in these medical encounters. This presents a context that is surprising and is likely to pique students' interests.

  8. (Dr. Lingsong Zhang) Zhang is working with Lawley (Texas A&M) and Sands (Virginia Tech) on statistical modeling of patient ``no-show''. They are collaborating with Alliance of Chicago, using scheduling data and electronic medical records from 7 clinics over 3 years. The focus is on diabetic patients, who visit regularly. They want to involve students in using scheduling history and demographic factors (payer class, income, education, age) to predict no-show probability and uncertainty.

  9. (Dr. Lingsong Zhang) Zhang is also working with S. Witz and K. Musselman (from the Regenstrief Center for Healthcare Engineering, RCHE), H. Wan from Purdue Industrial Engineering, and J. Castro from University of South Florida, on analysis of hospital readmission characteristics and prediction. RCHE is working with BayCare Health System (Tampa, FL) and St. Vincent Health (Indianapolis, IN), using multiple-year discharge data and patient characteristics, to identify (1) readmission probability upon discharge, (2) clinical/demographic factors associated with readmission, and (3) performance comparisons of hospital readmissions.

Research Thrust D: Probability and Theoretical Statistics
  1. (Dr. Guang Cheng) Students will learn resampling using bootstrap methods. The bootstrap is widely applicable for inference in massive data; however, bootstrap is computationally demanding. Kleiner et al. introduce the Bag of Little bootstrap (BLB): a robust, computationally efficient means of assessing the quality of estimators; it combines the results of bootstrapping multiple, small subsets, on parallel computing architectures. G. Cheng proposes to investigate with students whether the application of the m out of n bootstrap or subsampling in each subset in the BLB bootstrap will overcome the inconsistency in the bootstrap. Longer term: he wants to study BLB under the settings of M-estimation with a student, e.g., its consistency, asymptotics and computational efficiency in dealing with massive data.

  2. (Dr. Raghu Pasupathy) Motivated by contexts such as air quality measurement using cheap sensors, energy monitoring through smart meters in large buildings, and tracking stock tickers on mobile devices, we ask: Is existing statistical and simulation methodology adequate for online "big data" contexts? How should methods for estimating traditional statistical measures, e.g., quantiles, conditional value-at-risk adapt to the online context? Are there low-storage, fast-compute versions of function estimators, e.g., kernel densities, stochastic kriging, that are just as accurate as existing estimators? Students will help to construct and analyze O-estimators --- a new class of estimators characterized by (provably) minimal storage and update complexities, and having convergence rates matching those of analogous traditional statistical objects.

  3. (Dr. Jianxi Su) Current regulatory frameworks require enhanced techniques for measuring and managing extremal risks of financial enterprises. In particular, this involves analyses of standalone risks and dependence structures among them. Su's research aims to develop analytically tractable and practically interpretable quantitative risk management tools to analyze the dependencies among actuarial/financial risks. In this project, students will adopt a novel class of full-range tail dependence copulas to model large volumes of financial data.

  4. (Dr. Xiao Wang) Wang is currently working on projects related to statistical computing, spatial statistics and image analysis. Specifically, Dr. Wang is developing deep learning methods for neuroimaging data. For example, one of the studies is to use the predictive value of ultra-high dimensional imaging data and/or other scalar predictors (e.g., cognitive score) for clinical outcomes including diagnostic status and the response to treatment in the study of neurodegenerative and neuropsychiatric diseases, such as Alzheimer's disease (AD). The growing public threat of AD has raised the urgency to discover and validate prognostic biomarkers that may identify subjects at greatest risk for future cognitive decline and accelerate the testing of preventive strategies. In this regard, prior studies of subjects at risk for AD have examined the utility of various individual biomarkers, such as cognitive tests, fluid markers, imaging measurements, and some individual genetic markers (e.g., ApoE4 gene), to capture the heterogeneity and multifactorial complexity of AD (reviewed in Weiner et al. 2012). He wants to include more undergraduate students in this project.

  5. (Dr. Mark Daniel Ward) Students in Ward's research group will analyze asymptotic properties of randomly-generated sequences and trees using probabilistic generating functions, and simulations in R, as well as some Maple, for solving recurrences and deriving asymptotics. Undergraduates can also work with Ward on stochastic leader election algorithms or on data-driven problems in game theory.

Research Thrust E: Human Development and Family Studies
  1. (Dr. Edward Bartlett) Bartlett's research area is sensory neurophysiology. The research focus is to dissect the neural circuits involved in the neural coding of sound features across the lifespan, from early development through adulthood and age-related decline. Neural data are obtained from recordings of single neurons and neural populations in response to speech-like and simple sounds. In addition, realistic computational models of single neurons or small groups of neurons are constructed to understand the data.

  2. (Dr. Elliot Friedman) Some of our work involves the use of data from large, nationally representative survey-based studies, and a perpetual issue with such studies is missing data. In some cases these data are missing randomly (e.g. people skipped a question by accident), and in some cases it may not be random (e.g. possible reluctance to answer questions about income or more sensitive topics). Students will have the opportunity to look for patterns of missing data to determine whether they are random or systematic. They will also devise appropriate strategies for imputing missing values in order to increase the power and reliability of analyses based on these data.

Research Thrust F: Statistical Consulting Service
  1. (Dr. Bruce Craig and Ms. Ce-Ce Furtner) The SCS has over 200 research consultations/year, serving clients from every College in the University. Any Purdue faculty, staff member, or student can be a client, for free, to receive statistical consulting and advice. Consultants help with proposal preparation, design of studies, data import/export, data analysis, and interpretation and presentation of results. For funding reasons, the SCS only employs grad students, but Director Craig is willing to involve undergraduates from this MCTP project. C. Furtner (Manager), former UG academic advisor, knows what is feasible for undergraduate students. Consulting will: lead UG's to apply for graduate study in Applied Statistics; boost communication skills; and sometimes result in papers with clients. Undergraduate students will attend meetings---led by grad student consultants---and will help with the data analysis. Listening at meetings, UG's will get an early, tangible understanding of how modeling, time series, design of experiments, etc., are used in practice.

Research Thrust G: Coastal Margin Observation & Prediction
  1. (Dr. Tawnya Peterson and Dr. António Baptista) (Please note that the summer component of this thrust is in Portland, Oregon, and would require summer travel.) The NSF Science and Technology Center for Coastal Margin Observation & Prediction (CMOP) is dedicated to the study of estuaries as bioreactors that deliver unique ecosystem services, including the filtering of land inputs into the ocean. We use the Columbia River estuary as our long-term testbed, and we support our research through continuous high-resolution observations and simulations of a vast array of multi-disciplinary variables. Diverse opportunities for statistical analysis of the data are available for undergraduate students, in association with understanding physical and ecological processes, assessment and control of the quality of observations, and assessment and improvement of computational models. These opportunities are available during the summer or---by special arrangement---throughout the year. Because of the inter-institutional and inter-disciplinary nature of CMOP, students can work with leading scientists at three universities: Oregon Health & Science University, Oregon State University and University of Washington.

Research Thrust H: Saving Nature with Statistics
  1. (Dr. Songlin Fei) Forests provide a wide variety of vital services such as timber and clean water, but they are challenged by the changing climate. Our lab strives to understand the impact of climate change on forests and the resulting impact on future climate. We use continental-wide, long-term (1980-present) data collected by the US Forest Service to understand a set of questions such as: Are trees migrating and at what rates? How are recruitment and growth of trees affected? What are the consequences of climate-induced species composition changes? Students can learn how to explore and analyze large data in a spatial and temporal context.

  2. (Dr. Songlin Fei) Invasion of exotic plant species has caused serious ecological degradation and economic losses. Our lab is working to build predictive models to understand regional invasion patterns and processes that will advance the discipline of invasion ecology and assist in effective management policy and control practices to combat invasive species. We are interested in understanding a set of questions including: (1) Why are certain exotics more invasive? (2) Why are certain ecosystems more prone to invasion? (3) What are the main factors facilitating invasion? Students interested in this topic can use continental-wide (or subset of) invasion databases to explore these or related questions. Students will learn how to manage large datasets and practice model fitting, multivariate analyses, spatial analysis, etc.

  3. (Dr. Songlin Fei) Hellbenders, a gigantic, aquatic salamander species found in North America, are declining throughout their range. In Indiana, hellbenders are now confined to a single watershed. In an effort to aid hellbender conservation and management in the state, we are developing local habitat models for hellbenders. This project will involve using classification techniques on large volume of sonar data to develop a substrate map of the study river. The student will be working with a graduate student and a pre-collected data set to come up with novel statistical classification techniques to map river bottom substrate, which will then be used as predictor variables within hellbender habitat models.

  4. (Dr. Rob Swihart) Wildlife populations and communities are subjected to human influence in innumerable ways, including activities (e.g., hunting) that have direct effects and others (e.g., timber harvest, agriculture) for which effects may occur primarily due to changes in the availability or quality of habitat. My group seeks to understand how wild vertebrates respond to human activities, as this knowledge can be important to minimizing adverse influences. We have conducted work in the Upper Wabash Ecosystem Project and the Hardwood Ecosystem Experiment, which has resulted in large data sets on dozens of wild species (mostly mammals, but also birds, amphibians, and reptiles) and associated covariates for habitat and landscape features. These data are used to address questions such as: (1) How does the intensity of human disturbance affect population abundance and species composition? (2) What makes some species more sensitive than others to human disturbance? (3) What role does spatial scale play in determining wildlife responses? Students can learn how to conduct exploratory analyses and test competing hypotheses using general and generalized linear models.

  5. (Dr. Rob Swihart) Successful management and conservation of wildlife depends on understanding the factors (e.g., extreme droughts, disease epidemics, or habitat change) that drive variation in survival and reproduction. Unfortunately, a factor's importance often appears only infrequently or slowly, which requires long-term data sets. Wild mammals are difficult to study, so long-term data sets are rare. For species of game mammals, long-term data sets from unexploited populations are even rarer, despite the fact that an understanding of population dynamics in the absence of hunting is essential. I have inherited a data set collected by students at the Purdue Wildlife Area on a non-hunted population of eastern cottontail rabbits that spans 33 years. These data will be used to ask questions such as: (1) How does climate change influence density and survival of cottontails? (2) Have changes in the plant community over time had measurable impacts on the cottontail population? (3) What influence has the increase in abundance of coyotes, an important predator, had on cottontail numbers? Students can learn how to conduct exploratory analyses and test competing hypotheses using general and generalized linear models.

  6. (Dr. Bryan Pijanowski) Work in our Center for Global Soundscapes focusses on the use of long-term soundscape recordings to assess the health of ecosystems around the world. Recently featured in Science as a new area of big data research, soundscape ecology has mushroomed into one of the fastest growing ecological sciences, using advanced sensor and sensor network arrays that combine acoustic information, 3D landscape profiles using LiDAR (light detection and ranging) along with companion time-lapse photography/4K video imaging to characterize the dynamics of a variety of ecosystems around the world. Large-scale soundscape and remote sensing databases exist for exotic places like Borneo (paleotropics), Costa Rica (neotropics), Sonoran Desert (Arizona), Midwestern temperate forests (Indiana, Chicago and Wisconsin), estuaries (Maine) and the subarctic (Alaska). Students could work on any of the following projects mentored by both a graduate student and postdoc: (1) analyze over 70 TB of soundscape data from different ecosystems comparing the spatial-temporal dynamics of these systems; (2) develop multi-media web components for use in citizen science and K-12 learning of sound, ecology, mathematics and technology (enhancing our site at; (3) developing new soundscape ecological metrics that quantify diversity of sounds in files using principles of entropy; and (4) develop new techniques for data mining and pattern recognition using novel statistical tools.

  7. (Dr. Patrick Zollner) White-nose Syndrome (WNS), a disease caused by a novel fungal pathogen, has devastated bat populations in the eastern and midwestern United States. The effects WNS have led to the listing of the northern long-eared bat (Myotis septentrionalis) as a federally threatened species. There are large gaps in our understanding of northern long-eared bat habitat use and conservation, and in light of WNS it is increasingly important to understand these relationships in order to protect this once common species. This project will begin by using presence-only occupancy modelling to estimate how landscape-scale environmental variables are associated with northern long-eared bat roosting and foraging habitat from historical locations collected prior to the outbreak of WNS. We will then use a combination of acoustic detectors, bat capture, and radio-telemetry tracking of captured bats during the summers of 2017 and 2018 to determine where these bats remain following their dramatic population declines. Another important comparison we will make will be between habitat used by these bats in fragmented landscapes of northern Indiana relative to similar data from more contiguous forests of southern Indiana. A student helping with this project will have the opportunity to assist with collecting data on bats in the field as well as to help with analyses focused on comparing models of habitat used by these bats in different circumstances. This student's own project could develop from the above ideas or related side projects such as studying the effectiveness of acoustic lures at increasing the probability of capturing these bats.

  8. (Dr. Patrick Zollner) White nose syndrome is an invasive fungal species new to North America that has caused the death of more than 90% of the individuals of several species of cave hibernating bats throughout the Midwestern US. Our lab is collecting and analyzing acoustic monitoring data on the occurrence of these now threatened and endangered bat species to use in modelling summer habitat needs of these species. The acoustic bat detectors we are using record large volumes of bat echolocation calls, and we have access to such data from several regions of Indiana both before and after the arrival of the destructive fungus. A student working on this project would use these acoustic records to evaluate and validate the suitability of models we have developed from similar but independent observations. The improved habitat models resulting from this work will have important applications in the conservation of these bat species as well as the management of Indiana's forests.

  9. (Dr. Michael Saunders) What effects does timber harvesting have on forest ecosystems? - The Hardwood Ecosystem Experiment (HEE) in southern Indiana investigates the influence of forest management on various plant and animal communities within oak-dominated ecosystems. The HEE maintains a large geospatial database with repeat inventories of trees and shrubs, terrestrial vertebrates (see also "topic 4" from Dr. Swihart), and insects (see also "topic 12" from Dr. Holland). Initially, this project would evaluate the effects of forest harvesting of tree and shrub communities, but could be extended to work on other taxonomic groups. There would be opportunities for travel to the sites and to present the results to regional conferences.

  10. (Dr. Michael Saunders) How do trees grow wood? - Production ecology has been generally well studied in conifer tree species, but not in hardwood tree species. Theoretical relationships developed in conifer-dominated forest stands may or may not apply to our Indiana hardwood-dominated forests. This project will use a vast database of tree heights, diameters, and stem taper to model how walnut plantations grow. We will investigate relationships between the amount of leaves a tree displays and the amount of wood that tree produces each year. We will also study how manipulations of leaf area through pruning affect growth. There will be opportunities for the work to be extended to American chestnut, oak and other hardwood species.

  11. (Dr. Tomas Höök) Research in our lab focuses on aquatic ecology and, in particular, the dynamics of the Laurentian Great Lakes. Given that each of the Great Lakes is quite large, they are governed by almost oceanic scale physical processes and characterized by high spatial variability of physical features and biotic factors. Spatial description of variable biotic factors (e.g., fish distributions, growth rates) can contribute to hypothesis development of processes structuring biotic dynamics, while spatially comparing such biotic factors with other physical, chemical or biotic variables can evaluate hypotheses and potentially help identify mechanistic linkages. The objective of this research would be to a) describe spatial patterns of fish distributions and growth in Lake Michigan and b) relate these patterns to physical (e.g., satellite-derived surface temperature and water clarity) and biotic (e.g., chlorophyll concentrations, zooplankton densities) factors.

  12. (Dr. Tomas Höök) Northern Indiana contains ~450 natural lakes that provide diverse services, from boating to swimming to fishing to flood control. However, these services are often at odds with land-use practices and human activities. In particular, nutrient runoff (primarily phosphorous) from row crop agriculture to surface waters contributes to eutrophication, including harmful algal blooms, hypoxia, and local extinctions of sensitive species. We have access to a wealth of data from GIS databases and state and university monitoring programs related to fish community composition, water quality, lake morphometrics and land-use on the land draining into Indiana's natural lakes. The objective of this research would be to quantitatively model linkages among these different types of variables; for example, evaluating how agricultural practices on lands draining into glacial lakes influence water quality, habitat conditions and resulting fish biodiversity within these lakes.

  13. (Dr. Jeffrey D. Holland) Students in my laboratory study how the pattern of land use and habitat in different landscapes influences ecological processes involving insects such as individual dispersal and exotic species invasion, ecosystem services (e.g., pollination, predation of pests, decomposition), and maintenance of biodiversity. We simultaneously study the impact of local habitat and human activities and the larger scale landscape context. To study the landscape - insect link, we make use of extensive field surveys of insects and habitat combined with satellite and aerial data, geographical information systems, and spatial & multivariate statistics. A sample of projects students could become involved in includes: examining the impact of silvicultural regimes on the functional diversity of forest beetles, spatial analysis of aquatic insect communities, simulation modeling of insect movement, and statistical approaches to optical character recognition for capturing specimen data.

  14. (Dr. Esteban Fernandez-Juricic) Birds and airplanes collide regularly at airports across the world. These bird-strikes are a source of mortality for many species of conservation concern as well as a safety and financial concern for the airline industry. In the US, the Federal Aviation Administration has compiled a database of bird-strikes since 1990. We use this large database to answer key questions to better understand the environmental conditions that enhance the occurrence of bird strikes: (1) Are airports close to biodiversity hot-spots more likely to have a higher frequency of bird strikes involving species of conservation concern? (2) Does landscape composition around airports influence the probability/frequency of bird strikes? (3) Does habitat structure within airports influence the probability of bird strikes? (4) What is the role of regional and local bird densities in affecting bird strike frequency? (5) Does the color, speed, shape of commercial airplanes influence the probability of bird strikes? Students will learn how to manage large databases and use general and generalized linear mixed models as well as multivariate statistics. The answers to these questions have widespread management implications to reduce the frequency of bird strikes.

  15. (Dr. Esteban Fernandez-Juricic) Bird feeders are used by multiple species of birds throughout North America. However, bird feeders are not necessarily built to attract birds (for instance, to hold seeds some use Plexiglas, which blocks the ultra-violet portion of the spectrum that many bird species use to find food visually). We have developed a behavioral assay to test in aviary conditions novel bird feeders (different shapes, colors, etc.) designed taking into consideration the avian visual system, which is very different from our visual system. Through these assays, we have collected data to determine the combination of features that would increase the chances of bird visitation and seed consumption. Students will learn how to run these behavioral assays and use general and generalized linear mixed models to analyze the data. The results of this project have implications for increasing bird diversity in urbanized landscapes.

Research Thrust I: More Applications
  1. (Drs. Dennis Buckmaster and James Krogmeier) With access to soil, weather, crop yield, and topography data, students can pursue spatial interpolation and relationships between layers of crop and climatological data. Pursuit of important contextual data can help improve efficiency of production which is good for food, fuel, feed, and fiber prices as well as beneficial to the environment. Agriculture is a bit late to the big data boom, and potential impact is great. Opportunities to improve methods for analyzing geospatial agricultural data abound.

  2. (Dr. Dennis Buckmaster) USDA National Ag Statistics Services offers data regarding production and economics of agricultural commodities. Exploration of functional relationships between gross production and regional climatological data might lend insight regarding key management decisions and possibly the effects of climate variability. Creative thinking and model development with data from different sources will be the challenge. If we can gain access to reasonably large quantities of field specific data, validation with data sets representing smaller areas would be the goal.

  3. (Dr. Vetria Byrd) Big Data and Uncertainty Visualization: This project has applications under visualization, uncertainty and Big Data. Students learn the importance visualization and the role it plays in discovery and scholarship, and explore real-world datasets in an environment capable of viewing and manipulating Big Data sets. The project will allow for the exploration and training in uncertainty visualization. Students will work with publically available data including: climate datasets from the National Oceanic and Atmospheric Administration website, health data website, the US Census Bureau, United States Department of Agriculture, to create data visualizations that will aid in decision making.

  4. (Dr. Vetria Byrd) Making sense of Big at Multiple Levels of Abstraction: This project will explore the role of visualization in providing multiple levels of abstraction. Students will explore the development of a novel user interface technology allowing users to visualize plans of ~500 activities for a single day's operation of multiple levels of abstraction. The goal is to create visual interfaces that help users track performed activities, identify constraint violations, visualize contingencies and suggest new plans.

  5. (Dr. Andreas Jung) Research projects for UG students are in the context of the analysis of the data recorded with the Compact Muon Solenoid (CMS) detector at the LHC with the goal of measuring the top quark coupling to the Higgs Boson. This coupling strongly influences the answer to the question whether the Universe is in a stable, meta-stable or instable vacuum state. Together with the precision measurement of top quark property results can pin point to beyond the standard model contributions, which is what all particle physicists are searching for since decades. The data analysis heavily relies on methods that include topological multi-variate analysis, regularized matrix unfolding and complementary profiling techniques. There is also the opportunity to contribute in the data analysis of silicon detector prototypes that are tested for their thermal performance. These thermal tests are carried out under the same conditions as the real CMS detector and use the existing capabilities of the Purdue Silicon Detector Laboratory in the Physics building.

  6. (Drs. James Krogmeier and Darcy Bullock) We have access to high resolution traffic data regarding anonymized trips of individual vehicles on Indiana state highways. Coupled with traffic signal controller state information, road geometry, weather, road construction, and incident data we would like to explore various ways to visualize and analyze the data.

  7. (Dr. Yung-Hsiang Lu) Have you noticed the cameras at traffic intersections? Would you like to see the data from these traffic cameras? What information can you obtained from the traffic cameras? Imagine that you can see 10,000 traffic cameras from New York, Chicago, Boston, London, Paris, Atlanta, etc. What can you do with the data? Can you develop computer programs to count the number of vehicles passing through these intersections? Can you count the number of people crossing the streets? Can you discover the latest fashion trend based on the people's clothing? Can your programs accurately count in different weather, day and night? Would your programs be able to discover different driving habits in different cities? A research group at Purdue University has created the world's largest camera network, called Continuous Analysis of Many CAMeras (CAM2, CAM2 is capable of analyzing vast of real-time data from network cameras worldwide. CAM2 provides versatile data that can be challenging to any existing machine learning solutions. If you want to develop computer programs that can understand the world through thousands of network cameras, you would find this project challenging and exciting.

This material is based upon work supported by the National Science Foundation under Grant No. 1246818. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.