- UMD-ELMS home page
- Course material links (see also dropdown menu above):
The overarching goal of this class is to empower students with the ability to perform sophisticated analysis of ecological (or, really, any) data. There are two basic, interrelated skills we will be simultaneously developing: Statistical concepts and techniques, and R programming. These go hand in hand - R facilitates not only the implementation but also the understanding of statistical concepts. The ability to summarize and visualize data in creative ways using R is an additional, vital step in any analysis.
The course covers a lot of material in a short amount of time. It will necessarily be presented in a highly applied but somewhat superficial way. Because it is small, we can think of it as an intensive data analysis workshop, and will attempt to tailor the course towards the specific needs of the students.
Class time will be split (roughly in half) between lectures and in-class exercises and labs. Students in-class should bring laptops with R and Rstudio installed.
There will be 3 homework assignments and one final project on a topic of the student’s choice, ideally - a focus of the student’s research, and presented in a 15 minute presentation in a mini-symposium format on the last day of class.
The most important material for this course is the R: program itself. Additionally, we highly recommend using Rstudio, a user-friendly integrated development environment (IDE) for R.
R is available at: http://cran.r-project.org/
R-studio is available at: http://www.rstudio.com/
The material in the first quarter should be largely self-contained - every lecture and lab will be made available to the students on the course website.
There is no required text for this course, in part because we will be touching on a half-dozen topics that, alone, merit entire courses and textbooks. Relevant outside texts, references and on-line resources will be introduced as needed.
The following is the approximate curriculum for the course. Depending on the abilities and interests of the students, the pacing and emphasis of the material covered is open to modification as the course proceeds.
Day 1. Data: Types of data. Summary statistics. Loading and working with data. Visualizing raw data. Visualizing summaries. Multi-dimensional visualizations.
Day 2. Probability Models: Distributions - discrete and continuous. Properties. expectations, variances, support, moments. Simulating. Visualizing.
Day 3. Intro to Inference: Principles of Hypothesis Tests. Confidence Intervals. Numerical inference.
Day 4. The Linear Model: Linear regression. Ordinary least squares. Derivations. Factor analaysis. ANOVA/ANCOVA. Matrix formulations.
Day 1. Advanced regression: Generalized linear models. Mixed models. Generalized additive mixed models.
Day 2. Dependent data: Time series. Spatial correlations. Generalized least squares.
Day 3. Likelihoods:: The likelihood concept. Maximum likelihood maximization. Optimization. Likelihood ratio tests. Fisher information and confidence intervals
Day 4. More likelihoods: Practice with optimizing and fitting models.
Day 5. Markov chains: Stochastic models. Markov chains.
Days 1. Prediction I: The prediction paradigm. Nearest-neighbors. Clustering.
Day 2. Prediction II: Random forests. Boosted regression. Neural networks.
Day 3. Bayesian modeling I: Introduction to Bayes theory and Bayesian inference. Priors, posterior, “updating”. Implications, debates. Analytical solutions. Numberical integration.
Day 4. Bayesian modeling II: Bayesian inference using MCMC. Convergence, thinning, replication. Fitting a model with STAN.
15 minute presentations by all students of their final projects.