## November 08, 2010

### 36-402, Advanced Data Analysis, Spring 2011 (Course Announcement)

This is the undergraduate "advanced data analysis", not to be confused with the graduate projects course I'm teaching right now. Actually, they used to be much more similar, but due to the uncanny growth of the undergraduate major, I will have seventy or so students in 402, and all of them doing projects is more than we can cope with. (My inner economist says that the statistics department should leave the curriculum alone and just keep raising the threshold for passing our classes until the demand for being a statistics major balances the supply of faculty energy, as per Parkinson's "The Short List, or Principles of Selection", but fortunately no one listens to my inner economist.) So about a dozen will do projects in 36-490, as last year, and everyone will learn about methods.

36-402, Advanced Data Analysis, Spring 2011
Description: This course concentrates on methods for the analysis of data, building on the theory and application of the linear model from 36-401. Real-world examples will be drawn from a variety of fields.
Prerequisites: 36-401 (modern regression), or an equivalent class, with my permission.
Topics Tentative, and grouped by theme; presentation order will vary
Model evaluation: statistical inference, prediction, and scientific inference; in-sample and out-of-sample errors, generalization and over-fitting, cross-validation; evaluating by simulating; bootstrap; information criteria and their limits; mis-specification checks
Yet More Regression: regression = estimating the conditional expectation function; lightning review of ordinary least linear regression and what it is really doing; analysis of variance; limits of linear OLS; extensions: weighted least squares, basis functions; ridge regression and lasso.
Smoothing: kernel smoothing, including local polynomial regression; splines; additive models; classification and regression trees; kernel density estimation
GAMs: linear classifiers; logistic regression; generalized linear models; generalized additive models.
Latent variables and structured data: principal components; factor analysis and latent variables; graphical models in general; latent cluster/mixture models; random effects; hierarchical models
Causality: graphical causal models; estimating causal effects; discovering causal structure
Time and place: 10:30--11:50 Tuesdays and Thursdays in Porter Hall 100
Textbook: Julian Faraway, Extending the Linear Model with R (Chapman Hall/CRC Press, 2006, ISBN 978-1-58488-424-8) will be required. (Faraway's page on the book, with help and errata.) There may be other optional books.
Mechanics: nearly-weekly problem sets (mostly analyzing data sets, a little programming) will be due on Tuesdays; mid-term exam; final exam.
Computing: You will be expected, and in some assignments required, to use the R programming language. All assignments will need a computer. Let me know at once if this will be a problem.
Office hours: Monday 2--4 pm in Baker Hall 229C, or by appointment.

Update, 15 November: The class webpage will be here. Also: this is the same class as 36-608; graduate students should register under the latter number.

Posted by crshalizi at November 08, 2010 16:20 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems