October 21, 2013
Simulation III: Monte Carlo and Markov Chain Monte Carlo (Introduction to Statistical Computing)
Lecture 16: The Monte Carlo principle for numerical integrals: write your integral as an expectation, take a sample. Examples. Importance sampling: draw from a distribution other than the one you really are want, then weight the sample values. Markov chain Monte Carlo for sampling from a distribution we do not completely know: the Metropolis algorithm. Gibbs sampling. Bayesian inference via MCMC.
Posted by crshalizi at October 21, 2013 13:58 | permanent link
Simulation II: Markov Chains (Introduction to Statistical Computing)
Lecture 15: Combing multiple dependent random variables in a simulation; ordering the simulation to do the easy parts first. Markov chains as a particular example of doing the easy parts first. The Markov property. How to write a Markov chain simulator. Verifying that the simulator works by looking at conditional distributions. Variations on Markov models: hidden Markov models, interacting processes, continuous time, chains with complete connections. Asymptotics of Markov chains via linear algebra; the law of large numbers (ergodic theorem) for Markov chains: we can approximate expectations as soon as we can simulate.
Readings: Handouts on Markov chains and Monte Carlo
Posted by crshalizi at October 21, 2013 13:57 | permanent link
Simulation I: Generating Random Variables (Introduction to Statistical Computing)
Lecture 14: Why simulate? Generating random variables as first step. The built-in R commands: rnorm, runif, etc.; sample. Some uses of sampling: permutation tests; bootstrap standard errors and confidence intervals. Transforming uniformly-distributed random variables into other distributions: the quantile trick; the rejection method; illustration of the rejection method. Understanding pseudo-random number generators: irrational rotations; the Arnold cat map as a toy example of an unstable dynamical system; illustrations of the Arnold cat map. Controlling the random number seed.
Readings: Matloff, chapter 8; The R Cookbook, chapter 8
Posted by crshalizi at October 21, 2013 13:56 | permanent link
"Significance Tests for Adaptive Modelling" (Today at the Statistics Seminar)
Attention conservation notice: Late notice of a very technical presentation about theoretical statistics in a city you don't live in.
As always, the talk is free and open to the public.
Posted by crshalizi at October 21, 2013 13:55 | permanent link
October 11, 2013
Midterm Exam (Introduction to Statistical Computing)
Midterm Exam: eight questions about thirteen lines of code.
Posted by crshalizi at October 11, 2013 17:48 | permanent link
Split, Apply, Combine: Using plyr (Introduction to Statistical Computing)
Lecture 13, Split/apply/combine II: using plyr. Abstracting the split/apply/combine pattern: using a single function to appropriately split up the input, apply the function, and combine the results, depending on the type of input and output data. Syntax details. Examples: standardizing measurements for regularly-sampled spatial data; standardizing measurements for irregularly-sampled spatial data; more fun with strikes and left-wing politics. Limitations of the split/apply/combine pattern.
Shorter lecture 13: The lecturer is a gushing Hadley Wickham fanboy.
(This week's lectures are ripped off from slides by Vince Vu, with permission.)
Posted by crshalizi at October 11, 2013 17:47 | permanent link
Split, Apply, Combine: Using Basic R (Introduction to Statistical Computing)
Lecture 12: Design patterns and their benefits: clarity on what is to be done, flexibility about how to do it, ease of adapting others' solutions. The split/apply/combine pattern: divide big structured data sets up into smaller, related parts; apply the same analysis to each part independently; combine the results of the analyses. Trivial example: row and column means. Further examples. Iteration as a verbose, painful and clumsy implementation of split/apply/combine. Tools for split/apply/combine in basic R: the apply function for arrays, lapply for lists, mapply, etc.; split. Detailed example with a complicated data set: the relation between strikes and parliamentary politics.
Posted by crshalizi at October 11, 2013 17:46 | permanent link
Homework: I Made You a Likelihood Function, But I Ate It (Introduction to Statistical Computing)
In which we continue to practice using functions as arguments and as return values, while learning something about the standard error of maximum likelihood estimates, and about the modularity of methods like the jack-knife.
Posted by crshalizi at October 11, 2013 17:45 | permanent link
Lab: I Can Has Likelihood Surface? (Introduction to Statistical Computing)
In which we practice passing functions as arguments to other functions, by way of an introduction to likelihood and its maximization; and, incidentally, work more with plotting in R.
Posted by crshalizi at October 11, 2013 17:44 | permanent link
Abstraction and Refactoring (Introduction to Statistical Computing)
Lecture 11: Abstraction as a way to make programming more friendly to human beings. Refactoring as a form of abstraction. The rectification of names. Consolidation of related values into objects. Extracting common operations. Defining general operations. Extended example with the jackknife.
Reading: sections 14.1--14.3 in Matloff.
Posted by crshalizi at October 11, 2013 17:43 | permanent link
Simple Optimization (Introduction to Statistical Computing)
Reading: recipes 13.1 and 13.2 in The R Cookbook; chapters I.1, II.1 and II.2 in Red Plenty
Posted by crshalizi at October 11, 2013 17:42 | permanent link
Homework: Dimensions of Anomaly (Introduction to Statistical Computing)
In which we continue to practice the arts of debugging and testing, while learning about making our code more general, handling awkward special cases, and pondering what it means to say that an observation is an outlier.
Posted by crshalizi at October 11, 2013 17:41 | permanent link
Lab: Testing Our Way to Outliers (Introduction to Statistical Computing)
In which we use Tukey's rule for identifying outliers as an excuse to learn about debugging and testing.
Posted by crshalizi at October 11, 2013 17:40 | permanent link
Functions as Objects (Introduction to Statistical Computing)
Lecture 10: Functions in R are objects, just like everything else, and so can be both arguments to and return values of functions, with no special machinery required. Examples from math (especially calculus) of functions with other functions as arguments. Some R syntax relating to functions. Examples with curve. Using sapply to extend functions of single numbers to functions of vectors; its combination with curve. We write functions with lower-level functions as arguments to abstract out a common pattern of operations. Example: calculating a gradient. Numerical gradients by first differences, done two different ways. (Limitations of taking derivatives by first differences.) Incorporating this as a part of a larger algorithm, such as gradient descent. Using adapters, like wrapper functions and anonymous functions, to fit different functions together. Examples from math (especially calculus) of operators, which turn one function into another. The importance of scoping when using functions as return values. Example: creating a linear predictor. Example: implementing the gradient operator (two different ways). Example: writing surface, as a two-dimensional analog to the standard curve. The use of eval and substitute to control when and in what context an expression is evaluated. Three increasingly refined versions of surface, employing eval.
Posted by crshalizi at October 11, 2013 17:39 | permanent link
Triple Header (Next Week at the Statistics / Machine Learning Seminars)
Attention conservation notice: Only relevant if you (1) really care about statistics, and (2) will be in Pittsburgh on Monday.
Through a fortuitous concourse of calendars, we will have three outstanding talks on Monday, 14 October 2013. In chronological order:
As always, the talks are free and open to the public.
Posted by crshalizi at October 11, 2013 17:27 | permanent link