Notebooks

Cross-Validation

24 Oct 2012 08:18

One of the most brilliantly simple and compelling ideas in all of statistics: to estimate how well your model will do on new data, take your data set and divide it into two parts at random. Fit the model to one part and then evaluate its prediction on the other; average over a couple of splits into training and testing sets.

As a method of model selection; as (not quite the same thing) a means of estimating the generalization error of a statistical model; relations to bootstrapping. How best to cross-validate time series? Spatial models? Networks? Other kinds of structured data? Relation to "stability" in learning theory.


Notebooks:     Hosted, but not endorsed, by the Center for the Study of Complex Systems