The "curse of dimensionality" limits the usefulness of fully non-parametric regression in problems with many variables: bias remains under control, but variance grows rapidly with dimensionality. Parametric models do not have this problem, but have bias and do not let us discover anything about the true function. Structured or constrained non-parametric regression compromises, by adding some bias so as to reduce variance. Additive models are an example, where each input variable has a "partial response function", which add together to get the total regression function; the partial response functions are unconstrained. This generalizes linear models but still evades the curse of dimensionality. Fitting additive models is done iteratively, starting with some initial guess about each partial response function and then doing one-dimensional smoothing, so that the guesses correct each other until a self-consistent solution is reached. Examples in R using the California house-price data. Conclusion: there are no statistical reasons to prefer linear models to additive models, hardly any scientific reasons, and increasingly few computational ones; the continued thoughtless use of linear regression is a scandal.
Reading: Notes, chapter 8; Faraway, chapter 12
Posted by crshalizi at February 09, 2012 10:30 | permanent link