### Splines (Advanced Data Analysis from an Elementary Point of View)

Kernel regression controls the amount of smoothing indirectly by bandwidth;
why not control the irregularity of the smoothed curve directly? The spline
smoothing problem is a penalized least squares problem: minimize mean squared
error, plus a penalty term proportional to average curvature of the function
over space. The solution is always a continuous piecewise cubic polynomial,
with continuous first and second derivatives. Altering the strength of the
penalty moves along a bias-variance trade-off, from pure OLS at one extreme to
pure interpolation at the other; changing the strength of the penalty is
equivalent to minimizing the mean squared error under a constraint on the
average curvature. To ensure consistency, the penalty/constraint should weaken
as the data grows; the appropriate size is selected by cross-validation. An
example with the data, including confidence bands. Writing splines as basis
functions, and fitting as least squares on transformations of the data, plus a
regularization term. A brief look at splines in multiple dimensions. Splines
versus kernel regression.

*Reading*: Notes, chapter 7; Faraway, section 11.2.

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 07, 2012 10:30 | permanent link