Notebooks

Regression, especially Nonparametric Regression

02 Sep 2014 09:49

"Regression", in statistical jargon, is the problem of guessing the average level of some quantitative response variable from various predictor variables.

Linear regression is perhaps the single most common quantitative tool in economics, sociology, and many other fields; it's certainly the most common use of statistics. (Analysis of variance, arguably more common in psychology and biology, is a disguised form of regression.) While linear regression deserves a place in statistics, that place should be nowhere near as large and prominent as it currently is. There are very few situations where we actually have scientific support for linear models. Fortunately, very flexible nonlinear regression methods now exist, and from the user's point of view are just as easy as linear regression, and at least as insightful. (Regression trees and additive models, in particular, are just as interpretable.) At the very least, if you do have a particular functional form in mind for the regression, linear or otherwise, you should use a non-parametric regression to test the adequacy of that form.

From a technical point of view, the main drawback of modern regression methods is that their extra flexibility comes at the price of less "efficiency" — estimates converge more slowly, so you have less precision for the same amount of data. There are some situations where you'd prefer to have more precise estimates from a bad model than less precise estimates from a model which doesn't make systematic errors, but I don't think that's what most users of linear regression are chosing to do; they're just taught to type lm rather than gam. In this day and age, though, I don't understand why not.

(Of course, for the statistician, a lot of the more flexible regression methods look more or less like linear regression in some disguised form, because fundamentally all it does is projection. So it's not crazy to make it a foundational topic for statisticians. We should not, however, give the rest of the world the impression that the hat matrix is the source of all knowledge.)

The use of regression, linear or otherwise, for causal inference, rather than prediction, is a different, and far more sordid, story.

See also: Computational Statistics; Data Mining; Learning Theory; Model Selection; Neural Nets; Social Science Methodology; What Is the Right Null Model for Linear Regression?


Notebooks:     Hosted, but not endorsed, by the Center for the Study of Complex Systems