## Regression, especially Nonparametric Regression

*19 Sep 2013 10:04*

"Regression", in statistical jargon, is the problem of guessing the average level of some quantitative response variable from various predictor variables.

Linear regression is perhaps the single most common quantitative tool in
economics, sociology, and many other fields; it's certainly the most common use
of statistics. (Analysis of variance, arguably
more common in psychology and biology, is a disguised form of regression.)
While linear regression deserves *a* place in statistics, that place
should be nowhere near as large and prominent as it currently is. There are
very few situations where we actually have *scientific* support for
linear models. Fortunately, very flexible nonlinear regression methods now
exist, and from the user's point of view are just as easy as linear regression,
and at least as insightful. (Regression trees and additive models, in
particular, are just as interpretable.) At the very least, if you *do*
have a particular functional form in mind for the regression, linear or
otherwise, you should use a non-parametric regression to test the adequacy of
that form.

From a technical point of view, the main drawback of modern regression
methods is that their extra flexibility comes at the price of less "efficiency"
— estimates converge more slowly, so you have less precision for the same
amount of data. There are some situations where you'd prefer to have more
precise estimates from a bad model than less precise estimates from a model
which doesn't make systematic errors, but I don't think that's what most users
of linear regression are chosing to do; they're just taught to
type `lm`
rather
than `gam`.
In this day and age, though, I don't understand why not.

(Of course, for the statistician, a lot of the more flexible regression
methods look more or less like linear regression in some disguised form,
because fundamentally all it does is projection. So it's not crazy to make it
a foundational topic *for statisticians*. We should not, however, give
the rest of the world the impression that the hat matrix is the source of all
knowledge.)

The use of regression, linear or otherwise, for causal inference, rather than prediction, is a different, and far more sordid, story.

See also: Computational Statistics; Data Mining; Learning Theory; Model Selection; Neural Nets; Social Science Methodology; What Is the Right Null Model for Linear Regression?

- Recommended, more general:
- Richard A. Berk
- Julian J. Faraway
- Linear Models with R
- Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models

- Trevor Hastie and Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction [This is a corner-stone book, but is about much, much more than just regression.]
- Jeffrey S. Racine, "Nonparametric Econometrics: A Primer",
Foundations and Trends in Econometrics
**3**(2008): 1--88 [Good primer of nonparametric techniques for regression, density estimation and hypothesis testing; next to no economic content (except for examples). PDF reprint] - Larry Wasserman
- All of Statistics
- All of Nonparametric Statistics
- Notes for 36-707, Regression Analysis

- Weisberg, Applied Linear Regression

- Recommended, more specialized:
- Azadeh Alimadad and Matias Salibian-Barrera, "An Outlier-Robust
Fit for Generalized Additive Models with Applications to Disease Outbreak
Detection", Journal of the
American Statistical Association
**106**(2011): 719--731 - Norman H. Anderson and James Shanteau, "Weak inference with linear models", Psychological Bulletin
**84**(1977): 1155--1170 [A demonstration of why you should not rely on R^{2}to back up your claims] - Michael H. Birnbaum, "The Devil Rides Again: Correlation as an
Index of Fit", Psychological
Bulletin
**79**(1973): 239--242 - Peter Bühlmann, M. Kalisch and M. H. Maathuis, "Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm", Biometrika
**97**(2010): 261--278 - Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications [State-of-the art (2011) compendium of what's known about using high-dimensional regression, especially but not just the Lasso.]
- Andreas Buja, Trevor Hastie and Robert Tibshirani, "Linear smoothers
and additive models", Annals of Statistics
**17**(1989): 453--510 [A classic additive models paper. The discussions and reply fill pp. 510--555.] - Raymond J. Carroll, Aurore Delaigle, and Peter Hall, "Nonparametric
Prediction in Measurement Error
Models", Journal of
the American Statistical Association
**104**(2009): 993--1003 - Kevin A. Clarke, "The Phantom Menace: Omitted Variables Bias in Econometric Research" [PDF. Or: Kitchen-sink regressions considered harmful. Including extra variables in your linear regression may or may not reduce the bias in your estimate of any particular coefficients of interest, depending on the correlations between the added variables, the predictors of interest, the response, and omitted relevant variables. Adding more variables always increases the variance of your estimates.]
- William H. DuMouchel and Greg J. Duncan, "Using Sample Survey
Weights in Multiple Regression Analysis of Stratified
Samples", Proceedings of the Survey Research Methods Section, American
Statistical Association (1981), pp. 629--637
[PDF
reprint; presumably very similar to "Using Sample Survey Weights to Compare
Various Linear Regression Models", Journal of the American Statistical
Association
**78**(1983): 535--543, but I have not looked at the latter] - Andrew Gelman and Iain Pardoe, "Average predictive comparisons for models with nonlinearity, interactions, and variance components", Sociological Methodology forthcoming (2007) [PDF preprint, Gelman's comments]
- Lee-Ad Gottlieb, Aryeh Kontorovich, Robert Krauthgamer, "Efficient Regression in Metric Spaces via Approximate Lipschitz Extension", arxiv:1111.4470
- Berthold R. Haag, "Non-parametric Regression Tests Using Dimension
Reduction Techniques", Scandinavian Journal of Statistics
**35**(2008): 719--738 - Peter Hall, "On Bootstrap Confidence Intervals in Nonparametric
Regression", Annals of Statistics
**20**(1992): 695--711 - Peter Hall and Joel Horowitz, "A simple bootstrap method for constructing nonparametric confidence bands for functions", Annals of Statistics
**41**(2013): 1892--1921 - Jeffrey D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests
- M. Kohler, A. Krzyzak and D. Schafer, "Application of
structural risk minimization to multivariate smoothing spline regression
estimates", Bernoulli
**8**(2002): 475--490 - Jon Lafferty and Larry Wasserman [To be honest, I haven't checked to see how different these two papers actually are...]
- "Rodeo: Sparse Nonparametric Regression in High Dimensions", math.ST/0506342 ["We present a method for simultaneously performing bandwidth selection and variable selection in nonparametric regression."]
- "Rodeo: Sparse, greedy nonparametric regression", Annals of Statistics
**36**(2008): 27--63, arxiv:0803.1709

- Lukas Meier, Sara van de Geer and Peter Bühlmann,
"High-Dimensional Additive
Modeling", Annals of Statistics
**37**(2009): 3779--3821, arxiv:0806.4115 - Garvesh Raskutti, Martin J. Wainwright, and Bin Yu, "Early stopping and non-parametric regression: An optimal and data-dependent stopping rule", arxiv:1306.3574
- Pradeep Ravikumar, John Lafferty, Han Liu, Larry Wasserman, "Sparse Additive Models", arxiv:0711.4555 [a.k.a. "SpAM"]
- Sara van de Geer, Empirical Process Theory in M-Estimation
- Grace Wahba, Spline Models for Observational Data

- Recommended, historical:
- Mordecai Ezekiel, "A Method of Handling Curvilinear Correlation for
Any Number of Variables", Journal of the American Statistical
Association
**19**(1924): 431--453 [JSTOR. The oldest publication I have found introducing and advocating additive, as opposed to linear, regression models. The estimation procedure Ezekiel uses is even*almost*the same as modern "back-fitting", but will get into trouble with correlated input variables.] - Erich L. Lehmann, "On the history and use of some standard statistical models", pp. 114--126 in Deborah Nolan and Terry Speed (eds.), Probability and Statistics: Essays in Honor of David A. Freedman
- E. T. Whittaker, "On a New Method of Graduation",
Proceedings of the
Edinburgh Mathematical Society
**41**(1922): 63--75 [Introduces splines, complete with the Bayesian derivation (if you are in to that sort of thing), though without the name.]

- Modesty forbids me to recommend:
- My Lecture notes on data mining or advanced data analysis

- To read:
- Elena Andreou and Bas J. M. Werker, "An Alternative Asymptotic Analysis of Residual-Based Statistics", Review of Economics and Statistics
**94**(2012): 88--99 - Sylvain Arlot, "Choosing a penalty for model selection in heteroscedastic regression", arxiv:0812.3141
- Sylvain Arlot and Pascal Massart, "Data-driven Calibration of Penalties for Least-Squares Regression", Journal of Machine Learning Research
**10**(2009): 245--279 - Anil Aswani, Peter Bickel, and Claire Tomlin, "Regression on manifolds: Estimation of the exterior derivative", Annals of Statistics
**39**(2011): 48--81 - Jean-Baptiste Aubin, Samuela Leoni-Aubin, "A Simple Misspecification Test for Regression Models", arxiv:1003.2294
- Jean-Yves Audibert and Olivier Catoni, "Robust linear least squares regression", Annals of Statistics
**39**(2011): 2766--2794 - Alexandre Belloni, Victor Chernozhukov, "High Dimensional Sparse Econometric Models: An Introduction", arxiv:1106.5242
- Peter J. Bickel, Bo Li, "Local polynomial regression on unknown manifolds", pp. 177--186 in Regina Liu, William Strawderman and Cun-Hui Zhang (eds.), Complex Datasets and Inverse Problems: Tomography, Networks and Beyond (2007) ["`naive' multivariate local polynomial regression can adapt to local smooth lower dimensional structure in the sense that it achieves the optimal convergence rate for nonparametric estimation of regression functions ... when the predictor variables live on or close to a lower dimensional manifold"]
- Gilles Blanchard, Nicole Kraemer, "Kernel Conjugate Gradient is Universally Consistent", arxiv:0902.4380 ["approximate solutions are constructed by projections onto a nested set of data-dependent subspaces"]
- Borowiak, Model Discrimination for Nonlinear Regression Models
- Adrian W. Bowman and Adelchi Azzalini, Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations
- Lawrence D. Brown, T. Tony Cai, and Harrison H. Zhou, "Nonparametric regression in exponential families", Annals of Statistics
**38**(2010): 2005--2046 - Lawrence D. Brown and Mark G. Low, "Asymptotic Equivalence of
Nonparametric Regression and White Noise", Annals of Statistics
**24**(1996): 2384--2398 [JSTOR] - Peter Bühlmann, "Statistical significance in high-dimensional linear models", arxiv:1202.1377 [Not sure if this goes beyond what's in Bühlmann and van de Geer]
- Benoit Cadre and Qian Dong, "Dimension reduction for regression
estimation with nearest neighbor method", Electronic
Journal of Statistics
**4**(2010): 436--460 - T. Tony Cai, "Minimax and Adaptive Inference in Nonparametric Function Estimation", Statistical Science
**27**(2012): 31--50, arxiv:1203.4911 - T. Tony Cai, Harrison H. Zhou, "Asymptotic equivalence and adaptive estimation for robust nonparametric regression", Annals of
Statistics
**37**(2009): 3204--3235 = arxiv:0909.0343 - Andrew V. Carter, "Asymptotic approximation of nonparametric regression experiments with unknown variances", Annals of Statistics
**35**(2007): 1644--1673, arxiv:0710.3647 - Ming-Yen Cheng, Hau-tieng Wu, "Local Linear Regression on Manifolds and its Geometric Interpretation", arxiv:1201.0327
- Christophe Chesneau, Jalal Fadili, Bertrand Maillot, "Adaptive estimation of an additive regression function from weakly dependent data", arxiv:1111.3994
- Andreas Christmann and Robert Hable, "Support vector machines for additive models: consistency and robustness", arxiv:1007.4062
- Laëtitia Comminges, Arnak Dalalyan, "Tight conditions for consistent variable selection in high dimensional nonparametric regression", arxiv:1102.3616
- R. Dennis Cook, Liliana Forzani, and Adam J. Rothman, "Estimating sufficient reductions of the predictors in abundant high-dimensional regressions", Annals of Statistics
**40**(2012): 353--384 - Arnak Dalalyan and Alexandre B. Tsybakov, "Sparse Regression Learning by Aggregation and Langevin Monte-Carlo", arxiv:0903.1223
- Robert Davies, Christopher Withers, and Saralees Nadarajah, "Confidence intervals in a regression with both linear and non-linear terms", Electronic Journal of Statistics
**5**(2011): 603--618 - Kris De Brabanter, Jos De Brabanter, Johan A. K. Suykens and
Bart De Moor, "Kernel Regression in the Presence of Correlated Errors",
Journal of Machine Learning Research
**12**(2011): 1955--1976 - Michiel Debruyne, Mia Hubert, Johan A.K. Suykens, "Model Selection in Kernel Based Regression using the Influence Function", Journal of Machine Learning Research
**9**(2008): 2377--2400 - Aurore Delaigle, Peter Hall, Hans-Georg Müller, "Accelerated convergence for nonparametric regression with coarsened predictors", Annals of Statistics
**35**(2007): 2639--2653, arxiv:0803.3017 - Wei Dou, David Pollard, Harrison H. Zhou, "Functional regression for general exponential families", arxiv:1001.3742
- Elise Dusseldorp, Claudio Conversano, and Bart Jan Van Os, "Combining an Additive and Tree-Based Regression Model Simultaneously: STIMA", Journal of Computational and Graphical Statistics (2010) forthcoming
- Sam Efromovich
- Nonparametric Curve Estimation
- "Conditional density estimation in a regression
setting", Annals of Statistics
**35**(2007): 2504--2535, arxiv:0803.2984

- P. P. B. Eggermont, V. N. LaRiccia, "Uniform error bounds for smoothing splines", arxiv:math/0612776
- P. P. B. Eggermont and V. N. LaRiccia, Maximum Penalized
Likelihood Estimation, vol. II: Regression [Enthusiastic review in
JASA (
**104**(2010): 1628), appears self-contained] - Jianqing Fan, Shaojun Guo and Ning Hao, "Variance estimation using refitted cross-validation in ultrahigh dimensional regression", Journal of the Royal Statistical Society B
**74**(2012): 37--65 - Andrew Gelman and Jennifer Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models
- Christopher R. Genovese and Larry Wasserman
- "Confidence sets for
nonparametric wavelet regression", math.ST/0505632 = Annals of
Statistics
**33**(2005): 698--729 - "Adaptive Confidence Bands", math.ST/0701513

- "Confidence sets for
nonparametric wavelet regression", math.ST/0505632 = Annals of
Statistics
- Jose M. Gonzalez-Barrios and Silvia Ruiz-Velasco, "Regression
analysis and dependence", Metrica
**61**(2005): 73--87 - Marvin H. J. Gruber, Regression Estimators: A Comparative Study
- Chong Gu, Smoothing Spline ANOVA Models
- Haijie Gu, John Lafferty, "Sequential Nonparametric Regression", arxiv:1206.6408
- Hong Gu, Toby Kenney, Mu Zhu,
"Partial Generalized Additive Models: An Information-Theoretic Approach for Dealing With Concurvity and Selecting Variables", Journal of Computational and Graphical Statistics
**19**(2010): 531--551 - Benjamin Guedj, Pierre Alquier, "PAC-Bayesian Estimation and Prediction in Sparse Additive Models", arxiv:1208.1211
- Emmanuel Guerre and Pascal Lavergne, "Data-driven rate-optimal
specification testing in regression models", math.ST/0505640 = Annals
of Statistics
**33**(2005): 840--870 - Laszlo Gyorfi et al., A Distribution-Free Theory of Nonparametric Regression
- Robert Hable, "Asymptotic Confidence Sets for General Nonparametric Regression and Classification by Regularized Kernel Methods", arxiv:1203.4354
- P. Richard Hahn, Sayan Mukherjee, Carlos Carvalho, "Predictor-dependent shrinkage for linear regression via partial factor modeling", arxiv:1011.3725
- Peter Hall, "On Bootstrap Confidence Intervals in Nonparametric
Regression", Annals of
Statistics
**20**(1992): 695--711 - Peter Hall, Joel L. Horowitz, "Nonparametric methods for inference in the presence of instrumental variables", Annals of Statistics
**33**(2005): 2904--2929, arxiv:math/0603130 - Bruce E. Hansen
- "Uniform Convergence Rates for Kernel Estimation with
Dependent Data", Econometric Theory
**24**(2008): 726--748 [abstract with link to free PDF] - Econometrics

- "Uniform Convergence Rates for Kernel Estimation with
Dependent Data", Econometric Theory
- Wolfgang Härdle, Applied Nonparametric Regression
- Wolfgang Härdle, Marlene Müller, Stefan Sperlich and Axel Werwatz, Nonparametric and Semiparametric Models: An Introduction
- Jeffrey D. Hart, "Smoothing-inspired lack-of-fit tests based on ranks", arxiv:0805.2285
- Elad Hazan, Tomer Koren, "Linear Regression with Limited Observation", arxiv:1206.4678
- Mohamed Hebiri and Sara A. Van De Geer, "The Smooth-Lasso and other $\ell_1+\ell_2$-penalized methods", arxiv:1003.4885
- Nancy Heckman, "The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy", arxiv:1111.1915
- Tim Hesterberg, Nam Hee Choi, Lukas Meier, Chris Fraley, "Least angle and $\ell_1$ penalized regression: A review", Statistics Surveys
**2**(2008): 61--93, arxiv:0802.0964 - Jacob Hinkle, Prasanna Muralidharan, P. Thomas Fletcher, Sarang Joshi, "Polynomial Regression on Riemannian Manifolds", arxiv:1201.2395
- Giles Hooker and Saharon Rosset, "Prediction-based
regularization using data augmented regression", Statistics
and Computing
**22**(2011): 237--249 - Joel L. Horowitz, Enno Mammen, "Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions", Annals of Statistics
**35**(2007): 2589--2619, arxiv:0803.2999 - Jian Huang, Joel L. Horowitz, and Fengrong Wei, "Variable selection in nonparametric additive models", Annals of Statistics
**38**(2010): 2282--2313 - Salvatore Ingrassia, Simona C. Minotti, Giorgio Vittadini, "Local statistical modeling by cluster-weighted" [sic], arxiv:0911.2634 [Revisiting Gershenfeld et al.'s "cluster-weighted modeling" from a more properly statistical perspective]
- Sameer M. Jalnapurkar, "Learning a regression function via Tikhonov regularization", math.ST/0509420
- Jiancheng Jiang, Yingying Fan and Jianqing Fan, "Estimation in
additive models with highly or nonhighly correlated
covariates", Annals
of Statistics
**38**(2010): 1403--1432, arxiv:1010.0320 - Bo Kai, Runze Li and Hui Zou, "Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression",
Journal of the Royal Statistical Society
B
**72**(2010): 49--69 - Gerard Kerkyacharian, Mathilde Mougeot, Dominique Picard, Karine Tribouley, "Learning Out of Leaders", arxiv:1001.1919
- Estate V. Khmaladze, Hira L. Koul, "Goodness-of-fit problem for errors in nonparametric regression: Distribution free approach", Annals
of Statistics
**37**(2009): 3165--3185 = arxiv:0909.0170 - Hoyt Koepke, Mikhail Bilenko, "Fast Prediction of New Feature Utility", arxiv:1206.4680
- Michael R. Kosorok, Introduction to Empirical Processes and Semiparametric Inference [partial PDF preprint]
- Nicole Kraemer, Anne-Laure Boulesteix, Gerhard Tutz, "Penalized Partial Least Squares Based on B-Splines Transformations", math.ST/0608576
- Tatyana Krivobokova, Thomas Kneib, and Gerda Claeskens,
"Simultaneous Confidence Bands for Penalized Spline Estimators",
Journal of the American Statistical Association
**105**(2010): 852--863 - Arne Kovac, Andrew D.A.C. Smith, "Regression on a Graph", Journal of Computational and Graphical Statistics
**20**(2011): 432--447, arxiv:0911.1928 - Rafal Kulik and Cornelia Wichelhaus, "Nonparametric conditional variance and error density estimation in regression models with dependent errors and predictors", Electronic Journal of Statistics
**5**(2011): 856--898 - Randy C. S. Lai, Hsin-Cheng Huang, and Thomas C. M. Lee, "Fixed and random effects selection in nonparametric additive mixed models", Electronic Journal of Statistics
**6**(2012): 810--842 - Hannes Leeb, "Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process", Bernoulli
**14**(2008): 661--690, arxiv:0802.3364 - Qi Li and Jeffrey Scott Racine, Nonparametric Econometrics: Theory and Practice
- Yehua Li and Tailen Hsing, "Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data", Annals of Statistics
**38**(2010): 3321--3351 - Heng Lian, "Convergence of Nonparametric Functional Regression Estimates with Functional Responses", arxiv:1111.6230
- Han Liu, Xi Chen, John Lafferty and Larry Wasserman, "Graph-Valued Regression", NIPS 23 (2010) [PDF], arxiv:1006.3972
- Oliver Linton and Zhijie Xiao, "A Nonparametric Regression
Estimator That Adapts To Error Distribution of Unknown Form",
Econometric
Theory
**23**(2007): 371--413 - Po-Ling Loh, Martin J. Wainwright, "High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity", Annals of Statistics
**40**(2012): 1637--1664, arxiv:1109.3714 - Djamal Louani, Sidi Mohamed Ould Maouloud, "Large Deviation Results for the Nonparametric Regression Function Estimator on Functional Data", arxiv:1111.5989
- Enno Mammen, Christoph Rothe, and Melanie Schienle, "Nonparametric regression with nonparametrically generated covariates", Annals of Statististics
**40**(2012): 1132--1170 - Andreas Mayr, Nora Fenske, Benjamin Hofner, Thomas Kneib, Matthias Schmid, "Generalized additive models for location, scale and shape for high dimensional data: A flexible approach based on boosting", Journal of the Royal Statistical Society C forthcoming
- Charles E. McCulloch, John M. Neuhaus, "Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter", Statistical Science
**@6**(2011): 388--402, arxiv:1201.1980 - Hugh Miller and Peter Hall, "Local polynomial regression and variable selection", arxiv:1006.3342
- Jessica Minnier, Lu Tian and Tianxi Cai, "A Perturbation Method for Inference on Regularized Regression Estimates", Journal of the American Statistical Association
**106**(2011): 1371--1382 - Abdelkader Mokkadem, Mariane Pelletier, Yousri Slaoui, "Revisiting Révész's stochastic approximation method for the estimation of a regression function", arxiv:0812.3973
- Ursula U. Müller and Ingrid Van Keilegom, "Efficient parameter estimation in regression with missing responses", Electronic Journal of Statistics
**6**(2012): 1200--1219 - Andriy Norets, "Approximation of conditional densities by smooth mixtures of regressions", Annals of Statistics
**38**(2010): 1733--1766, arxiv:1010.0581 - Juhyun Park and Burkhardt Seifert, "Local additive estimation",
Journal of the Royal
Statistical Society B
**72**(2010): 171--191, arxiv:0806.0612 - Philippe Rigollet, "Maximum likelihood aggregation and misspecified generalized linear models", arxiv:0911.2919
- Cynthia Rudin, "Stability Analysis for Regularized Least Squares Regression", cs.LG/0502016
- George A. F. Seber and C. J. Wild, Nonlinear Regression
- David Shilane, Richard H. Liang and Sandrine Dudoit, "Loss-Based Estimation with Evolutionary Algorithms and Cross-Validation", UC Berkeley Biostatistics Working Paper 227 [Abstract, PDF]
- Jeffrey S. Simonoff, Smoothing Methods in Statistics
- Emre Soyer and Robin M. Hogarth, "The illusion of predictability: How regression statistics mislead experts" [PDF preprint]
- Aris Spanos, "Revisiting the Omitted Variables Argument: Substantive vs. Statistical Adequacy" [PDF preprint]
- Ingo Steinwart and Andreas Christmann, Support Vector Machines
- Curtis B. Storlie, Howard D. Bondell, and Brian J. Reich, "A Locally Adaptive Penalty for Estimation of Functions With Varying Roughness", Journal of Computational and Graphical Statistics (2010): forthcoming
- Liangjun Su and Aman Ullah, "Local polynomial estimation of nonparametric simultaneous equations models", Journal of Econometrics
**144**(2008): 193--218 - Ryan J. Tibshirani, "The Lasso Problem and Uniqueness", arxiv:1206.0313
- Jo-Anne Ting, Aaron D'Souza, Sethu Vijayakumar and Stefan Schaal,
"Efficient Learning and Feature Selection in High-Dimensional Regression",
Neural Computation
**22**(2010): 831--886 - Daniell Toth and John L. Eltinge, "Building Consistent Regression Trees From Complex Sample Data", Journal of the American Statistical Association
**106**(2011): 1626--1636 - Minh-Ngoc Tran, David Nott, Chenlei Leng, "The Predictive Lasso", arxiv:1009.2302
- Gerhard Tutz, Regression for Categorical Data
- Gerhard Tutz and Sebastian Petty, "Nonparametric estimation of
the link function including variable selection", Statistics and Computing
**22**(2011): 545--561 - Gerhard Tutz and Jan Ulbricht, "Penalized regression with
correlation-based
penalty", Statistics
and Computing
**19**(2008): 239--253 - Samuel Vaiter, Mohammad Golbabaee, Jalal Fadili, Gabriel Peyré, "Model Selection with Piecewise Regular Gauges", arxiv:1307.2342
- Sara van de Geer, Johannes Lederer, "The Lasso, correlated design, and improved oracle inequalities", arxiv:1107.0189
- Daniela M. Witten and Robert Tibshirani, "Covariance-regularized
regression and classification for high dimensional problems", Journal of the Royal
Statistical Society B
**71**(2009): 615--636 - Simon N. Wood, "Fast stable direct fitting and smoothness selection for Generalized Additive Models", arxiv:0709.3906
- Yingcun Xia, "A Note on the backfitting estimation of additive models", arxiv:0903.3470
- Takuma Yoshida, Kanta Naito, "Asymptotics for penalized splines in generalized additive models", arxiv:1208.3920
- Lan Xue, annie Qu, Jianhui Zhou, "Consistent Model Selection for Marginal Generalized Additive Model for Correlated Data", Journal of the American Statistical Association forthcoming
- Kyusang Yu, Byeong U. Park, Enno Mammen, "Smooth backfitting in generalized additive models", Annals of Statistics
**36**(2008): 228--260, arxiv:0803.1922 - Adriano Zanin Zambom, Michael Akritas, "Nonparametric Model Checking and Variable Selection", arxiv:1205.6761
- Hao Helen Zhang, Guang Cheng and Yufeng Liu, "Linear or Nonlinear? Automatic Structure Discovery for Partially Linear Models", Journal of the American Statistical Association
**106**(2011): 1099--1112 [Presumably they have a reason for not just using an additive model with an extra strong curvature penalty in each univariate smoother.] - Peng Zhau and Bin Yu, "On Model Selection Consistency of Lasso",
Journal
of Machine Learning Research
**7**(2006): 2541--2563 - Hongtu Zhu, Joseph G. Ibrahim, Sikyum Lee, Heping Zhang, "Perturbation selection and influence measures in local influence analysis", Annals of Statistics
**35**(2007): 2565--2588, arxiv:0803.2986