Regularized regression

Regularized regression #

lasso2 solves the elastic net problem

\[\frac{1}{N} (y_i - x_i'\beta)^2 + \frac{\lambda}{N} \alpha ||\Psi\beta ||_1 + \frac{\lambda}{2N}(1-\alpha)||\Psi\beta||_2\]

where

  • \((y_i - x_i'\beta)^2\) is the residual sum of squares (RSS),
  • \(\beta\) is a \(p\) -dimensional parameter vector,
  • \(\lambda\) is the overall penalty level, which controls the general degree of penalization,
  • \(\alpha\) is the elastic net parameter, which determines the relative contribution of \(\ell_1\) (lasso-type) to \(\ell_2\) (ridge-type) penalization. \(\alpha=1\) corresponds to the lasso; \(\alpha=0\) is ridge regression.
  • \(\Psi\) is a \(p\) by \(p\) diagonal matrix of predictor-specific penalty loadings.
  • \(N\) is the number of observations

In addition, lasso2 estimates the square-root lasso (sqrt-lasso) estimator, which is defined as the solution to the following objective function:

\[\sqrt{\frac{1}{N} (y_i - x_i'\beta)^2} + \frac{\lambda}{N} \alpha ||\Psi\beta ||_1\]

lasso2 implements the elastic net and sqrt-lasso using coordinate descent algorithms. The algorithm (then referred to as “shooting”) was first proposed by Fu (1998) for the lasso, and by Van der Kooij (2007) for the elastic net. Belloni et al. (2011) implement the coordinate descent for the sqrt-lasso, and have kindly provided Matlab code.

Penalized regression methods, such as the elastic net and the sqrt-lasso, rely on tuning parameters that control the degree and type of penalization. The estimation methods implemented in lasso2 use two tuning parameters: \(\lambda\) and \(\alpha\) .  

How to select lambda #

lassopack offers three approaches for selecting the “optimal” \(\lambda\) and \(\alpha\) value, which are implemented in lasso2, cvlasso and rlasso, respectively.

  1. Cross-validation: The penalty level \(\lambda\) may be chosen by cross-validation in order to optimize out-of-sample prediction performance. \(K\) -fold cross-validation and rolling cross-validation (for panel and time-series data) are implemented in cvlasso. cvlasso also supports cross-validation across \(\alpha\) .
  2. Theory-driven: Theoretically justified and feasible penalty levels and loadings are available for the lasso and sqrt-lasso via the separate command rlasso. The penalization is chosen to dominate the noise of the data-generating process (represented by the score vector), which allows derivation of theoretical results with regard to consistent prediction and parameter estimation. Since the error variance is in practice unknown, Belloni et al. (2012) introduce the rigorous (or feasible) lasso that relies on an iterative algorithm for estimating the optimal penalization and is valid in the presence of non-Gaussian and heteroskedastic errors. Belloni et al. (2016) extend the framework to the panel data setting. In the case of the sqrt-lasso under homoskedasticity, the optimal penalty level is independent of the unknown error variance, leading to a practical advantage and better performance in finite samples (see Belloni et al., 2011, 2014).
  3. Information criteria: \(\lambda\) can also be selected using information criteria. lasso2 calculates four information criteria: Akaike Information Criterion (AIC; Akaike, 1974), Bayesian Information Criterion (BIC; Schwarz, 1978), Extended Bayesian information criterion (EBIC; Chen & Chen, 2008) and the corrected AIC (AICc; Sugiura, 1978, and Hurvich, 1989).