Regularized regression #
lasso2
solves the elastic net problem
where
- \((y_i - x_i'\beta)^2\) is the residual sum of squares (RSS),
- \(\beta\) is a \(p\) -dimensional parameter vector,
- \(\lambda\) is the overall penalty level, which controls the general degree of penalization,
- \(\alpha\) is the elastic net parameter, which determines the relative contribution of \(\ell_1\) (lasso-type) to \(\ell_2\) (ridge-type) penalization. \(\alpha=1\) corresponds to the lasso; \(\alpha=0\) is ridge regression.
- \(\Psi\) is a \(p\) by \(p\) diagonal matrix of predictor-specific penalty loadings.
- \(N\) is the number of observations
In addition, lasso2
estimates the square-root lasso (sqrt-lasso) estimator,
which is defined as the solution to the following objective function:
lasso2
implements the elastic net and sqrt-lasso using coordinate descent algorithms. The algorithm (then
referred to as “shooting”) was first proposed by Fu (1998) for the lasso, and by Van der Kooij (2007) for the elastic net. Belloni et al. (2011) implement the coordinate descent for the sqrt-lasso, and have kindly provided Matlab code.
Penalized regression methods, such as the elastic net and the sqrt-lasso, rely on tuning parameters that control
the degree and type of penalization. The estimation methods implemented in lasso2
use two tuning parameters:
\(\lambda\)
and
\(\alpha\)
.
How to select lambda #
lassopack offers three approaches for selecting the “optimal”
\(\lambda\)
and
\(\alpha\)
value, which are implemented in lasso2
, cvlasso
and rlasso
, respectively.
- Cross-validation: The penalty level
\(\lambda\)
may be chosen by cross-validation in order to optimize out-of-sample prediction
performance.
\(K\)
-fold cross-validation and rolling cross-validation (for panel and time-series data) are
implemented in cvlasso.
cvlasso
also supports cross-validation across \(\alpha\) . - Theory-driven: Theoretically justified and feasible penalty levels and loadings are available for the lasso and sqrt-lasso via the separate command
rlasso
. The penalization is chosen to dominate the noise of the data-generating process (represented by the score vector), which allows derivation of theoretical results with regard to consistent prediction and parameter estimation. Since the error variance is in practice unknown, Belloni et al. (2012) introduce the rigorous (or feasible) lasso that relies on an iterative algorithm for estimating the optimal penalization and is valid in the presence of non-Gaussian and heteroskedastic errors. Belloni et al. (2016) extend the framework to the panel data setting. In the case of the sqrt-lasso under homoskedasticity, the optimal penalty level is independent of the unknown error variance, leading to a practical advantage and better performance in finite samples (see Belloni et al., 2011, 2014). - Information criteria:
\(\lambda\)
can also be selected using information criteria.
lasso2
calculates four information criteria: Akaike Information Criterion (AIC; Akaike, 1974), Bayesian Information Criterion (BIC; Schwarz, 1978), Extended Bayesian information criterion (EBIC; Chen & Chen, 2008) and the corrected AIC (AICc; Sugiura, 1978, and Hurvich, 1989).