Theory driven penalty #
rlasso
provides routines for estimating the coefficients of a lasso or square-root lasso
regression with data-dependent, theory-driven penalization.
The number of regressors,
\(p\)
, may be large and possibly greater than the number of
observations,
\(N\)
.
rlasso
implements a version of the lasso that allows for heteroskedastic and clustered
errors; see Belloni et al. (2012, 2016).
We start again with the prostate cancer data for demonstration.
. clear
. insheet using
https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data, tab
Homoskedastic lasso #
The optimal penalization depends on whether the errors are homoskedastic, heteroskedastic or cluster-dependent.
Similar to regress
, rlasso
assumes homoskedasticity by default.
Under homoskedasticity, the optimal penalty level is given by
which guarantees that the “rigorous” lasso is well-behaved. The unobserved \(\sigma\) is estimated using an iterative algorithm.
To run the lasso with theory-driven penalization, type:
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45
---------------------------------------------------
Selected | Lasso Post-est OLS
------------------+--------------------------------
lcavol | 0.4400059 0.5258519
lweight | 0.2385063 0.6617699
svi | 0.3024128 0.6656665
_cons |* 0.9533782 -0.7771568
---------------------------------------------------
*Not penalized
e(lambda)
returns
\(\lambda\)
, and e(lambda0)
stores
\(\lambda_0=\lambda/\hat{\sigma}\)
, i.e.,
the penalty level excluding the standard deviation of the error.
. di e(lambda)
44.984163
. di e(lambda0)
64.923165
Heteroskedastic lasso #
To allow for heteroskedasticity, we specify the robust
option.
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, robust
---------------------------------------------------
Selected | Lasso Post-est OLS
------------------+--------------------------------
lcavol | 0.4518205 0.5258519
lweight | 0.2047086 0.6617699
svi | 0.1995573 0.6656665
_cons |* 1.0823460 -0.7771568
---------------------------------------------------
*Not penalized
The names of selected predictors are stored in e(selected)
(without constant) and e(selected0)
(with constant):
. di e(selected0)
lcavol lweight svi _cons
. di e(selected)
lcavol lweight svi
Square-root lasso #
With the sqrt-lasso of Belloni et al. (2011, 2014), the default penalty level is
\(\lambda=c \sqrt{N} \Phi^{-1}(1-\gamma/(2p)).\)Note the difference by a factor of 2 compared to the standard lasso. More importantly, the optimal penalty level of the square-root lasso is independent of \(\sigma\) , leading to a practical advantage.
The square-root lasso is available through the sqrt
option.
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt
---------------------------------------------------
Selected | Sqrt-lasso Post-est OLS
------------------+--------------------------------
lcavol | 0.4293894 0.5258519
lweight | 0.1861616 0.6617699
svi | 0.2574895 0.6656665
_cons |* 1.1673922 -0.7771568
---------------------------------------------------
*Not penalized
In this example, lasso and square-root lasso select the same variables. Thus the post-estimation OLS estimator, which is OLS using the variables selected, is the same in both cases.
The estimated penalty level is:
. di e(lambda)
32.461583
The square-root lasso also allows for heteroskedastic errors:
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt robust
---------------------------------------------------
Selected | Sqrt-lasso Post-est OLS
------------------+--------------------------------
lcavol | 0.4402037 0.5258519
lweight | 0.1329878 0.6617699
svi | 0.1264166 0.6656665
_cons |* 1.3741342 -0.7771568
---------------------------------------------------
*Not penalized
Cluster-dependent errors #
Both rigorous lasso and rigorous square-root lasso allow
for within-panel correlation (based on Belloni et al., 2016, JBES).
The fe
option applies the within-transformation and cluster()
specifies
the cluster variable.
NB: The two regressions below take a few minutes to run,
and you might need to increase the maximum matsize
using set matsize
.
In this example, we interact the variable grade
and age
using
Stata’s factor variable notation (see help factor variables
).
. webuse nlswork
. xtset idcode
. rlasso ln_w i.grade#i.age ttl_exp tenure not_smsa south, ///
fe cluster(idcode)
---------------------------------------------------
Selected | Lasso Post-est OLS
------------------+--------------------------------
grade#age |
12 18 | -0.1226071 -0.2087164
12 19 | -0.0481608 -0.1109979
12 20 | -0.0088640 -0.0627530
|
ttl_exp | 0.0206773 0.0226526
tenure | 0.0107726 0.0123681
not_smsa | -0.0305386 -0.0957148
---------------------------------------------------
The results of cluster lasso and cluster square-root lasso are again similar:
. rlasso ln_w i.grade#i.age ttl_exp tenure not_smsa south, ///
sqrt fe cluster(idcode)
Selected | Sqrt-lasso Post-est OLS
------------------+--------------------------------
grade#age |
12 18 | -0.1223057 -0.2087164
12 19 | -0.0479408 -0.1109979
12 20 | -0.0086753 -0.0627530
|
ttl_exp | 0.0206704 0.0226526
tenure | 0.0107671 0.0123681
not_smsa | -0.0303104 -0.0957148
---------------------------------------------------
More #
More information can be found in the help file:
help rlasso