----------------------------------------------------------------------------------------------------------------------------------
help rlasso lassopack v1.4.2
----------------------------------------------------------------------------------------------------------------------------------
Title
rlasso -- Program for lasso and sqrt-lasso estimation with data-driven penalization
Syntax
rlasso depvar regressors [weight] [if exp] [in range] [ , sqrt partial(varlist) pnotpen(varlist) psolver(string)
norecover noconstant fe noftools robust cluster(varlist) bw(int) kernel(string) center xdependent numsim(int)
prestd tolopt(real) tolpsi(real) tolzero(real) maxiter(int) maxpsiiter(int) maxabsx lassopsi corrnumber(int)
lalternative gamma(real) maq c(real) c0(real) supscore ssnumsim(int) testonly seed(real) displayall postall ols
verbose vverbose ]
Note: the fe option will take advantage of the ftools package (if installed) for the fixed-effects transform; the speed
gains using this package can be large. See help ftools or click on ssc install ftools to install.
General options Description
----------------------------------------------------------------------------------------------------------------------------
sqrt use sqrt-lasso (default is standard lasso)
noconstant suppress constant from regression (cannot be used with aweights or pweights)
fe fixed-effects model (requires data to be xtset)
noftools do not use FTOOLS package for fixed-effects transform (slower; rarely used)
partial(varlist) variables partialled-out prior to lasso estimation, including the constant (if present); to
partial-out just the constant, specify partial(_cons)
pnotpen(varlist) variables not penalized by lasso
psolver(string) override default solver used for partialling out (one of: qr, qrxx, lu, luxx, svd, svdxx, chol;
default=qrxx)
norecover suppress recovery of partialled out variables after estimation.
robust lasso penalty loadings account for heteroskedasticity
cluster(varlist) lasso penalty loadings account for clustering; both standard (1-way) and 2-way clustering supported
bw(int) lasso penalty loadings account for autocorrelation (AC) using bandwidth int; use with robust to
account for both heteroskedasticity and autocorrelation (HAC)
kernel(string) kernel used for HAC/AC penalty loadings (one of: bartlett, truncated, parzen, thann, thamm, daniell,
tent, qs; default=bartlett)
center center moments in heteroskedastic and cluster-robust loadings
lassopsi use lasso or sqrt-lasso residuals to obtain penalty loadings (psi) (default is post-lasso)
corrnumber(int) number of high-correlation regressors used to obtain initial residuals; default=5; if =0, then depvar
is used in place of residuals
prestd standardize data prior to estimation (default is standardize during estimation via penalty loadings)
seed(real) set Stata's random number seed prior to xdep and supscore simulations (default=leave state unchanged)
Lambda Description
----------------------------------------------------------------------------------------------------------------------------
xdependent penalty level is estimated depending on X
numsim(int) number of simulations used for the X-dependent case (default=5000)
lalternative alternative (less sharp) lambda0 = 2c*sqrt(N)*sqrt(2*log(2*p/gamma)) (sqrt-lasso = replace 2c with c)
gamma(real) "gamma" in lambda0 function (default = 0.1/log(N); cluster-lasso = 0.1/log(N_clust))
maq (HAC/AC with truncated kernel only) "gamma" in lambda0 function = 0.1/log(N/(bw+1)); mimics
cluster-robust
c(real) "c" in lambda0 function (default = 1.1)
c0(real) (rarely used) "c" in lambda0 function in first iteration only when iterating to obtain penalty
loadings (default = 1.1)
Optimization Description
----------------------------------------------------------------------------------------------------------------------------
tolopt(real) tolerance for lasso shooting algorithm (default=1e-10)
tolpsi(real) tolerance for penalty loadings algorithm (default=1e-4)
tolzero(real) minimum below which coeffs are rounded down to zero (default=1e-4)
maxiter(int) maximum number of iterations for the lasso shooting algorithm (default=10k)
maxpsiiter(int) maximum number of lasso-based iterations for penalty loadings (psi) algorithm (default=2)
maxabsx (sqrt-lasso only) use max(abs(x_ij)) as initial penalty loadings as per Belloni et al. (2014)
Sup-score test Description
----------------------------------------------------------------------------------------------------------------------------
supscore report sup-score test of statistical significance
testonly report only sup-score test; do not estimate lasso regression
ssgamma(real) test level for conservative critical value for the sup-score test (default = 0.05, i.e., 5%
significance level)
ssnumsim(int) number of simulations for sup-score test multiplier bootstrap (default=500; 0 => do not simulate)
Display and post Description
----------------------------------------------------------------------------------------------------------------------------
displayall display full coefficient vectors including unselected variables (default: display only selected,
unpenalized and partialled-out)
postall post full coefficient vector including unselected variables in e(b) (default: e(b) has only selected,
unpenalized and partialled-out)
ols post OLS coefs using lasso-selected variables in e(b) (default is lasso coefs)
verbose show additional output
vverbose show even more output
dots show dots corresponding to repetitions in simulations (xdep and supscore)
----------------------------------------------------------------------------------------------------------------------------
Postestimation:
predict [type] newvar [if] [in] [ , xb u e ue xbu resid lasso noisily ols ]
predict is not currently supported after fixed-effects estimation.
Options Description
----------------------------------------------------------------------------------------------------------------------------
xb generate fitted values (default)
residuals generate residuals
e generate overall error component e(it). Only after fe.
ue generate combined residuals, i.e., u(i) + e(it). Only after fe.
xbu prediction including fixed effect, i.e., a + xb + u(i). Only after fe.
u fixed effect, i.e., u(i). Only after fe.
noisily displays beta used for prediction.
lasso use lasso coefficients for prediction (default is posted e(b) matrix)
ols use OLS coefficients based on lasso-selected variables for prediction (default is posted e(b) matrix)
----------------------------------------------------------------------------------------------------------------------------
Replay:
rlasso [ , displayall ]
Options Description
----------------------------------------------------------------------------------------------------------------------------
displayall display full coefficient vectors including unselected variables (default: display only selected,
unpenalized and partialled-out)
----------------------------------------------------------------------------------------------------------------------------
rlasso may be used with time-series or panel data, in which case the data must be tsset or xtset first; see help tsset or
xtset.
aweights and pweights are supported; see help weights. pweights is equivalent to aweights + robust.
All varlists may contain time-series operators or factor variables; see help varlist.
Contents
Description
Estimation methods
Penalty loadings
Sup-score test of joint significance
Computational notes
Miscellaneous
Version notes
Examples of usage
Saved results
References
Website
Installation
Acknowledgements
Citation of lassopack
Description
rlasso is a routine for estimating the coefficients of a lasso or square-root lasso (sqrt-lasso) regression where the lasso
penalization is data-dependent and where the number of regressors p may be large and possibly greater than the number of
observations. The lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996) is a regression method that uses
regularization and the L1 norm. rlasso implements a version of the lasso that allows for heteroskedastic and clustered
errors; see Belloni et al. (2012, 2013, 2014, 2016). For an overview of rlasso and the theory behind it, see Ahrens et al.
(2020)
The default estimator implemented by rlasso is the lasso. An alternative that does not involve estimating the error
variance is the square-root-lasso (sqrt-lasso) of Belloni et al. (2011, 2014), available with the sqrt option.
The lasso and sqrt-lasso estimators achieve sparse solutions: of the full set of p predictors, typically most will have
coefficients set to zero and only s<<p will be non-zero. The "post-lasso" estimator is OLS applied to the variables with
non-zero lasso or sqrt-lasso coefficients, i.e., OLS using the variables selected by the lasso or sqrt-lasso. The
lasso/sqrt-lasso and post-lasso coefficients are stored in e(beta) and e(betaOLS), respectively. By default, rlasso posts
the lasso or sqrt-lasso coefficients in e(b). To post in e(b) the OLS coefficients based on lasso- or sqrt-lasso-selected
variables, use the ols option.
Estimation methods
rlasso solves the following problem
min 1/N RSS + lambda/N*||Psi*beta||_1,
where
RSS = sum(y(i)-x(i)'beta)^2 denotes the residual sum of squares,
beta is a p-dimensional parameter vector,
lambda is the overall penalty level,
||.||_1 denotes the L1-norm, i.e., sum_i(abs(a[i]));
Psi is a p by p diagonal matrix of predictor-specific penalty loadings. Note that rlasso treats Psi as a row vector.
N number of observations
If the option sqrt is specified, rlasso estimates the sqrt-lasso estimator, which is defined as the solution to:
min sqrt(1/N*RSS) + lambda/N*||Psi*beta||_1.
Note: the above lambda differs from the definition used in parts of the lasso and elastic net literature; see for example
the R package glmnet by Friedman et al. (2010). The objective functions here follow the format of Belloni et al. (2011,
2012). Specifically, lambda(r)=2*N*lambda(GN) where lambda(r) is the penalty level used by rlasso and lambda(GN) is the
penalty level used by glmnet.
rlasso obtains the solutions to the lasso sqrt-lasso using coordinate descent algorithms. The algorithm was first proposed
by Fu (1998) for the lasso (then referred to as "shooting"). For further details of how the lasso and sqrt-lasso solutions
are obtained, see lasso2.
rlasso first estimates the lasso penalty level and then uses the coordinate descent algorithm to obtain the lasso
coefficients. For the homoskedastic case, a single penalty level lambda is applied; in the heteroskedastic and cluster
cases, the penalty loadings vary across regressors. The methods are discussed in detail in Belloni et al. (2012, 2013,
2014, 2016) and are described only briefly here. For a detailed discussion of an R implementation of rlasso, see Spindler
et al. (2016).
For compatibility with the wider lasso literature, the documentation here uses "lambda" to refer to the penalty level that,
combined with the possibly regressor-specific penalty loadings, is used with the estimation algorithm to obtain the lasso
coefficients. "lambda0" refers to the component of the overall lasso penalty level that does not depend on the error
variance. Note that this terminology differs from that in the R implementation of rlasso by Spindler et al. (2016).
The default lambda0 for the lasso is 2c*sqrt(N)*invnormal(1-gamma/(2p)), where p is the number of penalized regressors and c
and gamma are constants with default values of 1.1 and 0.1/log(N), respectively. In the cluster-lasso (Belloni et al. 2016)
the default gamma is 0.1/log(N_clust), where N_clust is the number of clusters (saved in e(N_clust)). The default lambda0s
for the sqrt-lasso are the same except replace 2c with c. The constant c>1.0 is a slack parameter; gamma controls the
confidence level. The alternative formula lambda0 = 2c*sqrt(N)*sqrt(2*log(2p/gamma)) is available with the lalt option.
The constants c and gamma can be set using the c(real) and gamma(real) options. The xdep option is another alternative that
implements an "X-dependent" penalty level lambda0; see Belloni and Chernozhukov (2011) and Belloni et al. (2013) for
discussion.
The default lambda for the lasso in the i.i.d. case is lambda0*rmse, where rmse is an estimate of the standard deviation of
the error variance. The sqrt-lasso differs from the standard lasso in that the penalty term lambda is pivotal in the
homoskedastic case and does not depend on the error variance. The default for the sqrt-lasso in the i.i.d. case is
lambda=lambda0=c*sqrt(N)*invnormal(1-gamma/(2*p)) (note the absence of the factor of "2" vs. the lasso lambda).
Penalty loadings
As is standard in the lasso literature, regressors are standardized to have unit variance. By default, standardization is
achieved by incorporating the standard deviations of the regressors into the penalty loadings. In the default homoskedastic
case, the penalty loadings are the vector of standard deviations of the regressors. The normalized penalty loadings are the
penalty loadings normalized by the SDs of the regressors. In the homoskedastic case the normalized penalty loadings are a
vector of 1s. rlasso saves the vector of penalty loadings, the vector of normalized penalty loadings, and the vector of SDs
of the regressors X in e(.) macros.
Penalty loadings are constructed after the partialling-out of unpenalized regressors and/or the FE (fixed-effects)
transformation, if applicable. A alternative to partialling-out unpenalized regressors with the partial(varlist) option is
to give them penalty loadings of zero with the pnotpen(varlist) option. By the Frisch-Waugh-Lovell Theorem for the lasso
(Yamada 2017), the estimated lasso coefficients are the same in theory (but see below) whether the unpenalized regressors
are partialled-out or given zero penalty loadings, so long as the same penalty loadings are used for the penalized
regressors in both cases. Note that the calculation of the penalty loadings in both the partial(.) and pnotpen(.) cases
involves adjustments for the partialled-out variables. This is different from the lasso2 handling of unpenalized variables
specified in the lasso2 option notpen(.), where no such adjustment of the penalty loadings is made (and is why the two
no-penalization options are named differently).
Regressor-specific penalty loadings for the heteroskedastic and clustered cases are derived following the methods described
in Belloni et al. (2012, 2013, 2014, 2015, 2016). The penalty loadings for the heteroskedastic-robust case have elements of
the form sqrt[avg(x^2e^2)]/sqrt[avg(e^2)] where x is a (demeaned) regressor, e is the residual, and sqrt[avg(e^2)] is the
root mean squared error; the normalized penalty loadings have elements sqrt[avg(x^2e^2)]/(sqrt[avg(x^2)]sqrt[avg(e^2)])
where the sqrt(avg(x^2) in the denominator is SD(x), the standard deviation of x. This corresponds to the presentation of
penalty loadings in Belloni et al. (2014; see Algorithm 1 but note that in their presentation, the predictors x are assumed
already to be standardized). NB: in the presentation we use here, the penalty loadings for the lasso and sqrt-lasso are the
same; what differs is the overall penalty term lambda.
The cluster-robust case is similar to the heteroskedastic case except that numerator sqrt[avg(x^2e^2)] in the
heteroskedastic case is replaced by sqrt[avg(u_i^2)], where (using the notation of the Stata manual's discussion of the
_robust command) u_i is the sum of x_ij*e_ij over the j members of cluster i; see Belloni et al. (2016). Again in the
presentation used here, the cluster-lasso and cluster-sqrt-lasso penalty loadings are the same. The unit vector is again
the benchmark for the standardized penalty loadings. NB: also following _robust, the denominator of avg(u_i^2) and Tbar is
(N_clust-1).
cluster(varname1 varname2) implements two-way cluster-robust penalty loadings (Cameron et al. 2011; Thompson 2011).
"Two-way cluster-robust" means the penalty loadings accommodate arbitrary within-group correlation in two distinct
non-nested categories defined by varname1 and varname2. Note that the asymptotic justification for the two-way
cluster-robust approach requires both dimensions to be "large" (go off to infinity).
Autocorrelation-consistent (AC) and heteroskedastic and autocorrelation-consistent (HAC) penalty loadings can be obtained by
using the bw(int) option on its own (AC) or in combination with the robust option (HAC), where int specifies the bandwidth;
see Chernozhukov et al. (2018, 2020) and Ahrens et al. (2020). Syntax and usage follows that used by ivreg2; see the ivreg2
help file for details. The default is to use the Bartlett kernel; this can be changed using the kernel option. The full
list of kernels available is (abbreviations in parentheses): Bartlett (bar); Truncated (tru); Parzen (par); Tukey-Hanning
(thann); Tukey-Hamming (thamm); Daniell (dan); Tent (ten); and Quadratic-Spectral (qua or qs). AC and HAC penalty loadings
can also be used for (large T) panel data; this requires the dataset to be xtset.
Note that for some kernels it is possible in finite samples to obtain negative variances and hence undefined penalty
loadings; the same is true of two-way cluster-robust. Intutively, this arises because the covariance term in a calculation
like var+var-2cov is "too big". When this happens, rlasso issues a warning and (arbitrarily) replaces 2cov with cov.
The center option centers the x_ij*e_ij terms (or in the cluster-lasso case, the u_i terms) prior to calculating the penalty
loadings.
Sup-score test of joint significance
rlasso with the supscore option reports a test of the null hypothesis H0: beta_1 = ... = beta_p = 0. i.e., a test of the
joint significance of the regressors (or, alternatively, a test that H0: s=0; of the full set of p regressors, none is in
the true model). The test follows Chernozhukov et al. (2013, Appendix M); see also Belloni et al. (2012, 2013). (The
variables are assumed to be rescaled to be centered and with unit variance.)
If the null hypothesis is correct and the rest of the model is well-specified (including the assumption that the regressors
are orthogonal to the disturbance e), then E(e*x_j) = E((y-beta_0)*x_j) = 0, j=1...p where beta_0 is the intercept. The
sup-score statistic is S=sqrt(N)*max_j(abs(avg((y-b_0)*x_j))/(sqrt(avg(((y-b_0)*x_j)^2)))), where: (a) the numerator
abs(avg((y-b_0)*x_j)) is the absolute value of the average score for regressor x_j and b_0 is sample mean of y; (b) the
denominator sqrt(avg(((y-b_0)*x_j)^2)) is the sample standard deviation of the score; (c) the statistic is sqrt(N) times the
maximum across the p regressors of the ratio of (a) to (b).
The p-value for the sup-score test is obtained by a multiplier bootstrap procedure simulating the statistic W, defined as
W=sqrt(N)*max_j(abs(avg((y-b_0)*x_j*u))/(sqrt(avg(((y-b_0)*x_j)^2)))) where u is an iid standard normal variate independent
of the data. The ssnumsim(int) option controls the number of simulated draws (default=500); ssnumsim(0) requests that the
sup-score statistic is reported without a simulation-based p-value. rlasso also reports a conservative critical value
(asymptotic bound) as per Belloni et al. (2012, 2013), defined as c*invnormal(1-gamma/(2p)); this can be set by the option
ssgamma(int) (default = 0.05).
Computational notes
A computational alternative to the default of standardizing "on the fly" (i.e., incorporating the standardization into the
lasso penalty loadings) is to standardize all variables to have unit variance prior to computing the lasso coefficients.
This can be done using the prestd option. The results are equivalent in theory. The prestd option can lead to improved
numerical precision or more stable results in the case of difficult problems; the cost is (a typically small) computation
time required to standardize the data.
Either the partial(varlist) option or the pnotpen(varlist) option can be used for variables that should not be penalized by
the lasso. The options are equivalent in theory (see above), but numerical results can differ in practice because of the
different calculation methods used. Partialling-out variables can lead to improved numerical precision or more stable
results in the case of difficult problems vs. specifying the variables as unpenalized, but may be slower in terms of
computation time.
Both the partial(varlist) and pnotpen(varlist) options use least squares. This is implemented in Mata using one of Mata's
solvers. In cases where the variables to be partialled out are collinear or nearly so, different solvers may generate
different results. Users may wish to check the stability of their results in such cases. The psolver(.) option can be used
to specify the Mata solver used. The default behavior of rlasso to solve AX=B for X is to use the QR decomposition applied
to (A'A) and (A'B), i.e., qrsolve((A'A),(A'B)), abbreviated qrxx. Available options are qr, qrxx, lu, luxx, svd, svdxx,
where, e.g., svd indicates using svsolve(A,B) and svdxx indicates using svsolve((A'A),(A'B)). rlasso will warn if collinear
variables are dropped when partialling out.
By default the constant (if present) is not penalized if there are no regressors being partialled out; this is equivalent to
mean-centering prior to estimation. The exception to this is if aweights or aweights are specified, in which case the
constant is partialled-out. The partial(varlist) option will automatically also partial out the constant (if present); to
partial out just the constant, specify partial(_cons). The within transformation implemented by the fe option automatically
mean-centers the data; the nocons option is redundant in this case and may not be specified with this option.
The prestd and pnotpen(varlist) vs. partial(varlist) options can be used as simple checks for numerical stability by
comparing results that should be equivalent in theory. If the results differ, the values of the minimized objective
functions (e(pmse) or e(prmse)) can be compared.
The fe fixed-effects option is equivalent to (but computationally faster and more accurate than) specifying unpenalized
panel-specific dummies. The fixed-effects ("within") transformation also removes the constant as well as the fixed effects.
The panel variable used by the fe option is the panel variable set by xtset. To use weights with fixed effects, the ftools
must be installed.
Miscellaneous
By default rlasso reports only the set of selected variables and their lasso and post-lasso coefficients; the omitted
coefficients are not reported in the regression output. The postall and displayall options allow the full coefficient
vector (with coefficients of unselected variables set to zero) to be either posted in e(b) or displayed as output.
rlasso, like the lasso in general, accommodates possibly perfectly-collinear sets of regressors. Stata's factor variables
are supported by rlasso (as well as by lasso2). Users therefore have the option of specifying as regressors one or more
complete sets of factor variables or interactions with no base levels using the ibn prefix. This can be interpreted as
allowing rlasso to choose the members of the base category.
The choice of whether to use partial(varlist) or pnotpen(varlist) will depend on the circumstances faced by the user. The
partial(varlist) option can be helpful in dealing with data that have scaling problems or collinearity issues; in these
cases it can be more accurate and/or achieve convergence faster than the pnotpen(varlist) option. The pnotpen(varlist)
option will sometimes be faster because it avoids using the pre-estimation transformation employed by partial(varlist). The
two options can be used simultaneously (but not for the same variables).
The treatment of standardization, penalization and partialling-out in rlasso differs from that of lasso2. In the rlasso
treatment, standardization incorporates the partialling-out of regressors listed in the pnotpen(varlist) list as well as
those in the partial(varlist) list. This is in order to maintain the equivalence of the lasso estimator irrespective of
which option is used for unpenalized variables (see the discussion of the Frisch-Waugh-Lovell Theorem for the lasso above).
In the lasso2 treatment, standardization takes place after the partialling-out of only the regressors listed in the
notpen(varlist) option. In other words, rlasso adjusts the penalty loadings for any unpenalized variables; lasso2 does not.
For further details, see lasso2.
The initial overhead for fixed-effects estimation and/or partialling out and/or pre-estimation standardization (creating
temporary variables and then transforming the data) can be noticable for large datasets. For problems that involve looping
over data, users may wish to first transform the data by hand.
If a small number of correlations is set using the corrnum(int) option, users may want to increase the number of penalty
loadings iterations from the default of 2 to something higher using the maxpsiiter(int) option.
The sup-score p-value is obtained by simulation, which can be time-consuming for large datasets. To skip this and use only
the conservative (asymptotic bound) critical value, set the number of simulations to zero with the ssnumsim(0) option.
Version notes
Detailed version notes can be found inside the ado files rlasso.ado and lassoutils.ado. Noteworthy changes appear below.
In versions of lassoutils prior to 1.1.01 (8 Nov 2018), the very first iteration to obtain penalty loadings set the constant
c=0.55. This was dropped in version 1.1.01, and the constant c is unchanged in all iterations. To replicate the previous
behavior of rlasso, use the c0(real) option. For example, with the default value of c=1.1, to replicate the earlier
behavior use c0(0.55).
In versions of lassoutils prior to 1.1.01 (8 Nov 2018), the sup-score test statistic S was N*max_j rather than sqrt(N)*max_j
as in Chernozhukov et al. (2013), and similarly for the simulated statistic W.
Examples using prostate cancer data from Hastie et al. (2009)
Load prostate cancer data.
. clear
. insheet using https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data, tab
Estimate lasso using data-driven lambda penalty; default homoskedasticity case.
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45
Use square-root lasso instead.
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt
Illustrate relationships between lambda, lambda0 and penalty loadings:
Basic usage: homoskedastic case, lasso
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45
lambda=lambda0*SD is lasso penalty; incorporates the estimate of the error variance
default lambda0 is 2c*sqrt(N)*invnormal(1-gamma/(2*p))
. di e(lambda)
. di e(lambda0)
In the homoskedastic case, penalty loadings are the vector of SDs of penalized regressors
. mat list e(ePsi)
...and the standardized penalty loadings are a vector of 1s.
. mat list e(sPsi)
Heteroskedastic case, lasso
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, robust
lambda and lambda0 are the same as for the homoskedastic case
. di e(lambda)
. di e(lambda0)
Penalty loadings account for heteroskedasticity as well as incorporating SD(x)
. mat list e(ePsi)
...and the standardized penalty loadings are not a vector of 1s.
. mat list e(sPsi)
Homoskedastic case, sqrt-lasso
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt
with the sqrt-lasso, the default lambda=lambda0=c*sqrt(N)*invnormal(1-gamma/(2*p));
note the difference by a factor of 2 vs. the standard lasso lambda0
. di e(lambda)
. di e(lambda0)
rlasso vs. lasso2 (if installed)
. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45
lambda=lambda0*SD is lasso penalty; incorporates the estimate of the error variance
default lambda0 is 2c*sqrt(N)*invnormal(1-gamma/(2*p))
. di %8.5f e(lambda)
Replicate rlasso estimates using rlasso lambda and lasso2
. lasso2 lpsa lcavol lweight age lbph svi lcp gleason pgg45, lambda(44.34953)
Examples using data from Acemoglu-Johnson-Robinson (2001)
Load and reorder AJR data for Table 6 and Table 8 (datasets need to be in current directory).
. clear
. (click to download maketable6.zip from economics.mit.edu)
. unzipfile maketable6
. (click to download maketable8.zip from economics.mit.edu)
. unzipfile maketable8
. use maketable6
. merge 1:1 shortnam using maketable8
. keep if baseco==1
. order shortnam logpgp95 avexpr lat_abst logem4 edes1975 avelf, first
. order indtime euro1900 democ1 cons1 democ00a cons00a, last
Alternatively, load AJR data from our website (no manual download required):
. clear
. use https://statalasso.github.io/dta/AJR.dta
Basic usage:
. rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres
Heteroskedastic-robust penalty loadings:
. rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, robust
Partialling-out vs. non-penalization:
. rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, partial(lat_abst)
. rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, pnotpen(lat_abst)
Request sup-score test (H0: all betas=0):
. rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, supscore
Examples using data from Angrist-Krueger (1991)
Load AK data and rename variables (dataset needs to be in current directory). NB: this is a large dataset (330k
observations) and estimations may take some time to run on some installations.
. clear
. (click to download asciiqob.zip from economics.mit.edu)
. unzipfile asciiqob.zip
. infix lnwage 1-9 edu 10-20 yob 21-31 qob 32-42 pob 43-53 using asciiqob.txt
Alternatively, get data from our website source (no unzipping needed):
. use https://statalasso.github.io/dta/AK91.dta
xtset data by place of birth (state):
. xtset pob
State (place of birth) fixed effects; regressors are year of birth, quarter of birth and QOBxYOB.
. rlasso edu i.yob# #i.qob, fe
As above but explicit penalized state dummies and all categories (no base category) for all factor vars.
Note that the (unpenalized) constant is reported.
. rlasso edu ibn.yob# #ibn.qob ibn.pob
State fixed effects; regressors are YOB, QOB and QOBxYOB; cluster on state.
. rlasso edu i.yob# #i.qob, fe cluster(pob)
Example using data from Belloni et al. (2015)
Load dataset on eminent domain (available at journal website).
. clear
. import excel using CSExampleData.xlsx, first
Settings used in Belloni et al. (2015) - results as in text discussion (p=147):
. rlasso NumProCase Z* BA BL DF, robust lalt corrnum(0) maxpsiiter(100) c0(0.55)
. di e(p)
Settings used in Belloni et al. (2015) - results as in journal replication file (p=144):
. rlasso NumProCase Z*, robust lalt corrnum(0) maxpsiiter(100) c0(0.55)
. di e(p)
Examples illustrating AC/HAC penalty loadingss
. use http://fmwww.bc.edu/ec-p/data/wooldridge/phillips.dta
. tsset year, yearly
Autocorrelation-consistent (AC) penalty loadings; bandwidth=3; default kernel is Bartlett.
. rlasso cinf L(0/10).unem, bw(3)
Heteroskedastic- and autocorrelation-consistent (HAC) penalty loadings; bandwidth=5; kernel is quadratic-spectral.
. rlasso cinf L(0/10).unem, bw(5) rob kernel(qs)
Saved results
rlasso saves the following in e():
scalars
e(N) sample size
e(N_clust) number of clusters in cluster-robust estimation; in the case of 2-way cluster-robust,
e(N_clust)=min(e(N_clust1),e(N_clust2))
e(N_g) number of groups in fixed-effects model
e(p) number of penalized regressors in model
e(s) number of selected regressors
e(s0) number of selected and unpenalized regressors including constant (if present)
e(lambda0) penalty level excluding rmse (default = 2c*sqrt(N)*invnormal(1-gamma/(2*p)))
e(lambda) lasso: penalty level including rmse (=lambda0*rmse); sqrt-lasso: lambda=lambda0
e(slambda) standardized lambda; equiv to lambda used on standardized data; lasso: slambda=lambda/SD(depvar);
sqrt-lasso: slambda=lambda0
e(c) parameter in penalty level lambda
e(gamma) parameter in penalty level lambda
e(niter) number of iterations for shooting algorithm
e(maxiter) max number of iterations for shooting algorithm
e(npsiiter) number of iterations for loadings algorithm
e(maxpsiiter) max iterations for loadings algorithm
e(r2) R-sq for lasso estimation
e(rmse) rmse using lasso resduals
e(rmseOLS) rmse using post-lasso residuals
e(pmse) minimized objective function (penalized mse, standard lasso only)
e(prmse) minimized objective function (penalized rmse, sqrt-lasso only)
e(cons) =1 if constant in model, =0 otherwise
e(fe) =1 if fixed-effects model, =0 otherwise
e(center) =1 if moments have been centered
e(bw) (HAC/AC only) bandwidth used
e(supscore) sup-score statistic
e(supscore_p) sup-score p-value
e(supscore_cv) sup-score critical value (asymptotic bound)
macros
e(cmd) rlasso
e(cmdline) command line
e(depvar) name of dependent variable
e(varX) all regressors
e(varXmodel) penalized regressors
e(pnotpen) unpenalized regressors
e(partial) partialled-out regressors
e(selected) selected and penalized regressors
e(selected0) all selected regressors including unpenalized and constant (if present)
e(method) lasso or sqrt-lasso
e(estimator) lasso, sqrt-lasso or post-lasso ols posted in e(b)
e(robust) heteroskedastic-robust penalty loadings
e(clustvar) variable defining clusters for cluster-robust penalty loadings; if two-way clustering is used, the
variables are in e(clustvar1) and e(clustvar2)
e(kernel) (HAC/AC only) kernel used
e(ivar) variable defining groups for fixed-effects model
matrices
e(b) posted coefficient vector
e(beta) lasso or sqrt-lasso coefficient vector
e(betaOLS) post-lasso coefficient vector
e(betaAll) full lasso or sqrt-lasso coefficient vector including omitted, factor base variables, etc.
e(betaAllOLS) full post-lasso coefficient vector including omitted, factor base variables, etc.
e(ePsi) estimated penalty loadings
e(sPsi) standardized penalty loadings (vector of 1s in homoskedastic case
functions
e(sample) estimation sample
References
Acemoglu, D., Johnson, S. and Robinson, J.A. 2001. The colonial origins of comparative development: An empirical
investigation. American Economic Review, 91(5):1369-1401. https://economics.mit.edu/files/4123
Ahrens, A., Aitkens, C., Dizten, J., Ersoy, E., Kohns, D. and M.E. Schaffer. 2020. A Theory-based Lasso for Time-Series
Data. Invited paper for the International Conference of Econometrics of Vietnam, January 2020. Forthcoming in Studies
in Computational Intelligence (Springer).
Ahrens, A., Hansen, C.B. and M.E. Schaffer. 2020. lassopack: model selection and prediction with regularized regression in
Stata. The Stata Journal, 20(1):176-235. https://journals.sagepub.com/doi/abs/10.1177/1536867X20909697. Working paper
version: https://arxiv.org/abs/1901.05397.
Angrist, J. and Kruger, A. 1991. Does compulsory school attendance affect schooling and earnings? Quarterly Journal of
Economics 106(4):979-1014. http://www.jstor.org/stable/2937954
Belloni, A. and Chernozhukov, V. 2011. High-dimensional sparse econometric models: An introduction. In Alquier, P.,
Gautier E., and Stoltz, G. (eds.), Inverse problems and high-dimensional estimation. Lecture notes in statistics, vol.
203. Springer, Berlin, Heidelberg. https://arxiv.org/pdf/1106.5242.pdf
Belloni, A., Chernozhukov, V. and Wang, L. 2011. Square-root lasso: Pivotal recovery of sparse signals via conic
programming. Biometrika 98:791-806. https://doi.org/10.1214/14-AOS1204
Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. 2012. Sparse models and methods for optimal instruments with an
application to eminent domain. Econometrica 80(6):2369-2429.
http://onlinelibrary.wiley.com/doi/10.3982/ECTA9626/abstract
Belloni, A., Chernozhukov, V. and Hansen, C. 2013. Inference for high-dimensional sparse econometric models. In Advances
in Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambridge University Press: Cambridge,
245-295. http://arxiv.org/abs/1201.0220
Belloni, A., Chernozhukov, V. and Hansen, C. 2014. Inference on treatment effects after selection among high-dimensional
controls. Review of Economic Studies 81:608-650. https://doi.org/10.1093/restud/rdt044
Belloni, A., Chernozhukov, V. and Hansen, C. 2015. High-dimensional methods and inference on structural and treatment
effects. Journal of Economic Perspectives 28(2):29-50. http://www.aeaweb.org/articles.php?doi=10.1257/jep.28.2.29
Belloni, A., Chernozhukov, V., Hansen, C. and Kozbur, D. 2016. Inference in high dimensional panel models with an
application to gun control. Journal of Business and Economic Statistics 34(4):590-605.
http://amstat.tandfonline.com/doi/full/10.1080/07350015.2015.1102733
Belloni, A., Chernozhukov, V. and Wang, L. 2014. Pivotal estimation via square-root-lasso in nonparametric regression.
Annals of Statistics 42(2):757-788. https://doi.org/10.1214/14-AOS1204
Chernozhukov, V., Chetverikov, D. and Kato, K. 2013. Gaussian approximations and multiplier bootstrap for maxima of sums of
high-dimensional random vectors. Annals of Statistics 41(6):2786-2819. https://projecteuclid.org/euclid.aos/1387313390
Cameron, A.C., Gelbach, J.B. and D.L. Miller. Robust Inference with Multiway Clustering. Journal of Business & Economic
Statistics 29(2):238-249. https://www.jstor.org/stable/25800796. Working paper version: NBER Technical Working Paper
327, http://www.nber.org/papers/t0327.
Chernozhukov, V., Hardle, W.K., Huang, C. and W. Wang. 2018 (rev 2020). LASSO-driven inference in time and space. Working
paper. https://arxiv.org/abs/1806.05081
Correia, S. 2016. FTOOLS: Stata module to provide alternatives to common Stata commands optimized for large datasets.
https://ideas.repec.org/c/boc/bocode/s458213.html
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate
Descent. Journal of Statistical Software 33(1), 1\9622. https://doi.org/10.18637/jss.v033.i01
Fu, W.J. 1998. Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics
7(3):397-416. http://www.tandfonline.com/doi/abs/10.1080/10618600.1998.10474784
Hastie, T., Tibshirani, R. and Friedman, J. 2009. The elements of statistical learning (2nd ed.). New York:
Springer-Verlag. https://web.stanford.edu/~hastie/ElemStatLearn/
Spindler, M., Chernozhukov, V. and Hansen, C. 2016. High-dimensional metrics. https://cran.r-project.org/package=hdm.
Thompson, S.B. 2011. Simple formulas for standard errors that cluster by both firm and time. Journal of Financial
Economics 99(1):1-10. Working paper version: http://ssrn.com/abstract=914002.
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B
(Methodological) 58(1):267-288. https://doi.org/10.2307/2346178
Yamada, H. 2017. The Frisch-Waugh-Lovell Theorem for the lasso and the ridge regression. Communications in Statistics -
Theory and Methods 46(21):10897-10902. http://dx.doi.org/10.1080/03610926.2016.1252403
Website
Please check our website https://statalasso.github.io/ for more information.
Installation
rlasso is part of the lassopack package. To get the latest stable version of lassopack from our website, check the
installation instructions at https://statalasso.github.io/installation/. We update the stable website version more
frequently than the SSC version. Earlier versions of lassopack are also available from the website.
To verify that lassopack is correctly installed, click on or type whichpkg lassopack (which requires whichpkg to be
installed; ssc install whichpkg).
Acknowledgements
Thanks to Alexandre Belloni for providing Matlab code for the square-root-lasso and to Sergio Correia for advice on the use
of the FTOOLS package.
Citation of rlasso
rlasso is not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it
as such:
Ahrens, A., Hansen, C.B., Schaffer, M.E. 2018 (updated 2020). LASSOPACK: Stata module for lasso, square-root lasso, elastic
net, ridge, adaptive lasso estimation and cross-validation http://ideas.repec.org/c/boc/bocode/s458458.html
Ahrens, A., Hansen, C.B. and M.E. Schaffer. 2020. lassopack: model selection and prediction with regularized regression in
Stata. The Stata Journal, 20(1):176-235. https://journals.sagepub.com/doi/abs/10.1177/1536867X20909697. Working paper
version: https://arxiv.org/abs/1901.05397.
Authors
Achim Ahrens, Public Policy Group, ETH Zurich, Switzerland
achim.ahrens@gess.ethz.ch
Christian B. Hansen, University of Chicago, USA
Christian.Hansen@chicagobooth.edu
Mark E. Schaffer, Heriot-Watt University, UK
m.e.schaffer@hw.ac.uk
Also see
Help: lasso2, cvlasso, lassologit, pdslasso, ivlasso (if installed)