Help file: rlasso

---------------------------------------------------------------------------------------------------------
help rlasso                                                                                lassopack v1.2
---------------------------------------------------------------------------------------------------------

Title

    rlasso --  Program for lasso and sqrt-lasso estimation with data-driven penalization

Syntax

        rlasso depvar regressors [weight] [if exp] [in range] [ , sqrt partial(varlist)
              pnotpen(varlist) noconstant fe noftools robust cluster(var) center xdependent numsim(int)
              prestd tolopt(real) tolpsi(real) tolzero(real) maxiter(int) maxpsiiter(int) maxabsx
              lassopsi corrnumber(int) lambda0(real) lalternative gamma(real) c(real) supscore
              ssnumsim(int) testonly seed(real) displayall postall ols verbose vverbose ]

        Note: the fe option will take advantage of the ftools package (if installed) for the
              fixed-effects transform; the speed gains using this package can be large.  See help
              ftools or click on ssc install ftools to install.

    General options       Description
    ---------------------------------------------------------------------------------------------------
    sqrt                   use sqrt-lasso (default is standard lasso)
    noconstant             suppress constant from regression (cannot be used with aweights or pweights)
    fe                     fixed-effects model (requires data to be xtset)
    noftools               do not use FTOOLS package for fixed-effects transform (slower; rarely used)
    partial(varlist)       variables partialled-out prior to lasso estimation, including the constant
                            (if present); to partial-out just the constant, specify partial(_cons)
    pnotpen(varlist)       variables not penalized by lasso
    robust                 lasso penalty loadings account for heteroskedasticity
    cluster(var)           lasso penalty loadings account for clustering on variable var
    center                 center moments in heteroskedastic and cluster-robust loadings
    lassopsi               use lasso or sqrt-lasso residuals to obtain penalty loadings (psi) (default
                            is post-lasso)
    corrnumber(int)        number of high-correlation regressors used to obtain initial residuals;
                            default=5; if =0, then depvar is used in place of residuals
    prestd                 standardize data prior to estimation (default is standardize during
                            estimation via penalty loadings)
    seed(real)             set Stata's random number seed prior to xdep and supscore simulations
                            (default=leave state unchanged)

    Lambda                Description
    ---------------------------------------------------------------------------------------------------
    xdependent             penalty level is estimated depending on X
    numsim(int)            number of simulations used for the X-dependent case (default=5000)
    lambda0(real)          user-specified lambda0; overrides lasso default lambda =
                            2c*sqrt(N)*invnormal(1-gamma/(2*p)) (sqrt-lasso default = replace 2c with
                            c)
    lalternative           alternative (less sharp) lambda0 = 2c*sqrt(N)*sqrt(2*log(2*p/gamma))
                            (sqrt-lasso = replace 2c with c)
    gamma(real)            "gamma" in lambda0 function (default = 0.1/log(N); cluster-lasso =
                            0.1/log(N_clust))
    c(real)                "c" in lambda0 function (default = 1.1)

    Optimization          Description
    ---------------------------------------------------------------------------------------------------
    tolopt(real)           tolerance for lasso shooting algorithm (default=1e-10)
    tolpsi(real)           tolerance for penalty loadings algorithm (default=1e-4)
    tolzero(real)          minimum below which coeffs are rounded down to zero (default=1e-4)
    maxiter(int)           maximum number of iterations for the lasso shooting algorithm (default=10k)
    maxpsiiter(int)        maximum number of lasso-based iterations for penalty loadings (psi)
                            algorithm (default=2)
    maxabsx                (sqrt-lasso only) use max(abs(x_ij)) as initial penalty loadings as per
                            Belloni et al. (2014)

    Sup-score test        Description
    ---------------------------------------------------------------------------------------------------
    supscore               report sup-score test of statistical significance
    testonly               report only sup-score test; do not estimate lasso regression
    ssgamma(real)          test level for conservative critical value for the sup-score test (default =
                            0.05, i.e., 5% significance level)
    ssnumsim(int)          number of simulations for sup-score test multiplier bootstrap (default=500;
                            0 => do not simulate)

    Display and post      Description
    ---------------------------------------------------------------------------------------------------
    displayall             display full coefficient vectors including unselected variables (default:
                            display only selected, unpenalized and partialled-out)
    postall                post full coefficient vector including unselected variables in e(b)
                            (default: e(b) has only selected, unpenalized and partialled-out)
    ols                    post OLS coefs using lasso-selected variables in e(b) (default is lasso
                            coefs)
    verbose                show additional output
    vverbose               show even more output
    dots                   show dots corresponding to repetitions in simulations (xdep and supscore)
    ---------------------------------------------------------------------------------------------------

    Postestimation:

        predict [type] newvar [if] [in] [ , xb resid lasso ols ]

    predict is not currently supported after fixed-effects estimation.

    Options               Description
    ---------------------------------------------------------------------------------------------------
    xb                     generate fitted values (default)
    residuals              generate residuals
    lasso                  use lasso coefficients for prediction (default is posted e(b) matrix)
    ols                    use OLS coefficients based on lasso-selected variables for prediction
                            (default is posted e(b) matrix)
    ---------------------------------------------------------------------------------------------------

    Replay:

        rlasso [ , displayall ]

    Options               Description
    ---------------------------------------------------------------------------------------------------
    displayall             display full coefficient vectors including unselected variables (default:
                            display only selected, unpenalized and partialled-out)
    ---------------------------------------------------------------------------------------------------

    rlasso may be used with time-series or panel data, in which case the data must be tsset or xtset
    first; see help tsset or xtset.

    aweights and pweights are supported; see help weights.  pweights is equivalent to aweights +
    robust.

    All varlists may contain time-series operators or factor variables; see help varlist.


Contents

    Description
    Estimation methods
    Penalty loadings
    Sup-score test of joint significance
    Computational notes
    Examples of usage
    Saved results
    References
    Website
    Installation
    Acknowledgements
    Citation of lassopack


Description

    rlasso is a routine for estimating the coefficients of a lasso or square-root lasso (sqrt-lasso)
    regression where the lasso penalization is data-dependent and where the number of regressors p may
    be large and possibly greater than the number of observations.  The lasso (Least Absolute Shrinkage
    and Selection Operator, Tibshirani 1996) is a regression method that uses regularization and the L1
    norm.  rlasso implements a version of the lasso that allows for heteroskedastic and clustered
    errors; see Belloni et al. (2012, 2013, 2014, 2016).

    The default estimator implemented by rlasso is the lasso.  An alternative that does not involve
    estimating the error variance is the square-root-lasso (sqrt-lasso) of Belloni et al. (2011, 2014),
    available with the sqrt option.

    The lasso and sqrt-lasso estimators achieve sparse solutions:  of the full set of p predictors,
    typically most will have coefficients set to zero and only s<<p will be non-zero.  The "post-lasso"
    estimator is OLS applied to the variables with non-zero lasso or sqrt-lasso coefficients, i.e., OLS
    using the variables selected by the lasso or sqrt-lasso.  The lasso/sqrt-lasso and post-lasso
    coefficients are stored in e(beta) and e(betaOLS), respectively.  By default, rlasso posts the
    lasso or sqrt-lasso coefficients in e(b).  To post in e(b) the OLS coefficients based on lasso- or
    sqrt-lasso-selected variables, use the ols option.

Estimation methods

    rlasso solves the following problem

        min 1/N RSS + lambda/N*||Psi*beta||_1, 
        
    where

    RSS        = sum(y(i)-x(i)'beta)^2 denotes the residual sum of squares,
    beta       is a p-dimensional parameter vector,
    lambda     is the overall penalty level,
    ||.||_1    denotes the L1-norm, i.e., sum_i(abs(a[i]));
    Psi        is a p by p diagonal matrix of predictor-specific penalty loadings. Note that rlasso
                treats Psi as a row vector.
    N          number of observations

    If the option sqrt is specified, rlasso estimates the sqrt-lasso estimator, which is defined as the
    solution to:

        min sqrt(1/N*RSS) + lambda/N*||Psi*beta||_1. 

    Note: the above lambda differs from the definition used in parts of the lasso and elastic net
    literature; see for example the R package glmnet by Friedman et al. (2010).  The objective
    functions here follow the format of Belloni et al. (2011, 2012).  Specifically,
    lambda(r)=2*N*lambda(GN) where lambda(r) is the penalty level used by rlasso and lambda(GN) is the
    penalty level used by glmnet.

    rlasso obtains the solutions to the lasso sqrt-lasso using coordinate descent algorithms.  The
    algorithm was first proposed by Fu (1998) for the lasso (then referred to as "shooting").  For
    further details of how the lasso and sqrt-lasso solutions are obtained, see lasso2.

    rlasso first estimates the lasso penalty level and then uses the coordinate descent algorithm to
    obtain the lasso coefficients.  For the homoskedastic case, a single penalty level lambda is
    applied; in the heteroskedastic and cluster cases, the penalty loadings vary across regressors.
    The methods are discussed in detail in Belloni et al. (2012, 2013, 2014, 2016) and are described
    only briefly here.  For a detailed discussion of an R implementation of rlasso, see Spindler et al.
    (2016).

    For compatibility with the wider lasso literature, the documentation here uses "lambda" to refer to
    the penalty level that, combined with the possibly regressor-specific penalty loadings, is used
    with the estimation algorithm to obtain the lasso coefficients.  "lambda0" refers to the component
    of the overall lasso penalty level that does not depend on the error variance.  Note that this
    terminology differs from that in the R implementation of rlasso by Spindler et al. (2016).

    The default lambda0 for the lasso is 2c*sqrt(N)*invnormal(1-gamma/(2p)), where p is the number of
    penalized regressors and c and gamma are constants with default values of 1.1 and 0.1/log(N),
    respectively.  In the cluster-lasso (Belloni et al. 2016) the default gamma is 0.1/log(N_clust),
    where N_clust is the number of clusters (saved in e(N_clust)).  The default lambda0s for the
    sqrt-lasso are the same except replace 2c with c.  The constant c>1.0 is a slack parameter; gamma
    controls the confidence level.  The alternative formula lambda0 = 2c*sqrt(N)*sqrt(2*log(2p/gamma))
    is available with the lalt option.  The constants c and gamma can be set using the c(real) and
    gamma(real) options.  The xdep option is another alternative that implements an "X-dependent"
    penalty level lambda0; see Belloni and Chernozhukov (2011) and Belloni et al. (2013) for
    discussion.

    The default lambda for the lasso in the i.i.d. case is lambda0*rmse, where rmse is an estimate of
    the standard deviation of the error variance.  The sqrt-lasso differs from the standard lasso in
    that the penalty term lambda is pivotal in the homoskedastic case and does not depend on the error
    variance.  The default for the sqrt-lasso in the i.i.d. case is
    lambda=lambda0=c*sqrt(N)*invnormal(1-gamma/(2*p)) (note the absence of the factor of "2" vs. the
    lasso lambda).

Penalty loadings

    As is standard in the lasso literature, regressors are standardized to have unit variance.  By
    default, standardization is achieved by incorporating the standard deviations of the regressors
    into the penalty loadings.  In the default homoskedastic case, the penalty loadings are the vector
    of standard deviations of the regressors.  The normalized penalty loadings are the penalty loadings
    normalized by the SDs of the regressors.  In the homoskedastic case the normalized penalty loadings
    are a vector of 1s.  rlasso saves the vector of penalty loadings, the vector of normalized penalty
    loadings, and the vector of SDs of the regressors X in e(.) macros.

    Penalty loadings are constructed after the partialling-out of unpenalized regressors and/or the FE
    (fixed-effects) transformation, if applicable.  A alternative to partialling-out unpenalized
    regressors with the partial(varlist) option is to give them penalty loadings of zero with the
    pnotpen(varlist) option.  By the Frisch-Waugh-Lovell Theorem for the lasso (Yamada 2017), the
    estimated lasso coefficients are the same in theory (but see below) whether the unpenalized
    regressors are partialled-out or given zero penalty loadings, so long as the same penalty loadings
    are used for the penalized regressors in both cases.  Note that the calculation of the penalty
    loadings in both the partial(.) and pnotpen(.) cases involves adjustments for the partialled-out
    variables.  This is different from the lasso2 handling of unpenalized variables specified in the
    lasso2 option notpen(.), where no such adjustment of the penalty loadings is made (and is why the
    two no-penalization options are named differently).

    Regressor-specific penalty loadings for the heteroskedastic and clustered cases are derived
    following the methods described in Belloni et al. (2012, 2013, 2014, 2015, 2016).  The penalty
    loadings for the heteroskedastic-robust case have elements of the form
    sqrt[avg(x^2e^2)]/sqrt[avg(e^2)] where x is a (demeaned) regressor, e is the residual, and
    sqrt[avg(e^2)] is the root mean squared error; the normalized penalty loadings have elements
    sqrt[avg(x^2e^2)]/(sqrt[avg(x^2)]sqrt[avg(e^2)]) where the sqrt(avg(x^2) in the denominator is
    SD(x), the standard deviation of x.  This corresponds to the presentation of penalty loadings in
    Belloni et al. (2014; see Algorithm 1 but note that in their presentation, the predictors x are
    assumed already to be standardized).  NB: in the presentation we use here, the penalty loadings for
    the lasso and sqrt-lasso are the same; what differs is the overall penalty term lambda.

    The cluster-robust case is similar to the heteroskedastic case except that numerator
    sqrt[avg(x^2e^2)] in the heteroskedastic case is replaced by sqrt[avg(u_i^2)], where (using the
    notation of the Stata manual's discussion of the _robust command) u_i is the sum of x_ij*e_ij over
    the j members of cluster i; see Belloni et al. (2016).  Again in the presentation used here, the
    cluster-lasso and cluster-sqrt-lasso penalty loadings are the same.  The unit vector is again the
    benchmark for the standardized penalty loadings.  NB: also following _robust, the denominator of
    avg(u_i^2) and Tbar is (N_clust-1).

    The center option centers the x_ij*e_ij terms (or the cluster-lasso case, the u_i terms) prior to
    calculating the penalty loadings.

Sup-score test of joint significance

    rlasso with the supscore option reports a test of the null hypothesis H0: beta_1 = ... = beta_p =
    0.  i.e., a test of the joint significance of the regressors (or, alternatively, a test that H0:
    s=0; of the full set of p regressors, none is in the true model).  The test follows Chernozhukov et
    al. (2013, Appendix M); see also Belloni et al. (2012, 2013).  (The variables are assumed to be
    rescaled to be centered and with unit variance.)

    If the null hypothesis is correct and the rest of the model is well-specified (including the
    assumption that the regressors are orthogonal to the disturbance e), then E(e*x_j) =
    E((y-beta_0)*x_j) = 0, j=1...p where beta_0 is the intercept.  The sup-score statistic is
    S=sqrt(N)*max_j(abs(avg((y-b_0)*x_j))/(sqrt(avg(((y-b_0)*x_j)^2)))), where:  (a) the numerator
    abs(avg((y-b_0)*x_j)) is the absolute value of the average score for regressor x_j and b_0 is
    sample mean of y; (b) the denominator sqrt(avg(((y-b_0)*x_j)^2)) is the sample standard deviation
    of the score; (c) the statistic is sqrt(N) times the maximum across the p regressors of the ratio
    of (a) to (b).

    The p-value for the sup-score test is obtained by a multiplier bootstrap procedure simulating the
    statistic W, defined as W=sqrt(N)*max_j(abs(avg((y-b_0)*x_j*u))/(sqrt(avg(((y-b_0)*x_j)^2)))) where
    u is an iid standard normal variate independent of the data.  The ssnumsim(int) option controls the
    number of simulated draws (default=500); ssnumsim(0) requests that the sup-score statistic is
    reported without a simulation-based p-value.  rlasso also reports a conservative critical value
    (asymptotic bound) as per Belloni et al. (2012, 2013), defined as c*invnormal(1-gamma/(2p)); this
    can be set by the option ssgamma(int) (default = 0.05).  NB: in versions of rlasso prior to 1.0.09,
    the test statistic S was N*max_j rather than sqrt(N)*max_j as in Chernozhukov et al. (2013), and
    similarly for the simulated statistic W.

Computational notes

    A computational alternative to the default of standardizing "on the fly" (i.e., incorporating the
    standardization into the lasso penalty loadings) is to standardize all variables to have unit
    variance prior to computing the lasso coefficients.  This can be done using the prestd option.  The
    results are equivalent in theory.  The prestd option can lead to improved numerical precision or
    more stable results in the case of difficult problems; the cost is (a typically small) computation
    time required to standardize the data.

    Either the partial(varlist) option or the pnotpen(varlist) option can be used for variables that
    should not be penalized by the lasso.  The options are equivalent in theory (see above), but
    numerical results can differ in practice because of the different calculation methods used.
    Partialling-out variables can lead to improved numerical precision or more stable results in the
    case of difficult problems vs. specifying the variables as unpenalized, but may be slower in terms
    of computation time.

    By default the constant (if present) is not penalized if there are no regressors being partialled
    out; this is equivalent to mean-centering prior to estimation.  The exception to this is if
    aweights or aweights are specified, in which case the constant is partialled-out.  The
    partial(varlist) option will automatically also partial out the constant (if present); to partial
    out just the constant, specify partial(_cons).  The within transformation implemented by the fe
    option automatically mean-centers the data; the nocons option is redundant in this case and may not
    be specified with this option.

    The prestd and pnotpen(varlist) vs. partial(varlist) options can be used as simple checks for
    numerical stability by comparing results that should be equivalent in theory.  If the results
    differ, the values of the minimized objective functions (e(pmse) or e(prmse)) can be compared.

    The fe fixed-effects option is equivalent to (but computationally faster and more accurate than)
    specifying unpenalized panel-specific dummies.  The fixed-effects ("within") transformation also
    removes the constant as well as the fixed effects.  The panel variable used by the fe option is the
    panel variable set by xtset.  To use weights with fixed effects, the ftools must be installed.

Miscellaneous

    By default rlasso reports only the set of selected variables and their lasso and post-lasso
    coefficients; the omitted coefficients are not reported in the regression output.  The postall and
    displayall options allow the full coefficient vector (with coefficients of unselected variables set
    to zero) to be either posted in e(b) or displayed as output.

    rlasso, like the lasso in general, accommodates possibly perfectly-collinear sets of regressors.
    Stata's factor variables are supported by rlasso (as well as by lasso2).  Users therefore have the
    option of specifying as regressors one or more complete sets of factor variables or interactions
    with no base levels using the ibn prefix.  This can be interpreted as allowing rlasso to choose the
    members of the base category.

    The choice of whether to use partial(varlist) or pnotpen(varlist) will depend on the circumstances
    faced by the user.  The partial(varlist) option can be helpful in dealing with data that have
    scaling problems or collinearity issues; in these cases it can be more accurate and/or achieve
    convergence faster than the pnotpen(varlist) option.  The pnotpen(varlist) option will sometimes be
    faster because it avoids using the pre-estimation transformation employed by partial(varlist).  The
    two options can be used simultaneously (but not for the same variables).

    The treatment of standardization, penalization and partialling-out in rlasso differs from that of
    lasso2.  In the rlasso treatment, standardization incorporates the partialling-out of regressors
    listed in the pnotpen(varlist) list as well as those in the partial(varlist) list.  This is in
    order to maintain the equivalence of the lasso estimator irrespective of which option is used for
    unpenalized variables (see the discussion of the Frisch-Waugh-Lovell Theorem for the lasso above).
    In the lasso2 treatment, standardization takes place after the partialling-out of only the
    regressors listed in the notpen(varlist) option.  In other words, rlasso adjusts the penalty
    loadings for any unpenalized variables; lasso2 does not.  For further details, see lasso2.

    The initial overhead for fixed-effects estimation and/or partialling out and/or pre-estimation
    standardization (creating temporary variables and then transforming the data) can be noticable for
    large datasets.  For problems that involve looping over data, users may wish to first transform the
    data by hand.

    If a small number of correlations is set using the corrnum(int) option, users may want to increase
    the number of penalty loadings iterations from the default of 2 to something higher using the
    maxpsiiter(int) option.

    The sup-score p-value is obtained by simulation, which can be time-consuming for large datasets.
    To skip this and use only the conservative (asymptotic bound) critical value, set the number of
    simulations to zero with the ssnumsim(0) option.

Examples using prostate cancer data from Hastie et al. (2009)

    Load prostate cancer data.
        . clear
        . insheet using https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data, tab

    Estimate lasso using data-driven lambda penalty; default homoskedasticity case.
        . rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45

    Use square-root lasso instead.
        . rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt

    Illustrate relationships between lambda, lambda0 and penalty loadings:

    Basic usage: homoskedastic case, lasso
        . rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45
    lambda=lambda0*SD is lasso penalty; incorporates the estimate of the error variance
    default lambda0 is 2c*sqrt(N)*invnormal(1-gamma/(2*p))
        . di e(lambda)
        . di e(lambda0)
    In the homoskedastic case, penalty loadings are the vector of SDs of penalized regressors
        . mat list e(ePsi)
    ...and the standardized penalty loadings are a vector of 1s.
        . mat list e(sPsi)

    Heteroskedastic case, lasso
        . rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, robust
    lambda and lambda0 are the same as for the homoskedastic case
        . di e(lambda)
        . di e(lambda0)
    Penalty loadings account for heteroskedasticity as well as incorporating SD(x)
        . mat list e(ePsi)
    ...and the standardized penalty loadings are not a vector of 1s.
        . mat list e(sPsi)

    Homoskedastic case, sqrt-lasso
        . rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt
    with the sqrt-lasso, the default lambda=lambda0=c*sqrt(N)*invnormal(1-gamma/(2*p));
    note the difference by a factor of 2 vs. the standard lasso lambda0
        . di e(lambda)
        . di e(lambda0)

    rlasso vs. lasso2 (if installed)
        . rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45
    lambda=lambda0*SD is lasso penalty; incorporates the estimate of the error variance
    default lambda0 is 2c*sqrt(N)*invnormal(1-gamma/(2*p))
        . di %8.5f e(lambda)
    Replicate rlasso estimates using rlasso lambda and lasso2
        . lasso2 lpsa lcavol lweight age lbph svi lcp gleason pgg45, lambda(44.34953)

Examples using data from Acemoglu-Johnson-Robinson (2001)

    Load and reorder AJR data for Table 6 and Table 8 (datasets need to be in current directory).
        . clear
        . (click to download maketable6.zip from economics.mit.edu)
        . unzipfile maketable6
        . (click to download maketable8.zip from economics.mit.edu)
        . unzipfile maketable8
        . use maketable6
        . merge 1:1 shortnam using maketable8
        . keep if baseco==1
        . order shortnam logpgp95 avexpr lat_abst logem4 edes1975 avelf, first
        . order indtime euro1900 democ1 cons1 democ00a cons00a, last

    Alternatively, load AJR data from our website (no manual download required):
        . clear
        . use https://statalasso.github.io/dta/AJR.dta

    Basic usage:
        . rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres

    Heteroskedastic-robust penalty loadings:
        . rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, robust

    Partialling-out vs. non-penalization:
        . rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, partial(lat_abst)
        . rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, pnotpen(lat_abst)

    Request sup-score test (H0: all betas=0):
        . rlasso logpgp95 lat_abst edes1975 avelf temp* humid* steplow-oilres, supscore

Examples using data from Angrist-Krueger (1991)

    Load AK data and rename variables (dataset needs to be in current directory).  NB: this is a large
    dataset (330k observations) and estimations may take some time to run on some installations.
        . clear
        . (click to download asciiqob.zip from economics.mit.edu)
        . unzipfile asciiqob.zip
        . infix lnwage 1-9 edu 10-20 yob 21-31 qob 32-42 pob 43-53 using asciiqob.txt

    Alternatively, get data from our website source (no unzipping needed):
        . use https://statalasso.github.io/dta/AK91.dta

    xtset data by place of birth (state):
        . xtset pob

    State (place of birth) fixed effects; regressors are year of birth, quarter of birth and QOBxYOB.
        . rlasso edu i.yob# #i.qob, fe

    As above but explicit penalized state dummies and all categories (no base category) for all factor
    vars.
    Note that the (unpenalized) constant is reported.
        . rlasso edu ibn.yob# #ibn.qob ibn.pob

    State fixed effects; regressors are YOB, QOB and QOBxYOB; cluster on state.
        . rlasso edu i.yob# #i.qob, fe cluster(pob)

Example using data from Belloni et al. (2015)

    Load dataset on eminent domain (available at journal website).
        . clear
        . import excel using CSExampleData.xlsx, first

    Settings used in Belloni et al. (2015) - results as in text discussion (p=147):
        . rlasso NumProCase Z* BA BL DF, robust lalt corrnum(0) maxpsiiter(100)
        . di e(p)

    Settings used in Belloni et al. (2015) - results as in journal replication file (p=144):
        . rlasso NumProCase Z*, robust lalt corrnum(0) maxpsiiter(100)
        . di e(p)


Saved results

    rlasso saves the following in e():

    scalars       
      e(N)               sample size
      e(N_clust)         number of clusters in cluster-robust estimation
      e(N_g)             number of groups in fixed-effects model
      e(p)               number of penalized regressors in model
      e(s)               number of selected regressors
      e(s0)              number of selected and unpenalized regressors including constant (if present)
      e(lambda0)         penalty level excluding rmse (default = 2c*sqrt(N)*invnormal(1-gamma/(2*p)))
      e(lambda)          lasso: penalty level including rmse (=lambda0*rmse); sqrt-lasso:
                           lambda=lambda0
      e(slambda)         standardized lambda; equiv to lambda used on standardized data; lasso:
                           slambda=lambda/SD(depvar); sqrt-lasso: slambda=lambda0
      e(c)               parameter in penalty level lambda
      e(gamma)           parameter in penalty level lambda
      e(niter)           number of iterations for shooting algorithm
      e(maxiter)         max number of iterations for shooting algorithm
      e(npsiiter)        number of iterations for loadings algorithm
      e(maxpsiiter)      max iterations for loadings algorithm
      e(rmse)            rmse using lasso resduals
      e(rmseOLS)         rmse using post-lasso residuals
      e(pmse)            minimized objective function (penalized mse, standard lasso only)
      e(prmse)           minimized objective function (penalized rmse, sqrt-lasso only)
      e(cons)            =1 if constant in model, =0 otherwise
      e(fe)              =1 if fixed-effects model, =0 otherwise
      e(center)          =1 if moments have been centered
      e(supscore)        sup-score statistic
      e(supscore_p)      sup-score p-value
      e(supscore_cv)     sup-score critical value (asymptotic bound)

    macros        
      e(cmd)             rlasso
      e(depvar)          name of dependent variable
      e(varX)            all regressors
      e(varXmodel)       penalized regressors
      e(pnotpen)         unpenalized regressors
      e(partial)         partialled-out regressors
      e(selected)        selected and penalized regressors
      e(selected0)       all selected regressors including unpenalized and constant (if present)
      e(method)          lasso or sqrt-lasso
      e(estimator)       lasso, sqrt-lasso or post-lasso ols posted in e(b)
      e(robust)          heteroskedastic-robust penalty loadings
      e(clustvar)        variable defining clusters for cluster-robust penalty loadings
      e(ivar)            variable defining groups for fixed-effects model

    matrices      
      e(b)               posted coefficient vector
      e(beta)            lasso or sqrt-lasso coefficient vector
      e(betaOLS)         post-lasso coefficient vector
      e(betaAll)         full lasso or sqrt-lasso coefficient vector including omitted, factor base
                           variables, etc.
      e(betaAllOLS)      full post-lasso coefficient vector including omitted, factor base variables,
                           etc.
      e(ePsi)            estimated penalty loadings
      e(sPsi)            standardized penalty loadings (vector of 1s in homoskedastic case

    functions     
      e(sample)          estimation sample


References

    Acemoglu, D., Johnson, S. and Robinson, J.A. 2001.  The colonial origins of comparative
        development: An empirical investigation.  American Economic Review, 91(5):1369-1401.
        https://economics.mit.edu/files/4123

    Angrist, J. and Kruger, A. 1991.  Does compulsory school attendance affect schooling and earnings?
        Quarterly Journal of Economics 106(4):979-1014.  http://www.jstor.org/stable/2937954

    Belloni, A. and Chernozhukov, V. 2011.  High-dimensional sparse econometric models: An
        introduction.  In Alquier, P., Gautier E., and Stoltz, G. (eds.), Inverse problems and
        high-dimensional estimation.  Lecture notes in statistics, vol. 203.  Springer, Berlin,
        Heidelberg.  https://arxiv.org/pdf/1106.5242.pdf

    Belloni, A., Chernozhukov, V. and Wang, L. 2011.  Square-root lasso: Pivotal recovery of sparse
        signals via conic programming.  Biometrika 98:791-806.  https://doi.org/10.1214/14-AOS1204

    Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. 2012.  Sparse models and methods for optimal
        instruments with an application to eminent domain.  Econometrica 80(6):2369-2429.  
        http://onlinelibrary.wiley.com/doi/10.3982/ECTA9626/abstract

    Belloni, A., Chernozhukov, V. and Hansen, C. 2013.  Inference for high-dimensional sparse
        econometric models.  In Advances in Economics and Econometrics: 10th World Congress, Vol. 3:
        Econometrics, Cambridge University Press: Cambridge, 245-295.  http://arxiv.org/abs/1201.0220

    Belloni, A., Chernozhukov, V. and Hansen, C. 2014.  Inference on treatment effects after selection
        among high-dimensional controls.  Review of Economic Studies 81:608-650.  
        https://doi.org/10.1093/restud/rdt044

    Belloni, A., Chernozhukov, V. and Hansen, C. 2015.  High-dimensional methods and inference on
        structural and treatment effects.  Journal of Economic Perspectives 28(2):29-50.  
        http://www.aeaweb.org/articles.php?doi=10.1257/jep.28.2.29

    Belloni, A., Chernozhukov, V., Hansen, C. and Kozbur, D. 2016.  Inference in high dimensional panel
        models with an application to gun control.  Journal of Business and Economic Statistics
        34(4):590-605.  http://amstat.tandfonline.com/doi/full/10.1080/07350015.2015.1102733

    Belloni, A., Chernozhukov, V. and Wang, L. 2014.  Pivotal estimation via square-root-lasso in
        nonparametric regression.  Annals of Statistics 42(2):757-788.  
        https://doi.org/10.1214/14-AOS1204

    Chernozhukov, V., Chetverikov, D. and Kato, K. 2013.  Gaussian approximations and multiplier
        bootstrap for maxima of sums of high-dimensional random vectors.  Annals of Statistics
        41(6):2786-2819.  https://projecteuclid.org/euclid.aos/1387313390

    Correia, S. 2016.  FTOOLS: Stata module to provide alternatives to common Stata commands optimized
        for large datasets.  https://ideas.repec.org/c/boc/bocode/s458213.html

    Friedman, J., Hastie, T., & Tibshirani, R. (2010).  Regularization Paths for Generalized Linear
        Models via Coordinate Descent.  Journal of Statistical Software 33(1), 1\9622.  
        https://doi.org/10.18637/jss.v033.i01

    Fu, W.J.  1998.  Penalized regressions: The bridge versus the lasso.  Journal of Computational and
        Graphical Statistics 7(3):397-416.  
        http://www.tandfonline.com/doi/abs/10.1080/10618600.1998.10474784

    Hastie, T., Tibshirani, R. and Friedman, J. 2009.  The elements of statistical learning (2nd ed.).
        New York: Springer-Verlag.  https://web.stanford.edu/~hastie/ElemStatLearn/

    Spindler, M., Chernozhukov, V. and Hansen, C. 2016.  High-dimensional metrics.
        https://cran.r-project.org/package=hdm.

    Tibshirani, R. 1996.  Regression shrinkage and selection via the lasso.  Journal of the Royal
        Statistical Society. Series B (Methodological) 58(1):267-288.  https://doi.org/10.2307/2346178

    Yamada, H. 2017.  The Frisch-Waugh-Lovell Theorem for the lasso and the ridge regression.
        Communications in Statistics - Theory and Methods 46(21):10897-10902.  
        http://dx.doi.org/10.1080/03610926.2016.1252403

Website

    Please check our website https://statalasso.github.io/ for more information.

Installation

    To get the latest stable version of lassopack from our website, check the installation instructions
    at https://statalasso.github.io/installation/.  We update the stable website version more
    frequently than the SSC version.

    To verify that lassopack is correctly installed, click on or type whichpkg lassopack (which
    requires whichpkg to be installed; ssc install whichpkg).

Acknowledgements

    Thanks to Alexandre Belloni for providing Matlab code for the square-root-lasso and to Sergio
    Correia for advice on the use of the FTOOLS package.


Citation of rlasso

    rlasso is not an official Stata command. It is a free contribution to the research community, like
    a paper. Please cite it as such:

    Ahrens, A., Hansen, C.B., Schaffer, M.E. 2018.  rlasso: Progam for lasso and sqrt-lasso estimation
        with data-driven penalization.  http://ideas.repec.org/c/boc/bocode/s458458.html


Authors

        Achim Ahrens, Economic and Social Research Institute, Ireland
        achim.ahrens@esri.ie
        
        Christian B. Hansen, University of Chicago, USA
        Christian.Hansen@chicagobooth.edu

        Mark E Schaffer, Heriot-Watt University, UK
        m.e.schaffer@hw.ac.uk


Also see

       Help:  lasso2, cvlasso, pdslasso, ivlasso (if installed)