Help file
help pdslasso, help ivlasso                                                                                          pdslasso v1.3


    pdslasso and ivlasso --
                 Programs for post-selection and post-regularization OLS or IV estimation and inference


        pdslasso depvar regressors (hd_controls) [weight] [if exp] [in range] [ , partial(varlist) pnotpen(varlist)
              psolver(string) aset(varlist) post(method) robust cluster(varlist) bw(int) kernel(string) fe noftools
              rlasso[(name)] sqrt noisily loptions(options) olsoptions(options) noconstant ]

        ivlasso depvar regressors [(hd_controls)] (endog=instruments) [if exp] [in range] [ , partial(varlist) pnotpen(varlist)
              psolver(string) aset(varlist) post(method) robust cluster(varlist) bw(int) kernel(string) fe noftools
              rlasso[(name)] sqrt noisily loptions(options) ivoptions(options) first idstats sscset ssgamma(real)
              ssgridmin(real) ssgridmax(real) ssgridpoints(integer 100) ssgridmat(name) noconstant ]

        Note: pdslasso requires rlasso and ivreg2 to be installed; ivlasso also requires ranktest.  See help rlasso, help ivreg2
              and help ranktest or click on ssc install lassopack or ssc install ranktest to install.

        Note: the fe option will take advantage of the ftools package (if installed) for the fixed-effects transform; the speed
              gains using this package can be large.  See help ftools or click on ssc install ftools to install.

        Note: ivlasso also supports the simpler pdslasso syntax.

    Options               Description
    partial(varlist)       controls and instruments to be partialled-out prior to lasso estimation
    pnotpen(varlist)       controls and instruments always included, not penalized by lasso
    aset(varlist)          controls and instruments in amelioration set, always included in post-lasso
    post(method)           pds, lasso or plasso; which estimation results are to be posted in e(b) and e(V)
    robust                 heteroskedastic-robust VCE; lasso penalty loadings account for heteroskedasticity
    cluster(varlist)       cluster-robust VCE; lasso penalty loadings account for clustering; both standard (1-way) and 2-way
                            clustering supported
    bw(int)                HAC/AC VCE; lasso penalty loadings account for autocorrelation (AC) using bandwidth int; use with
                            robust to account for both heteroskedasticity and autocorrelation (HAC)
    kernel(string)         kernel used for HAC/AC penalty loadings (one of: bartlett, truncated, parzen, thann, thamm, daniell,
                            tent, qs; default=bartlett)
    fe                     fixed-effects model (requires data to be xtset)
    noftools               do not use FTOOLS package for fixed-effects transform (slower; rarely used)
    rlasso[(name)]         store and display intermediate lasso and post-lasso results from rlasso with optional prefix name (if
                            just rlasso is specified the default prefix is _ivlasso_ or _pdslasso_)
    sqrt                   use sqrt-lasso instead of standard lasso
    noisily                display step-by-step intermediate rlasso estimation results
    loptions(options)      lasso options specific to rlasso estimation; see help rlasso
    olsoptions(options)    (pdslasso only) options specific to PDS OLS estimation of structural equation
    ivoptions(options)     (ivlasso only) options specific to PDS OLS or IV estimation of structural equation
    first                  (ivlasso only) display and store first-stage results for 2SLS
    idstats                (ivlasso only) request weak-identification statistics for 2SLS
    noconstant             suppress constant from regression (cannot be used with aweights or pweights)
    psolver(string)        override default solver used for partialling out (one of: qr, qrxx, lu, luxx, svd, svdxx, chol;

    Sup-score test        Description
    (ivlasso only)        
    sscset                 request sup-score weak-identification-robust confidence set
    ssgamma(real)          significance level for sup-score weak-identification-robust tests and confidence intervals
                            (default=0.05, 5%)
    ssgridmin(real)        minimum value for grid search for sup-score weak-identification-robust confidence intervals
                            (default=grid centered at OLS estimate)
    ssgridmax(real)        maximum value for grid search for sup-score weak-identification-robust confidence intervals
                            (default=grid centered at OLS estimate)
    ssgridpoints(real)     number of points in grid search for sup-score weak-identification-robust confidence intervals
    ssgridmat(name)        user-supplied Stata r x k matrix of r jointly hypothesized values for the k endogenous regressors to
                            be tested using the sup-score test
    ssomitgrid(name)       supress display of sup-score test results with user-supplied grid
    ssmethod(name)         "abound" (default) = use conservative critical value (asymptotic bound)
                            c*sqrt(N)*invnormal(1-gamma/(2p)); "simulate" = simulate distribution to obtain p-values for
                            sup-score test; "select" = reject if rlasso selects any instruments


        predict [type] newvar [if] [in] [, resid xb ]

    pdslasso and ivlasso may be used with time-series or panel data, in which case the data must be tsset or xtset first; see
    help tsset or xtset.

    aweights and pweights are supported; see help weights.  pweights is equivalent to aweights + robust.

    All varlists may contain time-series operators or factor variables; see help varlist.


    Computational notes
    Examples of usage
    Saved results
    Citation of pdslasso and ivlasso


    pdslasso and ivlasso are routines for estimating structural parameters in linear models with many controls and/or
    instruments.  The routines use methods for estimating sparse high-dimensional models, specifically the lasso (Least Absolute
    Shrinkage and Selection Operator, Tibshirani 1996) and the square-root-lasso (Belloni et al. 2011, 2014).

    pdslasso is used for the case where a researcher has an outcome variable y, a structural or causal variable of interest d,
    and a large set of potential control variables x1, x2, x3, ....  The usage in this case is:

        pdslasso y d (x1 x2 x3 ...)

    pdslasso accepts multiple causal variables, e.g.:

        pdslasso y d1 d2 (x1 x2 x3 ...)

    Important: The high-dimensional controls must be included within the parentheses (...).  If this is not done, they are
    treated as causal rather than as controls.

    The problem the researcher faces is that the "right" set of controls is not known.  In traditional practice, this presents
    her with a difficult choice:  use too few controls, or the wrong ones, and omitted variable bias will be present; use too
    many, and the model will suffer from overfitting.  The methods implemented in pdslasso address this problem by selecting
    enough controls to address the former problem but not so many as to introduce the latter.

    ivlasso is used for the case where a researcher has an endogenous causal variable of interest e, and a large set of
    potential instruments {it:z1, z2, z3, ...).

    The usage in this case is:

        ivlasso y (e = z1 z2 z3 ...)

    ivlasso accepts multiple causal variables, e.g.:

        pdslasso y (e1 e2 = z1 z2 z3 ...)

    ivlasso also allows combinations of exogenous and endogenous causal variables (d, e) and high-dimensional controls and
    instruments (x, z), e.g.:

        pdslasso y d (x1 x2 x3 ...) (e = z1 z2 z3 ...)

    Two approaches are implemented in pdslasso and ivlasso:

          1. The "post-double-selection" (PDS) methodology of Belloni et al. (2012, 2013, 2014, 2015, 2016), denoted "PDS
               methodology" below.

          2. The "post-regularization" (or "double-orthogonalization") methodology of Chernozhukov, Hansen and Spindler (2015),
               denoted "CHS methodology" below.

    The implemention of these methods in pdslasso and ivlasso uses the separate Stata program rlasso, which provides lasso and
    sqrt-lasso estimation with data-driven penalization; see rlasso for details.  For an overview of rlasso and the theory
    behind it, see Ahrens et al. (2020)

    The PDS methodology uses the lasso estimator to select the controls.  Specifically, the lasso is used twice:  (1) estimate a
    lasso regression with y as the dependent variable and the control variables x1, x2, x3, ... as regressors; (2) estimate a
    lasso regression with d as the dependent variable and again the control variables x1, x2, x3, ... as regressors.  The lasso
    estimator achieves a sparse solution, i.e., most coefficients are set to zero.  The final choice of control variables to
    include in the OLS regression of y on d is the union of the controls selected selected in steps (1) and (2), hence the name
    "post-double selection" for the methodolgy.  The PDS methodology can be employed to select instruments as well as controls
    in instrumental variables estimation.

    The CHS methodology is closely related.  Instead of using the lasso-selected controls and instruments in a
    post-regularization OLS or IV estimation, the selected variables are used to construct orthogonalized versions of the
    dependent variable, the exogenous and/or endogenous causal variables of interest and to construct optimal instruments from
    the lasso-selected IVs.  The orthogonalized versions are based either on the lasso or post-lasso estimated coefficients; the
    post-lasso is OLS applied to lasso-selected variables.  See Chernozhukov et al. (2015) for details.

    The set of variables selected by the lasso and used in the OLS post-lasso estimation and in the PDS structural estimation
    can be augmented by variables that were penalized but not selected by the lasso.  The penalized variables that are used in
    this way to augment the post-lasso and PDS estimations are called the "amelioration set" and can be specified with the
    aset(varlist) option.  This option affects only the CHS post-lasso-based and PDS estimations; the CHS lasso-based
    orthogonalized variables are unaffected.  See Chernozhukov et al. (2014) for details.

    pdslasso and ivlasso report the PDS-based and the two (lasso and post-lasso) CHS-based estimations.  If the sqrt option is
    specified, instead of the lasso the sqrt-lasso estimator is used; see rlasso for further details and references.

    If the IV model is weakly identified (the instruments are only weakly correlated with the endogenous regressors) Belloni et
    al. (2012, 2013) suggest using weak-identification-robust hypothesis tests and confidence sets based the Chernozhukov et al.
    (2013) sup-score test.  The intuition behind the sup-score test is similar to that of the Anderson-Rubin (1949) test.
    Consider the simplest case (a single endogenous regressor d and no exogenous regressors or controls) where the null
    hypothesis is that the coefficient on d is H0:beta=b0.  If the null is true, then the structural residual is simply
    e=y-b0*d.  Under the additional assumption that the instruments are valid (orthogonal to the true disturbance), they should
    be uncorrelated with e.

    The sup-score tests reported by ivlasso are in effect high-dimensional versions of the Anderson-Rubin test.  The test is
    implemented in rlasso; see help rlasso for details.  Specifically, ivlasso reports sup-score tests of statistical
    significance of the instruments where the dependent variable is e=y-b0*d, the instruments are regressors, and b0 is a
    hypothesized value of the coefficient on d; a large test statistic indicates rejection of the null H0:beta=b0.  The default
    is to use a conservative (asymptotic bound) critical value as suggested by Belloni et al. (2012, 2013) (option
    ssmethod(abound)).  Alternative methods are to use p-values obtained by simulation via a multiplier bootstrap (option
    ssmethod(simulate)), or to estimate a lasso regression with the instruments as regressors, and if (no) instruments are
    selected we (fail to) reject the null H0:beta=b0 at the gamma significance level (option ssmethod(select)).

    A 100*(1-gamma)% sup-score-based confidence set can be constructed by a grid search over the range of hypothesized values of
    beta.  ivlasso reports the result of the sup-score test of the null H0:beta=0 with the idstats option, and in addition, for
    the single endogenous regressor case only, reports sup-score confidence sets with the sscset option.  For the
    multiple-endogenous regressor case, sets of jointly hypothesized values for the componets of beta can be tested using the
    ssgridmat(name) option.  The matrix provided in the option should be an r x k Stata matrix, where each row contains a set of
    values that together specify a null hypothesis for the coefficients of the k endogenous regressors.  This option allows the
    user to specify a grid search in multiple dimensions.

Computational notes

    The various options available for the underlying calls to rlasso can be controlled via the option loptions(rlasso option
    list).  The rlasso option center, to center moments in heteroskedastic and cluster-robust loadings, will be a
    commonly-employed option.  This can be specified by lopt(center).

    Another rlasso option that may often be used is to "pre-standardize" the data to have unit variance prior to computing the
    lasso coefficients with the prestd option.  This is a computational alternative to the rlasso default of standardizing "on
    the fly" (i.e., incorporating the standardization into the lasso penalty loadings).  This is specified by lopt(prestd).  The
    results are equivalent in theory.  The prestd option can lead to improved numerical precision or more stable results in the
    case of difficult problems; the cost is (a typically small) computation time required to standardize.

    rlasso implements a version of the lasso with data-dependent penalization and, for the heteroskedastic and clustered cases,
    regressor-specific penalty loadings; see rlasso for details.  Note that specification of robust or cluster(.) as options to
    pdslasso or ivlasso automatically implies the use of robust or cluster-robust lasso penalty loadings.  Penalty loadings and
    VCE type can be separately controlled via the olsoptions(.) (for pdslasso) or ivoptions(.) (for ivlasso) vs. loptions(rlasso
    option list); for example, olsoptions(cluster(clustvar)) + loptions(robust) would use heteroskedastic-robust penalty
    loadings for the lasso estimations and a cluster-robust covariance estimator for the PDS and CHS estimations of the
    structural equation.

    Either the partial(varlist) option or the pnotpen(varlist) option can be used for variables that should not be penalized by
    the lasso.  By the Frisch-Waugh-Lovell Theorem for the lasso (Yamada 2017), the estimated lasso coefficients are the same in
    theory whether the unpenalized regressors are partialled-out or given zero penalty loadings, so long as the same penalty
    loadings are used for the penalized regressors in both cases.  Although the options are equivalent in theory, numerical
    results can differ in practice because of the different calculation methods used; see rlasso for further details.  The
    constant, if present, is always unpenalized or partialled-out By default the constant (if present) is not penalized if there
    are no regressors being partialled out; this is equivalent to mean-centering prior to estimation.  The exception to this is
    if aweights or aweights are specified, in which case the constant is partialled-out.  The partial(varlist) option always
    partials out the constant (if present) along with the variables specified in varlist; to partial out just the constant,
    specify partial(_cons).  Partialling-out of controls is done by ivlasso; partialling-out of instruments is done in the lasso
    estimation by rlasso.

    Partialling-out is implemented in Mata using one of Mata's solvers.  In cases where the variables to be partialled out are
    collinear or nearly so, different solvers may generate different results.  Users may wish to check the stability of their
    results in such cases.  The psolver(.) option can be used to specify the Mata solver used.  The default behavior for solving
    AX=B for X is to use the QR decomposition applied to (A'A) and (A'B), i.e., qrsolve((A'A),(A'B)), abbreviated qrxx.
    Available options are qr, qrxx, lu, luxx, svd, svdxx, where, e.g., svd indicates using svsolve(A,B) and svdxx indicates
    using svsolve((A'A),(A'B)).  pdslasso/ivlasso will warn if collinear variables are dropped when partialling out.

    The lasso and sqrt-lasso estimations are obtained via numerical methods (coordinate descent).  Results can be unstable for
    difficult problems (e.g., if the scaling of variables covers a wide range of magnitudes).  Using variables that are all
    measured on a similar scale will help (as usual).  Partialling-out variables is usually preferable to specifying them as
    unpenalized.  See rlasso for discussion of the various options for controlling the numerical methods used.

    The sup-score-based tests reported by ivlasso come in three versions:  (a) using lasso-orthogonalized variables, where the
    variables have first been orthogonalized with respect to the high-dimensional controls using the lasso; (b) using
    post-lasso-orthogonalized variables; (c) using the variables without any orthogonalization.  The orthogonalizations use the
    same lasso settings as in the main estimation.  After orthgonalization, e~ = y~ - b0*d~ is constructed (where a tilde
    indicates an orthogonalized variable), and then the sup-score test is conducted using e~ and the instruments.  Versions (a)
    and (b) are not reported if there are no high-dimensional controls.  Version (c) is available if there are high-dimensional
    controls but only if the method(select) option is used.  The sup-score-based tests are not available if the specification
    also includes either exogenous causal regressors or unpenalized instruments.

    For large datasets, obtaining the p-value for the sup-score test by simulation (multiplier bootstrap, ssmethod(simulate)
    option) can be time-consuming.  In such cases, using the default method of a conservative (asymptotic bound) critical value
    (ssmethod(abound) option) will be much faster.

    The grid search to construct the sup-score confidence set can be controlled by the ssgridmin, ssgridmax and ssgridpoints
    options.  If these options are not specified by the user, a 100-point grid centered on the OLS estimator is used.

    The fe fixed-effects option is equivalent to (but computationally faster and more accurate than) specifying unpenalized
    panel-specific dummies.  The fixed-effects ("within") transformation also removes the constant as well as the fixed effects.
    The panel variable used by the fe option is the panel variable set by xtset.

    rlasso, like the lasso in general, accommodates possibly perfectly-collinear sets of regressors.  Stata's factor variables
    are supported by rlasso.  Users therefore have the option of specifying as high-dimensional controls or instruments one or
    more complete sets of factor variables or interactions with no base levels using the ibn prefix.  This can be interpreted as
    allowing the lasso to choose the members of the base category.

    For a detailed discussion of an R implementation of this methodology, see Spindler et al. (2016).

Examples using data from Acemoglu-Johnson-Robinson (2001)

    Load and reorder AJR data for Table 6 and Table 8 (datasets need to be in current directory).
        . clear
        . (click to download from
        . unzipfile maketable6
        . (click to download from
        . unzipfile maketable8
        . use maketable6
        . merge 1:1 shortnam using maketable8
        . keep if baseco==1
        . order shortnam logpgp95 avexpr lat_abst logem4 edes1975 avelf, first
        . order indtime euro1900 democ1 cons1 democ00a cons00a, last

    Alternatively, load AJR data from our website (no manual download required):
        . clear
        . use

    Examples with exogenous regressors:

    Replicate OLS results in Panel C, col. 9.
        . reg logpgp95 avexpr lat_abst edes1975 avelf temp* humid* steplow-oilres

    Basic usage: select from high-dim controls.
        . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres)

    As above, hetoroskedastic-robust.
        . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), rob

    Specify that latitude is an unpenalized control to be partialled out.
        . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), partial(lat_abst)

    Specify that latitude is an unpenalized control using the notpen option (equivalent).
        . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), pnotpen(lat_abst)

    Specify that latitude is in the amelioration set.
        . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), aset(lat_abst)

    Example with endogenous regressor, high-dimensional controls and low-dimensional instrument:

    Replicate IV results in Panels A & B, col. 9.
        . ivreg logpgp95 (avexpr=logem4) lat_abst edes1975 avelf temp* humid* steplow-oilres, first

    Select controls; specify that logem4 is an unpenalized instrument to be partialled out.
        . ivlasso logpgp95 (avexpr=logem4) (lat_abst edes1975 avelf temp* humid* steplow-oilres), partial(logem4)

    Example with endogenous regressor and high-dimensional instruments and controls:

    Select controls and instruments; specify that logem4 is an unpenalized instrument and lat_abst is an unpenalized control;
    request weak identification stats and first-stage results.
        . ivlasso logpgp95 (lat_abst edes1975 avelf temp* humid* steplow-oilres) (avexpr=logem4 euro1900-cons00a),
            partial(logem4 lat_abst) idstats first

    Replay first-stage estimation. (Can also use est restore to make this the current estimation results.)
        . est replay _ivlasso_avexpr

    Select controls and instruments; specify that lat_abst is an unpenalized control; request weak identification stats and
    sup-score confidence sets.
        . ivlasso logpgp95 (lat_abst edes1975 avelf temp* humid* steplow-oilres) (avexpr=logem4 euro1900-cons00a),
            partial(lat_abst) idstats sscset

    As above but heteroskedastic-robust and use grid options to control grid search and test level; also set seed in rlasso
    options to make multiplier-bootstrap p-values replicable.
        . ivlasso logpgp95 (lat_abst edes1975 avelf temp* humid* steplow-oilres) (avexpr=logem4 euro1900-cons00a),
            partial(lat_abst) rob idstats sscset ssgridmin(0) ssgridmax(2) ssgamma(0.1) lopt(seed(1))

Examples using data from Angrist-Krueger (1991)

    Load AK data and rename variables (dataset needs to be in current directory).  NB: this is a large dataset (330k
    observations) and estimations may take some time to run on some installations.
        . clear
        . (click to download from
        . unzipfile
        . infix lnwage 1-9 edu 10-20 yob 21-31 qob 32-42 pob 43-53 using asciiqob.txt

    Alternative source (no unzipping needed):
        . use

    xtset data by place of birth (state):
        . xtset pob

    Table VII (1930-39) col 2. Year and state of birth = yob & pob.
        . ivregress 2sls lnwage i.pob i.yob (edu=i.qob i.yob#i.qob i.pob#i.qob)

    Fixed effects; select year controls and IVs; IVs are QOB and QOBxYOB.
        . ivlasso lnwage (i.yob) (edu=i.qob i.yob#i.qob), fe

    Fixed effects; select year controls and IVs; IVs are QOB, QOBxYOB, QOBxSOB.
        . ivlasso lnwage (i.yob) (edu=i.qob i.yob#i.qob i.pob#i.qob), fe

    All dummies & interactions incl. base levels.
        . ivlasso lnwage (i.yob) (edu=ibn.qob ibn.yob#ibn.qob ibn.pob#ibn.qob), fe

Example using data from Belloni et al. (2015)

    Load dataset on eminent domain (available at journal website).
        . clear
        . import excel using, first

    Settings used in Belloni et al. (2015) - results as in journal replication file (not text)
    (Includes use of undocumented rlasso option c0(real) to control initial penalty loadings.)
    Store rlasso intermediate results for replay later.
        . ivlasso CSIndex (NumProCase = Z*), nocons robust rlasso lopt(lalt corrnum(0) maxpsiiter(100) c0(0.55))
        . estimates replay _ivlasso_step5_NumProCase

Saved results

    ivlasso saves the following in e():

      e(N)               sample size
      e(xhighdim_ct)     number of all high-dimensional controls
      e(zhighdim_ct)     number of all high-dimensional instruments
      e(N_clust)         number of clusters in cluster-robust estimation; in the case of 2-way cluster-robust,
      e(N_g)             number of groups in fixed-effects model
      e(bw)              (HAC/AC only) bandwidth used
      e(ss_gamma)        significance level in sup-score tests and CIs
      e(ss_level)        test level in % in sup-score tests and CIs (=100*(1-gamma))
      e(ss_gridmin)      min grid point in sup-score CI
      e(ss_gridmax)      max grid point in sup-score CI
      e(ss_gridpoints)   number of grid points in sup-score CI

      e(cmd)             pdslasso or ivlasso
      e(depvar)          name of dependent variable
      e(dexog)           name(s) of exogenous structural variable(s)
      e(dendog)          name(s) endogenous structural variable(s)
      e(xhighdim)        names of high-dimensional control variables
      e(zhighdim)        names of high-dimensional instruments
      e(method)          lasso or sqrt-lasso
      e(kernel)          (HAC/AC only) kernel used
      e(ss_null)         result of sup-score test (reject/fail to reject)
      e(ss_null_l)       result of lasso-orthogonalized sup-score test (reject/fail to reject)
      e(ss_null_pl)      result of post-lasso-orthogonalized sup-score test (reject/fail to reject)
      e(ss_cset)         confidence interval for sup-score test
      e(ss_cset_l)       confidence interval for lasso-orthogonalized sup-score test
      e(ss_cset_pl)      confidence interval for post-lasso-orthogonalized sup-score test
      e(ss_method)       simulate, abound or select

      e(b)               posted coefficient vector
      e(V)               posted variance-covariance matrix
      e(beta_pds)        PDS coefficient vector
      e(V_pds)           PDS variance-covariance matrix
      e(beta_lasso)      CHS lasso-based coefficient vector
      e(V_lasso)         CHS lasso-based variance-covariance matrix
      e(beta_plasso)     CHS post-lasso-based coefficient vector
      e(V_plasso)        CHS post-lasso-based variance-covariance matrix
      e(ss_citable)      sup-score test results used to construct confidence sets
      e(ss_gridmat)      sup-score test results using user-specified grid



    Ahrens, A., Hansen, C.B. and M.E. Schaffer. 2020.  lassopack: model selection and prediction with regularized regression in
        Stata.  The Stata Journal, 20(1):176-235.  Working paper

    Anderson, T. W. and Rubin, H. 1949.  Estimation of the Parameters of Single Equation in a Complete System of Stochastic
        Equations.  Annals of Mathematical Statistics 20:46-63.

    Angrist, J. and Kruger, A. 1991.  Does compulsory school attendance affect schooling and earnings?  Quarterly Journal of
        Economics 106(4):979-1014.

    Belloni, A., Chernozhukov, V. and Wang, L. 2011.  Square-root lasso: Pivotal recovery of sparse signals via conic
        programming.  Biometrika 98:791-806.

    Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. 2012.  Sparse models and methods for optimal instruments with an
        application to eminent domain.  Econometrica 80(6):2369-2429. 

    Belloni, A., Chernozhukov, V. and Hansen, C. 2013.  Inference for high-dimensional sparse econometric models.  In Advances
        in Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambridge University Press: Cambridge,

    Belloni, A., Chernozhukov, V. and Hansen, C. 2014.  Inference on treatment effects after selection among high-dimensional
        controls.  Review of Economic Studies 81:608-650.

    Belloni, A., Chernozhukov, V. and Hansen, C. 2015.  High-dimensional methods and inference on structural and treatment
        effects.  Journal of Economic Perspectives 28(2):29-50.

    Belloni, A., Chernozhukov, V., Hansen, C. and Kozbur, D. 2016.  Inference in High Dimensional Panel Models with an
        Application to Gun Control.  Journal of Business and Economic Statistics 34(4):590-605. 

    Belloni, A., Chernozhukov, V. and Wang, L. 2014.  Pivotal estimation via square-root-lasso in nonparametric regression.
        Annals of Statistics 42(2):757-788.

    Chernozhukov, V., Chetverikov, D. and Kato, K. 2013.  Gaussian approximations and multiplier bootstrap for maxima of sums of
        high-dimensional random vectors.  Annals of Statistics 41(6):2786-2819.

    Chernozhukov, V. Hansen, C., and Spindler, M. 2015.  Post-selection and post-regularization inference in linear models with
        many controls and instruments.  American Economic Review: Papers & Proceedings 105(5):486-490. 

    Correia, S. 2016.  FTOOLS: Stata module to provide alternatives to common Stata commands optimized for large datasets. 

    Spindler, M., Chernozhukov, V. and Hansen, C. 2016.  High-dimensional metrics.

    Tibshirani, R. 1996.  Regression Shrinkage and Selection via the Lasso.  Journal of the Royal Statistical Society. Series B
        (Methodological) 58(1):267-288.

    Yamada, H. 2017.  The Frisch-Waugh-Lovell Theorem for the lasso and the ridge regression.  Communications in Statistics -
        Theory and Methods 46(21):10897-10902.


    Please check our website for more information.


    pdslasso/ivlasso require installation of the lassopack package.  To get the latest stable versions of lassopack and
    pdslasso/ivlasso from our website, check the installation instructions at  We
    update the website versions more frequently than the SSC version.  Earlier versions of these programs are also available
    from the website.

    To verify that pdslasso is correctly installed, click on or type whichpkg pdslasso (which requires whichpkg to be installed;
    ssc install whichpkg).


    Thanks to Sergio Correia for advice on the use of the FTOOLS package.

Citation of pdslasso and ivlasso

    pdslasso and ivlasso are not official Stata commands.  They are free contributions to the research community, like a paper.
    Please cite it as such:

    Ahrens, A., Hansen, C.B., Schaffer, M.E. 2018 (updated 2020).  pdslasso and ivlasso: Progams for post-selection and
        post-regularization OLS or IV estimation and inference.


        Achim Ahrens, Public Policy Group, ETH Zurich, Switzerland
        Christian B. Hansen, University of Chicago, USA

        Mark E. Schaffer, Heriot-Watt University, UK

Also see

       Help:  rlasso, lasso2, cvlasso (if installed)