----------------------------------------------------------------------------------------------------------------------------------help pdslasso, help ivlassopdslasso v1.3 ----------------------------------------------------------------------------------------------------------------------------------Titlepdslassoandivlasso-- Programs for post-selection and post-regularization OLS or IV estimation and inferenceSyntaxpdslassodepvarregressors(hd_controls)[weight] [ifexp] [inrange] [,partial(varlist)pnotpen(varlist)psolver(string)aset(varlist)post(method)robustcluster(varlist)bw(int)kernel(string)fenoftoolsrlasso[(name)]sqrtnoisilyloptions(options)olsoptions(options)noconstant]ivlassodepvarregressors[(hd_controls)](endog=instruments)[ifexp] [inrange] [,partial(varlist)pnotpen(varlist)psolver(string)aset(varlist)post(method)robustcluster(varlist)bw(int)kernel(string)fenoftoolsrlasso[(name)]sqrtnoisilyloptions(options)ivoptions(options)firstidstatssscsetssgamma(real)ssgridmin(real)ssgridmax(real)ssgridpoints(integer 100)ssgridmat(name)noconstant] Note:pdslassorequiresrlassoandivreg2to be installed;ivlassoalso requiresranktest. See help rlasso, help ivreg2 and help ranktest or click on ssc install lassopack or ssc install ranktest to install. Note: thefeoption will take advantage of theftoolspackage (if installed) for the fixed-effects transform; the speed gains using this package can be large. See help ftools or click on ssc install ftools to install. Note:ivlassoalso supports the simplerpdslassosyntax.OptionsDescription ----------------------------------------------------------------------------------------------------------------------------partial(varlist)controls and instruments to be partialled-out prior to lasso estimationpnotpen(varlist)controls and instruments always included, not penalized by lassoaset(varlist)controls and instruments in amelioration set, always included in post-lassopost(method)pds,lassoorplasso; which estimation results are to be posted ine(b)ande(V)robustheteroskedastic-robust VCE; lasso penalty loadings account for heteroskedasticitycluster(varlist)cluster-robust VCE; lasso penalty loadings account for clustering; both standard (1-way) and 2-way clustering supportedbw(int)HAC/AC VCE; lasso penalty loadings account for autocorrelation (AC) using bandwidthint; use withrobustto account for both heteroskedasticity and autocorrelation (HAC)kernel(string)kernel used for HAC/AC penalty loadings (one of: bartlett, truncated, parzen, thann, thamm, daniell, tent, qs; default=bartlett)fefixed-effects model (requires data to bextset)noftoolsdo not use FTOOLS package for fixed-effects transform (slower; rarely used)rlasso[(name)] store and display intermediate lasso and post-lasso results from rlasso with optional prefixname(if justrlassois specified the default prefix is _ivlasso_ or _pdslasso_)sqrtuse sqrt-lasso instead of standard lassonoisilydisplay step-by-step intermediaterlassoestimation resultsloptions(options)lasso options specific torlassoestimation; seehelp rlassoolsoptions(options)(pdslassoonly) options specific to PDS OLS estimation of structural equationivoptions(options)(ivlassoonly) options specific to PDS OLS or IV estimation of structural equationfirst(ivlassoonly) display and store first-stage results for 2SLSidstats(ivlassoonly) request weak-identification statistics for 2SLSnoconstantsuppress constant from regression (cannot be used withaweightsorpweights)psolver(string)override default solver used for partialling out (one of: qr, qrxx, lu, luxx, svd, svdxx, chol; default=qrxx) ----------------------------------------------------------------------------------------------------------------------------Sup-score testDescription (ivlassoonly) ----------------------------------------------------------------------------------------------------------------------------sscsetrequest sup-score weak-identification-robust confidence setssgamma(real)significance level for sup-score weak-identification-robust tests and confidence intervals (default=0.05, 5%)ssgridmin(real)minimum value for grid search for sup-score weak-identification-robust confidence intervals (default=grid centered at OLS estimate)ssgridmax(real)maximum value for grid search for sup-score weak-identification-robust confidence intervals (default=grid centered at OLS estimate)ssgridpoints(real)number of points in grid search for sup-score weak-identification-robust confidence intervals (default=100)ssgridmat(name)user-supplied Stata r x k matrix of r jointly hypothesized values for the k endogenous regressors to be tested using the sup-score testssomitgrid(name)supress display of sup-score test results with user-supplied gridssmethod(name)"abound" (default) = use conservative critical value (asymptotic bound) c*sqrt(N)*invnormal(1-gamma/(2p)); "simulate" = simulate distribution to obtain p-values for sup-score test; "select" = reject ifrlassoselects any instruments ---------------------------------------------------------------------------------------------------------------------------- Postestimation:predict[type]newvar[if] [in] [,residxb]pdslassoandivlassomay be used with time-series or panel data, in which case the data must be tsset or xtset first; see helptssetorxtset.aweightsandpweightsare supported; see helpweights.pweightsis equivalent toaweights+robust. All varlists may contain time-series operators or factor variables; see help varlist.Description Computational notes Examples of usage Saved results References Website Installation Acknowledgements Citation of pdslasso and ivlassoContentsDescriptionpdslassoandivlassoare routines for estimating structural parameters in linear models with many controls and/or instruments. The routines use methods for estimating sparse high-dimensional models, specifically the lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani1996) and the square-root-lasso (Belloni et al.2011,2014).pdslassois used for the case where a researcher has an outcome variabley, a structural or causal variable of interestd, and a large set of potential control variablesx1, x2, x3, .... The usage in this case is:pdslasso y d (x1 x2 x3 ...)pdslassoaccepts multiple causal variables, e.g.:pdslasso y d1 d2 (x1 x2 x3 ...)Important:The high-dimensional controls must be includedwithin the parentheses(...). If this is not done, they are treated as causal rather than as controls. The problem the researcher faces is that the "right" set of controls is not known. In traditional practice, this presents her with a difficult choice: use too few controls, or the wrong ones, and omitted variable bias will be present; use too many, and the model will suffer from overfitting. The methods implemented inpdslassoaddress this problem by selecting enough controls to address the former problem but not so many as to introduce the latter.ivlassois used for the case where a researcher has an endogenous causal variable of intereste, and a large set of potential instruments {it:z1, z2, z3, ...). The usage in this case is:ivlasso y (e = z1 z2 z3 ...)ivlassoaccepts multiple causal variables, e.g.:pdslasso y (e1 e2 = z1 z2 z3 ...)ivlassoalso allows combinations of exogenous and endogenous causal variables (d, e) and high-dimensional controls and instruments (x, z), e.g.:pdslasso y d (x1 x2 x3 ...) (e = z1 z2 z3 ...)Two approaches are implemented inpdslassoandivlasso: 1. The "post-double-selection" (PDS) methodology of Belloni et al. (2012,2013,2014,2015,2016), denoted "PDS methodology" below. 2. The "post-regularization" (or "double-orthogonalization") methodology of Chernozhukov, Hansen and Spindler (2015), denoted "CHS methodology" below. The implemention of these methods inpdslassoandivlassouses the separate Stata programrlasso, which provides lasso and sqrt-lasso estimation with data-driven penalization; seerlassofor details. For an overview ofrlassoand the theory behind it, see Ahrens et al. (2020) The PDS methodology uses the lasso estimator to select the controls. Specifically, the lasso is used twice: (1) estimate a lasso regression withyas the dependent variable and the control variablesx1, x2, x3, ...as regressors; (2) estimate a lasso regression withdas the dependent variable and again the control variablesx1, x2, x3, ...as regressors. The lasso estimator achieves a sparse solution, i.e., most coefficients are set to zero. The final choice of control variables to include in the OLS regression ofyondis the union of the controls selected selected in steps (1) and (2), hence the name "post-double selection" for the methodolgy. The PDS methodology can be employed to select instruments as well as controls in instrumental variables estimation. The CHS methodology is closely related. Instead of using the lasso-selected controls and instruments in a post-regularization OLS or IV estimation, the selected variables are used to construct orthogonalized versions of the dependent variable, the exogenous and/or endogenous causal variables of interest and to construct optimal instruments from the lasso-selected IVs. The orthogonalized versions are based either on the lasso or post-lasso estimated coefficients; the post-lasso is OLS applied to lasso-selected variables. See Chernozhukov et al. (2015) for details. The set of variables selected by the lasso and used in the OLS post-lasso estimation and in the PDS structural estimation can be augmented by variables that were penalized but not selected by the lasso. The penalized variables that are used in this way to augment the post-lasso and PDS estimations are called the "amelioration set" and can be specified with theaset(varlist)option. This option affects only the CHS post-lasso-based and PDS estimations; the CHS lasso-based orthogonalized variables are unaffected. See Chernozhukov et al. (2014) for details.pdslassoandivlassoreport the PDS-based and the two (lasso and post-lasso) CHS-based estimations. If thesqrtoption is specified, instead of the lasso the sqrt-lasso estimator is used; seerlassofor further details and references. If the IV model is weakly identified (the instruments are only weakly correlated with the endogenous regressors) Belloni et al. (2012,2013) suggest using weak-identification-robust hypothesis tests and confidence sets based the Chernozhukov et al. (2013) sup-score test. The intuition behind the sup-score test is similar to that of the Anderson-Rubin (1949) test. Consider the simplest case (a single endogenous regressordand no exogenous regressors or controls) where the null hypothesis is that the coefficient ondisH0:beta=b0. If the null is true, then the structural residual is simplye=y-b0*d. Under the additional assumption that the instruments are valid (orthogonal to the true disturbance), they should be uncorrelated withe. The sup-score tests reported byivlassoare in effect high-dimensional versions of the Anderson-Rubin test. The test is implemented inrlasso; seehelp rlassofor details. Specifically,ivlassoreports sup-score tests of statistical significance of the instruments where the dependent variable ise=y-b0*d, the instruments are regressors, andb0is a hypothesized value of the coefficient ond; a large test statistic indicates rejection of the null H0:beta=b0. The default is to use a conservative (asymptotic bound) critical value as suggested by Belloni et al. (2012,2013) (optionssmethod(abound)). Alternative methods are to use p-values obtained by simulation via a multiplier bootstrap (optionssmethod(simulate)), or to estimate a lasso regression with the instruments as regressors, and if (no) instruments are selected we (fail to) reject the nullH0:beta=b0at thegammasignificance level (optionssmethod(select)). A100*(1-gamma)%sup-score-based confidence set can be constructed by a grid search over the range of hypothesized values ofbeta.ivlassoreports the result of the sup-score test of the nullH0:beta=0with theidstatsoption, and in addition, for the single endogenous regressor case only, reports sup-score confidence sets with thesscsetoption. For the multiple-endogenous regressor case, sets of jointly hypothesized values for the componets ofbetacan be tested using thessgridmat(name)option. The matrix provided in the option should be an r x k Stata matrix, where each row contains a set of values that together specify a null hypothesis for the coefficients of the k endogenous regressors. This option allows the user to specify a grid search in multiple dimensions.The various options available for the underlying calls toComputational notesrlassocan be controlled via the optionloptions(rlasso optionlist). Therlassooptioncenter, to center moments in heteroskedastic and cluster-robust loadings, will be a commonly-employed option. This can be specified bylopt(center). Anotherrlassooption that may often be used is to "pre-standardize" the data to have unit variance prior to computing the lasso coefficients with theprestdoption. This is a computational alternative to therlassodefault of standardizing "on the fly" (i.e., incorporating the standardization into the lasso penalty loadings). This is specified bylopt(prestd). The results are equivalent in theory. Theprestdoption can lead to improved numerical precision or more stable results in the case of difficult problems; the cost is (a typically small) computation time required to standardize.rlassoimplements a version of the lasso with data-dependent penalization and, for the heteroskedastic and clustered cases, regressor-specific penalty loadings; seerlassofor details. Note that specification ofrobustorcluster(.)as options topdslassoorivlassoautomatically implies the use of robust or cluster-robust lasso penalty loadings. Penalty loadings and VCE type can be separately controlled via theolsoptions(.)(forpdslasso) orivoptions(.)(forivlasso) vs.loptions(rlassooption list); for example,olsoptions(cluster(clustvar))+loptions(robust)would use heteroskedastic-robust penalty loadings for the lasso estimations and a cluster-robust covariance estimator for the PDS and CHS estimations of the structural equation. Either thepartial(varlist)option or thepnotpen(varlist)option can be used for variables that should not be penalized by the lasso. By the Frisch-Waugh-Lovell Theorem for the lasso (Yamada2017), the estimated lasso coefficients are the same in theory whether the unpenalized regressors are partialled-out or given zero penalty loadings, so long as the same penalty loadings are used for the penalized regressors in both cases. Although the options are equivalent in theory, numerical results can differ in practice because of the different calculation methods used; seerlassofor further details. The constant, if present, is always unpenalized or partialled-out By default the constant (if present) is not penalized if there are no regressors being partialled out; this is equivalent to mean-centering prior to estimation. The exception to this is ifaweightsoraweightsare specified, in which case the constant is partialled-out. Thepartial(varlist)option always partials out the constant (if present) along with the variables specified invarlist; to partial out just the constant, specifypartial(_cons). Partialling-out of controls is done byivlasso; partialling-out of instruments is done in the lasso estimation byrlasso. Partialling-out is implemented in Mata using one of Mata's solvers. In cases where the variables to be partialled out are collinear or nearly so, different solvers may generate different results. Users may wish to check the stability of their results in such cases. Thepsolver(.)option can be used to specify the Mata solver used. The default behavior for solving AX=B for X is to use the QR decomposition applied to (A'A) and (A'B), i.e., qrsolve((A'A),(A'B)), abbreviated qrxx. Available options are qr, qrxx, lu, luxx, svd, svdxx, where, e.g., svd indicates using svsolve(A,B) and svdxx indicates using svsolve((A'A),(A'B)).pdslasso/ivlassowill warn if collinear variables are dropped when partialling out. The lasso and sqrt-lasso estimations are obtained via numerical methods (coordinate descent). Results can be unstable for difficult problems (e.g., if the scaling of variables covers a wide range of magnitudes). Using variables that are all measured on a similar scale will help (as usual). Partialling-out variables is usually preferable to specifying them as unpenalized. Seerlassofor discussion of the various options for controlling the numerical methods used. The sup-score-based tests reported byivlassocome in three versions: (a) using lasso-orthogonalized variables, where the variables have first been orthogonalized with respect to the high-dimensional controls using the lasso; (b) using post-lasso-orthogonalized variables; (c) using the variables without any orthogonalization. The orthogonalizations use the same lasso settings as in the main estimation. After orthgonalization,e~ = y~ - b0*d~is constructed (where a tilde indicates an orthogonalized variable), and then the sup-score test is conducted usinge~and the instruments. Versions (a) and (b) are not reported if there are no high-dimensional controls. Version (c) is available if there are high-dimensional controls but only if themethod(select)option is used. The sup-score-based tests are not available if the specification also includes either exogenous causal regressors or unpenalized instruments. For large datasets, obtaining the p-value for the sup-score test by simulation (multiplier bootstrap,ssmethod(simulate)option) can be time-consuming. In such cases, using the default method of a conservative (asymptotic bound) critical value (ssmethod(abound)option) will be much faster. The grid search to construct the sup-score confidence set can be controlled by thessgridmin,ssgridmaxandssgridpointsoptions. If these options are not specified by the user, a 100-point grid centered on the OLS estimator is used. Thefefixed-effects option is equivalent to (but computationally faster and more accurate than) specifying unpenalized panel-specific dummies. The fixed-effects ("within") transformation also removes the constant as well as the fixed effects. The panel variable used by thefeoption is the panel variable set byxtset.rlasso, like the lasso in general, accommodates possibly perfectly-collinear sets of regressors. Stata'sfactor variablesare supported byrlasso. Users therefore have the option of specifying as high-dimensional controls or instruments one or more complete sets of factor variables or interactions with no base levels using theibnprefix. This can be interpreted as allowing the lasso to choose the members of the base category. For a detailed discussion of an R implementation of this methodology, see Spindler et al. (2016).Load and reorder AJR data for Table 6 and Table 8 (datasets need to be in current directory). . clear . (click to download maketable6.zip from economics.mit.edu) . unzipfile maketable6 . (click to download maketable8.zip from economics.mit.edu) . unzipfile maketable8 . use maketable6 . merge 1:1 shortnam using maketable8 . keep if baseco==1 . order shortnam logpgp95 avexpr lat_abst logem4 edes1975 avelf, first . order indtime euro1900 democ1 cons1 democ00a cons00a, last Alternatively, load AJR data from our website (no manual download required): . clear . use https://statalasso.github.io/dta/AJR.dta Examples with exogenous regressors: Replicate OLS results in Panel C, col. 9. . reg logpgp95 avexpr lat_abst edes1975 avelf temp* humid* steplow-oilres Basic usage: select from high-dim controls. . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres) As above, hetoroskedastic-robust. . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), rob Specify that latitude is an unpenalized control to be partialled out. . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), partial(lat_abst) Specify that latitude is an unpenalized control using the notpen option (equivalent). . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), pnotpen(lat_abst) Specify that latitude is in the amelioration set. . pdslasso logpgp95 avexpr (lat_abst edes1975 avelf temp* humid* steplow-oilres), aset(lat_abst) Example with endogenous regressor, high-dimensional controls and low-dimensional instrument: Replicate IV results in Panels A & B, col. 9. . ivreg logpgp95 (avexpr=logem4) lat_abst edes1975 avelf temp* humid* steplow-oilres, first Select controls; specify that logem4 is an unpenalized instrument to be partialled out. . ivlasso logpgp95 (avexpr=logem4) (lat_abst edes1975 avelf temp* humid* steplow-oilres), partial(logem4) Example with endogenous regressor and high-dimensional instruments and controls: Select controls and instruments; specify that logem4 is an unpenalized instrument and lat_abst is an unpenalized control; request weak identification stats and first-stage results. . ivlasso logpgp95 (lat_abst edes1975 avelf temp* humid* steplow-oilres) (avexpr=logem4 euro1900-cons00a), partial(logem4 lat_abst) idstats first Replay first-stage estimation. (Can also useExamples using data from Acemoglu-Johnson-Robinson (2001)est restoreto make this the current estimation results.) . est replay _ivlasso_avexpr Select controls and instruments; specify that lat_abst is an unpenalized control; request weak identification stats and sup-score confidence sets. . ivlasso logpgp95 (lat_abst edes1975 avelf temp* humid* steplow-oilres) (avexpr=logem4 euro1900-cons00a), partial(lat_abst) idstats sscset As above but heteroskedastic-robust and use grid options to control grid search and test level; also set seed inrlassooptions to make multiplier-bootstrap p-values replicable. . ivlasso logpgp95 (lat_abst edes1975 avelf temp* humid* steplow-oilres) (avexpr=logem4 euro1900-cons00a), partial(lat_abst) rob idstats sscset ssgridmin(0) ssgridmax(2) ssgamma(0.1) lopt(seed(1))Examples using data from Angrist-Krueger (1991Load AK data and rename variables (dataset needs to be in current directory). NB: this is a large dataset (330k observations) and estimations may take some time to run on some installations. . clear . (click to download asciiqob.zip from economics.mit.edu) . unzipfile asciiqob.zip . infix lnwage 1-9 edu 10-20 yob 21-31 qob 32-42 pob 43-53 using asciiqob.txt Alternative source (no unzipping needed): . use https://statalasso.github.io/dta/AK91.dta xtset data by place of birth (state): . xtset pob Table VII (1930-39) col 2. Year and state of birth = yob & pob. . ivregress 2sls lnwage i.pob i.yob (edu=i.qob i.yob#i.qob i.pob#i.qob) Fixed effects; select year controls and IVs; IVs are QOB and QOBxYOB. . ivlasso lnwage (i.yob) (edu=i.qob i.yob#i.qob), fe Fixed effects; select year controls and IVs; IVs are QOB, QOBxYOB, QOBxSOB. . ivlasso lnwage (i.yob) (edu=i.qob i.yob#i.qob i.pob#i.qob), fe All dummies & interactions incl. base levels. . ivlasso lnwage (i.yob) (edu=ibn.qob ibn.yob#ibn.qob ibn.pob#ibn.qob), fe)Example using data from Belloni et al. (2015Load dataset on eminent domain (available at journal website). . clear . import excel using https://statalasso.github.io/dta/CSExampleData.xlsx, first Settings used in Belloni et al. ()2015) - results as in journal replication file (not text) (Includes use of undocumentedrlassooptionc0(real)to control initial penalty loadings.) Storerlassointermediate results for replay later. . ivlasso CSIndex (NumProCase = Z*), nocons robust rlasso lopt(lalt corrnum(0) maxpsiiter(100) c0(0.55)) . estimates replay _ivlasso_step5_NumProCaseSaved resultsivlassosaves the following ine(): scalarse(N)sample sizee(xhighdim_ct)number of all high-dimensional controlse(zhighdim_ct)number of all high-dimensional instrumentse(N_clust)number of clusters in cluster-robust estimation; in the case of 2-way cluster-robust,e(N_clust)=min(e(N_clust1),e(N_clust2))e(N_g)number of groups in fixed-effects modele(bw)(HAC/AC only) bandwidth usede(ss_gamma)significance level in sup-score tests and CIse(ss_level)test level in % in sup-score tests and CIs (=100*(1-gamma))e(ss_gridmin)min grid point in sup-score CIe(ss_gridmax)max grid point in sup-score CIe(ss_gridpoints)number of grid points in sup-score CI macrose(cmd)pdslasso or ivlassoe(depvar)name of dependent variablee(dexog)name(s) of exogenous structural variable(s)e(dendog)name(s) endogenous structural variable(s)e(xhighdim)names of high-dimensional control variablese(zhighdim)names of high-dimensional instrumentse(method)lasso or sqrt-lassoe(kernel)(HAC/AC only) kernel usede(ss_null)result of sup-score test (reject/fail to reject)e(ss_null_l)result of lasso-orthogonalized sup-score test (reject/fail to reject)e(ss_null_pl)result of post-lasso-orthogonalized sup-score test (reject/fail to reject)e(ss_cset)confidence interval for sup-score teste(ss_cset_l)confidence interval for lasso-orthogonalized sup-score teste(ss_cset_pl)confidence interval for post-lasso-orthogonalized sup-score teste(ss_method)simulate, abound or select matricese(b)posted coefficient vectore(V)posted variance-covariance matrixe(beta_pds)PDS coefficient vectore(V_pds)PDS variance-covariance matrixe(beta_lasso)CHS lasso-based coefficient vectore(V_lasso)CHS lasso-based variance-covariance matrixe(beta_plasso)CHS post-lasso-based coefficient vectore(V_plasso)CHS post-lasso-based variance-covariance matrixe(ss_citable)sup-score test results used to construct confidence setse(ss_gridmat)sup-score test results using user-specified grid functionse(sample)Ahrens, A., Hansen, C.B. and M.E. Schaffer. 2020. lassopack: model selection and prediction with regularized regression in Stata.ReferencesThe Stata Journal, 20(1):176-235. https://journals.sagepub.com/doi/abs/10.1177/1536867X20909697. Working paper version: https://arxiv.org/abs/1901.05397. Anderson, T. W. and Rubin, H. 1949. Estimation of the Parameters of Single Equation in a Complete System of Stochastic Equations.Annals of Mathematical Statistics20:46-63. https://projecteuclid.org/euclid.aoms/1177730090 Angrist, J. and Kruger, A. 1991. Does compulsory school attendance affect schooling and earnings?Quarterly Journal ofEconomics106(4):979-1014. http://www.jstor.org/stable/2937954 Belloni, A., Chernozhukov, V. and Wang, L. 2011. Square-root lasso: Pivotal recovery of sparse signals via conic programming.Biometrika98:791-806. https://doi.org/10.1214/14-AOS1204 Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. 2012. Sparse models and methods for optimal instruments with an application to eminent domain.Econometrica80(6):2369-2429. http://onlinelibrary.wiley.com/doi/10.3982/ECTA9626/abstract Belloni, A., Chernozhukov, V. and Hansen, C. 2013. Inference for high-dimensional sparse econometric models. InAdvancesin Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambridge University Press: Cambridge, 245-295. http://arxiv.org/abs/1201.0220 Belloni, A., Chernozhukov, V. and Hansen, C. 2014. Inference on treatment effects after selection among high-dimensional controls.Review of Economic Studies81:608-650. https://doi.org/10.1093/restud/rdt044 Belloni, A., Chernozhukov, V. and Hansen, C. 2015. High-dimensional methods and inference on structural and treatment effects.Journal of Economic Perspectives28(2):29-50. http://www.aeaweb.org/articles.php?doi=10.1257/jep.28.2.29 Belloni, A., Chernozhukov, V., Hansen, C. and Kozbur, D. 2016. Inference in High Dimensional Panel Models with an Application to Gun Control.Journal of Business and Economic Statistics34(4):590-605. http://amstat.tandfonline.com/doi/full/10.1080/07350015.2015.1102733 Belloni, A., Chernozhukov, V. and Wang, L. 2014. Pivotal estimation via square-root-lasso in nonparametric regression.Annals of Statistics42(2):757-788. https://doi.org/10.1214/14-AOS1204 Chernozhukov, V., Chetverikov, D. and Kato, K. 2013. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors.Annals of Statistics41(6):2786-2819. https://projecteuclid.org/euclid.aos/1387313390 Chernozhukov, V. Hansen, C., and Spindler, M. 2015. Post-selection and post-regularization inference in linear models with many controls and instruments.American Economic Review: Papers & Proceedings105(5):486-490. http://www.aeaweb.org/articles.php?doi=10.1257/aer.p20151022 Correia, S. 2016. FTOOLS: Stata module to provide alternatives to common Stata commands optimized for large datasets. https://ideas.repec.org/c/boc/bocode/s458213.html Spindler, M., Chernozhukov, V. and Hansen, C. 2016. High-dimensional metrics. https://cran.r-project.org/package=hdm. Tibshirani, R. 1996. Regression Shrinkage and Selection via the Lasso.Journal of the Royal Statistical Society. Series B(Methodological)58(1):267-288. https://doi.org/10.2307/2346178 Yamada, H. 2017. The Frisch-Waugh-Lovell Theorem for the lasso and the ridge regression.Communications in Statistics -Theory and Methods46(21):10897-10902. http://dx.doi.org/10.1080/03610926.2016.1252403Please check our website https://statalasso.github.io/ for more information.WebsiteInstallationpdslasso/ivlassorequire installation of thelassopackpackage. To get the latest stable versions oflassopackandpdslasso/ivlassofrom our website, check the installation instructions at https://statalasso.github.io/installation/. We update the website versions more frequently than the SSC version. Earlier versions of these programs are also available from the website. To verify thatpdslassois correctly installed, click on or type whichpkg pdslasso (which requireswhichpkgto be installed; ssc install whichpkg).Thanks to Sergio Correia for advice on the use of the FTOOLS package.AcknowledgementsCitation of pdslasso and ivlassopdslassoandivlassoare not official Stata commands. They are free contributions to the research community, like a paper. Please cite it as such: Ahrens, A., Hansen, C.B., Schaffer, M.E. 2018 (updated 2020). pdslasso and ivlasso: Progams for post-selection and post-regularization OLS or IV estimation and inference. http://ideas.repec.org/c/boc/bocode/s458459.htmlAchim Ahrens, Public Policy Group, ETH Zurich, Switzerland achim.ahrens@gess.ethz.ch Christian B. Hansen, University of Chicago, USA Christian.Hansen@chicagobooth.edu Mark E. Schaffer, Heriot-Watt University, UK m.e.schaffer@hw.ac.ukAuthorsHelp:Also seerlasso,lasso2,cvlasso(if installed)

**Help file**