-----------------------------------------------------------------------------------------------------------------------------------------------------------------help ddmlv1.2 -----------------------------------------------------------------------------------------------------------------------------------------------------------------Titleddml-- Stata package for Double Debiased Machine Learningddmlimplements algorithms for causal inference aided by supervised machine learning as proposed inDouble/debiased machine learning for treatment andstructural parameters(Econometrics Journal, 2018). Five different models are supported, allowing for binary or continous treatment variables and endogeneity, high-dimensional controls and/or instrumental variables.ddmlsupports a variety of different ML programs, including but not limited tolassopackandpystacked. The package includes the wrapper programqddml, which uses a simplified one-line syntax, but offers less flexibility.qddmlrelies oncrossfit, which can be used as a standalone program. Please check theexamplesprovided at the end of the help file.Estimation withSyntaxddmlproceeds in four steps.Step 1.Initializeddmland select model:ddml initmodel[if] [in] [ ,mname(name)kfolds(integer)fcluster(varname)foldvar(varlist)reps(integer)norandomtabfoldvars(varlist)] wheremodelis eitherpartial,iv,interactive,fiv,interactiveiv; seemodel descriptions.Step 2.Add supervised ML programs for estimating conditional expectations:ddmleq[ ,mname(name)vname(varname)learner(varname)vtype(string)predopt(string)] :commanddepvarvars[ ,cmdopt] where, depending on model chosen in Step 1,eqis eitherE[Y|X]E[Y|D,X]E[Y|X,Z]E[D|X]E[D|X,Z]E[Z|X].commandis a supported supervised ML program (e.g.pystackedorcvlasso). Seesupported programs. Note: Options before ":" and after the first comma refer toddml. Options that come after the final comma refer to the estimation command.Step 3.Cross-fitting:ddml crossfit[ ,mname(name)shortstack] This step implements the cross-fitting algorithm. Each learner is fitted iteratively on training folds and out-of-sample predicted values are obtained.Step 4.Estimate causal effects:ddml estimate[ ,mname(name)robustcluster(varname)vce(type)atetateutrim(real)] Theddml estimatecommand returns treatment effect estimates for all combination of learners added in Step 2.Optional.Report/post selected results:ddml estimate[ ,mname(name)spec(integer or string)rep(integer or string)allcombosnotablereplay]Auxiliary sub-programs:Download latestddmlfrom Github:ddml updateReport information aboutddmlmodel:ddml desc[ ,mname(name)learnerscrossfitestimatessampleall] Export results in csv format:ddml export[ using filename ,mname(name)] Retrieve information fromddml:ddml extract[object_name,mname(name)show(display_item)ename(name)vname(varname)statakeyskey1(string)key2(string)key3(string)subkey1(string)subkey2(string)]display_itemcan bemse,norpystacked.ddmlstores many internal results on associative arrays. These can be retrieved using the different key options. Seeddml extractfor details. Drop theddmlestimationmnameand all associated variables:ddml dropmnameReport overlap plots (interactiveandinteractiveivmodels only):ddml overlap[mname(name)replist(numlist)pslist(namelist)n(integer)kernel(name)name(name[,replace])title(string)subtitle(string)lopt0(string)lopt1(string)] One overlap (line) plot of propensity scores is reported for each treatment variable learner; by default, propensity scores for all crossfit samples are plotted. Overlap plots for the treatment variables are combined usinggraph combine.Optionsinit optionsDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------mname(name)name of the DDML model. Allows to run multiple DDML models simultaneously. Defaults tom0.kfolds(integer)number of cross-fitting folds. The default is 5.fcluster(varname)cluster identifiers for cluster randomization of random folds.foldvar(varlist)integer variable with user-specified cross-fitting folds (one per cross-fitting repetition).norandomuse observations in existing order instead of randomizing before splitting into folds; if multiple resamples, applies to first resample only; ignored if user-defined fold variables are provided infoldvar(varlist).reps(integer)cross-fitting repetitions, i.e., how often the cross-fitting procedure is repeated on randomly generated folds.tabfoldprints a table with frequency of observations by fold. -----------------------------------------------------------------------------------------------------------------------------------------------------------Equation optionsDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------mname(name)name of the DDML model. Defaults tom0.vname(varname)name of the dependent variable in the reduced form estimation. This is usually inferred from the command line but is mandatory for thefivmodel.learner(varname)optional name of the variable to be created.vtype(string)optional variable type of the variable to be created. Defaults todouble.nonecan be used to leave the type field blank (required when usingddmlwithrforest.)predopt(string)predictoption to be used to get predicted values. Typical values could bexborpr. Default is blank. -----------------------------------------------------------------------------------------------------------------------------------------------------------Cross-fittingDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------mname(name)name of the DDML model. Defaults tom0.shortstackasks for short-stacking to be used. Short-stacking runs constrained non-negative least squares on the cross-fitted predicted values to obtain a weighted average of several base learners. -----------------------------------------------------------------------------------------------------------------------------------------------------------EstimationDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------mname(name)name of the DDML model. Defaults tom0.spec(integer/string)select specification. This can either be the specification number,msefor minimum-MSE specification (the default) orssfor short-stacking.rep(integer/string)select resampling iteration. This can either be the cross-fit repetition number,mnfor mean aggregation ormdfor median aggregation (the default).robustreport SEs that are robust to the presence of arbitrary heteroskedasticity.cluster(varname)select cluster-robust variance-covariance estimator, e.g.vce(hc3)orvce(cluster id).vce(type)select variance-covariance estimator; seehere.noconstantsuppress constant term (partial,iv,fivmodels only). Since the residualized outcome and treatment may not be exactly mean-zero in finite samples,ddmlincludes the constant by default in the estimation stage of partially linear models.showconstantdisplay constant term in summary estimation output table (partial,iv,fivmodels only).atetreport average treatment effect of the treated (default is ATE).ateureport average treatment effect of the untreated (default is ATE).trim(real)trimming of propensity scores for the Interactive and Interactive IV models. The default is 0.01 (that is, values below 0.01 and above 0.99 are set to 0.01 and 0.99, respectively).allcombosestimates all possible specifications. By default, only the min-MSE (or short-stacking) specification is estimated and displayed.replayused in combination withspec()andrep()to display and return estimation results. -----------------------------------------------------------------------------------------------------------------------------------------------------------AuxiliaryDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------mname(name)name of the DDML model. Defaults tom0.replist(numlist)(overlap plots) list of crossfitting resamples to plot. Defaults to all.pslist(namelist)(overlap plots) varnames of propensity scores to plot (excluding the resample number). Defaults to all.n(integer)(overlap plots) seeteffects overlap.kernel(name)(overlap plots) seeteffects overlap.name(name)(overlap plots) seegraph combine.title(string)(overlap plots) seegraph combine.subtitle(string)(overlap plots) seegraph combine.lopt0(string)(overlap plots) options for line plot of untreated; default is solid/navy; seeline.lopt0(string)(overlap plots) options for line plot of treated; default is short dash/dark orange; seeline. -----------------------------------------------------------------------------------------------------------------------------------------------------------This section provides an overview of supported models. Throughout we useModelsYto denote the outcome variable,Xto denote confounders,Zto denote instrumental variable(s), andDto denote the treatment variable(s) of interest.Partially linear model[partial] Y =a.D + g(X) + U D = m(X) + V where the aim is to estimateawhile controlling for X. To this end, we estimate the conditional expectations E[Y|X] and E[D|X] using a supervised machine learner.Interactive model[interactive] Y = g(X,D) + U D = m(X) + V which relaxes the assumption that X and D are separable. D is a binary treatment variable. We estimate the conditional expectations E[D|X], as well as E[Y|X,D=0] and E[Y|X,D=1] (jointly added usingddml E[Y|X,D]).Partially linear IV model[iv] Y =a.D + g(X) + U Z = m(X) + V where the aim is to estimatea. We estimate the conditional expectations E[Y|X], E[D|X] and E[Z|X] using a supervised machine learner.Interactive IV model[interactiveiv] Y = g(Z,X) + U D = h(Z,X) + V Z = m(X) + E where the aim is to estimate the local average treatment effect. We estimate, using a supervised machine learner, the following conditional expectations: E[Y|X,Z=0] and E[Y|X,Z=1] (jointly added usingddml E[Y|X,Z]); E[D|X,Z=0] and E[D|X,Z=1] (jointly added usingddml E[D|X,Z]); E[Z|X].Flexible Partially Liner IV model[fiv] Y =a.D + g(X) + U D = m(Z) + g(X) + V where the estimand of interest isa. We estimate the conditional expectations E[Y|X], E[D^|X] and D^:=E[D|Z,X] using a supervised machine learner. The instrument is then formed as D^-E^[D^|X] where E^[D^|X] denotes the estimate of E[D^|X]. Note: "{D}" is a placeholder that is used because last step (estimation of E[D|X]) uses the fitted values from estimating E[D|X,Z]. Please seeexamplesection below.Compatible programsddmlis compatible with a large set of user-written Stata commands. It has been tested with -lassopackfor regularized regression (seelasso2,cvlasso,rlasso). - thepystackedpackage (seepystacked. Note thatpystackedrequires Stata 16. -rforestby Zou & Schonlau. Note thatrforestrequires the optionvtype(none). -svmachinesby Guenther & Schonlau. Beyond these, it is compatible with any Stata program that - uses the standard "reg y x" syntax, - supportsif-conditions, - and comes withpredictpost-estimation programs.Below we demonstrate the use ofExamplesddmlfor each of the 5 models supported. Note that estimation models are chosen for demonstration purposes only and kept simple to allow you to run the code quickly.Partially linear model I.Preparation: we load the data, define global macros and set the seed. . use https://github.com/aahrens1/ddml/raw/master/data/sipp1991.dta, clear . global Y net_tfa . global D e401 . global X tw age inc fsize educ db marr twoearn pira hown . set seed 42 We next initialize the ddml estimation and select the model.partialrefers to the partially linear model. The model will be stored on a Mata object with the default name "m0" unless otherwise specified using themname(name)option. Note that we set the number of random folds to 2, so that the model runs quickly. The default iskfolds(5). We recommend to consider at least 5-10 folds and even more if your sample size is small. Note also that we recommend re-running the model multiple times on different random folds; see optionsreps(integer). . ddml init partial, kfolds(2) We add a supervised machine learners for estimating the conditional expectation E[Y|X]. We first add simple linear regression. . ddml E[Y|X]: reg $Y $X We can add more than one learner per reduced form equation. Here, we add a random forest learner. We do this usingpystacked. In the next example we show how to usepystackedto stack multiple learners, but here we use it to implement a single learner. . ddml E[Y|X]: pystacked $Y $X, type(reg) method(rf) We do the same for the conditional expectation E[D|X]. . ddml E[D|X]: reg $D $X . ddml E[D|X]: pystacked $D $X, type(reg) method(rf) Optionally, you can check if the learners have been added correctly. . ddml desc Cross-fitting. The learners are iteratively fitted on the training data. This step may take a while. . ddml crossfit Finally, we estimate the coefficients of interest. Since we added two learners for each of our two reduced form equations, there are four possible specifications. By default, the result shown corresponds to the specification with the lowest out-of-sample MSPE: . ddml estimate, robust To estimate all four specifications, we use theallcombosoption: . ddml estimate, robust allcombos After having estimated all specifications, we can retrieve specific results. Here we use the specification relying on OLS for both estimating both E[Y|X] and E[D|X]: . ddml estimate, robust spec(1) replay You could manually retrieve the same point estimate by typing: . reg Y1_reg D1_reg, robust or graphically: . twoway (scatter Y1_reg D1_reg) (lfit Y1_reg D1_reg) whereY1_regandD1_regare the orthogonalized versions ofnet_tfaande401. To describe the ddml model setup or results in detail, you can useddml describewith the relevant option (sample,learners,crossfit,estimates), or just describe them all with thealloption: . ddml describe, allPartially linear model II. Stacking regression usingpystacked.Stacking regression is a simple and powerful method for combining predictions from multiple learners. It is available in Stata via thepystackedpackage. Below is an example with the partially linear model, but it can be used with any model supported byddml. Preparation: use the data and globals as above. Use the namem1for this new estimation, to distinguish it from the previous example that uses the default namem0. This enables having multiple estimations available for comparison. Also specify 5 resamplings. . set seed 42 . ddml init partial, kfolds(2) reps(5) mname(m1) Add supervised machine learners for estimating conditional expectations. The first learner in the stacked ensemble is OLS. We also use cross-validated lasso, ridge and two random forests with different settings, which we save in the following macros: . global rflow max_features(5) min_samples_leaf(1) max_samples(.7) . global rfhigh max_features(5) min_samples_leaf(10) max_samples(.7) In each step, we add themname(m1)option to ensure that the learners are not added to them0model which is still in memory. We also specify the names of the variables containing the estimated conditional expectations using thelearner(varname)option. This avoids overwriting the variables created for them0model using default naming. . ddml E[Y|X], mname(m1) learner(Y_m1): pystacked $Y $X || method(ols) || method(lassocv) || method(ridgecv) || method(rf) opt($rflow) || method(rf) opt($rfhigh), type(reg) . ddml E[D|X], mname(m1) learner(D_m1): pystacked $D $X || method(ols) || method(lassocv) || method(ridgecv) || method(rf) opt($rflow) || method(rf) opt($rfhigh), type(reg) Note: Options before ":" and after the first comma refer toddml. Options that come after the final comma refer to the estimation command. Make sure to not confuse the two types of options. Check if learners were correctly added: . ddml desc, mname(m1) learners Cross-fitting and estimation. . ddml crossfit, mname(m1) . ddml estimate, mname(m1) robust Examine the stacking weights and MSEs reported bypystacked. . ddml extract, mname(m1) show(pystacked) . ddml extract, mname(m1) show(mse) We can compare the effects with the firstddmlmodel (if you have run the first example above). . ddml estimate, mname(m0) replayPartially linear model III. Multiple treatments.We can also run the partially linear model with multiple treatments. In this simple example, we estimate the effect of both 401k elligibilitye401and educationeduc. Note that we removeeducfrom the set of controls. . use https://github.com/aahrens1/ddml/raw/master/data/sipp1991.dta, clear . global Y net_tfa . global D1 e401 . global D2 educ . global X tw age inc fsize db marr twoearn pira hown . set seed 42 Initialize the model. . ddml init partial, kfolds(2) Add learners. Note that we add leaners with both$D1and$D2as the dependent variable. . ddml E[Y|X]: reg $Y $X . ddml E[Y|X]: pystacked $Y $X, type(reg) method(rf) . ddml E[D|X]: reg $D1 $X . ddml E[D|X]: pystacked $D1 $X, type(reg) method(rf) . ddml E[D|X]: reg $D2 $X . ddml E[D|X]: pystacked $D2 $X, type(reg) method(rf) Cross-fitting. . ddml crossfit Estimation. . ddml estimate, robustPartially linear IV model.Preparation: we load the data, define global macros and set the seed. . use https://statalasso.github.io/dta/AJR.dta, clear . global Y logpgp95 . global D avexpr . global Z logem4 . global X lat_abst edes1975 avelf temp* humid* steplow-oilres . set seed 42 Preparation: we load the data, define global macros and set the seed. Since the data set is very small, we consider 30 cross-fitting folds. . ddml init iv, kfolds(30) The partially linear IV model has three conditional expectations: E[Y|X], E[D|X] and E[Z|X]. For each reduced form equation, we add two learners:regressandrforest. We need to add the optionvtype(none)forrforestto work withddmlsincerforest'spredictcommand doesn't support variable types. . ddml E[Y|X]: reg $Y $X . ddml E[Y|X], vtype(none): rforest $Y $X, type(reg) . ddml E[D|X]: reg $D $X . ddml E[D|X], vtype(none): rforest $D $X, type(reg) . ddml E[Z|X]: reg $Z $X . ddml E[Z|X], vtype(none): rforest $Z $X, type(reg) Cross-fitting and estimation. We use theshortstackoption to combine the base learners. Short-stacking is a computationally cheaper alternative to stacking. Whereas stacking relies on cross-validated predicted values to obtain the relative weights for the base learners, short-stacking uses the cross-fitted predicted values. . ddml crossfit, shortstack . ddml estimate, robust If you are curious about whatddmldoes in the background: . ddml estimate, allcombos spec(8) rep(1) robust . ivreg Y2_rf (D2_rf = Z2_rf), robustInteractive model--ATE and ATET estimation.Preparation: we load the data, define global macros and set the seed. . webuse cattaneo2, clear . global Y bweight . global D mbsmoke . global X prenatal1 mmarried fbaby mage medu . set seed 42 We use 5 folds and 5 resamplings; that is, we estimate the model 5 times using randomly chosen folds. . ddml init interactive, kfolds(5) reps(5) We need to estimate the conditional expectations of E[Y|X,D=0], E[Y|X,D=1] and E[D|X]. The first two conditional expectations are added jointly. We consider two supervised learners: linear regression and gradient boosted trees, stacked usingpystacked. Note that we use gradient boosted regression trees for E[Y|X,D], but gradient boosted classification trees for E[D|X]. . ddml E[Y|X,D]: pystacked $Y $X, type(reg) methods(ols gradboost) . ddml E[D|X]: pystacked $D $X, type(class) methods(logit gradboost) Cross-fitting: . ddml crossfit In the final estimation step, we can estimate the average treatment effect (the default), the average treatment effect of the treated (atet), or the average treatment effect of the untreated (ateu). . ddml estimate . ddml estimate, atet Recall that we have specified 5 resampling iterations (reps(5)) By default, the median over the minimum-MSE specification per resampling iteration is shown. At the bottom, a table of summary statistics over resampling iterations is shown. To estimate using the same two base learners but with short-stacking instead of stacking, we would enter the learners separately and use theshortstackoption: . set seed 42 . ddml init interactive, kfolds(5) reps(5) . ddml E[Y|X,D]: reg $Y $X . ddml E[Y|X,D]: pystacked $Y $X, type(reg) method(gradboost) . ddml E[D|X]: logit $D $X . ddml E[D|X]: pystacked $D $X, type(class) method(gradboost) . ddml crossfit, shortstack . ddml estimateInteractive IV model--LATE estimation.Preparation: we load the data, define global macros and set the seed. . use http://fmwww.bc.edu/repec/bocode/j/jtpa.dta, clear . global Y earnings . global D training . global Z assignmt . global X sex age married black hispanic . set seed 42 We initialize the model. . ddml init interactiveiv, kfolds(5) We use stacking (implemented inpystacked) with two base learners for each reduced form equation. . ddml E[Y|X,Z]: pystacked $Y c.($X)# #c($X), type(reg) m(ols lassocv) . ddml E[D|X,Z]: pystacked $D c.($X)# #c($X), type(class) m(logit lassocv) . ddml E[Z|X]: pystacked $Z c.($X)# #c($X), type(class) m(logit lassocv) Cross-fitting and estimation. . ddml crossfit . ddml estimate, robust To short-stack instead of stack: . set seed 42 . ddml init interactiveiv, kfolds(5) . ddml E[Y|X,Z]: reg $Y $X . ddml E[Y|X,Z]: pystacked $Y c.($X)# #c($X), type(reg) m(lassocv) . ddml E[D|X,Z]: logit $D $X . ddml E[D|X,Z]: pystacked $D c.($X)# #c($X), type(class) m(lassocv) . ddml E[Z|X]: logit $Z $X . ddml E[Z|X]: pystacked $Z c.($X)# #c($X), type(class) m(lassocv) Cross-fitting and estimation. . ddml crossfit, shortstack . ddml estimate, robustFlexible Partially Linear IV model.Preparation: we load the data, define global macros and set the seed. . use https://github.com/aahrens1/ddml/raw/master/data/BLP.dta, clear . global Y share . global D price . global X hpwt air mpd space . global Z sum* . set seed 42 We initialize the model. . ddml init fiv We add learners for E[Y|X] in the usual way. . ddml E[Y|X]: reg $Y $X . ddml E[Y|X]: pystacked $Y $X, type(reg) There are some pecularities that we need to bear in mind when adding learners for E[D|Z,X] and E[D|X]. The reason for this is that the estimation of E[D|X] depends on the estimation of E[D|X,Z]. More precisely, we first obtain the fitted values D^=E[D|X,Z] and fit these against X to estimate E[D^|X]. When adding learners for E[D|Z,X], we need to provide a name for each learners usinglearner(name). . ddml E[D|Z,X], learner(Dhat_reg): reg $D $X $Z . ddml E[D|Z,X], learner(Dhat_pystacked): pystacked $D $X $Z, type(reg) When adding learners for E[D|X], we explicitly refer to the learner from the previous step (e.g.,learner(Dhat_reg)) and also provide the name of the treatment variable (vname($D)). Finally, we use the placeholder{D}in place of the dependent variable. . ddml E[D|X], learner(Dhat_reg) vname($D): reg {D} $X . ddml E[D|X], learner(Dhat_pystacked) vname($D): pystacked {D} $X, type(reg) That's it. Now we can move to cross-fitting and estimation. . ddml crossfit . ddml estimate, robust If you are curious about whatddmldoes in the background: . ddml estimate, allcombos spec(8) rep(1) robust . gen Dtilde = $D - Dhat_pystacked_h_1 . gen Zopt = Dhat_pystacked_1 - Dhat_pystacked_h_1 . ivreg Y2_pystacked_1 (Dtilde=Zopt), robustChernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters.ReferencesThe Econometrics Journal, 21: C1-C68. https://doi.org/10.1111/ectj.12097To get the latest stable version ofInstallationddmlfrom our website, check the installation instructions at https://statalasso.github.io/installation/. We update the stable website version more frequently than the SSC version. To verify thatddmlis correctly installed, click on or type whichpkg ddml (which requireswhichpkgto be installed; ssc install whichpkg).Achim Ahrens, Public Policy Group, ETH Zurich, Switzerland achim.ahrens@gess.ethz.ch Christian B. Hansen, University of Chicago, USA Christian.Hansen@chicagobooth.edu Mark E Schaffer, Heriot-Watt University, UK m.e.schaffer@hw.ac.uk Thomas Wiemann, University of Chicago, USA wiemann@uchicago.eduAuthorsHelp:Also see (if installed)lasso2,cvlasso,rlasso,ivlasso,pdslasso,pystacked.

**ddml help file**