-----------------------------------------------------------------------------------------------------------------------------------------------------------------help ddmlv1.2 -----------------------------------------------------------------------------------------------------------------------------------------------------------------Titleqddml-- Stata program for Double Debiased Machine Learningddmlimplements algorithms for causal inference aided by supervised machine learning as proposed inDouble/debiased machine learning for treatment andstructural parameters(Econometrics Journal, 2018). Five different models are supported, allowing for binary or continous treatment variables and endogeneity, high-dimensional controls and/or instrumental variables.ddmlsupports a variety of different ML programs, including but not limited tolassopackandpystacked.qddmlis a wrapper program ofddml. It provides a convenient one-line syntax with almost the full flexibility ofddml. The main restriction ofqddmlis that it only allows to be used with one machine learning program at the time, whileddmlallow for multiple learners per reduced form equation.qddmluses stacking regression (pystacked) as the default machine learning program.qddmlrelies oncrossfit, which can be used as a standalone program.qddmldepvarregressors[(hd_controls)](endog=instruments)[ifexp] [inrange]model(name)[,cmd(string)cmdopt(string)mname(string)noreg...]} Sinceqddmlusespystackedper default, it requires Stata 16 or higher, Python 3.x and at least scikit-learn 0.24. Seethis help file, this Stata blog entry and this Youtube video for how to set up Python on your system. In short, install Python 3.x (we recommend Anaconda) and set the appropriate Python path usingpython set exec. If you don't have Stata 16+, you can still usepystackedwith programs that don't rely on Python, e.g., using the optioncmd(rlasso). Please check theexamplesprovided at the end of the help file.OptionsGeneralDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------model(name)the model to be estimated; allows forpartial,interactive,iv,fiv,late. Seeherefor an overview.mname(string)name of the DDML model. Allows to run multiple DDML models simultaneously. Defaults tom0.kfolds(integer)number of cross-fitting folds. The default is 5.fcluster(varname)cluster identifiers for cluster randomization of random folds.foldvar(varname)integer variable with user-specified cross-fitting folds.reps(integer)number of re-sampling iterations, i.e., how often the cross-fitting procedure is repeated on randomly generated folds.shortstackasks for short-stacking to be used. Short-stacking runs contrained non-negative least squares on the cross-fitted predicted values to obtain a weighted average of several base learners.robustreport SEs that are robust to the presence of arbitrary heteroskedasticity.vce(type)select variance-covariance estimator, seeherecluster(varname)select cluster-robust variance-covariance estimator.noregdo not addregressas an additional learner.LearnersDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------cmd(string)ML program used for estimating conditional expectations. Defaults topystacked. Seeherefor other supported programs.ycmd(string)ML program used for estimating the conditional expectations of the outcomeY. Defaults tocmd(string).dcmd(string)ML program used for estimating the conditional expectations of the treatment variable(s)D. Defaults tocmd(string).zcmd(string)ML program used for estimating conditional expectations of instrumental variable(s)Z. Defaults tocmd(string).*cmdopt(string)options that are passed on to ML program. The asterisk*can be replaced with either nothing (setting the default for all reduced form equations),y(setting the default for the conditional expectation ofY),d(setting the default forD) orz(setting the default forZ).*vtype(string)variable type of the variable to be created. Defaults todouble.nonecan be used to leave the type field blank (this is required when usingddmlwithrforest.) The asterisk*can be replaced with either nothing (setting the default for all reduced form equations),y(setting the default for the conditional expectation ofY),d(setting the default forD) orz(setting the default forZ).*predopt(string)predictoption to be used to get predicted values. Typical values could bexborpr. Default is blank. The asterisk*can be replaced with either nothing (setting the default for all reduced form equations),y(setting the default for the conditional expectation ofY),d(setting the default forD) orz(setting the default forZ).OutputDescription -----------------------------------------------------------------------------------------------------------------------------------------------------------verboseshow detailed outputvverboseshow even more outputSeeModelshere.SeeCompatible programshere.Below we demonstrate the use ofExamplesqddmlfor each of the 5 models supported. Note that estimation models are chosen for demonstration purposes only and kept simple to allow you to run the code quickly. Please also see the examples in theddml help filePartially linear model.Preparations: we load the data, define global macros and set the seed. . use https://github.com/aahrens1/ddml/raw/master/data/sipp1991.dta, clear . global Y net_tfa . global D e401 . global X tw age inc fsize educ db marr twoearn pira hown . set seed 42 The optionsmodel(partial)selects the partially linear model andkfolds(2)selects two cross-fitting folds. We use the optionscmd()andcmdopt()to select random forest for estimating the conditional expectations. Note that we set the number of random folds to 2, so that the model runs quickly. The default iskfolds(5). We recommend to consider at least 5-10 folds and even more if your sample size is small. Note also that we recommend to re-run the model multiple time on different random folds, see optionsreps(integer). . qddml $Y $D ($X), kfolds(2) model(partial) cmd(pystacked) cmdopt(type(reg) method(rf))Partially linear IV model.Preparations: we load the data, define global macros and set the seed. . use https://statalasso.github.io/dta/AJR.dta, clear . global Y logpgp95 . global D avexpr . global Z logem4 . global X lat_abst edes1975 avelf temp* humid* steplow-oilres . set seed 42 Since the data set is very small, we consider 30 cross-fitting folds. We need to add the optionvtype(none)forrforestto work withddmlsincerforests'spredictcommand doesn't support variable types. . qddml $Y ($X) ($D=$Z), kfolds(30) model(iv) cmd(rforest) cmdopt(type(reg)) vtype(none) robustInteractive model--ATE and ATET estimation.Preparations: we load the data, define global macros and set the seed. . webuse cattaneo2, clear . global Y bweight . global D mbsmoke . global X mage prenatal1 mmarried fbaby mage medu . set seed 42 Note that we use gradient boosted regression trees for E[Y|X,D] (seeycmdopt()), but gradient boosted classification trees for E[D|X] (seedcmdopt()). . qddml $Y $D ($X), kfolds(5) reps(5) model(interactive) cmd(pystacked) ycmdopt(type(reg) method(gradboost)) dcmdopt(type(class) method(gradboost))qddmlreports the ATE effect by default. The optionatetreturns the ATET estimate. If we want retrieve the ATET estimate after estimation, we can simply use {ddml estimate}. . ddml estimate, atetInteractive IV model--LATE estimation.Preparations: we load the data, define global macros and set the seed. . use http://fmwww.bc.edu/repec/bocode/j/jtpa.dta,clear . global Y earnings . global D training . global Z assignmt . global X sex age married black hispanic . set seed 42 . qddml $Y (c.($X)# #c($X)) ($D=$Z), kfolds(5) model(interactiveiv) cmd(pystacked) ycmdopt(type(reg) m(lassocv)) dcmdopt(type(class) m(lassocv)) zcmdopt(type(class) m(lassocv))Flexible Partially Linear IV model.Preparations: we load the data, define global macros and set the seed. . use https://github.com/aahrens1/ddml/raw/master/data/BLP.dta, clear . global Y share . global D price . global X hpwt air mpd space . global Z sum* . set seed 42 The syntax is the same as in the Partially Linear IV model, but we now estimate the optimal instrument flexibly. . qddml $Y ($X) ($D=$Z), model(fiv)Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters.ReferencesThe Econometrics Journal, 21: C1-C68. https://doi.org/10.1111/ectj.12097To get the latest stable version ofInstallationddmlfrom our website, check the installation instructions at https://statalasso.github.io/installation/. We update the stable website version more frequently than the SSC version. To verify thatddmlis correctly installed, click on or type whichpkg ddml (which requireswhichpkgto be installed; ssc install whichpkg).Achim Ahrens, Public Policy Group, ETH Zurich, Switzerland achim.ahrens@gess.ethz.ch Christian B. Hansen, University of Chicago, USA Christian.Hansen@chicagobooth.edu Mark E Schaffer, Heriot-Watt University, UK m.e.schaffer@hw.ac.uk Thomas Wiemann, University of Chicago, USA wiemann@uchicago.eduAuthorsHelp:Also see (if installed)lasso2,cvlasso,rlasso,ivlasso,pdslasso,pystacked.

**qddml help file**