Interactive Model #
Preparations: we load the data, define global macros and set the seed.
. webuse cattaneo2, clear
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138–154)
. global Y bweight
. global D mbsmoke
. global X mage prenatal1 mmarried fbaby mage medu
. set seed 42
Step 1: Initialization #
We use 5 folds and 5 resamplings; that is, we estimate the model 5 times using randomly chosen folds.
. ddml init interactive, kfolds(5) reps(5)
Step 2: Adding learners #
We need to estimate the conditional expectations of \(E[Y|X,D=0]\) , \(E[Y|X,D=1]\) and \(E[D|X]\) . The first two conditional expectations are added jointly.
We consider two supervised learners: linear regression and gradient boosted trees (implemented in pystacked). Note that we use gradient boosted regression trees for E[Y|X,D], but gradient boosted classification trees for E[D|X].
. ddml E[Y|X,D]: reg $Y $X
Learner Y1_reg added successfully.
. ddml E[Y|X,D]: pystacked $Y $X, type(reg) method(gradboost)
Learner Y2_pystacked added successfully.
. ddml E[D|X]: logit $D $X
Learner D1_logit added successfully.
. ddml E[D|X]: pystacked $D $X, type(class) method(gradboost)
Learner D2_pystacked added successfully.
Step 3: Cross-fitting #
. ddml crossfit
Step 4: Estimation #
In the final estimation step, we can estimate both the average treatment effect (the default) or the average treatment effect of
the treated (atet
; output not shown).
. ddml estimate
DDML estimation results (ATE):
spec r Y0 learner Y1 learner D learner b SE
opt 1 Y1_pystacked Y1_pystacked D1_pystacked -219.583 (26.027)
opt 2 Y1_pystacked Y1_pystacked D1_pystacked -220.967 (25.586)
opt 3 Y1_pystacked Y1_pystacked D1_pystacked -227.103 (26.256)
opt 4 Y1_pystacked Y1_pystacked D1_pystacked -221.207 (25.830)
opt 5 Y1_pystacked Y1_pystacked D1_pystacked -224.497 (25.840)
opt = minimum MSE specification for that resample.
Mean/med. Y0 learner Y1 learner D learner b SE
mse mn [min-mse] [mse] [mse] -222.672 (26.045)
mse md [min-mse] [mse] [mse] -221.207 (26.049)
Median over 5 min-mse specifications (ATE)
E[y|X,D=0] = Y1_pystacked Number of obs = 4642
E[y|X,D=1] = Y1_pystacked
E[D|X] = D1_pystacked
------------------------------------------------------------------------------
| Robust
bweight | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mbsmoke | -221.2069 26.0486 -8.49 0.000 -272.2612 -170.1526
------------------------------------------------------------------------------
Summary over 5 resamples:
D eqn mean min p25 p50 p75 max
mbsmoke -222.6716 -227.1032 -224.4971 -221.2069 -220.9675 -219.5831
. qui ddml estimate, atet
Recall that we have specified 5 resampling iterations (reps(5)) By default, the median over the minimum-MSE specification per resampling iteration is shown. At the bottom, a table of summary statistics over resampling iterations is shown.
Short-stacking #
To estimate using the same two base learners but with short-stacking instead of stacking, we would enter the learners separately
and use the shortstack
option:
. set seed 42
. ddml init interactive, kfolds(5) reps(5)
. ddml E[Y|X,D]: reg $Y $X
. ddml E[Y|X,D]: pystacked $Y $X, type(reg) method(gradboost)
. ddml E[D|X]: logit $D $X
. ddml E[D|X]: pystacked $D $X, type(class) method(gradboost)
. ddml crossfit, shortstack
. ddml estimate