Interactive

Interactive Model #

Preparations: we load the data, define global macros and set the seed.

. webuse cattaneo2, clear
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138–154)
. global Y bweight
. global D mbsmoke
. global X mage prenatal1 mmarried fbaby mage medu
. set seed 42

Step 1: Initialization #

We use 5 folds and 5 resamplings; that is, we estimate the model 5 times using randomly chosen folds.

. ddml init interactive, kfolds(5) reps(5)

Step 2: Adding learners #

We need to estimate the conditional expectations of \(E[Y|X,D=0]\) , \(E[Y|X,D=1]\) and \(E[D|X]\) . The first two conditional expectations are added jointly.

We consider two supervised learners: linear regression and gradient boosted trees (implemented in pystacked). Note that we use gradient boosted regression trees for E[Y|X,D], but gradient boosted classification trees for E[D|X].

. ddml E[Y|X,D]: reg $Y $X
Learner Y1_reg added successfully.
. ddml E[Y|X,D]: pystacked $Y $X, type(reg) method(gradboost)
Learner Y2_pystacked added successfully.
. ddml E[D|X]: logit $D $X
Learner D1_logit added successfully.
. ddml E[D|X]: pystacked $D $X, type(class) method(gradboost)
Learner D2_pystacked added successfully.

Step 3: Cross-fitting #

. ddml crossfit

Step 4: Estimation #

In the final estimation step, we can estimate both the average treatment effect (the default) or the average treatment effect of the treated (atet; output not shown).

. ddml estimate

DDML estimation results (ATE):
spec  r    Y0 learner    Y1 learner     D learner         b        SE
 opt  1  Y1_pystacked  Y1_pystacked  D1_pystacked  -219.583  (26.027)
 opt  2  Y1_pystacked  Y1_pystacked  D1_pystacked  -220.967  (25.586)
 opt  3  Y1_pystacked  Y1_pystacked  D1_pystacked  -227.103  (26.256)
 opt  4  Y1_pystacked  Y1_pystacked  D1_pystacked  -221.207  (25.830)
 opt  5  Y1_pystacked  Y1_pystacked  D1_pystacked  -224.497  (25.840)
opt = minimum MSE specification for that resample.

Mean/med.  Y0 learner    Y1 learner     D learner         b        SE
 mse mn     [min-mse]         [mse]         [mse]  -222.672  (26.045)
 mse md     [min-mse]         [mse]         [mse]  -221.207  (26.049)

Median over 5 min-mse specifications (ATE)
E[y|X,D=0]   = Y1_pystacked                        Number of obs   =      4642
E[y|X,D=1]   = Y1_pystacked
E[D|X]       = D1_pystacked
------------------------------------------------------------------------------
             |               Robust
     bweight | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     mbsmoke |  -221.2069    26.0486    -8.49   0.000    -272.2612   -170.1526
------------------------------------------------------------------------------

Summary over 5 resamples:
       D eqn      mean       min       p25       p50       p75       max
     mbsmoke   -222.6716 -227.1032 -224.4971 -221.2069 -220.9675 -219.5831

. qui ddml estimate, atet

Recall that we have specified 5 resampling iterations (reps(5)) By default, the median over the minimum-MSE specification per resampling iteration is shown. At the bottom, a table of summary statistics over resampling iterations is shown.

Short-stacking #

To estimate using the same two base learners but with short-stacking instead of stacking, we would enter the learners separately and use the shortstack option:

. set seed 42
. ddml init interactive, kfolds(5) reps(5)
. ddml E[Y|X,D]: reg $Y $X
. ddml E[Y|X,D]: pystacked $Y $X, type(reg) method(gradboost)
. ddml E[D|X]: logit $D $X
. ddml E[D|X]: pystacked $D $X, type(class) method(gradboost)
. ddml crossfit, shortstack
. ddml estimate