Partial Linear IV Model #

Preparations #

We load the data, define global macros and set the seed.

. use https://statalasso.github.io/dta/AJR.dta, clear
. global Y logpgp95
. global D avexpr
. global Z logem4
. global X lat_abst edes1975 avelf temp* humid* steplow-oilres
. set seed 42

Step 1: Initialization #

Since the data set is very small, we consider 30 cross-fitting folds.

. ddml init iv, kfolds(30)

Step 2: Adding learners #

The partially linear IV model has three conditional expectations: \(E[Y|X]\) , \(E[D|X]\) and \(E[Z|X]\) . For each reduced form equation, we add two learners: regress and rforest.

We need to add the option vtype(none) for rforest to work with ddml since rforest’s predict command doesn’t support variable types.

. ddml E[Y|X]: reg $Y $X
Learner Y1_reg added successfully.
. ddml E[Y|X], vtype(none): rforest $Y $X, type(reg)
Learner Y2_rforest added successfully.
. ddml E[D|X]: reg $D $X
Learner D1_reg added successfully.
. ddml E[D|X], vtype(none): rforest $D $X, type(reg)
Learner D2_rforest added successfully.
. ddml E[Z|X]: reg $Z $X
Learner Z1_reg added successfully.
. ddml E[Z|X], vtype(none): rforest $Z $X, type(reg)
Learner Z2_rforest added successfully.

Step 3/4: Cross-fitting and estimation #

We use the shortstack option to combine the base learners. Short-stacking is a computationally cheaper alternative to stacking. Whereas stacking relies on cross-validated predicted values to obtain the relative weights for the base learners, short-stacking uses the cross-fitted predicted values.

. qui ddml crossfit

. ddml estimate, robust

DDML estimation results:
spec  r     Y learner     D learner         b        SE     Z learner
 opt  1    Y2_rforest    D2_rforest     0.772  ( 0.207)              
  ss  1  [shortstack]          [ss]     0.716  ( 0.196)          [ss]
opt = minimum MSE specification for that resample.

Shortstack DDML model
y-E[y|X]  = logpgp95_ss_1                          Number of obs   =        64
D-E[D|X,Z]= avexpr_ss_1
Z-E[Z|X]  = logem4_ss_1
------------------------------------------------------------------------------
             |               Robust
    logpgp95 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      avexpr |   .7158468   .1958356     3.66   0.000     .3320162    1.099677
       _cons |  -.0308525   .0914993    -0.34   0.736    -.2101878    .1484828
------------------------------------------------------------------------------

Manual estimation #

If you are curious what ddml does in the background:

. ddml estimate, allcombos spec(8) rep(1) robust

DDML estimation results:
spec  r     Y learner     D learner         b        SE     Z learner
   1  1        Y1_reg        D1_reg     0.378  ( 0.125)        Z1_reg
   2  1        Y1_reg        D1_reg    -0.187  ( 1.573)    Z2_rforest
   3  1        Y1_reg    D2_rforest     2.413  ( 3.594)        Z1_reg
   4  1        Y1_reg    D2_rforest     0.083  ( 0.475)    Z2_rforest
   5  1    Y2_rforest        D1_reg     0.123  ( 0.207)        Z1_reg
   6  1    Y2_rforest        D1_reg    -1.749  ( 4.690)    Z2_rforest
   7  1    Y2_rforest    D2_rforest     0.783  ( 0.504)        Z1_reg
*  8  1    Y2_rforest    D2_rforest     0.772  ( 0.207)    Z2_rforest
  ss  1  [shortstack]          [ss]     0.716  ( 0.196)          [ss]
* = minimum MSE specification for that resample.

Min MSE DDML model, specification 8
y-E[y|X]  = Y2_rforest_1                           Number of obs   =        64
D-E[D|X,Z]= D2_rforest_1
Z-E[Z|X]  = Z2_rforest_1
------------------------------------------------------------------------------
             |               Robust
    logpgp95 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      avexpr |    .772314   .2068282     3.73   0.000     .3669382     1.17769
       _cons |  -.0119092   .1009289    -0.12   0.906    -.2097263    .1859079
------------------------------------------------------------------------------


. ivreg Y2_rf (D2_rf = Z2_rf), robust

Instrumental variables 2SLS regression          Number of obs     =         64
                                                F(1, 62)          =      13.94
                                                Prob > F          =     0.0004
                                                R-squared         =          .
                                                Root MSE          =     .80209

------------------------------------------------------------------------------
             |               Robust
Y2_rforest_1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
D2_rforest_1 |    .772314   .2068282     3.73   0.000     .3588703    1.185758
       _cons |  -.0119092   .1009289    -0.12   0.906    -.2136633    .1898448
------------------------------------------------------------------------------
Instrumented: D2_rforest_1
 Instruments: Z2_rforest_1