Partially Linear Model #
Preparations #
We load the data, define global macros and set the seed.
. use https://github.com/aahrens1/ddml/raw/master/data/sipp1991.dta, clear
. global Y net_tfa
. global D e401
. global X tw age inc fsize educ db marr twoearn pira hown
. set seed 42
Step 1: Initialize DDML model #
We next initialize the ddml estimation and select the model. partial
refers to the partially linear model. The model will be
stored on a Mata object with the default name “m0” unless otherwise specified using
the mname(name)
option.
Number of folds
Note that we set the number of random folds to 2, so that the model runs quickly. The default iskfolds(5)
. We recommend to consider at least 5-10 folds and even more if your sample size is small.
. ddml init partial, kfolds(2)
Step 2: Add machine learners #
We add a supervised machine learners for estimating the conditional expectation \(E[Y|X]\) . We first add simple linear regression.
. ddml E[Y|X]: reg $Y $X
Learner Y1_reg added successfully.
We can add more than one learner per reduced form equation. Here, we also add a random forest learner implemented in pystacked
. (In the next example we show how to use pystacked
to stack multiple learners, but here we use it to implement a single learner.)
. ddml E[Y|X]: pystacked $Y $X, type(reg) method(rf)
Learner Y2_pystacked added successfully.
We do the same for the conditional expectation E[D|X].
. ddml E[D|X]: reg $D $X
Learner D1_reg added successfully.
. ddml E[D|X]: pystacked $D $X, type(reg) method(rf)
Learner D2_pystacked added successfully.
Optionally, you can check if the learners have been added correctly.
. ddml desc
Model: partial, crossfit folds k=2, resamples r=1
Dependent variable (Y): net_tfa
net_tfa learners: Y1_reg Y2_pystacked
D equations (1): e401
e401 learners: D1_reg D2_pystacked
Step 3: Cross-fitting #
The learners are iteratively fitted on the training data. This step may take a while.
. ddml crossfit
Cross-fitting E[Y|X] equation: net_tfa
Cross-fitting fold 1 2 ...completed cross-fitting
Cross-fitting E[D|X] equation: e401
Cross-fitting fold 1 2 ...completed cross-fitting
Step 4: Estimation #
Finally, we obtain estimates of the coefficients of interest. Since we added two learners for each of our two reduced form equations, there are four possible specifications. By default, the result shown corresponds to the specification with the lowest out-of-sample MSPE:
. ddml estimate, robust
DDML estimation results:
spec r Y learner D learner b SE
1 1 Y1_reg D1_reg 5397.308(1130.901)
2 1 Y1_reg D2_pystacked 6707.514 (880.374)
* 3 1 Y2_pystacked D1_reg 7044.822(1127.173)
4 1 Y2_pystacked D2_pystacked 6991.835 (755.805)
* = minimum MSE specification for that resample.
Min MSE DDML model, specification 3
y-E[y|X] = Y2_pystacked_1 Number of obs = 9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
| Robust
net_tfa | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
e401 | 7044.822 1127.173 6.25 0.000 4835.603 9254.042
------------------------------------------------------------------------------
To estimate all four specifications, we use the allcombos
option:
. ddml estimate, robust allcombos
DDML estimation results:
spec r Y learner D learner b SE
1 1 Y1_reg D1_reg 5397.208(1130.776)
2 1 Y1_reg D2_pystacked 6705.740 (878.656)
* 3 1 Y2_pystacked D1_reg 7044.518(1126.896)
4 1 Y2_pystacked D2_pystacked 6979.699 (753.471)
* = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X] = Y2_pystacked_1 Number of obs = 9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
| Robust
net_tfa | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
e401 | 7044.518 1126.896 6.25 0.000 4835.843 9253.193
_cons | -317.8379 352.8666 -0.90 0.368 -1009.444 373.768
------------------------------------------------------------------------------
After having estimated all specifications, we can retrieve specific results. Here we use the specification relying on OLS for both estimating both E[Y|X] and E[D|X]:
. ddml estimate, robust spec(1) replay
DDML estimation results:
spec r Y learner D learner b SE
opt 1 Y2_pystacked D1_reg 7044.518(1126.896)
opt = minimum MSE specification for that resample.
DDML model, specification 1
y-E[y|X] = Y1_reg_1 Number of obs = 9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
| Robust
net_tfa | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
e401 | 5397.208 1130.776 4.77 0.000 3180.928 7613.488
_cons | -104.854 397.9023 -0.26 0.792 -884.728 675.0201
------------------------------------------------------------------------------
Inclusion of the constant
Since the residualized outcome and treatment may not be exactly mean-zero in finite samples,ddml
includes the constant by default in the estimation stage of partially linear models. Asymptotically, the intercept is not required. Earlier versions ofddml
(before 1.2) did not include the constant.
You could manually retrieve the same point estimate by typing:
. reg Y1_reg D1_reg, robust
Linear regression Number of obs = 9,915
F(1, 9914) = 22.78
Prob > F = 0.0000
R-squared = 0.0037
Root MSE = 39626
------------------------------------------------------------------------------
| Robust
Y1_reg_1 | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
D1_reg_1 | 5397.308 1130.901 4.77 0.000 3180.512 7614.105
------------------------------------------------------------------------------
or graphically:
. twoway (scatter Y1_reg D1_reg) (lfit Y1_reg D1_reg)
where Y1_reg
and D1_reg
are the orthogonalized versions of net_tfa
and e401
.
To describe the ddml model setup or results in detail, you can use ddml describe with the relevant option (sample, learners, crossfit, estimates), or just describe them all with the all option:
. ddml describe, all