Partial Linear Model (PLM)

## Partially Linear Model #

### Preparations #

We load the data, define global macros and set the seed.

. use https://github.com/aahrens1/ddml/raw/master/data/sipp1991.dta, clear
. global Y net_tfa
. global D e401
. global X tw age inc fsize educ db marr twoearn pira hown
. set seed 42


### Step 1: Initialize DDML model #

We next initialize the ddml estimation and select the model. partial refers to the partially linear model. The model will be stored on a Mata object with the default name “m0” unless otherwise specified using the mname(name) option.

Number of folds
Note that we set the number of random folds to 2, so that the model runs quickly. The default is kfolds(5). We recommend to consider at least 5-10 folds and even more if your sample size is small.
. ddml init partial, kfolds(2)


### Step 2: Add machine learners #

We add a supervised machine learners for estimating the conditional expectation $$E[Y|X]$$ . We first add simple linear regression.

. ddml E[Y|X]: reg $Y$X


We can add more than one learner per reduced form equation. Here, we also add a random forest learner implemented in pystacked. (In the next example we show how to use pystacked to stack multiple learners, but here we use it to implement a single learner.)

. ddml E[Y|X]: pystacked $Y$X, type(reg) method(rf)


We do the same for the conditional expectation E[D|X].

. ddml E[D|X]: reg $D$X

. ddml E[D|X]: pystacked $D$X, type(reg) method(rf)


Optionally, you can check if the learners have been added correctly.

. ddml desc

Model:                  partial, crossfit folds k=2, resamples r=1
Dependent variable (Y): net_tfa
net_tfa learners:      Y1_reg Y2_pystacked
D equations (1):        e401
e401 learners:         D1_reg D2_pystacked


### Step 3: Cross-fitting #

The learners are iteratively fitted on the training data. This step may take a while.

. ddml crossfit
Cross-fitting E[Y|X] equation: net_tfa
Cross-fitting fold 1 2 ...completed cross-fitting
Cross-fitting E[D|X] equation: e401
Cross-fitting fold 1 2 ...completed cross-fitting


### Step 4: Estimation #

Finally, we obtain estimates of the coefficients of interest. Since we added two learners for each of our two reduced form equations, there are four possible specifications. By default, the result shown corresponds to the specification with the lowest out-of-sample MSPE:

. ddml estimate, robust

DDML estimation results:
spec  r     Y learner     D learner         b        SE
1  1        Y1_reg        D1_reg  5397.308(1130.901)
2  1        Y1_reg  D2_pystacked  6707.514 (880.374)
*  3  1  Y2_pystacked        D1_reg  7044.822(1127.173)
4  1  Y2_pystacked  D2_pystacked  6991.835 (755.805)
* = minimum MSE specification for that resample.

Min MSE DDML model, specification 3
y-E[y|X]  = Y2_pystacked_1                         Number of obs   =      9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
|               Robust
net_tfa | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
e401 |   7044.822   1127.173     6.25   0.000     4835.603    9254.042
------------------------------------------------------------------------------


To estimate all four specifications, we use the allcombos option:

. ddml estimate, robust allcombos

DDML estimation results:
spec  r     Y learner     D learner         b        SE
1  1        Y1_reg        D1_reg  5397.208(1130.776)
2  1        Y1_reg  D2_pystacked  6705.740 (878.656)
*  3  1  Y2_pystacked        D1_reg  7044.518(1126.896)
4  1  Y2_pystacked  D2_pystacked  6979.699 (753.471)
* = minimum MSE specification for that resample.

Min MSE DDML model
y-E[y|X]  = Y2_pystacked_1                         Number of obs   =      9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
|               Robust
net_tfa | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
e401 |   7044.518   1126.896     6.25   0.000     4835.843    9253.193
_cons |  -317.8379   352.8666    -0.90   0.368    -1009.444     373.768
------------------------------------------------------------------------------


After having estimated all specifications, we can retrieve specific results. Here we use the specification relying on OLS for both estimating both E[Y|X] and E[D|X]:

. ddml estimate, robust spec(1) replay

DDML estimation results:
spec  r     Y learner     D learner         b        SE
opt  1  Y2_pystacked        D1_reg  7044.518(1126.896)
opt = minimum MSE specification for that resample.

DDML model, specification 1
y-E[y|X]  = Y1_reg_1                               Number of obs   =      9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
|               Robust
net_tfa | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
e401 |   5397.208   1130.776     4.77   0.000     3180.928    7613.488
_cons |   -104.854   397.9023    -0.26   0.792     -884.728    675.0201
------------------------------------------------------------------------------

Inclusion of the constant
Since the residualized outcome and treatment may not be exactly mean-zero in finite samples, ddml includes the constant by default in the estimation stage of partially linear models. Asymptotically, the intercept is not required. Earlier versions of ddml (before 1.2) did not include the constant.

You could manually retrieve the same point estimate by typing:

. reg Y1_reg D1_reg, robust

Linear regression                               Number of obs     =      9,915
F(1, 9914)        =      22.78
Prob > F          =     0.0000
R-squared         =     0.0037
Root MSE          =      39626

------------------------------------------------------------------------------
|               Robust
Y1_reg_1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
D1_reg_1 |   5397.308   1130.901     4.77   0.000     3180.512    7614.105
------------------------------------------------------------------------------


or graphically:

. twoway (scatter Y1_reg D1_reg) (lfit Y1_reg D1_reg)


where Y1_reg and D1_reg are the orthogonalized versions of net_tfa and e401.

To describe the ddml model setup or results in detail, you can use ddml describe with the relevant option (sample, learners, crossfit, estimates), or just describe them all with the all option:

. ddml describe, all