PLM & Stacking

Partially linear model with Stacking #

Stacking regression is a simple and powerful method for combining predictions from multiple learners. It is available in Stata via the pystacked package (see here). Below is an example with the partially linear model, but it can be used with any model supported by ddml.

Step 1: Initialization #

Preparation: use the data and globals as above. Use the name m1 for this new estimation, to distinguish it from the previous example that uses the default name m0. This enables having multiple estimations available for comparison. Also specify 5 cross-fitting repetitions.

. set seed 42
. ddml init partial, kfolds(2) reps(5) mname(m1)

Cross-fitting repetitions
The results of DDML depends on the exact cross-fit fold split. We recommend re-running the (final) model multiple times on different random folds; see options reps(integer).

Step 2: Add learners #

Add supervised machine learners for estimating conditional expectations. The first learner in the stacked ensemble is OLS. We also use cross-validated lasso, ridge and two random forests with different settings, which we save in the following macros:

. global rflow max_features(5) min_samples_leaf(1) max_samples(.7)
. global rfhigh max_features(5) min_samples_leaf(10) max_samples(.7)

In each step, we add the mname(m1) option to ensure that the learners are not added to the m0 model which is still in memory. We also specify the names of the variables containing the estimated conditional expectations using the learner(varname) option. This avoids overwriting the variables created for the m0 model using default naming.

. ddml E[Y|X], mname(m1) learner(Y_m1): pystacked $Y $X            || ///
>                                method(ols)                       || ///
>                                method(lassocv)                   || ///
>                                method(ridgecv)                   || ///
>                                method(rf) opt($rflow)            || ///
>                                method(rf) opt($rfhigh), type(reg)
Learner Y_m1 added successfully.

. ddml E[D|X], mname(m1) learner(D_m1): pystacked $D $X            || ///
>                                method(ols)                       || ///
>                                method(lassocv)                   || ///
>                                method(ridgecv)                   || ///
>                                method(rf) opt($rflow)            || ///
>                                method(rf) opt($rfhigh), type(reg)
Learner D_m1 added successfully.

Options
Note: Options before “:” and after the first comma refer to ddml. Options that come after the final comma refer to the estimation command. Make sure to not confuse the two types of options.

Check if learners were correctly added (output omitted):

. ddml desc, mname(m1) learners

Step 3/4: Cross-fitting and estimation #

. qui ddml crossfit, mname(m1)

. ddml estimate, mname(m1) robust

DDML estimation results:
spec  r     Y learner     D learner         b        SE
 opt  1          Y_m1          D_m1  7362.283 (937.426)
 opt  2          Y_m1          D_m1  6958.283 (899.946)
 opt  3          Y_m1          D_m1  6531.201 (872.895)
 opt  4          Y_m1          D_m1  6532.662 (952.414)
 opt  5          Y_m1          D_m1  6672.368 (981.239)
opt = minimum MSE specification for that resample.

Mean/med.   Y learner     D learner         b        SE
 mse mn     [min-mse]         [mse]  6811.360 (973.863)
 mse md     [min-mse]         [mse]  6672.368 (962.606)

Median over min-mse specifications
y-E[y|X]  = Y_m1                                   Number of obs   =      9915
D-E[D|X,Z]= D_m1
------------------------------------------------------------------------------
             |               Robust
     net_tfa | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        e401 |   6672.368   962.6062     6.93   0.000     4785.695    8559.042
------------------------------------------------------------------------------

Summary over 5 resamples:
       D eqn      mean       min       p25       p50       p75       max
        e401   6811.3596 6531.2007 6532.6626 6672.3682 6958.2832 7362.2832

Examine the learner weights used by pystacked (not shown):

. ddml extract, mname(m1) show(pystacked)