Regression

Stacking regression #

First load the Boston housing data and split the data randomly in training and test sample:

. insheet using ///
     https://statalasso.github.io/dta/housing.csv,  ///
     clear comma
. set seed 789
. gen train=runiform()
. replace train=train>.75

We now consider a more complicated pystacked application with 5 base learners: linear regression, two versions of lasso with AIC-chosen penalty, random forest and gradient boosting:

. pystacked medv crim-lstat if train,                              ///
      type(regress)                                                ///
      methods(ols lassoic lassoic rf gradboost)                    ///
      pipe1(poly2) pipe3(poly2) cmdopt5(learning_rate(0.01)        ///
      n_estimators(1000))

Stacking weights:
---------------------------------------
  Method         |      Weight
-----------------+---------------------
  ols            |      0.0139419
  lassoic        |      0.0000000
  lassoic        |      0.3959649
  rf             |      0.5900932
  gradboost      |      0.0000000

In this example, we use the lasso twice—once with and once without the poly2 pipeline. Indeed, nothing keeps us from using base learners multiple times. This way we can compare and combine different sets of options.

Note the numbering of the pipe*() and cmdopt*() options: We apply the poly2 pipe to the first and third method (ols and lassoic). We also change the default learning rate and number of estimators for gradient boosting (the 5th estimator).

The weights determine how much each method contributes to the final stacking contribution. lassoic without poly2 receives a weight of 0, while lassoic with poly2 gets a positive weight.

You can verify that options are being passed on to scikit-learn correctly using, e.g., di e(pyopt1) after estimation.

Alternative Syntax #

The above syntax becomes, admittedly, a bit difficult to read, especially with many methods and many options. We offer an alternative syntax for easier use with many base learners:

. pystacked medv crim-lstat                                                 || ///
                   m(ols) pipe(poly2)                                       || ///
                   m(lassoic)                                               || ///
                   m(lassoic) pipe(poly2)                                   || ///
                   m(rf)                                                    || ///
                   m(gradboost) opt(learning_rate(0.01) n_estimators(1000)) | ///
                   if train , type(regress)

Stacking weights:
---------------------------------------
  Method         |      Weight
-----------------+---------------------
  ols            |      0.0000000
  lassoic        |      0.0000000
  lassoic        |      0.3696124
  rf             |      0.6303876
  gradboost      |      0.0000000

Changing the final estimator #

The default final learner of pystacked is non-negative least squares (NNLS) where the are constrained to be non-negative and to sum to 1.

Here, we switch to the “singlebest” approach which selects the base learner with the lowest RMSPE.

. pystacked medv crim-lstat                                                 || ///
                   m(ols) pipe(poly2)                                       || ///
                   m(lassoic)                                               || ///
                   m(lassoic) pipe(poly2)                                   || ///
                   m(rf)                                                    || ///
                   m(gradboost) opt(learning_rate(0.01) n_estimators(1000)) ///
                   if train, type(regress) finalest(singlebest)

Stacking weights:
---------------------------------------
  Method         |      Weight
-----------------+---------------------
  ols            |      0.0000000
  lassoic        |      0.0000000
  lassoic        |      0.0000000
  rf             |      0.0000000
  gradboost      |      1.0000000

Predictions #

In addition to the stacking predicted values, we can also get the predicted values of each base learner using the basexb option (formerly called transform):

. predict double yh, xb
. predict double ybase, basexb
. list yh ybase* if _n <= 5

     +-----------------------------------------------------------------------+
     |        yh      ybase1      ybase2      ybase3      ybase4      ybase5 |
     |-----------------------------------------------------------------------|
  1. | 24.726908   22.301384    30.27872   25.451438      25.721   24.726908 |
  2. | 21.993344   22.596584   25.133314   23.908178      21.356   21.993344 |
  3. | 32.628795   29.987554   31.027162   33.898635   32.968001   32.628795 |
  4. | 34.027947   30.977113   29.428614    32.00817   33.098001   34.027947 |
  5. | 35.215694   34.034215    28.85696   32.311613   35.109001   35.215694 |
     +-----------------------------------------------------------------------+

Notice that the stacking predicted values are equal to the gradient boosting predicted values, since boosting was identified as the best learner.

Plotting #

pystacked also comes with plotting features. The graph option creates a scatter plot of predicted vs observed values for stacking and each base learner. There is no need to re-run the stacking estimation. You can also use pystacked with graph as a post-estimation command:

. pystacked, graph holdout

Here, we show the out-of-sample predicted values. To see the in-sample predicted values, simply omit the holdout option. Note that the holdout option won’t work if the estimation was run on the whole sample.

Root mean squared prediction error (RMSPE) #

The table option allows to compare stacking weights with three types of RMSPE: in-sample RMSPE of learners fit on the whole training sample, RMSPE of cross-validated predictions, and out-of-sample RMSPE. As with the graph option, we can use table as a post-estimation command:

. pystacked, table holdout
Number of holdout observations:  361

RMSPE: In-Sample, CV, Holdout
-----------------------------------------------------------------
  Method         | Weight   In-Sample        CV         Holdout
-----------------+-----------------------------------------------
  STACKING       |    .        0.637        4.547        3.664
  ols            | 0.000       1.308       18.660       11.529
  lassoic        | 0.000       4.501        5.179        5.218
  lassoic        | 0.000       1.798        4.913        4.964
  rf             | 0.000       1.484        4.599        3.932
  gradboost      | 1.000       0.637        4.547        3.664