Stacking regression #
First load the Boston housing data and split the data randomly in training and test sample:
. insheet using ///
https://statalasso.github.io/dta/housing.csv, ///
clear comma
. set seed 789
. gen train=runiform()
. replace train=train>.75
We now consider a more complicated pystacked
application with
5 base learners: linear regression, two versions of lasso with AIC-chosen
penalty, random forest and gradient boosting:
. pystacked medv crim-lstat if train, ///
type(regress) ///
methods(ols lassoic lassoic rf gradboost) ///
pipe1(poly2) pipe3(poly2) cmdopt5(learning_rate(0.01) ///
n_estimators(1000))
Stacking weights:
---------------------------------------
Method | Weight
-----------------+---------------------
ols | 0.0139419
lassoic | 0.0000000
lassoic | 0.3959649
rf | 0.5900932
gradboost | 0.0000000
In this example, we use the lasso twice—once with and once without the poly2
pipeline. Indeed, nothing keeps us from using base learners multiple times. This way we can compare and combine different sets of options.
Note the numbering of the pipe*()
and cmdopt*()
options: We apply the poly2
pipe to the first and third method (ols and lassoic). We also change the default learning rate and number of estimators for gradient boosting (the 5th estimator).
The weights determine how much each method contributes to the final stacking contribution. lassoic without poly2
receives a weight of 0, while lassoic with poly2
gets a positive weight.
You can verify that options are being passed on to scikit-learn correctly using, e.g.,
di e(pyopt1)
after estimation.
Alternative Syntax #
The above syntax becomes, admittedly, a bit difficult to read, especially with many methods and many options. We offer an alternative syntax for easier use with many base learners:
. pystacked medv crim-lstat || ///
m(ols) pipe(poly2) || ///
m(lassoic) || ///
m(lassoic) pipe(poly2) || ///
m(rf) || ///
m(gradboost) opt(learning_rate(0.01) n_estimators(1000)) | ///
if train , type(regress)
Stacking weights:
---------------------------------------
Method | Weight
-----------------+---------------------
ols | 0.0000000
lassoic | 0.0000000
lassoic | 0.3696124
rf | 0.6303876
gradboost | 0.0000000
Changing the final estimator #
The default final learner of pystacked
is non-negative least squares (NNLS) where the are constrained to be non-negative and
to sum to 1.
Here, we switch to the “singlebest” approach which selects the base learner with the lowest RMSPE.
. pystacked medv crim-lstat || ///
m(ols) pipe(poly2) || ///
m(lassoic) || ///
m(lassoic) pipe(poly2) || ///
m(rf) || ///
m(gradboost) opt(learning_rate(0.01) n_estimators(1000)) ///
if train, type(regress) finalest(singlebest)
Stacking weights:
---------------------------------------
Method | Weight
-----------------+---------------------
ols | 0.0000000
lassoic | 0.0000000
lassoic | 0.0000000
rf | 0.0000000
gradboost | 1.0000000
Predictions #
In addition to the stacking predicted values, we can also get the predicted values of each base learner using the basexb
option (formerly called transform
):
. predict double yh, xb
. predict double ybase, basexb
. list yh ybase* if _n <= 5
+-----------------------------------------------------------------------+
| yh ybase1 ybase2 ybase3 ybase4 ybase5 |
|-----------------------------------------------------------------------|
1. | 24.726908 22.301384 30.27872 25.451438 25.721 24.726908 |
2. | 21.993344 22.596584 25.133314 23.908178 21.356 21.993344 |
3. | 32.628795 29.987554 31.027162 33.898635 32.968001 32.628795 |
4. | 34.027947 30.977113 29.428614 32.00817 33.098001 34.027947 |
5. | 35.215694 34.034215 28.85696 32.311613 35.109001 35.215694 |
+-----------------------------------------------------------------------+
Notice that the stacking predicted values are equal to the gradient boosting predicted values, since boosting was identified as the best learner.
Plotting #
pystacked
also comes with plotting features. The graph
option creates a scatter plot of predicted vs observed values for stacking and each base learner. There is no need to re-run the stacking estimation. You can also use pystacked
with graph
as a post-estimation command:
. pystacked, graph holdout
Here, we show the out-of-sample predicted values. To see the in-sample predicted values, simply omit the holdout
option. Note that the holdout
option won’t work if the estimation was run on the whole sample.
Root mean squared prediction error (RMSPE) #
The table
option allows to compare stacking weights with three types of RMSPE: in-sample RMSPE of learners fit on the whole training sample, RMSPE of cross-validated predictions, and out-of-sample RMSPE. As with the graph
option, we can use table
as a post-estimation command:
. pystacked, table holdout
Number of holdout observations: 361
RMSPE: In-Sample, CV, Holdout
-----------------------------------------------------------------
Method | Weight In-Sample CV Holdout
-----------------+-----------------------------------------------
STACKING | . 0.637 4.547 3.664
ols | 0.000 1.308 18.660 11.529
lassoic | 0.000 4.501 5.179 5.218
lassoic | 0.000 1.798 4.913 4.964
rf | 0.000 1.484 4.599 3.932
gradboost | 1.000 0.637 4.547 3.664