Parallelization #
pystacked
can be run in parallel, even without a StataMP license.
pystacked
can be parallelized at the level of the base learners or at the stacking level (to speed up the cross-validation process). Example 1 below uses no parallelization (the default). Example 2 parallelizes the random forest base learner. Example 3 parallelizes at the top level.
. insheet using ///
> https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data, ///
> clear comma
. set seed 42
. gen uni=runiform()
. sort uni
. timer on 1
. pystacked v58 v1-v57, type(class) methods(rf gradboost nnet) ///
> cmdopt1(n_estimators(1000))
Stacking weights:
---------------------------------------
Method | Weight
-----------------+---------------------
rf | 0.3698014
gradboost | 0.5437376
nnet | 0.0864610
. timer off 1
. timer on 2
. pystacked v58 v1-v57, type(class) methods(rf gradboost nnet) ///
> cmdopt1(n_jobs(-1) n_estimators(1000))
Stacking weights:
---------------------------------------
Method | Weight
-----------------+---------------------
rf | 0.3293277
gradboost | 0.5661072
nnet | 0.1045651
. timer off 2
. timer on 3
. pystacked v58 v1-v57, type(class) methods(rf gradboost nnet) ///
> cmdopt1(n_estimators(1000)) njobs(-1)
Stacking weights:
---------------------------------------
Method | Weight
-----------------+---------------------
rf | 0.3514905
gradboost | 0.5024690
nnet | 0.1460405
. timer off 3
. timer list
1: 196.95 / 1 = 196.9510
2: 30.01 / 1 = 30.0140
3: 83.05 / 1 = 83.0450
Which method is faster depends on the choice and number of base learners and number of folds. In this example, parallelizing the random forest is the fastest approach since we fit many trees independently.
n_jobs(-1)
uses all available cores. If you don’t want to use all CPUs, you can use, for example, n_jobs(4)
to ask for 4 CPUs; see also the scikit-learn documentation. n_jobs(-2)
asks for all cores minus 1.
You can change the backend used for parallelization using backend()
; the default is ’loky’ under Linux/MacOS and ’threading’ under Windows. See here for more information.