Parallelization

Parallelization #

pystacked can be run in parallel, even without a StataMP license.

pystacked can be parallelized at the level of the base learners or at the stacking level (to speed up the cross-validation process). Example 1 below uses no parallelization (the default). Example 2 parallelizes the random forest base learner. Example 3 parallelizes at the top level.

. insheet using ///
>     https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data, ///
>     clear comma

. set seed 42
. gen uni=runiform()
. sort uni

. timer on 1
. pystacked v58 v1-v57, type(class) methods(rf gradboost nnet) ///
>             cmdopt1(n_estimators(1000))

Stacking weights:
---------------------------------------
  Method         |      Weight
-----------------+---------------------
  rf             |      0.3698014
  gradboost      |      0.5437376
  nnet           |      0.0864610

. timer off 1

. timer on 2
. pystacked v58 v1-v57, type(class) methods(rf gradboost nnet) ///
>             cmdopt1(n_jobs(-1) n_estimators(1000))

Stacking weights:
---------------------------------------
  Method         |      Weight
-----------------+---------------------
  rf             |      0.3293277
  gradboost      |      0.5661072
  nnet           |      0.1045651

. timer off 2

. timer on 3
. pystacked v58 v1-v57, type(class) methods(rf gradboost nnet) ///
>             cmdopt1(n_estimators(1000)) njobs(-1)

Stacking weights:
---------------------------------------
  Method         |      Weight
-----------------+---------------------
  rf             |      0.3514905
  gradboost      |      0.5024690
  nnet           |      0.1460405

. timer off 3

. timer list
   1:    196.95 /        1 =     196.9510
   2:     30.01 /        1 =      30.0140
   3:     83.05 /        1 =      83.0450

Which method is faster depends on the choice and number of base learners and number of folds. In this example, parallelizing the random forest is the fastest approach since we fit many trees independently.

n_jobs(-1) uses all available cores. If you don’t want to use all CPUs, you can use, for example, n_jobs(4) to ask for 4 CPUs; see also the scikit-learn documentation. n_jobs(-2) asks for all cores minus 1.

You can change the backend used for parallelization using backend(); the default is ’loky’ under Linux/MacOS and ’threading’ under Windows. See here for more information.