Rigorous lasso

rlasso provides routines for estimating the coefficients of a lasso or square-root lasso regression with data-dependent, theory-driven penalization. The number of regressors, , may be large and possibly greater than the number of observations, . rlasso implements a version of the lasso that allows for heteroskedastic and clustered errors; see Belloni et al. (2012, 2016).

We start again with the prostate cancer data for demonstration.

. clear
. insheet using
    https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data, tab

Homoskedastic lasso

The optimal penalization depends on whether the errors are homoskedastic, heteroskedastic or cluster-dependent.

Similar to regress, rlasso assumes homoskedasticity by default. Under homoskedasticity, the optimal penalty level is given by

which guarantees that the “rigorous” lasso is well-behaved. The unobserved is estimated using an iterative algorithm.

To run the lasso with theory-driven penalization, type:

. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45

---------------------------------------------------
         Selected |           Lasso   Post-est OLS
------------------+--------------------------------
           lcavol |       0.4400059      0.5258519
          lweight |       0.2385063      0.6617699
              svi |       0.3024128      0.6656665
            _cons |*      0.9533782     -0.7771568
---------------------------------------------------
*Not penalized

e(lambda) returns , and e(lambda0) stores , i.e., the penalty level excluding the standard deviation of the error.

. di e(lambda)
44.984163

. di e(lambda0)
64.923165

Heteroskedastic lasso

To allow for heteroskedasticity, we specify the robust option.

. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, robust

---------------------------------------------------
         Selected |           Lasso   Post-est OLS
------------------+--------------------------------
           lcavol |       0.4518205      0.5258519
          lweight |       0.2047086      0.6617699
              svi |       0.1995573      0.6656665
            _cons |*      1.0823460     -0.7771568
---------------------------------------------------
*Not penalized

The names of selected predictors are stored in e(selected) (without constant) and e(selected0) (with constant):

. di e(selected0)
lcavol lweight svi _cons

. di e(selected)
lcavol lweight svi

Square-root lasso

With the sqrt-lasso of Belloni et al. (2011, 2014), the default penalty level is

.

Note the difference by a factor of 2 compared to the standard lasso. More importantly, the optimal penalty level of the square-root lasso is independent of , leading to a practical advantage.

The square-root lasso is available through the sqrt option.

. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt

---------------------------------------------------
	 Selected |      Sqrt-lasso   Post-est OLS
------------------+--------------------------------
	   lcavol |       0.4293894      0.5258519
	  lweight |       0.1861616      0.6617699
              svi |       0.2574895      0.6656665
            _cons |*      1.1673922     -0.7771568
---------------------------------------------------
*Not penalized

In this example, lasso and square-root lasso select the same variables. Thus the post-estimation OLS estimator, which is OLS using the variables selected, is the same in both cases.

The estimated penalty level is:

. di e(lambda)
32.461583

The square-root lasso also allows for heteroskedastic errors:

. rlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, sqrt robust

---------------------------------------------------
	 Selected |      Sqrt-lasso   Post-est OLS
------------------+--------------------------------
	   lcavol |       0.4402037      0.5258519
	  lweight |       0.1329878      0.6617699
              svi |       0.1264166      0.6656665
            _cons |*      1.3741342     -0.7771568
---------------------------------------------------
*Not penalized

Cluster-dependent errors

Both rigorous lasso and rigorous square-root lasso allow for within-panel correlation (based on Belloni et al., 2016, JBES). The fe option applies the within-transformation and cluster() specifies the cluster variable.

NB: The two regressions below take a few minutes to run, and you might need to increase the maximum matsize using set matsize.

In this example, we interact the variable grade and age using Stata’s factor variable notation (see help factor variables).

. webuse nlswork
. xtset idcode
. rlasso ln_w i.grade#i.age ttl_exp tenure not_smsa south, ///
	      fe cluster(idcode)

---------------------------------------------------
         Selected |           Lasso   Post-est OLS
------------------+--------------------------------
        grade#age |
           12 18  |      -0.1226071     -0.2087164
           12 19  |      -0.0481608     -0.1109979
           12 20  |      -0.0088640     -0.0627530
                  |
          ttl_exp |       0.0206773      0.0226526
           tenure |       0.0107726      0.0123681
         not_smsa |      -0.0305386     -0.0957148
---------------------------------------------------

The results of cluster lasso and cluster square-root lasso are again similar:

. rlasso ln_w i.grade#i.age ttl_exp tenure not_smsa south, ///
              sqrt fe cluster(idcode)

         Selected |      Sqrt-lasso   Post-est OLS
------------------+--------------------------------
        grade#age |
           12 18  |      -0.1223057     -0.2087164
           12 19  |      -0.0479408     -0.1109979
           12 20  |      -0.0086753     -0.0627530
                  |
          ttl_exp |       0.0206704      0.0226526
           tenure |       0.0107671      0.0123681
         not_smsa |      -0.0303104     -0.0957148
---------------------------------------------------

More

More information can be found in the help file:

help rlasso