# Help file: cvlasso

---------------------------------------------------------------------------------------------------------help cvlassolassopack v1.2 ---------------------------------------------------------------------------------------------------------Titlecvlasso-- Program for cross-validation using lasso, square-root lasso, elastic net, adaptive lasso and post-OLS estimatorsFull syntaxSyntaxcvlassodepvarregressors[ifexp] [inrange] [,alpha(numlist)alphacount(int)sqrtadaptiveadaloadings(string)adatheta(real)olslambda(real)lcount(integer)lminratio(real)lmax(real)loptlsenotpen(varlist)partial(varlist)ploadings(string)unitloadingsprestdfenoftoolsnoconstanttolopt(real)tolzero(real)maxiter(int)nfolds(int)foldvar(varname)savefoldvar(varname)rollingh(int)origin(int)fixedwindowseed(real)plotcvplotopt(string)saveest(string)] Note: thefeoption will take advantage of theftoolspackage (if installed) for the fixed-effects transform; the speed gains using this package can be large. See help ftools or click on ssc install ftools to install.EstimatorsDescription ---------------------------------------------------------------------------------------------------alpha(numlist)a scalar elastic net parameter or an ascending list of elastic net parameters. If the number of alpha values is larger than 1, cross-validation is conducted over alpha (and lambda). The default is alpha=1, which corresponds to the lasso estimator. The elastic net parameter controls the degree of L1-norm (lasso-type) to L2-norm (ridge-type) penalization. Each alpha value must be in the interval [0,1].alphacount(real)number of alpha values used for cross-validation across alpha. By default, cross-validation is only conducted across lambda, but not over alpha. Ignored ifalpha()is specified.sqrtsquare-root lasso estimator.adaptiveadaptive lasso estimator. The penalty loading for predictor j is set to 1/abs(beta0(j))^theta where beta0(j) is the OLS estimate or univariate OLS estimate if p>n. Theta is the adaptive exponent, and can be controlled using theadatheta(real)option.adaloadings(string)alternative initial estimates, beta0, used for calculating adaptive loadings. For example, this could be the vector e(b) from an initial lasso2 estimation. The elements of the vector are raised to the power -theta (note the minus). Seeadaptiveoption.adatheta(real)exponent for calculating adaptive penalty loadings. Seeadaptiveoption. Default=1.olspost-estimation OLS. Note that cross-validation using OLS will in most cases lead to no unique optimal lambda (since MSPE is a step function over lambda). --------------------------------------------------------------------------------------------------- See overview of estimation methods.Lambda(s)Description ---------------------------------------------------------------------------------------------------lambda(numlist)a scalar lambda value or list of descending lambda values. Each lambda value must be greater than 0. If not specified, the default list is used which is given byexp(rangen(log(lmax),log(lminratio*lmax),lcount))(seemf_range).lcount(integer)† number of lambda values for which the solution is obtained. Default is 100.lminratio(real)† ratio of minimum to maximum lambda.lminratiomust be between 0 and 1. Default is 1/1000.lmax(real)† maximum lambda value. Default is 2*max(X'y), and max(X'y) in the case of the square-root lasso (where X is the pre-standardized regressor matrix and y is the vector of the response variable).loptafter cross-validation, estimate model with lambda that minimized the mean-squared prediction errorlseafter cross-validation, estimate model with largest lambda that is within one standard deviation from lopt --------------------------------------------------------------------------------------------------- † Not applicable if lambda() is specified.Loadings & standardizationDescription ---------------------------------------------------------------------------------------------------notpen(varlist)sets penalty loadings to zero for predictors invarlist. Unpenalized predictors are always included in the model.partial(varlist)variables invarlistare partialled out prior to estimation.ploadings(matrix)a row-vector of penalty loadings; overrides the default standardization loadings (in the case of the lasso, =sqrt(avg(x^2))). The size of the vector should equal the number of predictors (excluding partialled out variables and excluding the constant).unitloadingspenalty loadings set to a vector of ones; overrides the default standardization loadings (in the case of the lasso, =sqrt(avg(x^2)).prestddependent variable and predictors are standardized prior to estimation rather than standardized "on the fly" using penalty loadings. See here for more details. By default the coefficient estimates are un-standardized (i.e., returned in original units). --------------------------------------------------------------------------------------------------- See discussion of standardization in the lasso2 help file. Also see Section Data transformations in cross-validation below.FE & constantDescription ---------------------------------------------------------------------------------------------------fewithin-transformation is applied prior to estimation. Requires data to be xtset.noftoolsdo not use FTOOLS package for fixed-effects transform (slower; rarely used)noconstantsuppress constant from estimation. Default behaviour is to partial the constant out (i.e., to center the regressors). ---------------------------------------------------------------------------------------------------OptimizationDescription ---------------------------------------------------------------------------------------------------tolopt(real)tolerance for lasso shooting algorithm (default=1e-10)tolzero(real)minimum below which coeffs are rounded down to zero (default=1e-4)maxiter(int)maximum number of iterations for the lasso shooting algorithm (default=10,000) ---------------------------------------------------------------------------------------------------Fold variable optionsDescription ---------------------------------------------------------------------------------------------------nfolds(integer)the number of folds used forK-fold cross-validation. Default is 10.foldvar(varname)user-specified variable with fold IDs, ranging from 1 to #folds. If not specified, fold IDs are randomly generated such that each fold is of approximately equal size.savefoldvar(varname)saves the fold ID variable. Not supported in combination withrolling.rollinguses rollingh-step ahead cross-validation. Requires the data to be tsset.h(integer)‡ changes the forecasting horizon. Default is 1.origin(integer)‡ controls the number of observations in the first training dataset.fixedwindow‡ ensures that the size of the training dataset is always the same.seed(real)set seed for the generation of a random fold variable. Only relevant if fold variable is randomly generated. --------------------------------------------------------------------------------------------------- ‡ Only applicable withrollingoption.Plotting optionsDescription ---------------------------------------------------------------------------------------------------plotcvplots the estimated mean-squared prediction error as a function of ln(lambda)plotopt(varlist)overwrites the default plotting options. All options are passed on toline. ---------------------------------------------------------------------------------------------------Display optionsDescription ---------------------------------------------------------------------------------------------------omitgridsuppresses the display of mean-squared prediction errors ---------------------------------------------------------------------------------------------------Store lasso2 resultsDescription ---------------------------------------------------------------------------------------------------saveest(string)saves lasso2 results from each step of the cross-validation instring1, ...,stringKwhereKis the number of folds. Intermediate results can be restored usingestimates restore. ---------------------------------------------------------------------------------------------------cvlassomay be used with time-series or panel data, in which case the data must be tsset or xtset first; see helptssetorxtset. All varlists may contain time-series operators or factor variables; see help varlist. Replay syntaxcvlasso[,loptlsepostresultsplotcv(method)plotopt(string)]Replay optionsDescription ---------------------------------------------------------------------------------------------------loptshow estimation results using the model corresponding to lambda=e(lopt)lseshow estimation results using the model corresponding to lambda=e(lse)postresultspost lasso2 estimation results (to be used in combination withlseorlopt)plotcv(method)see plotting options aboveplotopt(string)see plotting options above --------------------------------------------------------------------------------------------------- Postestimation:predict[type]newvar[if] [in] [,xbresidualsloptlsenoisily]Predict optionsDescription ---------------------------------------------------------------------------------------------------xbcompute predicted values (the default)residualscompute residualsloptuse lambda that minimized the mean-squared prediction errorlseuse the largest lambda that is within one standard deviation from loptnoisilyshow estimation output if re-estimation required. ---------------------------------------------------------------------------------------------------Description Partitioning of folds Data transformations in cross-validation Examples of usage --General demonstration --Rolling cross-validation with time-series data --Rolling cross-validation with panel data Saved results References Website Installation Acknowledgements Citation of lassopackContentsDescriptioncvlassoimplementsK-fold cross-validation andh-step ahead rolling cross-validation for the following estimators: lasso, square-root lasso, adaptive lasso, ridge regression, elastic net. See lasso2 for more information about these estimators. The purpose of cross-validation is to assess the out-of-sample prediction performance of the estimator. The steps forK-fold cross-validation over lambda can be summarized as follows: 1. Split the data intoKgroups, referred to as folds, of approximately equal size. Let n(k) denote the number of observations in thekth data partition withk=1,...,K. 2. The first fold is treated as the validation dataset and the remainingK-1 parts constitute the training dataset. The model is fit to the training data for a given value of lambda. The resulting estimate is denoted as betahat(1,lambda). The mean-squared prediction error for group 1 is computed as MSPE(1,lambda)=1/n(1)*sum([y(i) - x(i)'betahat(1,lambda)]^2) for all i in the first data partition. The procedure is repeated fork=2,...,K. Thus, MSPE(2,lambda), ..., MSPE(K,lambda) are calculated. 3. TheK-fold cross-validation estimate of the MSPE, which serves as a measure of prediction performance, is CV(lambda)=1/K*sum(MSPE(k,lambda)). 4. Step 2 and 3 are repeated for a range of lambda values.h-step ahead rolling cross-validation proceeds in a similar way, except that the partitioning of training and validation takes account of the time-series structure. Specifically, the training window is iteratively extended (or moved forward) by one step. See below for more details.Partitioning of foldscvlassosupportsK-fold cross-validation and cross-validation using rollingh-step ahead forecasts.K-fold cross-validation is the standard approach and relies on a fold ID variable. Rollingh-step ahead cross-validation is applicable with time-series data, or panels with large time dimension.K-fold cross-validationThe fold ID variable marks the observations which are used as validation data. For example, a fold ID variable (with three folds) could have the following structure: +------------------+ |fold y x| |------------------| |3 y1 x1| |2 y2 x2| |1 y3 x3| |3 y4 x4| |1 y5 x5| |2 y6 x6| +------------------+ It is instructive to illustrate the cross-validation process implied by the above fold ID variable. Let T denote a training observation and V denote a validation point. The division of folds can be summarized as follows: Step 1 2 3 +- -+ 1 | T T V | 2 | T V T | 3 | V T T | i 4 | T T V | 5 | V T T | 6 | T V T | +- -+ In the first step, the 3rd and 5th observation are in the validation dataset and remaining data constitute the training dataset. In the second step, the validation dataset includes the 2nd and 6th observation, etc. By default, the fold ID variable is randomly generated such that each fold is of approximately equal size. The default number of folds is equal to 10, but can be changed using thenfolds()option.Rolling h-step ahead cross-validationTo allow for time-series data,cvlassosupports cross-validation using rollingh-step forecasts (optionrolling); see Hyndman,2016. To use rolling cross-validation, the data must be tsset or xtset. The optionsh()andorigin()control the forecasting horizon and the starting point of the rolling forecast, respectively. The following matrix illustrates the division between training and validation data over the course of the cross-validation for the case of 1-step ahead forecasting (the default whenrollingis specified). Step 1 2 3 4 5 +- -+ 1 | T T T T T | 2 | T T T T T | 3 | T T T T T | t 4 | V T T T T | 5 | . V T T T | 6 | . . V T T | 7 | . . . V T | 8 | . . . . V | +- -+ In the first iteration (illustrated in the first column), the first three observations are in the training dataset, which corresponds toorigin(3). The optionh()controls the forecasting horizon used for cross-validation (the default is 1). Ifh(2)is specified, which corresponds to 2-step ahead forecasting, the structure changes to: Step 1 2 3 4 5 +- -+ 1 | T T T T T | 2 | T T T T T | 3 | T T T T T | 4 | . T T T T | t 5 | V . T T T | 6 | . V . T T | 7 | . . V . T | 8 | . . . V . | 9 | . . . . V | +- -+ Thefixedwindowoption ensures that the size of the training dataset is always the same. In this example (usingh(1)), each step uses three data points for training: Step 1 2 3 4 5 +- -+ 1 | T . . . . | 2 | T T . . . | 3 | T T T . . | t 4 | V T T T . | 5 | . V T T T | 6 | . . V T T | 7 | . . . V T | 8 | . . . . V | +- -+An important principle in cross-validation is that the training dataset should not contain information from the validation dataset. This mimics the real-world situation where out-of-sample predictions are made not knowing what the true response is. The principle applies not only to individual observations (the training and validation data do not overlap) but also to data transformations. Specifically, data transformations applied to the training data should not use information from the validation data or full dataset. In particular, standardization using the full sample violates this principle.Data transformations in cross-validationcvlassoimplements this principle for all data transformations supported by lasso2: data standardization, fixed effects and partialling-out. In most applications using the estimators supported bycvlasso, predictors are standardized to have mean zero and unit variance. The above principle means that the standardization applied to the training data is based only on observations in the training data; further, the standardization transformation applied to the validation data will also be based only on the means and variances of the observations in the training data. The same applies to the fixed effects transformation: the group means used to implement the within transformation to both the training data and the validation data are calculated using only the training data. Similarly, the projection coefficients used to "partial out" variables are estimated using only the training data and are applied to both the training dataset and the validation dataset.General introduction using K-fold cross-validationDatasetThe dataset is available through Hastie et al. (2015) on the authors' website. The following variables are included in the dataset of 97 men: Predictors lcavol log(cancer volume) lweight log(prostate weight) age patient age lbph log(benign prostatic hyperplasia amount) svi seminal vesicle invasion lcp log(capsular penetration) gleason Gleason score pgg45 percentage Gleason scores 4 or 5 Outcome lpsa log(prostate specific antigen) Load prostate cancer data. . insheet using https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data, clear tabGeneral demonstration10-fold cross-validation across lambda. The lambda value that minimizes the mean-squared prediction error is indicated by an asterisk (*). A hat (^) marks the largest lambda at which the MSPE is within one standard error of the minimal MSPE. The former is returned ine(lopt), the latter ine(lse). We useseed(123)throughout this demonstration for replicability of folds. . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, seed(123) . di e(lopt) . di e(lse)Estimate the full modelEstimate the the full model with either e(lopt) or e(lse).cvlassointernally calls lasso2 with lambda=lopt or lse, respectively. . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, lopt seed(123) . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, lse seed(123) The same as above can be achieved using the replay syntax. . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, seed(123) . cvlasso, lopt . cvlasso, lse Ifpostresultsis specified,cvlassoposts the lasso2 estimation results. . cvlasso, lopt postres . ereturn listCross-validation over lambda and alphaalpha()can be a scalar or list of elastic net parameters. Each alpha value must lie in the interval [0,1]. Ifalpha()is a list longer than 1,cvlassocross-validates over lambda and alpha. The table at the end of the output indicates the alpha value that minimizes the empirical MSPE. . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, alpha(0 0.1 0.5 1) lc(10) seed(123) Alternatively, thealphacount()option can be used to control the number of alpha values used for cross-validation. . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, alphac(3) lc(10) seed(123)PlottingWe can plot the estimated mean-squared prediction error over lambda. Note that the plotting feature is not supported if we cross-validate over alpha. . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, seed(123) plotcvPredictionThepredictpostestimation command allows to obtain predicted values and residuals for lambda=e(lopt) or lambda=e(lse). . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, seed(123) . cap drop xbhat1 . predict double xbhat1, lopt . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, seed(123) . cap drop xbhat2 . predict double xbhat2, lseStore intermediate stepscvlassocalls internally lasso2. To see intermediate estimation results, we can use thesaveest(string)option. . cvlasso lpsa lcavol lweight age lbph svi lcp gleason pgg45, seed(123) nfolds(3) saveest(step) . estimates dir . estimates restore step1 . estimates replay step1Load airline passenger data. . webuse air2, clear There are 144 observations in the sample.Time-series example using rolling h-step ahead cross-validationorigin()controls the sample range used for training and validation. In this example,origin(130)implies that data up to and includingt=130 are used for training in the first iteration. Data pointst=131 to 144 are successively used for validation. The notation `a-b(v)' indicates that dataatobare used for estimation (training), and data pointvis used for forecasting (validation). Note that the training dataset starts with t=13 since 12 lags are used as predictors. . cvlasso air L(1/12).air, rolling origin(130) The optimal model includes lags 1, 11 and 12. . cvlasso, lopt The optionh()controls the forecasting horizon (default=1). . cvlasso air L(1/12).air, rolling origin(130) h(2) In the above examples, the size of the training dataset increases by one data point each step. To keep the size of the training dataset fixed, specifyfixedwindow. . cvlasso air L(1/12).air, rolling origin(130) fixedwindow Cross-validation over alpha with alpha={0, 0.1, 0.5, 1}. . cvlasso air L(1/12).air, rolling origin(130) alpha(0 0.1 0.5 1) Plot mean-squared prediction errors against ln(lambda). . cvlasso air L(1/12).air, rolling origin(130) . cvlasso, plotcvRolling cross-validation can also be applied to panel data. For demonstration, load Grunfeld data. . webuse grunfeld, clear ApplyPanel data example using rolling h-step ahead cross-validation1-step ahead cross-validation. . cvlasso mvalue L(1/10).mvalue, rolling origin(1950) The model selected by cross-validation: . cvlasso, lopt Same as above with fixed size of training data. . cvlasso mvalue L(1/10).mvalue, rolling origin(1950) fixedwindowSaved resultscvlassosaves the following ine(): scalarse(N)sample sizee(nfolds)number of foldse(lmax)largest lambdae(lmin)smallest lambdae(lcount)number of lambdase(sqrt)=1 if sqrt-lasso, 0 otherwisee(adaptive)=1 if adaptive loadings are used, 0 otherwisee(ols)=1 if post-estimation OLS, 0 otherwisee(partial_ct)number of partialled out predictorse(notpen_ct)number of not penalized predictorse(prestd)=1 if pre-standardized, 0 otherwisee(nalpha)number of alphase(h)forecasting horizon for rolling forecasts (only returned ifrollingis specified)e(origin)number of observations in first training dataset (only returned ifrollingis specified)e(lopt)optimal lambda (may be missing if no unique minimum MSPE)e(lse)lambda se (may be missing if no unique minimum MSPE)e(mspemin)minimum MSPE macrose(cmd)cvlassoe(method)indicates which estimator is used (e.g. lasso, elastic net)e(cvmethod)indicates whetherK-fold or rolling cross-validation is usede(varXmodel)predictors (excluding partialled-out variables)e(varX)predictorse(partial)partialled out predictorse(notpen)not penalized predictorse(depvar)dependent variable matricese(lambdamat)column vector of lambda values functionse(sample)estimation sampleIn addition, ifcvlassocross-validates over alpha and lambda:scalarse(alphamin)optimal alpha, i.e., the alpha that minimizes the empirical MSPE macrose(alphalist)list of alpha values matricese(mspeminmat)minimum MSPE for each alphaIn addition, ifcvlassocross-validates over lambda only:scalarse(alpha)elastic net parameter matricese(mspe)matrix of MSPEs for each fold and lambda where each column corresponds to one lambda value and each row corresponds to one fold.e(mmspe)column vector of MSPEs for each lambdae(cvsd)column vector standard deviation of MSPE (for each lambda)e(cvupper)column vector equal to MSPE + 1 standard deviatione(cvlower)column vector equal to MSPE - 1 standard deviationCorreia, S. 2016. FTOOLS: Stata module to provide alternatives to common Stata commands optimized for large datasets. https://ideas.repec.org/c/boc/bocode/s458213.html Hyndman, Rob J. (2016). Cross-validation for time series.ReferencesHyndsight blog, 5 December 2016. https://robjhyndman.com/hyndsight/tscv/ See lasso2 for further references.Please check our website https://statalasso.github.io/ for more information.WebsiteTo get the latest stable version ofInstallationlassopackfrom our website, check the installation instructions at https://statalasso.github.io/installation/. We update the stable website version more frequently than the SSC version. To verify thatlassopackis correctly installed, click on or type whichpkg lassopack (which requireswhichpkgto be installed; ssc install whichpkg).Thanks to Sergio Correia for advice on the use of the FTOOLS package.AcknowledgementsCitation of cvlassocvlassois not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such: Ahrens, A., Hansen, C.B., Schaffer, M.E. 2018. cvlasso: Program for cross-validation using lasso, square-root lasso, elastic net, adaptive lasso and post-OLS estimators. http://ideas.repec.org/c/boc/bocode/s458458.htmlAchim Ahrens, Economic and Social Research Institute, Ireland achim.ahrens@esri.ie Christian B. Hansen, University of Chicago, USA Christian.Hansen@chicagobooth.edu Mark E Schaffer, Heriot-Watt University, UK m.e.schaffer@hw.ac.ukAuthorsHelp:Also seelasso2,rlasso(if installed)