# Help file: lassologit

-------------------------------------------------------------------------------------------------------------help lassologitlassologitpackage v0.1help rlassologitfirst releasehelp cvlassologit-------------------------------------------------------------------------------------------------------------Titlelassologit-- Main program for regularized logistic regressioncvlassologit-- Program forK-fold cross-validation with logistic regressionrlassologit-- Program for regularized logistic regression with rigorous penalizationSyntaxFull syntaxlassologitdepvarregressors[ifexp] [inrange] [,postlogitnoconstantlambda(numlist)lcount(integer)lminratio(real)lmax(real)lambdanlic(string)ebicxi(real)postresultsnotpen(varlist)spsi(matrix)nostdstdcoefholdout(varname)lossmeasure(string)tolopt(real)tolzero(real)maxiter(int)quadprecnoseqruleplotpath(method)plotvar(varlist)plotopt(string)plotlabellongverboseic(string)noprogressbar]cvlassologitdepvarregressors[ifexp] [inrange] [,postlogitnoconstantlambda(numlist)lcount(integer)lminratio(real)lmax(real)lambdanloptlsepostresultsnotpen(varlist)spsi(matrix)nostdtolopt(real)tolzero(real)maxiter(int)quadprecnoseqrulenfolds(integer)foldvar(varname)savefoldvar(new varname)seed(integer)stratifiedstoreest(string)lossmeasure(string)plotcvplotopt(string)longverbosetabfold]rlassologitdepvarregressors[ifexp] [inrange] [,postlogitnoconstantgamma(real)c(real)holdout(varname)lossmeasure(string)tolopt(real)tolzero(real)maxiter(int)quadprecnoseqruleverbose]OptionsEstimatorsDescription -------------------------------------------------------------------------------------------------------postlogituse post-estimation logit.lassologit: If lambda is a list, post-estimation OLS results are displayed and returned ine(betas). If lambda is a scalar (orrlassologitis used), post-estimation OLS is always displayed, and this option controls whether standard or post-estimation OLS results are stored ine(b).cvlassologit: post-estimation logit is used for cross-validation.noconstantsuppress constant from estimation (not recommended). -------------------------------------------------------------------------------------------------------Lambda(s)Description -------------------------------------------------------------------------------------------------------lambda(numlist)a scalar lambda value or list of descending lambda values. Each lambda value must be greater than 0. If not specified, the default list is used which is given byexp(rangen(log(lmax),log(lminratio*lmax),lcount))(seemf_range).lcount(integer)† number of lambda values for which the solution is obtained. Default is 50.lminratio(real)† ratio of minimum to maximum lambda.lminratiomust be between 0 and 1. Default is 1/1000.lmax(real)† maximum lambda value.lambdanuseslambda:=lambda/Nin the objective function. This makeslambdacomparable withglmnet(Friedman, Hastie & Tibshirani,2010).lic(string)lassologit: after firstlassologitestimation using list of lambdas, estimate model corresponding to minimum information criterion. 'aic', 'bic', 'aicc', and 'ebic' (the default) are allowed. Note the lower case spelling. See Information criteria for the definition of each information criterion.ebicxi(real)lassologit: controls thexiparameter of the EBIC.xineeds to lie in the [0,1] interval.xi=0 is equivalent to the BIC. The default choice isxi=1-log(n)/(2*log(p)).loptcvlassologit: after cross-validation, estimate model with lambda that minimized the mean-squared prediction errorlsecvlassologit: after cross-validation, estimate model with largest lambda that is within one standard deviation from loptpostresultsUsed in combination withlic(),lseorlopt. Stores estimation results of the model selected by information criterion ine(). ------------------------------------------------------------------------------------------------------- The above options are only applicable forlassologitandcvlassologit. † Not applicable iflambda(numlist)is specified.Rigorous lambdaDescription -------------------------------------------------------------------------------------------------------gamma(real)specifies the significance levelgammafor the rigorous lambda. The default is 0.05/max((p*log(n),n)).c(real)specified slack parametercfor the rigorous lambda (default = 1.1) ------------------------------------------------------------------------------------------------------- The above options are only applicable forrlassologit.Loadings & standardizationDescription -------------------------------------------------------------------------------------------------------notpen(varlist)sets penalty loadings to zero for predictors invarlist. Unpenalized predictors are always included in the model.spsi(matrix)a row-vector of penalty loadings (in standard units); overrides the default which is a vector of ones. The size of the vector should equal the number of predictors (excluding partialled out variables and excluding the constant).nostddo not standardize the predictors. Default is to standardize predictors to have unit variance.stdcoefreturn coefficient estimates in standardized units. Default is to return coefficients in original units. -------------------------------------------------------------------------------------------------------OptimizationDescription -------------------------------------------------------------------------------------------------------tolopt(real)tolerance for lasso shooting algorithm (default=1e-10)tolzero(real)minimum below which coeffs are rounded down to zero (default=1e-4)maxiter(int)maximum number of iterations for the lasso shooting algorithm (default=10,000)quadprecusemf_quadcrossinstead ofmf_crossin the shooting algorithm. This will slow down the program (considerably) but lead to (in our experience minor) gains in precision. This will also disable the sequential strong rule, see next.noseqruledisables use of sequential strong rule, which discards some predictors before running the shooting algorithm (see Section 5 in Tibshirani et al.,2012). The sequential rule leads to speed gains. NB: sequential rule is automatically disabled if intercept is omitted. -------------------------------------------------------------------------------------------------------Cross-validationDescription -------------------------------------------------------------------------------------------------------nfolds(integer)the number of folds used forK-fold cross-validation. Default is 5.foldvar(varname)user-specified variable with fold IDs, ranging from 1 to #folds. If not specified, fold IDs are randomly generated such that each fold is of approximately equal size.savefoldvar(varname)saves the fold ID variable.seed(real)set seed for the generation of a random fold variable. Only relevant if fold variable is randomly generated.stratifiedobservations are divided into folds such that number of successes / failures is approximately the same across folds. Recommended especially if share of successes is close to 0 or 1.storeest(string)saveslassologitresults from each step of the cross-validation instring1, ...,stringKwhereKis the number of folds. Intermediate results can be restored usingestimates restore.holdout(varname)defines a holdout sample.lassologitandrlassologitonly.varnameshould be a binary variable where 1 indicates that observations are excluded from the estimation. Estimated loss is returned ine(loss).lossmeasure(string)loss measure used for cross-validation or for the holdout sample. "deviance" and "class" (miss-classification error) are supported. Deviance is the default. ------------------------------------------------------------------------------------------------------- Only applicable forcvlassologit.PlottinglassologitDescription -------------------------------------------------------------------------------------------------------plotpath(method)plots the coefficients path as a function of the L1-norm (norm), lambda (lambda) or the log of lambda (lnlambda)plotvar(varlist)list of variables to be included in the plotplotopt(string)additional plotting options passed on toline. For example, useplotopt(legend(off))to turn off the legend.plotlabeldisplays variable labels in graph. -------------------------------------------------------------------------------------------------------Note:Plotting withlassologitis not available if lambda is a scalar value.PlottingcvlassologitDescription -------------------------------------------------------------------------------------------------------plotcvplots the coefficients path as a function of the L1-norm (norm), lambda (lambda) or the log of lambda (lnlambda)plotopt(string)additional plotting options passed on toline. For example, useplotopt(legend(off))to turn off the legend. -------------------------------------------------------------------------------------------------------Display optionsDescription -------------------------------------------------------------------------------------------------------long† show long output, applicable forlassologitandcvlassologit.verboseshow additional outputtabfoldcvlassologit: show frequency table of fold variableic(string)† controls which information criterion is shown in the output. 'aic', 'bic', 'aicc', and 'ebic' (the default' are allowed). Note the lower case spelling. See Information criteria for the definition of each information criterion.noprogressbarlassologit: do not show progressbar -------------------------------------------------------------------------------------------------------Replay syntaxlassologitandcvlassologitsupport replay syntay. The replay syntax can be used to retrieve estimation results for the models selected by information criteria (using thelic()) option or the model selected by cross-validation (usinglseorlopt).lassologit[,plotpath(method)plotvar(varlist)plotopt(string)plotlabellongpostresultslic(string)ic(string)]cvlassologit[,plotpath(method)plotvar(varlist)plotopt(string)plotlabellongpostresultslic(string)ic(string)]Predictionpredict[type]newvar[if] [in] [,xbprclasspostlogitlseloptlic(string)noisilyPredict optionsDescription -------------------------------------------------------------------------------------------------------xbcompute predicted values (the default)prpredicted probabilitiesclasspredicted class (either 1 or 0)postlogituse post-logit (default is to usee(b)lic(string)afterlassologit: selects which information criterion to use for prediction.loptaftercvlassologit: use lambda that minimizes the mean-squared prediction errorlseaftercvlassologit: use largest lambda that is within one standard deviation from loptnoisilyshow estimation output if re-estimation required -------------------------------------------------------------------------------------------------------NotesAll varlists may contain time-series operators or factor variables; see help varlist.Description Coordinate descent algorithm Penalization level Cross-validation Information criteria Rigorous penalization Technical notes Example using Spam data --Data set --Introduction --Information criteria --Cross-validation --Rigorous penalization --Prediction --Holdout option --Plotting with lassologit --Plotting with cvlassologit Saved results References Website Installation Acknowledgements Citation of lassologitContentsDescriptionlassologitimplements logistic lasso regression. The logistic lasso maximizes the penalized log likelihood: max 1/N sum_i { y(i) * log p(x(i)) + (1-y(i)) * log(1-p(x(i))) } - lambda * ||Psi*beta||[1], where y(i) is a binary response that is either 1 or 0, beta is ap-dimensional parameter vector, x(i) is ap-dimensional vector of predictors for observation i, p(x(i)) is the probability that y(i) takes the value 1 given x(i); p(x(i)) = exp(x(i)'beta) / (1 + exp(x(i)'beta)), lambda is the overall penalty level, ||.||[1] denotes the L(1) vector norm, Psi is apbypdiagonal matrix of predictor-specific penalty loadings. Note thatlassologittreats Psi as a row vector. N number of observationslassologituses coordinate descent algorithms for logistic lasso as described inFriedman 2010, Section 3.Penalized regression methods rely on tuning parameters that control the degree and type of penalization. Logistic lasso relies on the tuning parameterPenalization level: choice of lambdalambdawhich determines the level penalization. We offer three approaches for selecting the "optimal" lambda value implemented inlassologit,cvlassologitandrlassologit: (1) The penalty level may be chosen by cross-validation in order to optimize out-of-sample prediction performance.K-fold cross-validation is implemented incvlassologit. (2) Theoretically justified and feasible penalty levels and loadings are available for the logistic lasso viarlassologit. (3) Lambda can also be selected using information criteria.lassologitcalculates four information criteria: Akaike Information Criterion (AIC; Akaike,1974), Bayesian Information Criterion (BIC; Schwarz,1978), Extended Bayesian information criterion (EBIC; Chen & Chen,2008) and the corrected AIC (AICc; Sugiura,1978, and Hurvich,1989).K-fold cross-validationcvlassologitimplementsK-fold cross-validation. The purpose of cross-validation is to assess the out-of-sample prediction (classification) performance.Cross-validation procedureK-fold cross-validation divides the data randomly (or based on the user-specifiedfoldvar(varname)intoKfolds, i.e., data partitions of approximately equal size. In each step, one fold is left out of the estimation (training) sample and used for validation. The prediction (classification) performance is assessed based on loss measures.cvlassologitoffers two loss measures: deviance and miss-classification error (defined below). For more information, seecvlasso(for the linear case).Stratified cross-validationSimpleK-fold cross-validation might fail with randomly generated folds, or produce misleading results, if the share of successes (y=1) or failures (y=0) is low. Thestratifiedoption ensures that the number of success/failures is approximately the same across folds. Thetabfoldoption can be useful in this context; it askscvlassologitto show the frequency distribution of successes/failures across folds.Loss measuresThe prediction performance is assessed based on two loss measures: deviance and miss-classification. Deviance is the default and is defined as: Deviance = -2 * {y0 :* log(p0) :+ (1:-y0):*log(1:-p0)} where y0 is the response in the validation data and p0 are the predicted probabilities. The missclassification error is the average number of wrongly classified cases, and can be specified usinglossmeasure(class).The information criteria supported byInformation criterialassologitare the Akaike information criterion (AIC, Akaike,1974), the Bayesian information criterion (BIC, Schwarz,1978), the corrected AIC (Sugiura,1978; Hurvich,1989), and the Extended BIC (Chen & Chen,2008). These are given by (omitting dependence on lambda and alpha): AIC = -2*LL + 2*dfBIC = -2*LL +df*log(N) AICc = AIC + (2*df(df+1))/(N-df-1) EBIC = BIC + 2*xi*df*log(p) where LL is the log-likelihood anddf(lambda,alpha) is the effective degrees of freedom, which is a measure of model complexity.dfis approximated by the number of predictors selected. By default,lassologitdisplays EBIC in the output, but all four information criteria are stored ine(aic),e(bic),e(ebic)ande(aicc). See help file oflasso2for more information.The theory-driven ("rigorous") penalty level used byRigorous penalizationrlassologitis: lambda =c/2 sqrt(N) Phi^(-1)(1-gamma) wherecis a slack parameter (default = 1.1), Phi(.) is the standard normal CDF andgammais the significance level. The default forgammais 0.05/max((p*log(n),n)). The approach requires the predictors to be standardized such that mean(x(i)^2)=1. The penalty level is motivated by self-normalized moderate deviation theory, and is aimed at overruling the noise associated with the data-generating process. See Belloni, Chernozhukov & Wei (2016).Technical notesStandardizationlassologitcenters and standardizes the predictors before estimation. The coefficient estimates are returned in original scale. If thestdcoefoption is used, coefficients are returned in standardized units.nostdcan be used to estimate with predictors in original scale.ConstantThe constant is not penalized by default. Thus, the constant is always included in the model. To omit the constant, usenoconstant(not recommended).Example using Spam dataData setFor demonstration we consider the Spambase Data Set from the Machine Learning Repository. The data includes 4,601 observations and 57 variables. The aim is to predict whether an email is spam (i.e., unsolicited commercial e-mail) or not. Each observation corresponds to one email. Predictors v1-v48 percentage of words in the e-mail that match a specificword, i.e. 100 * (number of times the word appears in the e-mail) divided by total number of words in e-mail. To see which word each predictor corresponds to, see link below. v49-v54 percentage of characters in the e-mail that match a specificcharacter, i.e. 100 * (number of times the character appears in the e-mail) divided by total number of characters in e-mail. To see which character each predictor corresponds to, see link below. v55 average length of uninterrupted sequences of capital letters v56 length of longest uninterrupted sequence of capital letters v57 total number of capital letters in the e-mail Outcome v58 denotes whether the e-mail was considered spam (1) or not (0). For more information about the data see https://archive.ics.uci.edu/ml/datasets/spambase. Load spam data. . insheet using https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data, clear commaIntroduction toThe basic syntax forlassologitlassologitis to specify the dependent variable followed by a list of predictors: . lassologit v58 v1-v57 The output oflassologitshows the penalty levels (lambda), the number of predictors included (s), the L1-Norm, one information criterion (EBIC by default), McFadden's Pseudo-R-squared and which predictors are included/removed from the model. By default, one line per knot is shown. Knots are points at which predictors enter or leave the model. By specifyinglong, an extended output with one row for each lambda is shown. . lassologit, long To obtain the logistic lasso estimate for a scalar lambda or a list of lambdas, thelambda(numlist)option can be used. For example: . lassologit v58 v1-v57, lambda(40 20) . ereturn list And for one lambda: . lassologit v58 v1-v57, lambda(40) . ereturn list Note that output and the objects stored ine()depend on whether lambda is only one value or a list of more than one value.Information criteriaTo estimate the model selected by one of the information criteria, use thelic()option: . lassologit v58 v1-v57 . lassologit, lic(ebic) . lassologit, lic(aicc) In the above example, we use the replay syntax that works similar to a post-estimation command. The same can also be achieved in one line: . lassologit v58 v1-v57, lic(ebic) Whenlic()is used,lassologitreports the logistic lasso estimates and the post-logit estimates (from applying logit estimation to the model selected by the logitistic lasso) for the value of lambda selected by the specified information criterion. Note thatlic()does not change the estimation results in memory. The advantage is that this waylic()can be used multiple times to compare results without that we need to re-estimate the model. To store the model selected by one of the information criteria, usepostresults: . lassologit, lic(ebic) postresultsCross-validation withcvlassologitcvlassologitimplementsK-fold cross-validation where the data is by default randomly partitioned. Here, we useK=3 andseed(123)to set the seed for reproducibility. (Be patient, this takes a minute.) . cvlassologit v58 v1-v57, nfolds(3) seed(123) The output shows the prediction performance measured by deviance for each lambda value. To estimate the model selected by cross-validation we can specifyloptorlseusing the replay syntax. . cvlassologit, lopt . cvlassologit, lse The data is by default randomly partitioned intoKfolds. Thetabfoldoption askslassologitto show the frequency distribution of successes (1) and failures (0) across folds. . cvlassologit v58 v1-v57, nfolds(3) seed(123) tabfold In small samples, we might end up with a low number of success or failures in some folds. Thestratifiedoption can help with this: it ensures that the number of successes (1) and failures (0) is approximately the same across folds: . cvlassologit v58 v1-v57, nfolds(3) seed(123) tabfold stratified As withlassologit, we can use thelongoption for an extended outout. . cvlassologit, longRigorous penalization withLastly, we consider the logistic lasso with rigorous penalization: . rlassologit v58 v1-v57rlassologitrlassologitdisplays the logistic lasso solution and the post-logit solution. The rigorous lambda is returned ine(lambda)and is equal to 79.207801. . di e(lambda) We get the same result when specifying the rigorous lambda manually using thelambda()option oflassologit: . lassologit v58 v1-v57, lambda(79.207801)PredictionAfter selecting a model, we can usepredictto obtain predicted probabilities or linear predictions. First, we select a model usinglic()in combination withpostresultsas above: . lassologit v58 v1-v57 . lassologit, lic(ebic) postresults Then, we usepredict: . predict double phat, pr . predict double xbhat, xbprsaves the predicted probability of success andxbsaves the linear predicted values. Note that the use ofpostresultsis required. Withoutpostresultsthe results of the estimation with the selected penalty level are not stored. The approach forcvlassologitis very similar: . cvlassologit v58 v1-v57 . cvlassologit, lopt postresults . predict double phat, pr In the case ofrlassologit, we don't need to select a specific penalty level and we also don't need to specifypostresults. . rlassologit v58 v1-v57 . predict double phat, prAssessing prediction accuracy withWe can leave one partition of the data out of the estimation sample and check the accuracy of prediction using theholdout()holdout(varname)option. We first define a binary holdout variable: . gen myholdout = (_n>4500) There are 4,601 observations in the sample, and we exclude observations 4,501 to 4,601 from the estimation. The holdout variable should be set to 1 for all observations that we want to use for assessing classification accuracy. . lassologit v58 v1-v57, holdout(myholdout) . mat list e(loss) . rlassologit v58 v1-v57, holdout(myholdout) . mat list e(loss) The loss measure is returned ine(loss). As with cross-validation, deviance is used by default.lossmeasure(class)will return the average number of miss-classifications.Plotting withlassologitlassologitsupports plotting of the coefficient path over lambda. Here, we create the plot using the replay syntax, but the same can be achieved in one line: . lassologit v58 v1-v57 . lassologit, plotpath(lambda) plotvar(v1-v5) plotlabel plotopt(legend(off)) In the above example, we use the following settings:plotpath(lambda)plots estimates against lambda.plotvar(v1-v5)restricts the set of variables plotted tov1-v5(to avoid that the graph is too cluttered).plotlabelputs variable labels next to the lines.plotopt(legend(off))turns the legend off.Plotting withThecvlassologitplotcvoption creates a graph of the estimates loss a function of lambda: . cvlassologit v58 v1-v57, nfolds(3) seed(123) . cvlassologit v58 v1-v57, plotcv The vertical solid red line indicates the value of lambda that minimizes the loss function. The dashed red line corresponds to the largest lambda for which MSPE is within one standard error of the minimum loss.Saved resultslassologit with single lambda and rlassologitscalarse(N)sample sizee(cons)=1 if constant is present, 0 otherwisee(p)number of predictors excluding intercepte(std)=1 if predictors are standardizede(lcount)number of lambda valuese(ll0)log-likelihood of null modele(total_success)number of successese(total_trials)number of trialse(N_holdout)observations in holdout samplee(lmax)largest lambda valuee(lmin)smallest lambda valuee(lambda)penalty levele(ll)log-likelihoode(shat)number of selected regressorse(shat0)number of selected and unpenalized regressors including constant (if present)e(tss)total sum of squarese(aic)minimum AICe(bic)minimum BICe(aicc)minimum AICce(ebic)minimum EBIC macrose(cmd)command namee(depvar)name of dependent variablee(varX)all predictorse(varXmodel)penalized predictorse(selected)selected predictorse(selected0)selected predictors including constant matricese(b)posted coefficient vector. By default used for prediction.e(beta_post)post-logit coefficient vectore(beta_dense)logistic lasso coefficient vector without zerose(beta_post_dense)post-logit coefficient vector without zerose(beta_std)logitistic lasso coefficient vector in standard unitse(beta_std_post)post-logit coefficient vector in standard unitse(beta)logistic lasso coefficient vectore(sdvec)vector of standard deviations of the predictorse(sPsi)penalty loadings in standard unitse(Psi)=e(sPsi):*e(sdvec)e(loss)estimated loss ifholdout()is usedlassologit with multiple lambdasscalarse(N)sample sizee(cons)=1 if constant is present, 0 otherwisee(p)number of predictors excluding intercepte(std)=1 if predictors are standardizede(lcount)number of lambda valuese(ll0)log-likelihood of null modele(total_success)number of successese(total_trials)number of trialse(N_holdout)observations in holdout samplee(aicmin)minimum AICe(bicmin)minimum BICe(aiccmin)minimum AICce(ebicmin)minimum EBICe(aicid)lambda ID of minimum AICe(bicid)lambda ID of minimum BICe(aiccid)lambda ID of minimum AICce(ebicid)lambda ID of minimum EBICe(aiclambda)lambda corresponding to minimum AICe(biclambda)lambda corresponding to minimum BICe(aicclambda)lambda corresponding to minimum AICce(ebiclambda)lambda corresponding to minimum EBICe(loss)estimated loss ifholdout()is used macrose(cmd)command namee(depvar)name of dependent variablee(varX)all predictorse(varXmodel)penalized predictors matricese(betas)posted coefficient matrixe(betas_std)posted coefficient matrix in standard unitse(lambdas)vector of lambdase(aic)vector of AIC valuese(aicc)vector of AICc valuese(bic)vector of BIC valuese(ebic)vector of EBIC valuese(ll)vector of log-likelihood valuese(l1norm)vector of L1-norme(shat)number of included predictorse(shat0)number of included predictors including intercepte(sdvec)vector of standard deviations of the predictorse(sPsi)penalty loadings in standard unitse(Psi)=e(sPsi):*e(sdvec)cvlassologitscalarse(N)number of observationse(lunique)luniquee(lambdan)=1 iflambdanoption is usede(mlossmin)number of observationse(lmin)smallest lambda used for CVe(lmax)maximum lambda used for CVe(lse)number of observationse(lopt)number of observationse(lseid)lambda ID corresponding toe(lse)e(loptid)lambda ID corresponding toe(lopt)e(nfolds)number of folds macrose(cmd)command namee(depvar)name of dependent variablee(varX)all predictorse(lossmeasure)loss measure (devianceorclass) matricese(lambdas)vector of lambda values used for cross-validatione(mloss)mean cross-validated losse(loss)cross-validated loss for each fold; a matrix of sizenfoldsxlcounte(cvsd)estimate of standard error of mean cross-validated losse(cvlower)=e(mloss)-e(cvsd)e(cvupper)=e(mloss)+e(cvsd)Estimation sample (always returned)functionse(sample)estimation sampleAkaike, H. (1974). A new look at the statistical model identification.ReferencesIEEE Transactions on AutomaticControl, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705 Belloni, A., Chernozhukov, V., & Wei, Y. (2016). Post-Selection Inference for Generalized Linear Models With Many Controls.Journal of Business & Economic Statistics, 34(4), 606–619. https://doi.org/10.1080/07350015.2016.1166116 Belloni, A., Chernozhukov, V., Fernández-Val, I., & Hansen, C. (2017). Program Evaluation and Causal Inference With High-Dimensional Data.Econometrica, 85(1), 233–298. https://doi.org/10.3982/ECTA12723 Fu, W. J. (1998). Penalized Regressions: The Bridge Versus the Lasso.Journal of Computational andGraphical Statistics7(3), 397–416. https://doi.org/10.2307/1390712 Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization.TheAnnals of Applied Statistics1(2), 302–332. https://doi.org/10.1214/07-AOAS131 Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent.Journal of Statistical Software33(1), 1–22. https://doi.org/10.18637/jss.v033.i01 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). New York: Springer-Verlag. https://web.stanford.edu/~hastie/ElemStatLearn/ Hurvich, C. M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples.Biometrika, 76(2), 297–307. http://doi.org/10.1093/biomet/76.2.297 Schwarz, G. (1978). Estimating the Dimension of a Model.The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136 Sugiura, N. (1978). Further analysts of the data by akaike’ s information criterion and the finite corrections.Communications in Statistics - Theory and Methods, 7(1), 13–26. http://doi.org/10.1080/03610927808827599 Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso.Journal of the RoyalStatistical Society. Series B (Methodological)58(1), 267–288. https://doi.org/10.2307/2346178 Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., & Tibshirani, R. J. (2012). Strong rules for discarding predictors in lasso-type problems.Journal of the Royal StatisticalSociety. Series B (Statistical Methodology), 74(2), 245–266. http://www.jstor.org/stable/41430939 Van der Kooij A (2007). Prediction Accuracy and Stability of Regrsssion with Optimal Scaling Transformations. Ph.D. thesis, Department of Data Theory, University of Leiden. http://hdl.handle.net/1887/12096Please check our website https://statalasso.github.io/ for more information.WebsiteTo get the latest stable version ofInstallationlassologitfrom our website, check the installation instructions at https://statalasso.github.io/installation/. We update the stable website version more frequently than the SSC version. To verify thatlassologitis correctly installed, click on or type whichpkg lassologit (which requireswhichpkgto be installed; ssc install whichpkg).Citation of lassologitlassologitis not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such: Ahrens, A., Hansen, C.B., Schaffer, M.E. 2019. lassologit: Stata module for logistic lasso regression. http://ideas.repec.org/c/boc/bocode/XXXXX.htmlAchim Ahrens, Economic and Social Research Institute, Ireland achim.ahrens@esri.ie Christian B. Hansen, University of Chicago, USA Christian.Hansen@chicagobooth.edu Mark E Schaffer, Heriot-Watt University, UK m.e.schaffer@hw.ac.ukAuthorsHelp:Also seelasso2,cvlasso,rlasso,ivlasso,pdslasso(if installed).