Help file: lassologit

-------------------------------------------------------------------------------------------------------------
help lassologit                                                                       lassologit package v0.1
help rlassologit                                                                                first release
help cvlassologit
-------------------------------------------------------------------------------------------------------------

Title

    lassologit -- Main program for regularized logistic regression

    cvlassologit -- Program for K-fold cross-validation with logistic regression

    rlassologit -- Program for regularized logistic regression with rigorous penalization


Syntax

    Full syntax

        lassologit depvar regressors [if exp] [in range] [, postlogit noconstant lambda(numlist)
              lcount(integer) lminratio(real) lmax(real) lambdan lic(string) ebicxi(real) postresults
              notpen(varlist) spsi(matrix) nostd stdcoef holdout(varname) lossmeasure(string) tolopt(real)
              tolzero(real) maxiter(int) quadprec noseqrule plotpath(method) plotvar(varlist)
              plotopt(string) plotlabel long verbose ic(string) noprogressbar]

        cvlassologit depvar regressors [if exp] [in range] [, postlogit noconstant lambda(numlist)
              lcount(integer) lminratio(real) lmax(real) lambdan lopt lse postresults notpen(varlist)
              spsi(matrix) nostd tolopt(real) tolzero(real) maxiter(int) quadprec noseqrule nfolds(integer)
              foldvar(varname) savefoldvar(new varname) seed(integer) stratified storeest(string)
              lossmeasure(string) plotcv plotopt(string) long verbose tabfold]

        rlassologit depvar regressors [if exp] [in range] [, postlogit noconstant gamma(real) c(real)
              holdout(varname) lossmeasure(string) tolopt(real) tolzero(real) maxiter(int) quadprec
              noseqrule verbose]


    Options

    Estimators            Description
    -------------------------------------------------------------------------------------------------------
    postlogit              use post-estimation logit.  lassologit: If lambda is a list, post-estimation OLS
                            results are displayed and returned in e(betas).  If lambda is a scalar (or
                            rlassologit is used), post-estimation OLS is always displayed, and this option
                            controls whether standard or post-estimation OLS results are stored in e(b).
                            cvlassologit: post-estimation logit is used for cross-validation.
    noconstant             suppress constant from estimation (not recommended).
    -------------------------------------------------------------------------------------------------------

    Lambda(s)             Description
    -------------------------------------------------------------------------------------------------------
    lambda(numlist)        a scalar lambda value or list of descending lambda values. Each lambda value
                            must be greater than 0.  If not specified, the default list is used which is
                            given by exp(rangen(log(lmax),log(lminratio*lmax),lcount)) (see mf_range).
    lcount(integer)†       number of lambda values for which the solution is obtained. Default is 50.
    lminratio(real)†       ratio of minimum to maximum lambda. lminratio must be between 0 and 1. Default
                            is 1/1000.
    lmax(real)†            maximum lambda value.
    lambdan                uses lambda:=lambda/N in the objective function.  This makes lambda comparable
                            with glmnet (Friedman, Hastie & Tibshirani, 2010).
    lic(string)            lassologit: after first lassologit estimation using list of lambdas, estimate
                            model corresponding to minimum information criterion.  'aic', 'bic', 'aicc',
                            and 'ebic' (the default) are allowed.  Note the lower case spelling.  See
                            Information criteria for the definition of each information criterion.
    ebicxi(real)           lassologit: controls the xi parameter of the EBIC.  xi needs to lie in the [0,1]
                            interval.  xi=0 is equivalent to the BIC.  The default choice is
                            xi=1-log(n)/(2*log(p)).
    lopt                   cvlassologit: after cross-validation, estimate model with lambda that minimized
                            the mean-squared prediction error
    lse                    cvlassologit: after cross-validation, estimate model with largest lambda that is
                            within one standard deviation from lopt
    postresults            Used in combination with lic(), lse or lopt.  Stores estimation results of the
                            model selected by information criterion in e().
    -------------------------------------------------------------------------------------------------------
    The above options are only applicable for lassologit and cvlassologit.  † Not applicable if
    lambda(numlist) is specified.

    Rigorous lambda       Description
    -------------------------------------------------------------------------------------------------------
    gamma(real)            specifies the significance level gamma for the rigorous lambda. The default is
                            0.05/max((p*log(n),n)).
    c(real)                specified slack parameter c for the rigorous lambda (default = 1.1)
    -------------------------------------------------------------------------------------------------------
    The above options are only applicable for rlassologit.

    Loadings & standardization Description
    -------------------------------------------------------------------------------------------------------
    notpen(varlist)        sets penalty loadings to zero for predictors in varlist.  Unpenalized predictors
                            are always included in the model.
    spsi(matrix)           a row-vector of penalty loadings (in standard units); overrides the default
                            which is a vector of ones.  The size of the vector should equal the number of
                            predictors (excluding partialled out variables and excluding the constant).
    nostd                  do not standardize the predictors. Default is to standardize predictors to have
                            unit variance.
    stdcoef                return coefficient estimates in standardized units.  Default is to return
                            coefficients in original units.
    -------------------------------------------------------------------------------------------------------

    Optimization          Description
    -------------------------------------------------------------------------------------------------------
    tolopt(real)           tolerance for lasso shooting algorithm (default=1e-10)
    tolzero(real)          minimum below which coeffs are rounded down to zero (default=1e-4)
    maxiter(int)           maximum number of iterations for the lasso shooting algorithm (default=10,000)
    quadprec               use mf_quadcross instead of mf_cross in the shooting algorithm. This will slow
                            down the program (considerably) but lead to (in our experience minor) gains in
                            precision.  This will also disable the sequential strong rule, see next.
    noseqrule              disables use of sequential strong rule, which discards some predictors before
                            running the shooting algorithm (see Section 5 in Tibshirani et al., 2012).  The
                            sequential rule leads to speed gains.  NB: sequential rule is automatically
                            disabled if intercept is omitted.
    -------------------------------------------------------------------------------------------------------

    Cross-validation      Description
    -------------------------------------------------------------------------------------------------------
    nfolds(integer)        the number of folds used for K-fold cross-validation. Default is 5.
    foldvar(varname)       user-specified variable with fold IDs, ranging from 1 to #folds.  If not
                            specified, fold IDs are randomly generated such that each fold is of
                            approximately equal size.
    savefoldvar(varname)   saves the fold ID variable.
    seed(real)             set seed for the generation of a random fold variable. Only relevant if fold
                            variable is randomly generated.
    stratified             observations are divided into folds such that number of successes / failures is
                            approximately the same across folds.  Recommended especially if share of
                            successes is close to 0 or 1.
    storeest(string)       saves lassologit results from each step of the cross-validation in string1, ...,
                            stringK where K is the number of folds.  Intermediate results can be restored
                            using estimates restore.
    holdout(varname)       defines a holdout sample. lassologit and rlassologit only.  varname should be a
                            binary variable where 1 indicates that observations are excluded from the
                            estimation.  Estimated loss is returned in e(loss).
    lossmeasure(string)    loss measure used for cross-validation or for the holdout sample.  "deviance"
                            and "class" (miss-classification error) are supported. Deviance is the default.
    -------------------------------------------------------------------------------------------------------
    Only applicable for cvlassologit.

    Plotting lassologit Description
    -------------------------------------------------------------------------------------------------------
    plotpath(method)       plots the coefficients path as a function of the L1-norm (norm), lambda (lambda)
                            or the log of lambda (lnlambda)
    plotvar(varlist)       list of variables to be included in the plot
    plotopt(string)        additional plotting options passed on to line.  For example, use
                            plotopt(legend(off)) to turn off the legend.
    plotlabel              displays variable labels in graph.
    -------------------------------------------------------------------------------------------------------
    Note: Plotting with lassologit is not available if lambda is a scalar value.

    Plotting cvlassologit Description
    -------------------------------------------------------------------------------------------------------
    plotcv                 plots the coefficients path as a function of the L1-norm (norm), lambda (lambda)
                            or the log of lambda (lnlambda)
    plotopt(string)        additional plotting options passed on to line.  For example, use
                            plotopt(legend(off)) to turn off the legend.
    -------------------------------------------------------------------------------------------------------

    Display options       Description
    -------------------------------------------------------------------------------------------------------
    long†                  show long output, applicable for lassologit and cvlassologit.
    verbose                show additional output
    tabfold                cvlassologit: show frequency table of fold variable
    ic(string)†            controls which information criterion is shown in the output.  'aic', 'bic',
                            'aicc', and 'ebic' (the default' are allowed).  Note the lower case spelling.
                            See Information criteria for the definition of each information criterion.
    noprogressbar          lassologit: do not show progressbar
    -------------------------------------------------------------------------------------------------------


    Replay syntax

    lassologit and cvlassologit support replay syntay.  The replay syntax can be used to retrieve
    estimation results for the models selected by information criteria (using the lic()) option or the
    model selected by cross-validation (using lse or lopt).

        lassologit [, plotpath(method) plotvar(varlist) plotopt(string) plotlabel long postresults
              lic(string) ic(string)]

        cvlassologit [, plotpath(method) plotvar(varlist) plotopt(string) plotlabel long postresults
              lic(string) ic(string)]


    Prediction

        predict [type] newvar [if] [in] [, xb pr class postlogit lse lopt lic(string) noisily

    Predict options       Description
    -------------------------------------------------------------------------------------------------------
    xb                     compute predicted values (the default)
    pr                     predicted probabilities
    class                  predicted class (either 1 or 0)
    postlogit              use post-logit (default is to use e(b)
    lic(string)            after lassologit: selects which information criterion to use for prediction.
    lopt                   after cvlassologit: use lambda that minimizes the mean-squared prediction error
    lse                    after cvlassologit: use largest lambda that is within one standard deviation
                            from lopt
    noisily                show estimation output if re-estimation required
    -------------------------------------------------------------------------------------------------------


    Notes

    All varlists may contain time-series operators or factor variables; see help varlist.


Contents

    Description
    Coordinate descent algorithm
    Penalization level
    Cross-validation
    Information criteria
    Rigorous penalization
    Technical notes
    Example using Spam data
    --Data set
    --Introduction
    --Information criteria
    --Cross-validation
    --Rigorous penalization
    --Prediction
    --Holdout option
    --Plotting with lassologit
    --Plotting with cvlassologit
    Saved results
    References
    Website
    Installation
    Acknowledgements
    Citation of lassologit


Description

    lassologit implements logistic lasso regression.  The logistic lasso maximizes the penalized log
    likelihood:

        max  1/N sum_i { y(i) * log p(x(i)) + (1-y(i)) * log(1-p(x(i))) }
                                        - lambda * ||Psi*beta||[1], 
        
    where

    y(i)       is a binary response that is either 1 or 0,
    beta       is a p-dimensional parameter vector,
    x(i)       is a p-dimensional vector of predictors for observation i,
    p(x(i))    is the probability that y(i) takes the value 1 given x(i); p(x(i)) = exp(x(i)'beta) / (1 +
                exp(x(i)'beta)),
    lambda     is the overall penalty level,
    ||.||[1]   denotes the L(1) vector norm,
    Psi        is a p by p diagonal matrix of predictor-specific penalty loadings. Note that lassologit
                treats Psi as a row vector.
    N          number of observations
        
    lassologit uses coordinate descent algorithms for logistic lasso as described in Friedman 2010, Section
    3.


Penalization level: choice of lambda

    Penalized regression methods rely on tuning parameters that control the degree and type of
    penalization.  Logistic lasso relies on the tuning parameter lambda which determines the level
    penalization.  We offer three approaches for selecting the "optimal" lambda value implemented in
    lassologit, cvlassologit and rlassologit:

    (1) The penalty level may be chosen by cross-validation in order to optimize out-of-sample prediction
    performance.  K-fold cross-validation is implemented in cvlassologit.

    (2) Theoretically justified and feasible penalty levels and loadings are available for the logistic
    lasso via rlassologit.

    (3) Lambda can also be selected using information criteria.  lassologit calculates four information
    criteria:  Akaike Information Criterion (AIC; Akaike, 1974), Bayesian Information Criterion (BIC;
    Schwarz, 1978), Extended Bayesian information criterion (EBIC; Chen & Chen, 2008) and the corrected AIC
    (AICc; Sugiura, 1978, and Hurvich, 1989).


K-fold cross-validation

    cvlassologit implements K-fold cross-validation.  The purpose of cross-validation is to assess the
    out-of-sample prediction (classification) performance.

    Cross-validation procedure

    K-fold cross-validation divides the data randomly (or based on the user-specified foldvar(varname) into
    K folds, i.e., data partitions of approximately equal size. In each step, one fold is left out of the
    estimation (training) sample and used for validation.  The prediction (classification) performance is
    assessed based on loss measures.  cvlassologit offers two loss measures:  deviance and
    miss-classification error (defined below).  For more information, see cvlasso (for the linear case).

    Stratified cross-validation

    Simple K-fold cross-validation might fail with randomly generated folds, or produce misleading results,
    if the share of successes (y=1) or failures (y=0) is low. The stratified option ensures that the number
    of success/failures is approximately the same across folds.  The tabfold option can be useful in this
    context; it asks cvlassologit to show the frequency distribution of successes/failures across folds.

    Loss measures

    The prediction performance is assessed based on two loss measures:  deviance and miss-classification.
    Deviance is the default and is defined as:

        Deviance = -2 * {y0 :* log(p0) :+ (1:-y0):*log(1:-p0)} 

    where y0 is the response in the validation data and p0 are the predicted probabilities.

    The missclassification error is the average number of wrongly classified cases, and can be specified
    using lossmeasure(class).


Information criteria
 
    The information criteria supported by lassologit are the Akaike information criterion (AIC, Akaike,
    1974), the Bayesian information criterion (BIC, Schwarz, 1978), the corrected AIC (Sugiura, 1978;
    Hurvich, 1989), and the Extended BIC (Chen & Chen, 2008).  These are given by (omitting dependence on
    lambda and alpha):

        AIC     = -2*LL + 2*df
        BIC     = -2*LL + df*log(N) 
        AICc    = AIC + (2*df(df+1))/(N-df-1)
        EBIC    = BIC + 2*xi*df*log(p)

    where LL is the log-likelihood and df(lambda,alpha) is the effective degrees of freedom, which is a
    measure of model complexity.  df is approximated by the number of predictors selected.

    By default, lassologit displays EBIC in the output, but all four information criteria are stored in
    e(aic), e(bic), e(ebic) and e(aicc).  See help file of lasso2 for more information.


Rigorous penalization

    The theory-driven ("rigorous") penalty level used by rlassologit is:

        lambda = c/2 sqrt(N) Phi^(-1)(1-gamma)

    where c is a slack parameter (default = 1.1), Phi(.) is the standard normal CDF and gamma is the
    significance level.  The default for gamma is 0.05/max((p*log(n),n)).  The approach requires the
    predictors to be standardized such that mean(x(i)^2)=1.  The penalty level is motivated by
    self-normalized moderate deviation theory, and is aimed at overruling the noise associated with the
    data-generating process.  See Belloni, Chernozhukov & Wei (2016).

Technical notes

    Standardization

    lassologit centers and standardizes the predictors before estimation.  The coefficient estimates are
    returned in original scale. If the stdcoef option is used, coefficients are returned in standardized
    units.  nostd can be used to estimate with predictors in original scale.

    Constant

    The constant is not penalized by default.  Thus, the constant is always included in the model.  To omit
    the constant, use noconstant (not recommended).


Example using Spam data

    Data set

    For demonstration we consider the Spambase Data Set from the Machine Learning Repository.  The data
    includes 4,601 observations and 57 variables.  The aim is to predict whether an email is spam (i.e.,
    unsolicited commercial e-mail) or not.  Each observation corresponds to one email.

    Predictors    
      v1-v48    percentage of words in the e-mail that match a specific word, i.e. 100 * (number of times
                  the word appears in the e-mail) divided by total number of words in e-mail.  To see which
                  word each predictor corresponds to, see link below.
      v49-v54   percentage of characters in the e-mail that match a specific character, i.e. 100 * (number
                  of times the character appears in the e-mail) divided by total number of characters in
                  e-mail.  To see which character each predictor corresponds to, see link below.
      v55       average length of uninterrupted sequences of capital letters
      v56       length of longest uninterrupted sequence of capital letters
      v57       total number of capital letters in the e-mail

    Outcome       
      v58       denotes whether the e-mail was considered spam (1) or not (0).
 
    For more information about the data see https://archive.ics.uci.edu/ml/datasets/spambase.

    Load spam data.
        . insheet using https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data,
            clear comma

    Introduction to lassologit

    The basic syntax for lassologit is to specify the dependent variable followed by a list of predictors:

        . lassologit v58 v1-v57

    The output of lassologit shows the penalty levels (lambda), the number of predictors included (s), the
    L1-Norm, one information criterion (EBIC by default), McFadden's Pseudo-R-squared and which predictors
    are included/removed from the model.

    By default, one line per knot is shown. Knots are points at which predictors enter or leave the model.
    By specifying long, an extended output with one row for each lambda is shown.

        . lassologit, long

    To obtain the logistic lasso estimate for a scalar lambda or a list of lambdas, the lambda(numlist)
    option can be used.  For example:

        . lassologit v58 v1-v57, lambda(40 20)
        . ereturn list

    And for one lambda:

        . lassologit v58 v1-v57, lambda(40)
        . ereturn list

    Note that output and the objects stored in e() depend on whether lambda is only one value or a list of
    more than one value.

    Information criteria

    To estimate the model selected by one of the information criteria, use the lic() option:

        . lassologit v58 v1-v57
        . lassologit, lic(ebic)
        . lassologit, lic(aicc)

    In the above example, we use the replay syntax that works similar to a post-estimation command.  The
    same can also be achieved in one line:

        . lassologit v58 v1-v57, lic(ebic)

    When lic() is used, lassologit reports the logistic lasso estimates and the post-logit estimates (from
    applying logit estimation to the model selected by the logitistic lasso) for the value of lambda
    selected by the specified information criterion.

    Note that lic() does not change the estimation results in memory. The advantage is that this way lic()
    can be used multiple times to compare results without that we need to re-estimate the model.

    To store the model selected by one of the information criteria, use postresults:

        . lassologit, lic(ebic) postresults

    Cross-validation with cvlassologit

    cvlassologit implements K-fold cross-validation where the data is by default randomly partitioned.

    Here, we use K=3 and seed(123) to set the seed for reproducibility.  (Be patient, this takes a minute.)

        . cvlassologit v58 v1-v57, nfolds(3) seed(123)

    The output shows the prediction performance measured by deviance for each lambda value.  To estimate
    the model selected by cross-validation we can specify lopt or lse using the replay syntax.

        . cvlassologit, lopt
        . cvlassologit, lse

    The data is by default randomly partitioned into K folds. The tabfold option asks lassologit to show
    the frequency distribution of successes (1) and failures (0) across folds.

        . cvlassologit v58 v1-v57, nfolds(3) seed(123) tabfold

    In small samples, we might end up with a low number of success or failures in some folds.  The
    stratified option can help with this:  it ensures that the number of successes (1) and failures (0) is
    approximately the same across folds:

        . cvlassologit v58 v1-v57, nfolds(3) seed(123) tabfold stratified

    As with lassologit, we can use the long option for an extended outout.

        . cvlassologit, long

    Rigorous penalization with rlassologit

    Lastly, we consider the logistic lasso with rigorous penalization:

        . rlassologit v58 v1-v57

    rlassologit displays the logistic lasso solution and the post-logit solution.

    The rigorous lambda is returned in e(lambda) and is equal to 79.207801.

        . di e(lambda)

    We get the same result when specifying the rigorous lambda manually using the lambda() option of
    lassologit:

        . lassologit v58 v1-v57, lambda(79.207801)

    Prediction

    After selecting a model, we can use predict to obtain predicted probabilities or linear predictions.

    First, we select a model using lic() in combination with postresults as above:

        . lassologit v58 v1-v57
        . lassologit, lic(ebic) postresults

    Then, we use predict:

        . predict double phat, pr
        . predict double xbhat, xb

    pr saves the predicted probability of success and xb saves the linear predicted values.

    Note that the use of postresults is required.  Without postresults the results of the estimation with
    the selected penalty level are not stored.

    The approach for cvlassologit is very similar:

        . cvlassologit v58 v1-v57
        . cvlassologit, lopt postresults
        . predict double phat, pr

    In the case of rlassologit, we don't need to select a specific penalty level and we also don't need to
    specify postresults.

        . rlassologit v58 v1-v57
        . predict double phat, pr

    Assessing prediction accuracy with holdout()

    We can leave one partition of the data out of the estimation sample and check the accuracy of
    prediction using the holdout(varname) option.

    We first define a binary holdout variable:

        . gen myholdout = (_n>4500)

    There are 4,601 observations in the sample, and we exclude observations 4,501 to 4,601 from the
    estimation.  The holdout variable should be set to 1 for all observations that we want to use for
    assessing classification accuracy.

        . lassologit v58 v1-v57, holdout(myholdout)
        . mat list e(loss)

        . rlassologit v58 v1-v57, holdout(myholdout)
        . mat list e(loss)

    The loss measure is returned in e(loss).  As with cross-validation, deviance is used by default.
    lossmeasure(class) will return the average number of miss-classifications.

    Plotting with lassologit

    lassologit supports plotting of the coefficient path over lambda.  Here, we create the plot using the
    replay syntax, but the same can be achieved in one line:

        . lassologit v58 v1-v57
        . lassologit, plotpath(lambda) plotvar(v1-v5) plotlabel plotopt(legend(off))

    In the above example, we use the following settings:  plotpath(lambda) plots estimates against lambda.
    plotvar(v1-v5) restricts the set of variables plotted to v1-v5 (to avoid that the graph is too
    cluttered).  plotlabel puts variable labels next to the lines.  plotopt(legend(off)) turns the legend
    off.

    Plotting with cvlassologit

    The plotcv option creates a graph of the estimates loss a function of lambda:

        . cvlassologit v58 v1-v57, nfolds(3) seed(123)
        . cvlassologit v58 v1-v57, plotcv

    The vertical solid red line indicates the value of lambda that minimizes the loss function.  The dashed
    red line corresponds to the largest lambda for which MSPE is within one standard error of the minimum
    loss.


Saved results

    lassologit with single lambda and rlassologit

    scalars       
      e(N)               sample size
      e(cons)            =1 if constant is present, 0 otherwise
      e(p)               number of predictors excluding intercept
      e(std)             =1 if predictors are standardized
      e(lcount)          number of lambda values
      e(ll0)             log-likelihood of null model
      e(total_success)   number of successes
      e(total_trials)    number of trials
      e(N_holdout)       observations in holdout sample
      e(lmax)            largest lambda value
      e(lmin)            smallest lambda value
      e(lambda)          penalty level
      e(ll)              log-likelihood
      e(shat)            number of selected regressors
      e(shat0)           number of selected and unpenalized regressors including constant (if present)
      e(tss)             total sum of squares
      e(aic)             minimum AIC
      e(bic)             minimum BIC
      e(aicc)            minimum AICc
      e(ebic)            minimum EBIC

    macros        
      e(cmd)             command name
      e(depvar)          name of dependent variable
      e(varX)            all predictors
      e(varXmodel)       penalized predictors
      e(selected)        selected predictors
      e(selected0)       selected predictors including constant

    matrices      
      e(b)               posted coefficient vector. By default used for prediction.
      e(beta_post)       post-logit coefficient vector
      e(beta_dense)      logistic lasso coefficient vector without zeros
      e(beta_post_dense) post-logit coefficient vector without zeros
      e(beta_std)        logitistic lasso coefficient vector in standard units
      e(beta_std_post)   post-logit coefficient vector in standard units
      e(beta)            logistic lasso coefficient vector
      e(sdvec)           vector of standard deviations of the predictors
      e(sPsi)            penalty loadings in standard units
      e(Psi)             = e(sPsi) :* e(sdvec)
      e(loss)            estimated loss if holdout() is used

    lassologit with multiple lambdas

    scalars       
      e(N)               sample size
      e(cons)            =1 if constant is present, 0 otherwise
      e(p)               number of predictors excluding intercept
      e(std)             =1 if predictors are standardized
      e(lcount)          number of lambda values
      e(ll0)             log-likelihood of null model
      e(total_success)   number of successes
      e(total_trials)    number of trials
      e(N_holdout)       observations in holdout sample
      e(aicmin)          minimum AIC
      e(bicmin)          minimum BIC
      e(aiccmin)         minimum AICc
      e(ebicmin)         minimum EBIC
      e(aicid)           lambda ID of minimum AIC
      e(bicid)           lambda ID of minimum BIC
      e(aiccid)          lambda ID of minimum AICc
      e(ebicid)          lambda ID of minimum EBIC
      e(aiclambda)       lambda corresponding to minimum AIC
      e(biclambda)       lambda corresponding to minimum BIC
      e(aicclambda)      lambda corresponding to minimum AICc
      e(ebiclambda)      lambda corresponding to minimum EBIC
      e(loss)            estimated loss if holdout() is used

    macros        
      e(cmd)             command name
      e(depvar)          name of dependent variable
      e(varX)            all predictors
      e(varXmodel)       penalized predictors

    matrices      
      e(betas)           posted coefficient matrix
      e(betas_std)       posted coefficient matrix in standard units
      e(lambdas)         vector of lambdas
      e(aic)             vector of AIC values
      e(aicc)            vector of AICc values
      e(bic)             vector of BIC values
      e(ebic)            vector of EBIC values
      e(ll)              vector of log-likelihood values
      e(l1norm)          vector of L1-norm
      e(shat)            number of included predictors
      e(shat0)           number of included predictors including intercept
      e(sdvec)           vector of standard deviations of the predictors
      e(sPsi)            penalty loadings in standard units
      e(Psi)             = e(sPsi) :* e(sdvec)


    cvlassologit

    scalars       
      e(N)               number of observations
      e(lunique)         lunique
      e(lambdan)         =1 if lambdan option is used
      e(mlossmin)        number of observations
      e(lmin)            smallest lambda used for CV
      e(lmax)            maximum lambda used for CV
      e(lse)             number of observations
      e(lopt)            number of observations
      e(lseid)           lambda ID corresponding to e(lse)
      e(loptid)          lambda ID corresponding to e(lopt)
      e(nfolds)          number of folds

    macros        
      e(cmd)             command name
      e(depvar)          name of dependent variable
      e(varX)            all predictors
      e(lossmeasure)     loss measure (deviance or class)

    matrices      
      e(lambdas)         vector of lambda values used for cross-validation
      e(mloss)           mean cross-validated loss
      e(loss)            cross-validated loss for each fold; a matrix of size nfolds x lcount
      e(cvsd)            estimate of standard error of mean cross-validated loss
      e(cvlower)         = e(mloss) - e(cvsd)
      e(cvupper)         = e(mloss) + e(cvsd)

    Estimation sample (always returned)

    functions     
      e(sample)          estimation sample


References

    Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic
        Control, 19(6), 716–723.  https://doi.org/10.1109/TAC.1974.1100705

    Belloni, A., Chernozhukov, V., & Wei, Y. (2016). Post-Selection Inference for Generalized Linear Models
        With Many Controls.  Journal of Business & Economic Statistics, 34(4), 606–619.  
        https://doi.org/10.1080/07350015.2016.1166116

    Belloni, A., Chernozhukov, V., Fernández-Val, I., & Hansen, C. (2017).  Program Evaluation and Causal
        Inference With High-Dimensional Data.  Econometrica, 85(1), 233–298.  
        https://doi.org/10.3982/ECTA12723

    Fu, W. J. (1998). Penalized Regressions: The Bridge Versus the Lasso. Journal of Computational and
        Graphical Statistics 7(3), 397–416.  https://doi.org/10.2307/1390712

    Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. The
        Annals of Applied Statistics 1(2), 302–332.  https://doi.org/10.1214/07-AOAS131

    Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models
        via Coordinate Descent. Journal of Statistical Software 33(1), 1–22.  
        https://doi.org/10.18637/jss.v033.i01

    Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). New
        York: Springer-Verlag.  https://web.stanford.edu/~hastie/ElemStatLearn/

    Hurvich, C. M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples.
        Biometrika, 76(2), 297–307.  http://doi.org/10.1093/biomet/76.2.297

    Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461–464.  
        https://doi.org/10.1214/aos/1176344136

    Sugiura, N. (1978). Further analysts of the data by akaike’ s information criterion and the finite
        corrections. Communications in Statistics - Theory and Methods, 7(1), 13–26.  
        http://doi.org/10.1080/03610927808827599

    Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal
        Statistical Society. Series B (Methodological) 58(1), 267–288.  https://doi.org/10.2307/2346178

    Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., & Tibshirani, R. J. (2012).
        Strong rules for discarding predictors in lasso-type problems.  Journal of the Royal Statistical
        Society. Series B (Statistical Methodology), 74(2), 245–266.  http://www.jstor.org/stable/41430939

    Van der Kooij A (2007). Prediction Accuracy and Stability of Regrsssion with Optimal Scaling
        Transformations. Ph.D. thesis, Department of Data Theory, University of Leiden.  
        http://hdl.handle.net/1887/12096


Website

    Please check our website https://statalasso.github.io/ for more information.


Installation

    To get the latest stable version of lassologit from our website, check the installation instructions at
    https://statalasso.github.io/installation/.  We update the stable website version more frequently than
    the SSC version.

    To verify that lassologit is correctly installed, click on or type whichpkg lassologit (which requires 
    whichpkg to be installed; ssc install whichpkg).

Citation of lassologit

    lassologit is not an official Stata command. It is a free contribution to the research community, like
    a paper. Please cite it as such:

    Ahrens, A., Hansen, C.B., Schaffer, M.E. 2019.  lassologit: Stata module for logistic lasso regression.
        http://ideas.repec.org/c/boc/bocode/XXXXX.html


Authors

        Achim Ahrens, Economic and Social Research Institute, Ireland
        achim.ahrens@esri.ie
        
        Christian B. Hansen, University of Chicago, USA
        Christian.Hansen@chicagobooth.edu

        Mark E Schaffer, Heriot-Watt University, UK
        m.e.schaffer@hw.ac.uk
        

Also see

       Help: lasso2, cvlasso, rlasso, ivlasso, pdslasso (if installed).