Introduction

Welcome to the Stata ML Page #

On this website we introduce packages for machine learning in Stata. The packages include features intended for prediction, model selection and causal inference.

  • The package lassopack implements lasso (Tibshirani 1996), square-root lasso (Belloni et al. 2011), elastic net (Zou & Hastie 2005), ridge regression (Hoerl & Kennard 1970), adaptive lasso (Zou 2006) and post-estimation OLS. lassopack also supports logistic lasso.

  • pdslasso offers methods to facilitate causal inference in structural models. The package allows to select control variables and/or instruments from a large set of variables in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest.

  • pystacked implements stacking regression (Wolpert, 1992) via scikit-learn’s sklearn.ensemble.StackingRegressor and sklearn.ensemble.StackingClassifier. Stacking is a way of combining predictions from multiple supervised machine learners (the “base learners”) into a final prediction to improve performance.

  • ddml implements Double/Debiased Machine Learning (DDML) for Stata. Five different estimators are supported, allowing for flexible estimation of causal effects of endogenous variables in settings with unknown functional forms and/or many exogenous variables. ddml is compatible with many existing supervised machine learning programs in Stata.