Many instruments #
Belloni et al. (2012, Econometrica) consider the model
\[y_i = \alpha d_i + \varepsilon_i \\ d_i = z_i'\delta + u_i\]where \(y_i\) is the dependent variable, \(d_i\) is an endogenous regressors and \(z_i\) is a \(p_z\) dimensional vector of instruments. \(p_z\) is allowed to be large and may even exceed the sample size. We refer to \(z_i\) as highdimensional. The interest lies in estimating the causal effect of endogenous variable \(d_i\) on the outcome variable \(y_i\) .
The choice and specification of instruments is crucial for the estimation of \(\alpha\) . However, often it is a priori not clear how to select or specify instruments. The situation of many instruments can arise because there are simply many instruments available and/or because we need to consider a large number of transformations of elementary variables to approximate the complex relationship between endogenous regressor \(d_i\) and instruments \(z_i\) .
Belloni et al. suggest to apply the lasso with theorydriven penalization to the equation
\(d_i = z_i'\delta + u_i\)
. Under the assumption of (approximate) sparsity, the rigorous lasso (or squareroot lasso) can be applied to select appropriate instruments and to predict
\(d_i\)
.
\(\hat{d}_i=z_i'\hat\delta\)
is then used as a as estimate of the optimal instrument, where
\(\hat\delta\)
is either the lasso, squareroot lasso, postlasso or post squareroot lasso estimator. Instrument selection using lasso and squareroot lasso is implemented in ivlasso
.
Many controls #
Next, we consider the case where \(d_i\) is exogenous, but there are many control variables.
\(y_i = \alpha d_i + x_i'\beta + \varepsilon_i\)In this setting, we allow the \(p_x\) dimensional vector of controls, \(x_i\) to be highdimensional. The problem the researcher faces is that the “right” set of controls is not known. In traditional practice, this presents her with a difficult choice: use too few controls, or the wrong ones, and omitted variable bias will be present; use too many, and the model will suffer from overfitting.
The postdoubleselection (PDS) methodology introduced in Belloni, Chernozhukov and Hansen (2014) uses the lasso estimator to select the controls. Specifically, the lasso is used twice:

estimate a lasso regression with \(y_i\) as the dependent variable and the control variables \(x_i\) as regressors;

estimate a lasso regression with \(d_i\) as the dependent variable and again the control variables \(x_i\) as regressors. The lasso estimator achieves a sparse solution, i.e., most coefficients are set to zero. The final choice of control variables to include in the OLS regression of \(y_i\) on \(d_i\) is the union of the controls selected selected in steps 1. and 2., hence the name postdouble selection for the methodolgy.
The postregularization or CHS methodology is closely related. Instead of using the lassoselected controls in a postregularization OLS estimation, the selected variables are used to construct orthogonalized versions of the dependent variable and the exogenous causal variables of interest. The orthogonalized versions are based either on the lasso or postlasso estimated coefficients; the postlasso is OLS applied to lassoselected variables. See Chernozhukov, Hansen & Spindler (2015) for details.
The postdoubleselection and postregularization approach
for many controls are implemented in pdslasso
.
Many controls and many instruments #
Chernozhukov, Hansen & Spindler (2015) also consider the case where we have both many instruments and many controls:
\[y_i = \alpha d_i + x_i'\beta +\varepsilon_i\\ d_i = x_i'\gamma + z_i'\delta + u_i\]where
\(p_x\gg N\)
and/or
\(p_z\gg N\)
are allowed. The above model can be estimated using
ivlasso
, which allows for low and/or highdimensional sets of instruments.
To summarise, ivlasso
and pdslasso
implement methods for:
 endogenous and/or exogenous regressors,
 low and highdimensional instruments,
 low and highdimensional control variables.