Install Stata/Python

Setting up Stata’s Python integration #

pystacked requires at least Stata 16 (or higher), a Python installation (3.8 or higher) and scikit-learn (0.24 or higher). You should also install Python if you want to use ddml.

StataCorp provides detailed instructions on how to set up Stata’s Python integration in three blog entries: Link 1, Link 2 Link 3.

Below, we briefly outline the steps.

1. Python installation #

You have (at least) three options:

  1. For many, the easiest way is to install Anaconda which is available here. Anaconda is a Python distribution that comes with the most important packages, a package manager and an editor.
  2. Alternatively, you can download and install plain Python from here.
  3. If you already have a recent Python version installed on your system, you could set up a separate Python environment for Stata. This is optional, but might be useful if you want to use different libraries in different projects (see instructions here).

2. Set up Python integration #

Once you have installed Python, you need to tell Stata where to find the Python installation.

You can search for Python installations on your system using

. python search

Note that multiple Python installations might show up (e.g. MacOS is shipped with an old Python version) and that Stata will not always find all Python installations on your system.

To link Stata to a particular Python installation use:

python set exec <pyexecutable> , permanently 

where <pyexecutable> could be, for example, /usr/local/bin/python3, C:\Program Files\Python38\python.exe or C:\Users\<user>\AppData\Local\Programs\Python\Python38\python.exe, depending on your OS and where you installed Python.

Type python query to check that the installation was correctly linked:

.  python query
------------------------------------------------------------------ 
    Python Settings
      set python_exec      /usr/bin/python3
      set python_userpath  

    Python system information
      initialized          no
      version              3.8.9
      architecture         64-bit

You can start Python within Stata just by typing python and go back to the Stata environment using end:

. python
----------------------------------------------- python (type end to exit) ----------
>>> print('hello')
hello
>>> end
------------------------------------------------------------------------------------

3. Managing packages #

pystacked requires scikit-learn (abbreviated sklearn). You can check from within Stata whether sklearn is installed:

. python which sklearn
<module 'sklearn' from '/Users/<username>/Library/Python/3.8/lib/python/site-packages/sklearn/__init__.py'>

If Stata doesn’t find sklearn, this is either because you have linked Stata to the wrong Python installation or because you still need to install sklearn.

If you use Anaconda, sklearn is automatically included and you can update scikit-learn through your Anaconda Python distribution (see here) or using conda install in the terminal.

If you do not use Anaconda, you can install and update packages using pip. For example, you can install sklearn by typing <Python path> -m pip install -U scikit-learn into the terminal, or directly in Stata:

. shell <Python path> -m pip install -U scikit-learn

where <Python path> refer to the Python installation that you want to use with Stata. If you just want to use your default Python installation, you can also replace <Python path> with python3 (on Mac) or py (on Win).

4. Check that it works #

To test that Stata’s Python integration works on your system, run the following test code in Stata:

clear all 
use http://www.stata-press.com/data/r16/iris

python:
from sfi import Data
import numpy as np
from sklearn.svm import SVC

# Use the sfi Data class to pull data from Stata variables into Python
X = np.array(Data.get("seplen sepwid petlen petwid"))
y = np.array(Data.get("iris"))

# Use the data to train C-Support Vector Classifier
svc_clf = SVC(gamma='auto')
svc_clf.fit(X, y)
end

To test that pystacked works on your system, run the following test code in Stata:

clear all
use https://statalasso.github.io/dta/cal_housing.dta, clear
set seed 42
gen train=runiform()
replace train=train<.75
set seed 42
pystacked medh longi-medi if train 

Optional: Use Stata with Python environments #

You can also use Stata with Python environments. This can be useful if you want to work with multiple versions of Python on your system. A full guide on how Python environments work is available here. Below is a step-by-step guide.

MacOS:

  1. Close Stata and create folder where you would like to save your Python environments. I use the folder /Users/myname/python_envs.
  2. Open the Terminal and navigate to that folder. In my example, this would be cd /Users/myname/python_envs.
  3. Set up the Python environment: python3 -m venv myenv where myenv can be replaced by whatever name you want to use.
  4. Activate the environment: source venv/bin/activate
  5. Install sklearn: python3 -m pip install scikit-learn
  6. Deactivate the environment: deactivate
  7. Open Stata and type python set exec "/Users/myname/python_envs/myenv/bin/python3", perm

Windows:

  1. Close Stata and create folder where you would like to save your Python environments. I use the folder C:\Users\myname\python_envs.
  2. Open the Command Prompt and navigate to that folder. In my example, this would be cd C:\Users\myname\python_envs.
  3. Set up the Python environment: py -m venv myenv where myenv can be replaced by whatever name you want to use.
  4. Activate the environment: .\venv\bin\activate
  5. Install sklearn: py -m pip install scikit-learn
  6. Deactivate the environment: deactivate
  7. Open Stata and type python set exec "C:\Users\myname\python_envs\Scripts\python.exe", perm