Setting up Stata’s Python integration #
pystacked
requires at least Stata 16 (or higher), a Python installation (3.8 or higher) and scikit-learn (0.24 or higher). You should also install Python if you want to use ddml
.
StataCorp provides detailed instructions on how to set up Stata’s Python integration in three blog entries: Link 1, Link 2 Link 3.
Below, we briefly outline the steps.
1. Python installation #
You have (at least) three options:
- For many, the easiest way is to install Anaconda which is available here. Anaconda is a Python distribution that comes with the most important packages, a package manager and an editor.
- Alternatively, you can download and install plain Python from here.
- If you already have a recent Python version installed on your system, you could set up a separate Python environment for Stata. This is optional, but might be useful if you want to use different libraries in different projects (see instructions here).
2. Set up Python integration #
Once you have installed Python, you need to tell Stata where to find the Python installation.
You can search for Python installations on your system using
. python search
Note that multiple Python installations might show up (e.g. MacOS is shipped with an old Python version) and that Stata will not always find all Python installations on your system.
To link Stata to a particular Python installation use:
python set exec <pyexecutable> , permanently
where <pyexecutable>
could be, for example, /usr/local/bin/python3
, C:\Program Files\Python38\python.exe
or C:\Users\<user>\AppData\Local\Programs\Python\Python38\python.exe
, depending on your OS and where you installed Python.
Type python query
to check that the installation was correctly linked:
. python query
------------------------------------------------------------------
Python Settings
set python_exec /usr/bin/python3
set python_userpath
Python system information
initialized no
version 3.8.9
architecture 64-bit
You can start Python within Stata just by typing python
and go back to the Stata environment
using end
:
. python
----------------------------------------------- python (type end to exit) ----------
>>> print('hello')
hello
>>> end
------------------------------------------------------------------------------------
3. Managing packages #
pystacked
requires scikit-learn (abbreviated sklearn
). You can check from within Stata whether sklearn
is installed:
. python which sklearn
<module 'sklearn' from '/Users/<username>/Library/Python/3.8/lib/python/site-packages/sklearn/__init__.py'>
If Stata doesn’t find sklearn
, this is either because you have linked Stata to the wrong Python installation or because you still need to install sklearn
.
If you use Anaconda, sklearn
is automatically included and you can update scikit-learn through your Anaconda Python distribution (see here) or using conda install
in the terminal.
If you do not use Anaconda, you can install and update packages using pip
. For example, you can install sklearn
by typing <Python path> -m pip install -U scikit-learn
into the terminal, or directly
in Stata:
. shell <Python path> -m pip install -U scikit-learn
where <Python path>
refer to the Python installation that you want to use with Stata. If you just want to use your default Python installation, you can also replace <Python path>
with python3
(on Mac) or py
(on Win).
4. Check that it works #
To test that Stata’s Python integration works on your system, run the following test code in Stata:
clear all
use http://www.stata-press.com/data/r16/iris
python:
from sfi import Data
import numpy as np
from sklearn.svm import SVC
# Use the sfi Data class to pull data from Stata variables into Python
X = np.array(Data.get("seplen sepwid petlen petwid"))
y = np.array(Data.get("iris"))
# Use the data to train C-Support Vector Classifier
svc_clf = SVC(gamma='auto')
svc_clf.fit(X, y)
end
To test that pystacked
works on your system, run the following test code in Stata:
clear all
use https://statalasso.github.io/dta/cal_housing.dta, clear
set seed 42
gen train=runiform()
replace train=train<.75
set seed 42
pystacked medh longi-medi if train
Optional: Use Stata with Python environments #
You can also use Stata with Python environments. This can be useful if you want to work with multiple versions of Python on your system. A full guide on how Python environments work is available here. Below is a step-by-step guide.
MacOS:
- Close Stata and create folder where you would like to save your Python environments. I use the folder
/Users/myname/python_envs
. - Open the Terminal and navigate to that folder. In my example, this would be
cd /Users/myname/python_envs
. - Set up the Python environment:
python3 -m venv myenv
wheremyenv
can be replaced by whatever name you want to use. - Activate the environment:
source venv/bin/activate
- Install sklearn:
python3 -m pip install scikit-learn
- Deactivate the environment:
deactivate
- Open Stata and type
python set exec "/Users/myname/python_envs/myenv/bin/python3", perm
Windows:
- Close Stata and create folder where you would like to save your Python environments. I use the folder
C:\Users\myname\python_envs
. - Open the Command Prompt and navigate to that folder. In my example, this would be
cd C:\Users\myname\python_envs
. - Set up the Python environment:
py -m venv myenv
wheremyenv
can be replaced by whatever name you want to use. - Activate the environment:
.\venv\bin\activate
- Install sklearn:
py -m pip install scikit-learn
- Deactivate the environment:
deactivate
- Open Stata and type
python set exec "C:\Users\myname\python_envs\Scripts\python.exe", perm