Python

We will be using Python 3 extensively in this course (and in machine learning). This document is a reference sheet for useful libraries and tools that you might use.

Tools

This section lists useful Python tools, which are external programs (not Python libraries, although many – if not all – are written in Python) which help you to use Python.

Virtualenv

virtualenv is used to create virtual Python installations. This is useful to isolate dependencies between projects (e.g. project A needs version 1 of a library while project B needs version 2, and so a system-wide installation cannot be used). This is also useful for allowing local installation of Python libraries without administrator privileges.

Python 3 comes with the venv module which allows you to create virtual environments. To create a new virtual environment, run

cd your/project/directory
python3 -m venv venv

This will create a directory called venv (feel free to change the name) which is a virtual Python installation. After creating it (and every time you open a new shell and want to work with the project) running

cd your/project/directory
. venv/bin/activate

will activate the virtual environment. Running the deactivate command will deactivate the environment. We can verify that the environment is being used easily. Ordinarily, the command which python3 will return /usr/bin/python3, but when the virtual environment is active it will return your/project/directory/venv/bin/python3 instead.

While nominally you might want to create a unique virtual environment for each project in this class, it likely suffices to create a single virtual environment at the root of a directory that will contain all work for the semester since we do not expect to be installing conflicting software versions.

A virtualenv can also be used to allow your own personal Python modules to be used together easily. Suppose you had the following directory structure:

/home/user/project-1/README
/home/user/project-1/proj1/__init__.py
/home/user/project-1/proj1/proj1.py
/home/user/project-2/make_plots.py

Suppose you want to use the proj1 module inside make_plots.py; if you simply wrote import proj1 Python wouldn’t know where to find that module, and it would fail. You could of course make the entire directory structure into a single Python project, but that is a bad idea unless it really is supposed to be one project. Instead, we can use a virtualenv to add the project-1/ path to the list of paths Python looks at to discover modules.

First, create the virtual environment at the root of the two projects.

cd home/user
python3 -m venv --system-site-packages venv

This will create the /home/user/venv directory, and it will allow this virtual environment to use the packages that are installed in the default system installation (we do this because installing matplotlib, numpy, scipy, and other libraries in each virtualenv is time-consuming and tedious). We now will create a .pth file containing extra paths to add to the module search path. The contents of this file (for this example) should be:

/home/user/project-1

The name of the file is unimportant, as long as it ends in .pth. It should be located in the venv/lib/python3.4/site-packages directory. Our directory structure now looks like:

/home/user/project-1/README
/home/user/project-1/proj1/__init__.py
/home/user/project-1/proj1/proj1.py
/home/user/project-2/make_plots.py
/home/user/venv/lib/python3.4/site-packages/extrapaths.pth

After activating the virtualenv via . venv/bin/activate, the module proj1 can be imported anywhere.

pip

pip is a tool that installs Python packages from PyPI. When you need to use a Python library that does not come with the default installation of Python, you likely want to use pip to install it. For example, to install the matplotlib plotting library you can run

pip install matplotlib

If you run this command while your virtual environment is active it will install locally; otherwise you might need administrator privileges to run this command.

IPython

An improved Python read-eval-print-loop that offers tab completion, easy access to documentation, and access to the system shell without leaving Python. The IPython Notebook is an web browser based interactive Python shell that is excellent for exploratory data analysis and experimenting with plots. Note that the IPython shell can be used exactly like the standard python shell, and so there is no real learning curve to conquer to get started.

Python libraries

This section lists useful Python libraries for common tasks. All of these libraries can be installed into your virtual environment using pip.

Matplotlib

A full-featured plotting library for Python. The gallery contains many examples of what can be done along with the code samples used to generate each example; an excellent way to learn to use this library is to scan the gallery for a feature you want (such as a legend or text annotation) and checking the same code to see how it was created.

Useful links:

Numpy

A numerical library for Python. Its primary selling point is its (very) fast matrix and linear algebra code. (Internally much of Numpy is written in highly optimized Fortran libraries, so using linear algebra heavy Python code that uses Numpy might be nearly as fast as code written in any other language). If there is a linear algebra function you need, there is a good chance Numpy as implemented it.

Useful links:

SciPy

SciPy is an extension to Numpy with many more scientific computing libraries. If Numpy doesn’t have the function you’re looking for, SciPy might.

Useful links:

Scikit-Learn

Scikit-learn is a collection of machine learning algorithms for Python, with routines for classification, regression, clustering, and more.

requests

The requests module allows you to make HTTP requests. It is very easy to use; from their documentation,

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}

Useful links:

flask

Flask is a framework for creating web services in Python. Their documentation is a great place to get started.

Useful links:

pytest

A module for writing boilerplate-free unit tests in Python. With pytest, any file containing a function beginning with test_ will be run as a test, and pytest will also auto-discover any files matching test_*.py or *_test.py. If you are familiar with the unittest module, pytest also understands unittest tests and files.

Useful links: