Python¶
We will be using Python 3 extensively in this course (and in machine learning). This document is a reference sheet for useful libraries and tools that you might use.
Tools¶
This section lists useful Python tools, which are external programs (not Python libraries, although many – if not all – are written in Python) which help you to use Python.
Virtualenv¶
virtualenv
is used to create virtual Python installations. This is
useful to isolate dependencies between projects (e.g. project A needs
version 1 of a library while project B needs version 2, and so a
system-wide installation cannot be used). This is also useful for
allowing local installation of Python libraries without administrator
privileges.
Python 3 comes with the venv
module which allows you to create
virtual environments. To create a new virtual environment, run
cd your/project/directory
python3 -m venv venv
This will create a directory called venv
(feel free to change the
name) which is a virtual Python installation. After creating it (and
every time you open a new shell and want to work with the project)
running
cd your/project/directory
. venv/bin/activate
will activate the virtual environment. Running the deactivate
command will deactivate the environment. We can verify that the
environment is being used easily. Ordinarily, the command which
python3
will return /usr/bin/python3
, but when the virtual
environment is active it will return
your/project/directory/venv/bin/python3
instead.
While nominally you might want to create a unique virtual environment for each project in this class, it likely suffices to create a single virtual environment at the root of a directory that will contain all work for the semester since we do not expect to be installing conflicting software versions.
A virtualenv can also be used to allow your own personal Python modules to be used together easily. Suppose you had the following directory structure:
/home/user/project-1/README
/home/user/project-1/proj1/__init__.py
/home/user/project-1/proj1/proj1.py
/home/user/project-2/make_plots.py
Suppose you want to use the proj1
module inside make_plots.py
;
if you simply wrote import proj1
Python wouldn’t know where to
find that module, and it would fail. You could of course make the
entire directory structure into a single Python project, but that is a
bad idea unless it really is supposed to be one project. Instead, we
can use a virtualenv to add the project-1/
path to the list of
paths Python looks at to discover modules.
First, create the virtual environment at the root of the two projects.
cd home/user
python3 -m venv --system-site-packages venv
This will create the /home/user/venv
directory, and it will allow
this virtual environment to use the packages that are installed in the
default system installation (we do this because installing
matplotlib
, numpy
, scipy
, and other libraries in each
virtualenv is time-consuming and tedious). We now will create a
.pth
file containing extra paths to add to the module search
path. The contents of this file (for this example) should be:
/home/user/project-1
The name of the file is unimportant, as long as it ends in
.pth
. It should be located in the
venv/lib/python3.4/site-packages
directory. Our directory
structure now looks like:
/home/user/project-1/README
/home/user/project-1/proj1/__init__.py
/home/user/project-1/proj1/proj1.py
/home/user/project-2/make_plots.py
/home/user/venv/lib/python3.4/site-packages/extrapaths.pth
After activating the virtualenv via . venv/bin/activate
, the
module proj1
can be imported anywhere.
pip
¶
pip
is a tool that installs Python packages from PyPI. When you
need to use a Python library that does not come with the default
installation of Python, you likely want to use pip
to install
it. For example, to install the matplotlib
plotting library you
can run
pip install matplotlib
If you run this command while your virtual environment is active it will install locally; otherwise you might need administrator privileges to run this command.
IPython¶
An improved Python read-eval-print-loop that offers tab completion,
easy access to documentation, and access to the system shell without
leaving Python. The IPython Notebook is an web browser based
interactive Python shell that is excellent for exploratory data
analysis and experimenting with plots. Note that the IPython shell can
be used exactly like the standard python
shell, and so there is no
real learning curve to conquer to get started.
Python libraries¶
This section lists useful Python libraries for common tasks. All of these libraries can be installed into your virtual environment using pip.
Matplotlib¶
A full-featured plotting library for Python. The gallery contains many examples of what can be done along with the code samples used to generate each example; an excellent way to learn to use this library is to scan the gallery for a feature you want (such as a legend or text annotation) and checking the same code to see how it was created.
Useful links:
Numpy¶
A numerical library for Python. Its primary selling point is its (very) fast matrix and linear algebra code. (Internally much of Numpy is written in highly optimized Fortran libraries, so using linear algebra heavy Python code that uses Numpy might be nearly as fast as code written in any other language). If there is a linear algebra function you need, there is a good chance Numpy as implemented it.
Useful links:
SciPy¶
SciPy is an extension to Numpy with many more scientific computing libraries. If Numpy doesn’t have the function you’re looking for, SciPy might.
Useful links:
Scikit-Learn¶
Scikit-learn is a collection of machine learning algorithms for Python, with routines for classification, regression, clustering, and more.
requests¶
The requests
module allows you to make HTTP requests. It is very
easy to use; from their documentation,
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
Useful links:
flask¶
Flask is a framework for creating web services in Python. Their documentation is a great place to get started.
Useful links:
pytest¶
A module for writing boilerplate-free unit tests in Python. With
pytest
, any file containing a function beginning with test_
will be run as a test, and pytest
will also auto-discover any
files matching test_*.py
or *_test.py
. If you are familiar
with the unittest
module, pytest
also understands unittest
tests and files.
Useful links: