Python
======
We will be using Python 3 extensively in this course (and in machine
learning). This document is a reference sheet for useful libraries and
tools that you might use.
Tools
-----
This section lists useful Python tools, which are external programs
(not Python libraries, although many -- if not all -- are written in
Python) which help you to use Python.
.. _virtualenv:
Virtualenv
~~~~~~~~~~
``virtualenv`` is used to create virtual Python installations. This is
useful to isolate dependencies between projects (e.g. project A needs
version 1 of a library while project B needs version 2, and so a
system-wide installation cannot be used). This is also useful for
allowing local installation of Python libraries without administrator
privileges.
Python 3 comes with the ``venv`` module which allows you to create
virtual environments. To create a new virtual environment, run
.. code-block:: bash
cd your/project/directory
python3 -m venv venv
This will create a directory called ``venv`` (feel free to change the
name) which is a virtual Python installation. After creating it (and
every time you open a new shell and want to work with the project)
running
.. code-block:: bash
cd your/project/directory
. venv/bin/activate
will activate the virtual environment. Running the ``deactivate``
command will deactivate the environment. We can verify that the
environment is being used easily. Ordinarily, the command ``which
python3`` will return ``/usr/bin/python3``, but when the virtual
environment is active it will return
``your/project/directory/venv/bin/python3`` instead.
While nominally you might want to create a unique virtual environment
for each project in this class, it likely suffices to create a single
virtual environment at the root of a directory that will contain all
work for the semester since we do not expect to be installing
conflicting software versions.
A virtualenv can also be used to allow your own personal Python
modules to be used together easily. Suppose you had the following
directory structure::
/home/user/project-1/README
/home/user/project-1/proj1/__init__.py
/home/user/project-1/proj1/proj1.py
/home/user/project-2/make_plots.py
Suppose you want to use the ``proj1`` module inside ``make_plots.py``;
if you simply wrote ``import proj1`` Python wouldn't know where to
find that module, and it would fail. You could of course make the
entire directory structure into a single Python project, but that is a
bad idea unless it really is supposed to be one project. Instead, we
can use a virtualenv to add the ``project-1/`` path to the list of
paths Python looks at to discover modules.
First, create the virtual environment at the root of the two projects.
.. code-block:: bash
cd home/user
python3 -m venv --system-site-packages venv
This will create the ``/home/user/venv`` directory, and it will allow
this virtual environment to use the packages that are installed in the
default system installation (we do this because installing
``matplotlib``, ``numpy``, ``scipy``, and other libraries in each
virtualenv is time-consuming and tedious). We now will create a
``.pth`` file containing extra paths to add to the module search
path. The contents of this file (for this example) should be::
/home/user/project-1
The name of the file is unimportant, as long as it ends in
``.pth``. It should be located in the
``venv/lib/python3.4/site-packages`` directory. Our directory
structure now looks like::
/home/user/project-1/README
/home/user/project-1/proj1/__init__.py
/home/user/project-1/proj1/proj1.py
/home/user/project-2/make_plots.py
/home/user/venv/lib/python3.4/site-packages/extrapaths.pth
After activating the virtualenv via ``. venv/bin/activate``, the
module ``proj1`` can be imported anywhere.
.. _pip:
``pip``
~~~~~~~
``pip`` is a tool that installs Python packages from PyPI_. When you
need to use a Python library that does not come with the default
installation of Python, you likely want to use ``pip`` to install
it. For example, to install the ``matplotlib`` plotting library you
can run
.. code-block:: bash
pip install matplotlib
If you run this command while your virtual environment is active it
will install locally; otherwise you might need administrator
privileges to run this command.
.. _PyPI: pypi.python.org/pypi/
IPython
~~~~~~~
An improved Python read-eval-print-loop that offers tab completion,
easy access to documentation, and access to the system shell without
leaving Python. The IPython Notebook is an web browser based
interactive Python shell that is excellent for exploratory data
analysis and experimenting with plots. Note that the IPython shell can
be used exactly like the standard ``python`` shell, and so there is no
real learning curve to conquer to get started.
Python libraries
----------------
This section lists useful Python libraries for common tasks. All of
these libraries can be installed into your virtual environment using
:ref:`pip`.
Matplotlib
~~~~~~~~~~
A full-featured plotting library for Python. The gallery_ contains
many examples of what can be done along with the code samples used to
generate each example; an excellent way to learn to use this library
is to scan the gallery for a feature you want (such as a legend or
text annotation) and checking the same code to see how it was created.
.. _gallery: http://matplotlib.org/gallery.html
Useful links:
- `Homepage `_
- `Example gallery `_
Numpy
~~~~~
A numerical library for Python. Its primary selling point is its
(very) fast matrix and linear algebra code. (Internally much of Numpy
is written in highly optimized Fortran libraries, so using linear
algebra heavy Python code that uses Numpy might be nearly as fast as
code written in any other language). If there is a linear algebra
function you need, there is a good chance Numpy as implemented it.
Useful links:
- `Homepage `_
- `Installation `_
- `API Reference `_
SciPy
~~~~~
SciPy is an extension to Numpy with many more scientific computing
libraries. If Numpy doesn't have the function you're looking for,
SciPy might.
Useful links:
- `Homepage `_
- `Installation `_
- `API Reference `_
Scikit-Learn
~~~~~~~~~~~~
Scikit-learn is a collection of machine learning algorithms for
Python, with routines for classification, regression, clustering, and
more.
- `Homepage `_
- `Quick start `_
- `Full documentation `_
requests
~~~~~~~~
The ``requests`` module allows you to make HTTP requests. It is very
easy to use; from their `documentation
`_,
.. code-block:: python
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
Useful links:
- `Documentation `_
flask
~~~~~
Flask is a framework for creating web services in Python. Their
`documentation `_ is a great place to get
started.
Useful links:
- `Documentation `_
.. _pytest:
pytest
~~~~~~
A module for writing boilerplate-free unit tests in Python. With
``pytest``, any file containing a function beginning with ``test_``
will be run as a test, and ``pytest`` will also auto-discover any
files matching ``test_*.py`` or ``*_test.py``. If you are familiar
with the ``unittest`` module, ``pytest`` also understands ``unittest``
tests and files.
Useful links:
- `Getting started
`_
- `Full documentation `_
- `Setup and tear-down methods
`_