Making Python Fast

Python is an expressive language that is well-suited for many of the problems we typically solve. However, sometimes Python is just too slow. This page discusses some ways to speed up your Python code.

Note: Using the correct data structures and algorithms is far more important than what language you use! Moving from an \(O(n^2)\) implementation to a \(O(n)\) implementation is a much larger speedup than you can get by switching languages. Check your algorithms and data structures first.

The following sections are roughly ordered by ease-of-use, starting with faster libraries and ending with foreign language bindings.

Faster libraries: Numpy

The easiest way to speed up your code is to properly utilize the numpy library. numpy offers high-dimensional arrays and matrices that are fast. The library implements its core operations and data structures in C or Fortran, avoiding the overhead of Python.

As an example, let’s compute matrix powers. Specifically, we compute \(A^{16}\) where \(A\) is a \(100 \times 100\) matrix. Our plain Python solution takes 11.77 seconds to run, while using Numpy to perform the multiplications and generate the matrices takes 0.0097 seconds to run. Additionally, if we use the Numpy function power instead, we cut the runtime to 0.00065 seconds.

We could, of course, speed up our Python solution by more carefully constructing the matrix. However even using Numpy to create the matrices doesn’t get us all the way to (or even close to) the Numpy solution.

As an additional benefit, starting in Python 3.5 matrix multiplication in libraries such as Numpy can be written as

A @ B @ C

using the new @ binary operator.

Writing some code in C(++)

Sometimes Numpy (or other libraries) don’t have exactly what you need, but your code is still slow. If this is the case you could consider writing the most time-consuming routines in C++, and using Python for everything else.

Brief note on Python

So far we’ve just referred to Python as Python, but ‘Python’ is actually just the name of the language and does not necessarily imply a specific implementation. The standard reference implementation is called cpython and is written in C. There are other implementations such as IronPython (for the .NET platform), Jython (for the Java Virtual Machine) and PyPy (a just-in-time compiler for Python written in Python; surprisingly fast). We will be focusing on cpython, which is what you are using unless you have made a conscious choice not to. In the following sections ‘Python’ will refer to cpython.

First-class extensions

Python supports extensions to the language that directly add new functions to Python. However writing these extensions is non-trivial; you must use the Python header files and adhere to a strict interface as well as deal with Python’s memory model. This would be the approach you’d want to take if you were publishing a library that you want many people to use and to be fully integrated with Python.

Using the ctypes module

Fortunately there is a very simple alternative to writing a first-class Python extension, which is using the ctypes module. The ctypes module lets you call any function defined in a shared library (DLL on Windows). The upside is that you don’t need to change your C++ code at all. The downside is that your Python code needs to wrangle with C’s type system to call the functions.

As an example, consider the following C++ code.

Head file:

#ifndef C_ADD_HH
#define C_ADD_HH

class Point
  Point(double x, double y);
  double x;
  double y;

namespace Example
  extern "C" auto add(double x,  double y) -> double;
  extern "C" auto add_point(const Point &a, const Point &b) -> Point;


Source file:

#include "c_add.hh"

Point::Point(double x, double y)
  : x(x)
  , y(y)

auto Example::add(double x, double y) -> double
  return x + y;

auto Example::add_point(const Point &a, const Point &b) -> Point
  return Point(a.x + b.x, a.y + b.y);

We can compile a shared library called via

g++ -o c_add.os -c -fPIC -Wall -Werror -std=c++11
g++ -o -shared c_add.os

We can use the following Python code to call into the shared library.

from ctypes import CDLL, c_double, Structure, byref

# We load the shared library
lib_example = CDLL('./')

# The 'add' function returns a double
lib_example.add.restype = c_double

# We can call the add function by passing in double values. The return
# type is a c_double, but can be coerced to a standard Python float
# again.
x = lib_example.add(c_double(2.1), c_double(3.4))
y = float(x)

# The add_point function needs a C structure to be defined in
# Python. We mimic the structure here.
class Point(Structure):
    _fields_ = [('x', c_double),
                ('y', c_double)]

# Having defined our Point structure, we can set the return type.
lib_example.add_point.restype = Point

# We construct two Points to add
a = Point(1.0, 2.0)
b = Point(3.1, 4.1)

# We call the add_point function with our Points. Since add_point
# takes values by const reference, we have to use the byref function.
c = lib_example.add_point(byref(a), byref(b))
print(c.x, c.y)

When run, this Python file outputs

$ python3
4.1 6.1