Functional Programming

Functional programming is a programming paradigm that aims to structure programs in a manner similar to their mathematical definitions. For example, recursion is generally preferred to looping, and variables are often immutable declarations rather than actual mutating values. Functions are first-class objects, meaning that functions can be passed to other functions as arguments.

As with all languages there are many variations of functional programming languages. We often talk about static versus dynamic, or strongly typed versus weakly typed programming languages; with functional programming languages, we often talk about purely functional versus impure and lazy versus eager languages.

Many languages support programming in a functional style. Python, for example, has fairly good support for functional style. However, We will focus on a lazy, purely functional programming language called Haskell.

Eager languages

An eager language is one where expressions are evaluated fully when they are encountered. Almost all traditional languages are eager, including C, C++, Python, Java, or Matlab.

Lazy languages

Lazy languages, on the other hand, do not fully evaluate an expression until it is required. This potentially allows for more efficient code because only expressions that are actually needed are evaluated. For example, consider the following Python code

def foo(a, b):
    x = expensive(a)
    y = other_expensive(b)

    if a < b:
        return x
        return y

and its direct Haskell equivalent

foo a b | a < b     = x
        | otherwise = y
    x = expensive a
    y = otherExpensive b

In the Python code we will evaluate both the expensive and otherExpensive function, while in the Haskell code at most one will evaluated. Note that we say ‘at most one’ because, depending on what the code calling foo actually does with the result, the return value might never need to be fully evaluated!

This example might not be entirely convincing, since the Python code can be easily refactored to avoid computing more than one of the functions. However, there is another benefit of laziness, which is that it allows us to deal with infinite data structures. Suppose that we are looking for the first three square numbers to satisfy some predicate p (which we know exists). In Python, we might write

def search():
    i = 1

    els = []

    while len(els) < 3:
        if p(i * i):
            els.append(i * i)

        i += 1

    return els

In Haskell, we can just write

search = take 3 (filter p (map (^ 2) [1..])

Although we have the expression map (^ 2) [1..] which appears to square the positive integers, we only evaluate that infinite list in a lazy fashion. In particular, when search is evaluated, take asks for (up to) 3 elements. filter then needs to start looking through elements of the list, and so map starts applying the squaring operation to the elements.

Many non-functional languages have support for lazy evaluation. In fact, Python makes extensive use of lazy evaluation with its generators. The above code could be re-written as

def squares():
    i = 1
    while True:
        yield i * i
        i += 1

def search():
    matches = filter(p, squares())
    for i in range(3):
    return els

which could operate in a similar fashion.

Purely functional languages

A purely functional programming language is one in which there is no mutation. Variable declarations are in fact simply names assigned to some value which can be obtained by evaluating an expression (which in turn contains only pure expressions. Functions are functions in the mathematical sense, in that they map a set of inputs to a set of outputs, and do not perform any side effects.

An immediate consequence of this is that traditional looping constructs are not possible, since the iterator variable in a for-loop or the conditional in a while-loop cannot change. To compute the the $itext{th}$ Fibonacci number in Python, we might write

def fib(i):
    a = 0
    b = 1
    j = 1
    while j < i:
        (a, b) = (b, a + b)
        j += 1

    return b

In Haskell, a direct translation of this code would be

fib i = fib' 1 0 1
    fib' j a b | j < i     = fib' (j + 1) b (a + b)
               | otherwise = b

Note that no variables are ever mutated; rather, we simply recursively call functions with new arguments.


Haskell is a purely functional lazy programming language. One of it’s greatest strengths is it’s hierarchy of abstractions which allow us to succinctly express common ideas.

Implicit Recursion

Explicit recursion is ugly and almost as error-prone as manipulating indices directly. Consider defining the functions sum' and product' that compute the cumulative sum and product of a list of numbers, respectively. We could write

sum' []     = 0
sum' (x:xs) = x + sum' xs

product' []     = 1
product' (x:xs) = x * product' xs

But we’re really expressing a fold, which is a function used to reduce a list-like structure with a binary operation into some summary value. We can re-write the above code as

sum'     xs = foldr (+) 0 xs
product' xs = foldr (*) 1 xs

The symmetry between these functions is (perhaps) more apparent with these definitions.

Another slightly more interesting example is in defining the (infinite) Fibonacci sequence, where we could write

fibonacci :: [Integer]
fibonacci = 1 : 1 : f 1 1
    f a b = (a + b) : f b (a + b)

This definition follows naturally from the mathematical definition, but what we’re really doing is combining two sequences with a binary operation. The two sequences are the Fibonacci sequence itself and the tail of the Fibonacci sequence, where the tail of a list is everything but the head of the list, and the binary operation is addition. We can instead write

fibonacci :: [Integer]
fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci)

Idiomatic Haskell code rarely relies on explicit recursion, instead relying on these higher order functions.


The next generalization we consider are functors. Functors are data structures that can be ‘mapped over’. For example, consider the Maybe data structure that represents an optional value.

data Maybe a = Just a
             | Nothing

A Maybe a value can either contain a value of type a, or nothing. If we have a Maybe a value, it would be nice to be able to apply functions from a -> b to get a Maybe b value, where in the presence of a Nothing value we do the natural thing and return a Nothing. To be concrete, suppose we have implemented each of

coolFunction :: Int -> Double
readInt :: Maybe Integer

How can we combine these two functions to get a Maybe Double? We could write

case readInt of
  Nothing -> Nothing
  Just x  -> Just (coolFunction x)

With functors, we can instead write

fmap coolFunction readInt

Some more examples follow.

>>> fmap (+ 1) (Just 1)
Just 2
>>> fmap (+ 1) Nothing
>>> fmap ("Hello " ++) (Just "Patrick")
Just "Hello Patrick"
>>> fmap ("Hello " ++) Nothing

Many data structures are functors.

>>> fmap (^ 2) [1, 2, 5]
[1, 4, 25]
>>> fmap (^ 2) []
>>> fmap (* pi) ("Any type here", 2)
("Any type here", 6.283185307179586)

You can, of course, define functor instances for your data structures.


Applicative functors generalize functors. In particular, with functors we were able to add a regular number to a Maybe Int, but we can’t (easily) add two Maybe Int values to get a Maybe Int value.

>>> (+) <$> Just 1 <*> Just 3
Just 4
>>> (+) <$> Just 1 <*> Nothing
>>> (+) <$> Nothing <*> Just 3
>>> (++) <$> Just "Hello " <*> Just "Patrick"
Just "Hello Patrick"
>>> (++) <$> Just "Hello " <*> Nothing

This is quite powerful. Imagine you are trying to solve a linear program described by the matrix \(A\) and vectors \(b\) and \(c\) where \(A\), \(b\), and \(c\) are read in from a user-provided file. Since the user might make a mistake in typing the files, to read these files we have to return a Maybe value to represent the possibility of failure. We can only run our solver if none of \(A\), \(b\), or \(c\) return Nothing. To make this concrete, suppose we have implemented each of

readA :: Maybe Matrix
readB :: Maybe Vector
readC :: Maybe Vector
simplex :: Matrix -> Vector -> Vector -> Solution

How can we chain these functions together to get a Maybe Solution? If we were to do this without applicative functors, we would need to write something like

solve :: Maybe Solution
solve = case readA of
  Nothing -> Nothing
  Just a  -> case readB of
    Nothing -> Nothing
    Just b  -> case readC of
      Nothing -> Nothing
      Just c  -> Just (simplex a b c)

This is extremely tedious. Instead, we can simply write

solve = simplex <$> readA <*> readB <*> readC

There are more exotic instances of the applicative typeclass. For example, lists implement the applicative typeclass by creating the Cartesian product of possible outputs.

>>> (*) <$> [1, 2, 3] <*> [1, 2, 3]

Hypothesis testing

A great example of the power of Haskell is the QuickCheck library. The idea of this library is to write propositions that map an input to a boolean, where returning true indicates a successful test and returning false indicates a failed test. What differentiates this from standard unit testing is that we only specify the types of the input, and not the input itself. The library randomly generates many valid inputs and tests the propositions on each one, looking for a counter example.

Consider the following (incorrect) sorting algorithm.

badSort []     = []
badSort (x:xs) = badSort lt ++ [x] ++ badSort gt
    lt = filter (< x) xs
    gt = filter (> x) xs

We can find the error by defining

import Data.List
import Test.QuickCheck

testBadSort :: (Ord a) => [a] -> Bool
testBadSort xs = badSort xs == sort xs

which simply compares the output of our sorting algorithm to one which is (presumably) correct. We can then run

>>> quickCheck testBadSort
*** Failed! Falsifiable (after 5 tests and 1 shrink):

This tells us that it found a counter example to our proposition, namely [(), ()], where () is pronounced ‘unit’. It is a singleton value that compares equal with itself. If we run our sort on this input, we can see the problem.

>>> badSort [(), ()]

More generally, we can see

>>> badSort [1, 1, 1, 1]

The problem is our algorithm was assuming that elements were unique. The fix is simple:

goodSort []     = []
goodSort (x:xs) = goodSort lt ++ [x] ++ goodSort gt
    lt = filter (<= x) xs
    gt = filter (>  x) xs