Testing your code

Suppose you collaborate on writing code: how do you make sure none of your collaborators break your code? The answer: introduce quality controls. We can do this by testing the code frequently. To make sure you do this (and do it consistently), we should automate this process, so that we can make sure it happens every time that the code is committed to your repository.


As with other parts of this book, the tools and examples in this chapter use python as an example. The principles which are being introduced are very general, however, and the vast majority of mature programming languages will have testing toolkits available or built-in.

Static Analysis

The simplest level of quality control which most large coding projects expect (or sometimes demand) is that your code follows a given “style guide”. Much like how different publishers will have a view on whether an article should (or should not) make use of the serial (so-called “Oxford) comma, project maintainers will often have strong feelings about the formatting of code in their codebase. This can be both for performance reasons, and to make reading the code easier.

While it’s less important to do this if you’re working along on your own code, it can still be helpful to maintain some conventions (such as capitalising the names of classes, in the case of Python). A popuarly used style guide for Python is PEP8, which aims to make your code as readible as possible.

Enforcing, or checking that code follows a given set of guidelines can be done with tools which perform “static analysis” on the code. These read and parse the code, but they don’t actually run it. They then check the code against a style guide, and against the grammar rules for the language, and attempt to indentify problems.

Two of the most popular analysis tools for Python are Pylint and flake8. These have overlapping feature sets, but running both can sometimes be helpful.

To get to grips with what these do, let’s take two different code examples; the first one has good style, and the other bad.

 2This file contains an example of code written in a way which follows PEP8 guidelines.
 5def hello_world(planet=None):
 6    """Print 'Hello world'.
 7    If another planet is specified then the function will greet that planet.
 8    """
10    if not planet:
11        world = "world"
12    else:
13        world = planet
15    print(f"Hello {world}")
 2def HelloWorld(planet = None):
 6    if not planet:
 7        world= "world"
 8    else:
 9        world=planet
11    print(f"Hello {world}")


Pylint tries to identify a series of stylistic errors in your code, and then provide a score at the end.

The code in goodstyle.py does pretty well:

$ pylint goodstyle.py

Your code has been rated at 10.00/10

But we score a 0/10 for our badstyle.py code…

$ pylint badstyle.py

bad-style.py:5:0: C0303: Trailing whitespace (trailing-whitespace)
bad-style.py:10:0: C0303: Trailing whitespace (trailing-whitespace)
bad-style.py:1:0: C0114: Missing module docstring (missing-module-docstring)
bad-style.py:2:0: C0103: Function name "HelloWorld" doesn't conform to snake_case naming style (invalid-name)
bad-style.py:2:0: C0116: Missing function or method docstring (missing-function-docstring)

Your code has been rated at 0.00/10

The linter tells us which line the problem arises in (the first warning is for line 5, for example), and then gives a short description of how the code is breaking the style guide.


The good code provides no output, as no problems are found.

$ flake8 goodstyle.py

For the badly styled code we get a series of errors and warnings.

$ flake8 badstyle.py

bad-style.py:2:3: E999 IndentationError: unexpected indent
bad-style.py:2:4: E111 indentation is not a multiple of four
bad-style.py:2:4: E113 unexpected indentation
bad-style.py:2:25: E251 unexpected spaces around keyword / parameter equals
bad-style.py:2:27: E251 unexpected spaces around keyword / parameter equals
bad-style.py:5:1: W293 blank line contains whitespace
bad-style.py:6:7: E303 too many blank lines (3)
bad-style.py:6:7: E111 indentation is not a multiple of four
bad-style.py:7:11: E111 indentation is not a multiple of four
bad-style.py:7:16: E225 missing whitespace around operator
bad-style.py:8:7: E111 indentation is not a multiple of four
bad-style.py:9:11: E111 indentation is not a multiple of four
bad-style.py:9:16: E225 missing whitespace around operator
bad-style.py:11:7: E111 indentation is not a multiple of four

You can see that in comparison to pylint there’s a lot of output, and a lot of it relates to spacing and line breaks. I’ll pull-out the E111 indentation is not a multiple of four lines for special mention. These are grumbling about the use of tabs to indent code rather than four spaces; either will work (though spaces are generally preferred in most style guides). However, if you’re using python3 then you can’t mix both indenting styles in the same file.

Unit testing

While static analysis can show up problems with the syntax or the style of your code, most bugs won’t show-up until you’ve actually executed the code. There are a few approaches to testing your code for these sorts of bugs. The simplest is called “unit testing”, where individual units of code are run, and the behaviour of the unit is examined. In Python a unit is likely to be a function or a class method.

Units are tested in isolation from each other, which allows problems to be identified quickly, but generally aren’t a realistic representation of how code is used; I’ll cover the additional methods for this later in this chapter.

There are a number of tools in Python to help you create tests; I’ll use the unittest library here, because it’s part of the standard python libraries which should be present in any python installation. Other tools, such as Pytest make some things simpler, however.

Let’s write a function which we want to test; this is one which checks if a given input is prime.

def is_prime(number):
    """Test if a number is prime. Return True if prime, and False if not."""
    if (number == 2):
        return True
    for i in range(2, number):
        if number % 2 == 0:
            return False
        elif (number % i) == 0:
            return False
            return True

Now, this is a very simple function, which just searches for factors of a number. If it can’t find any it returns True, since that number is prime.

We want to test this; there are some things we might regard as “edge cases” which we should look out for, and we should check that it actually identifies a prime number correctly. Here are some things I can think of which we might want to check are handled correctly:

  • Is 2 identified as prime?

  • Is 1 identified as not prime?

  • What happens when 0 or a negative number is provided?

  • What happens if the input isn’t an integer?

Before we even start writing tests we can see that we actually need to put some thought into answering some of these, especially the last two. We need to define a behaviour in both cases, since it’s likely that most approaches we’ll take to writing an algorithm won’t cover these. This highlights one of the reasons that writing tests is important: they can act like a design specification for your code. In fact, practitioners of test-driven development would suggest that you should write the tests before you write any code.

Let’s look at an example test.

def is_prime(number):
 """Test if a number is prime. Return True if prime, and False if not."""
 for i in range(number):
     if number % 2 == 0:
         return False
     elif (num % i) == 0:
         return False
         return True

So, what’s going on here? Well, fundamentally this file is just a python script, and we’ve added in some boilerplate to help the unittest tool do its job. The first thing is importing both unittest and our function (which lives in a file called prime.py). The tool groups together tests into classes, with each individual test defined by a method on that class. The class needs to inherit the unittest.TestCase class.


Don’t worry if you’re not too clear on what’s going on here with things like class inheritence, it just gives our testing class access to a number of additional tools from the library. For now you can just modify these samples, and as you get more confident in what you’re doing you can consult the documentation on writing more complex tests.

The code which defines the test is contained in the test_number_two method. All that is does is evaluate the is_prime function, and checks that it returns True. The assertTrue method is one of the methods which unittest provides to help us; if the result is False it will cause the test to fail.

Let’s run the test. I’ve saved the script above in a file called test_prime.py. In order to run the test I need to call python on it with an additional option, -m unittest:

$ python -m unittest test-prime.py

FAIL: test_number_two (test-prime.PrimeTests)
Test that two is prime.
Traceback (most recent call last):
File "test-prime.py", line 9, in test_number_two
AssertionError: False is not true

Ran 1 test in 0.000s

FAILED (failures=1)

Well, you can see that we didn’t exactly pass with flying colours… Looking back at the code we can see that it will reject all even numbers as non-prime; we need to handle 2 differently.

Adding a case in the if statement for 2 is one way to do this:

def is_prime(number):
   """Test if a number is prime. Return True if prime, and False if not."""
   if (number == 2):
       return True
   for i in range(number):
       if number % 2 == 0:
           return False
       elif (num % i) == 0:
           return False
           return True

Now our test gives us:

$ python -m unittest test-prime.py

Ran 1 test in 0.000s


Which is good! We’ve fixed a problem in our code! Let’s go on to write tests for the remaining conditions I identified earlier.

import unittest
from prime import is_prime

class PrimeTests(unittest.TestCase):
    """Test prime number checking."""

    def test_number_two(self):
        """Test that two is prime."""

    def test_number_one(self):
        """Test that one is not prime."""

    def test_number_zero(self):
        """Test that zero is not prime."""

    def test_number_negative(self):
        """Test that negative numbers are not prime."""

    def test_actual_prime(self):
        """Test that a known prime is prime."""

Running this we find that there are still problems:

ERROR: test_actual_prime (test-prime.PrimeTests)
Test that a known prime is prime.
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 25, in test_actual_prime
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (num % i) == 0:
NameError: name 'num' is not defined

ERROR: test_number_one (test-prime.PrimeTests)
Test that one is not prime.
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 13, in test_number_one
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (num % i) == 0:
NameError: name 'num' is not defined

Ran 5 tests in 0.001s

FAILED (errors=2)

The first line here, E.E.. reveals that two of the tests had errors; the first and third ones which were run. Then there are individual messages which reveal why the test couldn’t finish; there’s a bug in the code! num isn’t defined. If we fix this (changing num to number) and run again… only to find another error.

$ python -m unittest test-prime.py

ERROR: test_actual_prime (test-prime.PrimeTests)
Test that a known prime is prime.
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 25, in test_actual_prime
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (number % i) == 0:
ZeroDivisionError: integer division or modulo by zero

ERROR: test_number_one (test-prime.PrimeTests)
Test that one is not prime.
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 13, in test_number_one
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (number % i) == 0:
ZeroDivisionError: integer division or modulo by zero

Ran 5 tests in 0.001s

FAILED (errors=2)

This one’s a result of choosing to use range(number); we can update this to range(2, number) and get…

Ran 5 tests in 0.000s


We’ve passed our tests!

I’ve still not included a test for non-integer input, but that one, in classical academic style, is left as an exercise to the reader.

As your codebase evolves you’re likely to end-up with large amounts of code you’ll need to test. A good practice here is to keep all your test scripts in a directory called tests within your project repositorty. It can also be a good idea to separate the unit tests into their own tests/unit directory.

unittest can then be used to run all the tests in that directory using the command

$ python -m unittest discover -s tests/unit

For this to work the scripts need to have names of the form test_foo; the individual test classes in those scripts should have names starting in Test, such as TestSpam, and then the individual test methods should have names like test_eggs.


Unit testing is most useful when all of the code in the codebase is checked by your test suite. Code coverage is a measure of the fraction of lines of the code which are run by a given test suite.

A tool called coverage.py can be used to assess this for the unittests in this section. You can install it by running

$ pip install coverage

and then changing the way that you call your tests to

$ coverage -m unittest test-prime.py

This will gather the information which is needed to measure the coverage, and a report can be made by running

$ coverage report -m
Name            Stmts   Miss  Cover   Missing
prime.py            9      2    78%   7, 9
test-prime.py      13      0   100%
TOTAL              22      2    91%

So we can see lines 7 and 9 are never called by our test suite as it currently stands.

End-to-end testing

Testing that individual functions produce the results we expect is a powerful technique, but it doesn’t reflect the way that code actually runs. End-to-end, or integration testing is used to ensure that several units of code interact the way that we expect.

Writing integration tests doesn’t differ much from writing a unit test, but we need to include all the components of the workflow. This is why separating these from unittests can be sensible; your code might take minutes or hours to run its integration tests, and running them every time we make a change might be time consuming. Instead, by keeping them in e.g. a tests/integration directory we can run them with

$ python -m unittest discover -s tests/integration

Regression testing

When you add new features or tidy-up (refactor) code, you want to ensure that it continues to provide the same results as older versions of the code.

This is where regression testing comes into play. The way that you go about writing this type of test will vary depending on how your code works, but also if you have external requirements, such as whether the code needs to be able to reproduce reviewed results.

For example, if you have a set of reviewed results, you could write a test script which attempts to reproduce these under the same configuration, and checks that the results are still produced after the code is updated.

