Beginner’s guide to Unit Testing

This post is for those who heard about Unit Testing but never did it before. What it is about? Why? Is it complicated? How do I start? Hopefully I can answer all these questions for the people that is considering whether they should invest time on this.

The two types of Testing

Testing, without “Unit” on it, in software means that we should check our software before releasing it into the public or customer. Bugs are inherent to coding, the more you code the higher chance of getting bugs on it.

Releasing buggy software causes very bad reputation and often damage to the users because data loss, corruption, unwanted access, etc.

The developer usually while it codes the new features tests them to make sure it’s looking and performing as expected. But this type of testing is so flawed that usually it doesn’t even count as “testing”. Developers tend to overlook side questions, they only test the minimal amount necessary to get the job done and nothing more.

To avoid this, the first step usually done is to add a team of testers to verify every single feature manually. This is called QA (Quality Assurance). It is very expensive, as the team of testers has sometimes equal amount of people as developers. The same parts of the application have to be retested every time that anything that remotely could affect changes.

This is what we call “Manual Testing”. Expensive, but simple to deploy.

To reduce the load on manual testing and make it cheaper, we add on top automated testing. Automated tests are anything that can validate any part of the software using scripts. This doesn’t remove the need of a QA team, but it reduces their load significantly.

A perfectly automated testing could completely replace the need for a QA team, but this is unlikely. Because QA teams are expensive, sometimes they are replaced by other means of testing, like canarying and/or A/B testing.

Manual testing

  • Expensive (lots of human resources)
  • Easy to deploy (just deploy a testing server and/or app and you’re good to go)
  • Good to catch complex patterns
  • Able to catch design or usability flaws
  • Bad to catch convoluted bugs reliably

Automated testing

  • Cheap once deployed (just pay for extra servers that will do tons of testing continuously)
  • Hard and expensive to deploy (takes a lot of time to create good automated tests that do work)
  • Good to re-catch known failure scenarios, even if convoluted
  • Unable to find or discern about design or usability flaws
  • Hard to setup to perform complex patterns or interactions

What is Unit Testing and why does it exist

Complete testing solutions are not only hard and complex to deploy (as it involves emulating users), it is also costly. Sure, compared to manual testing it is quite cheap. But most of the times the full automated testing takes too long, from hours to days.

Having to wait for days to know the result is a big deal. We’d like to have continuous results, showing if we introduced a bug in our change probably even before we submit it. This is not possible (or really hard) since it usually involves a full deployment, server and a lot of CPU time to perform all the tests.

Unit tests solve this problem by testing just one small thing at a time (one unit, hence the name). They don’t test interactions between different systems (even avoiding/emulating database behavior), and they’re so simple, that when first presented they look stupid and useless.

In Unit testing we test one function at a time, or in objects, we test one method or operation at a time.

Integration tests fall in the middle, they test the correct inter-operation between two or three systems at a time. So they cover better the correct behavior of the program.

Usually unit testing and integration tests are mixed together. Depending on the system or the framework they might be on the same folders or separated. For what it’s worth in this article, we can treat them as more or less the same thing.

How Unit Testing works

Imagine you have a function in Python that returns the greatest of two numbers:

def greatest(a, b):
    if a > b:
       return a
    else b

We could write the following tests:

import unittest

class TestGreatest(unittest.TestCase):
    def test_great1(self):
        self.assertEqual(greatest(1, 6), 6)
    def test_great2(self):
        self.assertEqual(greatest(6, 1), 6)
    def test_great3(self):
        self.assertEqual(greatest(2, 2), 2)

Even if it looks stupid, these tests make sure that the three main options for our function is covered: a > b, a < b, a == b.

Why is this useful?

Functions and objects should be understood as black boxes that work under a contract. What this means is:

  • They have a regime for correct operation. Like a car engine is expected to work between 800 RPM and 6000 RPM, functions are expected to work when certain criteria is met.
  • Under the operation regime, having a correct output is guaranteed.
  • Hence, we can forget about the internals of the function and just rely on their contract. We see them as black boxes.

If every function, object and component of our system is guaranteed to work, then no bugs will arise from their internals. Just from the interactions themselves.

For the interactions, we can do integration tests, or mathematically prove that a function will never ever call a function outside of its contracts.

Having done this, our application must be bug-free. Well, not quite, but you get the point.

Because Unit Tests can prove the correct behavior, they are useful tools to catch errors in the earliest stages of development.

Because they’re so small and contained, their setup and run is quick, so it can be done before submitting the code. For the same reason, they can be heavily parallelized; if you have a good CPU and a bit of RAM, you could run more than 10 in parallel.

As they test the contracts, they not only catch the error, but also pinpoint the reason; this is because the test itself is a simple demonstrative example, when it goes wrong, there are not much options to wonder why it happened. Analyzing unit test failures is much easier than a failure in the real application.

But the major reason why we do unit testing is because it gives a strong foundation for refactoring and adding new features. Also when merging big changes if that’s something that you do in Git or Mercurial.

When we add a new, complex feature, it’s not uncommon that we break existing code. Unit tests will catch this very early and avoid this common type of errors.

Coverage metrics

One of the most important thing in unit testing is knowing your coverage. This is basically how much code was executed during the unit tests. If our tests don’t execute a function we know that function is completely untested. It’s really good to know where we need to spend the effort testing.

It is usually measured on the percentage of lines tested, compared against the total of lines of code in the application.

What percentage of coverage should we aim for?

  • ~90%: Excellent coverage. Just keep an eye to avoid it falling back again.
  • ~80%: Good coverage. This is the usual target for most teams.
  • ~70%: Average coverage. This ensures the application is mostly tested and you should see benefits from it.
  • ~60%: Minimal coverage. This is low and will rarely catch any bugs. Just ensures that basic stuff does work.
  • <60%: Not enough to see benefits from it. Getting to 50% is easy, but from there it gets more complicated.

Where to start

Each programming language and framework has its stack for doing unit tests. Often there are even many options to choose from. I would advise to start from the most supported, most basic one. In Python this would be using the module “unittest” included.

Don’t go overboard. Start simple, test a bit where it’s easy, and leave the complex stuff for later. If it looks hard, move on to the next function or file.

Testing the complex parts will become easier as you get used to it and as you start adding the boilerplate needed to setup those cases. This will get reused (or partially reused) in later tests.

Unit testing takes time from development, be sure to account for it. This will be between 30% more and 100% more time; depends on what needs to be tested and what amount of coverage you want.

Types of unit test

Common question here is, how much should we cover in the test? Do we need to test every single possible scenario or value? I have come across several types:

The “it runs” test

Basically you just run the function, it doesn’t fail, that’s good enough.

def test_great1(self):
    // FIXME: Complete this test later
    greatest(1, 6)

This is simply not enough. The main problem of doing this is that coverage will show the function as tested and you’ll most probably never come back to fix the test. If you do this, remember to add a comment you can search later to remind you that it’s not done.

When to do this? When the unit to test is simply too complex to prepare a proper test and predict the result; Proving that a fairly complex piece of code runs and does not error out can be useful.

But, as said before, this is not enough. Avoid it if possible, and if not, try to not to forget to fix it later. Sometimes there are time constraints that prevent us from doing proper testing. Speak to your manager if this happens and specially if they’re accumulating in the codebase.

It might be a good strategy in few scenarios, for example if you know that this piece of code is going to be scraped soon anyway. It might make more sense to do manual testing on those because is a one off.

The “happy path” test

Basically a “happy path” is a single scenario that is expected to work. This is the minimal amount that is considered sufficient for a unit test. It should check the result.

def test_great1(self):
    self.assertEqual(greatest(1, 6), 6)

When to do this? When implementing the first test, this is the only thing that we really need to do. If at least one happy path is tested, it’s considered already a good coverage of the code to start from. You can implement better tests later, focus on having more functions tested first.

The “table” test

A test that performs a suite of input tests from a table. The table is just an array of inputs and outputs, and performs the same logic and checks on all of the entries.

When to do this? Whenever we want to implement “proper testing”. We implement a set of scenarios so we have a good coverage that doesn’t require later additions. Also it’s quite easy to add new tests by expanding the table.

The “expected failures” test suite

These tests ensure that the unit does fail in predictable ways, and tests all possible cases ensuring that the code bails out when expected. In programming, failing properly should be as important as working properly. If a function encounters a scenario outside its contract it should bail out.

When to do this? When we need to make sure the unit performs exactly as expected, and we want to start leveraging on the contract of the function. This is done usually when the coverage gets above 80%.

The “all paths” test suite

Testing that not only one path in the code works, carefully checking with the coverage to ensure that all conditions are tested both for true and false, and loops are tested both executing several iterations as none.

When to do this? When we want to reach near 100% coverage; critical functions that are used widely or their failure can be disastrous for the application.

The “fuzzy” test

In some cases we can pass a bunch of combinations of data randomly generated inputs to surface errors that we didn’t thought about. This is not usually done as those tests will pass sometimes and fail others. They’re meant for other stages of testing.

When to do this? Never. Maybe just locally to try to discover problems; but this is never meant to stay in the sources. It can bite you later with random failures.

Mocking

Unit testing requires sometimes complicated setups that are or costly or simply not possible to do. As we want to test one component, we want to abstract from their dependencies. So that’s where mocks come in.

Mocking is a way to emulate dependencies, so for example databases or other libraries.

Imagine your function will communicate via serial port to another device. If ran as it is, it will try to perform this operation, and thus it will fail unless the expected device is found. If it works, it might be affecting production. So we can mock those calls in a way that emulates the behavior, so we can test the function without having the real device attached.

Myths and fallacies of Unit testing

Unit Testing is really cool, but as with everything, it can become abused. Be sure to know what not to expect from Unit Testing.

It will not prevent all bugs. Having 100% code coverage doesn’t guarantee a program to be bug-free.

It doesn’t replace manual testing or other means of testing. Design or usability flaws usually require humans. Systems are fairly more complicated than what Unit Testing can verify. You still need to run test servers and have other layers of testing.

Coverage percentages might give a false sensation of security.

Bugs can be in tests themselves. If something is improperly designed, it is likely that it pollutes also the tests, making them invalid.

Developers tend to fix test failures by amending the test. This means that the bug remains in the code but now your test passes.