Unit Tests

I now take unit tests for granted and can no longer program effectively without them. Code without unit tests seems unfinished. But it took me a while to reach this position.

You can't simply order developers to begin writing effective tests. They have to believe in them first.

My motivation could be summarized as "make it hard for others to break your code."

§ What is a unit test?

Here's one useful definition I've seen: Unit tests are written by developers for the benefit of developers. In contrast, system tests are written by testers, to automate their requirements as representatives of users. This is a useful distinction so far as it goes.

A unit test is usually written from the inside out, starting with the smallest possible units of code. A unit test guarantees the integrity of some self-sufficient unit of code, usually a class, or perhaps just a single function. A unit test is often invasive, looking inside a class, and asserting that its state remains valid as it is exercised. A user cannot write such a test. A system test, on the other hand, is written from the outside in, starting with the interfaces visible to the user.

A unit test increases the coarseness of bugs, eliminating the fine-grained bugs that are hardest for a user to analyze, being buried too deeply.

In addition to unit tests and system tests, we could also distinguish integration tests. These test middle-level interaction between different independent subsystems, or interactions between classes from different sources. These are less common, harder to write, and perhaps less essential. Yet a few of these tests can go a long way, especially when the subsystems do not have unit tests of their own.

You may also have regression tests that that compare new results against past results. These tests have their place, but they are closer to a system test. You still need tests of your logic and design. Regression tests require more maintenance and are more likely to be discarded. Data may change, but the consistency of your design should not.

In the end, I have not found it necessary for the test harness or framework to distinguish these different kinds of tests. All are invoked in the same way, on the same occasions. All are run frequently and should never break.

§ Ground rules

There aren't many essential rules to managing unit tests. Any test is better than none. But a test must be run to be useful.

Tests should be an integral part of any build. Tests should run immediately after compilation, and if a test fails, the build has failed. Anyone downloading our code and building it should run the tests by default. We can even make it possible for a client to run all the tests, to look for bugs that depend on the environment.

Of course, a failed test isn't really as bad as a failed compilation. A compilation error prevents others from compiling their code, and a broken unit test will only affect those using the tested code. But what if other developers are depending on that functionality right now? They still won't be able to trust any of their new work. The compilation might as well be broken. Kent Beck points out that a broken test means you still have an unknown amount of work to do. You might need a minute to fix the problem, or days. You might have a fundamental flaw in your design.

My projects for the past few years have striven for continuous integration. Small frequent changes are checked into the repository all day long, and builds occur almost continuously. As soon as anyone decides the code is better than it was, the change is committed. We frequently update our personal working copies with all changes made by others. Some of us do repeated clean builds on our machines every day. This way we guarantee that all our changes are compatible with all others.

Here are my preferred rules:

Run all tests before you check in code.
If you break a test and can't fix it right away, undo your changes.
Do not leave a test broken overnight.
You are not allowed to disable an existing test.

I think most developers understand these rules. If they have written tests for their code, they don't enjoy seeing them broken, even for a while. It is sometimes easy to blame the test. If the coder has changed the behavior of the code, then the tests also need to be modified to redefine the expected behavior. It may be discouraging at times to see how often the tests break. But we realize how much worse it would be if so many bugs went unnoticed.

§ Testing in a framework

First of all a test framework should not get in your way. A test framework should make it easy to add and run tests, not become an burden in itself.

Unit tests are expected to be relatively fast, but a slow test is better than no test. My last project took 10 minutes to compile, and over an hour to run all the tests. It is useful to have tests in a hierarchy, so that you can test only those packages you are currently modifying.

A test framework usually runs tests in-process, but not necessarily. Each test is run inside a sandbox with a known environment. A single failed test does not prevent other tests from being run. After running all tests, the framework can also check for leaked resources, unfreed memory, unterminated threads, and open files.

A test asserts the correctness of code. A failure causes some error condition that cannot be ignored. You can call an assert method defined by your test framework if you like. You should also be able to use the assert defined by your language, or to throw an uncaught exception. A test should never require a human being to examine some output and approve.

Tests run silently unless they fail. Do not write to standard output unless you have an error to report. You might obscure an error message from some unknown library. A silent test is a happy test. If you have a logging framework, then you can selectively turn off informative messages while the tests run.

Some java programmers put their test code in a class main method. Some prefer separate classes in separate packages. Some prefer to extend classes from the junit framework. In my builds, any of these classes can be added to a list of similar tests, depending on how they should be invoked. The developers do not have to write any complicated hooks or understand a complicated framework.

If your C++ programmers prefer separate executables for each test, then they should be able to add the names of those executables to a list. If the exit code is non-zero, then the test has failed.

Above all, don't let the awkwardness of the test framework become a excuse to avoid writing tests. Be tolerant and support as many styles of tests as you can.

§ xUnit

One very popular framework is the xUnit set of standard interfaces. There is jUnit for java and cppUnit for C++. This framework essentially makes it easier to build a tree of tests. A developer can easily select and run a single branch of the tests. I like this framework as an outermost integrator of tests. Individual tests can remain ignorant of the framework if you provide simple wrappers and bridges to the framework.

The xUnit framework seems simplest for languages that allow runtime type inspection. In Java, your test class derives from a single TestCase superclass. You add an arbitrary number of methods that begin with the letters "test". You add the name of the jUnit test class to a list and you are done. The framework loads each class, inspects it, then creates an instance for each time it runs a "test" method. If all tests share a certain setup, you can define a "setUp" or "tearDown" method. (I rarely find these necessary.) The cppUnit framework looks heavier than the jUnit and not quite as functional. But C++ can still support a runtime inspection of symbol names in a platform-dependent way.

The first version of the jUnit framework was just a few hundred lines of code. It isn't hard to write a framework of your own, but there may not be any reason to. Just keep it simple.

Kent Beck and Eric Gamma give an excellent example of jUnit testing in "Test Infected: Programmers Love Writing Tests": http://members.pingnet.ch/gamma/junit.htm

§ Organizing tests

Many begin by arguing about where tests should go in the code tree, and how the files should be named. A consistent naming convention can make it possible for the framework to discover and run all new tests, without a separate list. Otherwise, I don't think naming matters much. I personally prefer the test code to be as close as possible to the implementation. If the tests are short, they can go into the same file as the implementation. If the tests are longer they can go into a separate file in the same directory. If the tests entail new dependencies that should not be required by the implementation, then you may want to put the tests in a subdirectory called "test." Don't let the tests get so far from the implementation that they are overlooked by a developer who modifies the code.

§ Getting started

Even if you do not currently run tests as a part of your build, you probably have some tests already checked into your repository, written for someone's personal use. Start with those. Your framework should make it easy to add tests that were not written for the framework. It seems to help if one developer has the responsibility for trying to integrate existing tests and making it easier for others to add new tests.

You'll have a large amount of existing code without tests. You'll never have time to add tests for everything, but you probably won't need to. Your first priority should be to protect code with many dependencies. Test things you are worried about breaking. There is no need to test a set/get combination if it only saves and returns a single member variable.

Add tests any time you modify code, yours or someone else's. If you aren't sure what the code does, make a guess, and assert you are correct. If you are correct, check it into the permanent tests. You'll know when your assumption changes. You will be able to modify the code without breaking that existing behavior.

§ Try writing tests first

The best tests are those that are written when the code is written. Tests help you improve your design and your API's. Try writing some tests first, before you write the implementation. Think like a user of your class, and make your API convenient for that user. What inputs will be most convenient? How should steps be combined or broken apart? Make tests that use your class in a typical way. Then decide how your code should respond to bad data, parameters out of range, or with zero length. Add tests that assert the correct behavior. Write tests that ensure errors generate the correct exceptions, by catching the expected exception and complaining if it is not thrown. When all your tests finally pass, you are done. You won't be distracted by features that you don't need. Your API's will reflect how they are used rather than how they are implemented.

See Martin Fowler's book on "Refactoring" for great examples of how testing improves design: http://martinfowler.com/ http://www.refactoring.com/

§ Unit tests for GUI's

GUI's are indeed a special problem, but with a very satisfying solution.

Everyone has seen products that record mouse movements and button events. The tests are very tedious to maintain. They often depend on specific pixel locations or a particular layout. At best they depend on the existence of specific components, without specifying a layout. If you ever replace a radio box by a pulldown, the tests will still break.

Instead, test the responses to GUI events, not the way the events are generated. To do so, you will almost be forced to improve the separation of your GUI code from your logic and underlying model. This is both a pro and con. Your code will improve, but you can't add unit tests to existing code until it does improve.

Put your non-trivial functionality behind an API that does as much as possible without involving GUI code. If the user pushes a button, you can call a method that is used by no other component but that button. The method may contain only a few lines of code, but that's okay. Someday, you may want to call it for a different event, or with a different kind of GUI. Call these high-level methods from your unit tests as if they had been called from the GUI, in response to a typical series of events. Confirm that the state changes as expected.

You won't be testing that your GUI calls the correct methods, but that is not where most of your bugs will hide. Remember, your unit tests are meant to increase the coarseness of your bugs. You're putting the remaining bugs at a high level where a user can understand and test them.

You might consider making your GUI scriptable. I worked on one product with a GUI in a separate process from servers that did all the important work. GUI events caused TCL commands to be sent through a socket to the server. We could record these commands and save them to a script. The script was easily executed as a part of the tests, and also easily modified by hand.

See this example for how it feels to develop a unit test for a GUI: http://www.xp123.com/xplor/xp0001/

§ Why should developers like tests?

Tests define the problem you are solving. Tests limit the scope of what your code is expected to do. If you haven't tested something, you aren't promising that it will work. Tests provide documentation that never grows stale. You have example code of exactly how your code can and should be used. Tests are a warning to other developers that this much functionality must be guaranteed and protected. Tests protect you against code rot and give you the courage to make further changes, knowing that what you have already done will stay done.

Recently I noticed that I was consistently spending more time writing my tests than my implementations. Many might think that this makes me more inefficient than before. Yet, I am spending much less time overall on implementation, especially if I count the reduction in time for stabilization and bug fixing.

§ How do you measure progress?

The first time your team adds unit tests systematically to the build, you will be unable to quantify how much it will help you meet your schedule. If you practice shipping your code in shorter iterations, you should be able to measure an improvement. When you agree on the features to be added over the next couple weeks, measure the amount of time you spent stabilizing these features, once your functionality was considered "code complete." This part of each cycle essentially detects bugs that were improperly tested before. If stabilization does not shorten with each iteration, then tests are not yet helping. If you ever see bugs recurring, after they were fixed once, then clearly unit tests are being neglected.

§ Peer pressure

Some of your developers will resist for a long time, and some will resist forever. Ordering them to write unit tests will just result in resentment and worthless tests. Peer pressure is much more effective. Eventually one of these guys will break someone else's test. It will no longer be possible for them to argue that the tests don't help. A bug was discovered almost immediately after it was introduced. Your enthusiastic coders should offer to show others how to write a test. They should be able to demonstrate how easy it is. If they help fix someone else's bug, they can leave behind a little test to ensure that the bug stays fixed. It will also serve as an example and a reminder.

Bill Harlan, 2004

Here is a presentation I gave in 2011 to encourage developers to write more tests: [ 2011tests.pdf ] .

Return to parent directory.