Code complexity

The complexity of code greatly affects the efficiency of a software project. It is surprisingly easy for code to become so complex that it can no longer support any significant enhancement, without first reducing that complexity.

Computer scientists have been measuring code complexity for decades. A widely used and intuitive measure is cyclomatic complexity, introduced in 1976.

Cyclomatic complexity simply measures the number of paths through code. In how many different orders can the commands in a program be executed, depending on the state of the system? If a program is a graph of possible operations, then complexity measures the number of different paths through that graph. Every conditional statement, every if, causes a new branch in the graph. In the worst case, one additional branch can double the total number of paths.

The study of complexity encouraged the movement toward "structured programming," to reduce the number of ways in which code substructures could interact. The goto statement was forbidden.

The idea that blocks of code could be understood and verified independently lead later to object-oriented programming and functional programming. These were more sophisticated ways of isolating blocks of code and minimizing the number of ways they could interact with the rest of the system.

In the early stages of a project, complexity is low, no matter how carelessly code may be thrown together. Functionality increases geometrically for a while, and schedules become optimistic. The more functionality you have, the more easily you can add new functionality, in some rough proportion.

It is not difficult to persuade ourselves, however, that a system has a maximum capacity for complexity. At some point, the number of different ways code can interact eventually becomes too complicated for the human brain to comprehend. With exponential growth, we can reach this point sooner rather than later.

The more paths you have through your code, the harder it is to test, even automatically. Some paths may be very difficult to traverse. You may have dead code and not even realize it.

If no effort is made to restrain code complexity, the velocity of a project will slow and eventually reach an inflection point. Accelerating progress will turn to deceleration. Functionality will continue to grow, but only asymptotically approaching a maximum capacity for the system.

I believe this curve of functionality versus time is the same as for any bounded exponential growth -- the logistic function, also known as the S-curve or sigmoid function.

At least, this limit on functionality and complexity applies to any single indivisible unit of code. When we decouple subsystems from each other, we reduce the amount of code that must be understood at once. If subsystems have well defined roles, then we can minimize the ways in which they interact with other subsystems.

If we use abstractions correctly, then we need only understand how the abstractions interact, not the specifics of each implementation. We again have code we can grasp at once, at both large and small scales.

One of the most insidious sources of complexity is the special case : an algorithm has a sensible general strategy, but certain situations require special behavior. Sometimes the special cases are unavoidable. For example, we cannot divide by zero. Sometimes special cases are added as an afterthought to fix oversights in the original design. Bug fixing is a tempting time to introduce special cases, since they seem to affect only local behavior; yet, they necessarily increase the total number of paths through the code. Every programmer encountering a special case must pause and think through the implications.

Another source of complexity is unnecessary generality.

For a project to survive the bounded growth of complexity, the programmers must eventually consider refactoring. Otherwise, the system will eventually lose its ability to hold new functionality and remain stable: [ Defense_of_refactoring.html ]

Bill Harlan, July 2009

Return to parent directory.