The complexity of code greatly affects the efficiency of a
software project. It is surprisingly easy for code to become
so complex that it can no longer support any significant
enhancement, without first reducing that complexity.
Computer scientists have been measuring code complexity for
decades. A widely used and intuitive measure is
cyclomatic complexity, introduced in 1976.
Cyclomatic complexity simply measures the number of paths
through code. In how many different orders can the commands in
a program be executed, depending on the state of the system?
If a program is a graph of possible operations, then complexity
measures the number of different paths through that graph.
Every conditional statement, every ``if'', causes a new branch
in the graph. In the worst case, one additional branch can
double the total number of paths.
The study of complexity encouraged the movement toward
"structured programming," to reduce the number of ways in which
code substructures could interact. The ``goto'' statement was
forbidden.
The idea that blocks of code could be understood and verified
independently lead later to object-oriented programming and
functional programming. These were more sophisticated ways of
isolating blocks of code and minimizing the number of ways they
could interact with the rest of the system.
In the early stages of a project, complexity is low, no matter
how carelessly code may be thrown together. Functionality
increases geometrically for a while, and schedules become
optimistic. The more functionality you have, the more easily
you can add new functionality, in some rough proportion.
It is not difficult to persuade ourselves, however, that a
system has a maximum capacity for complexity. At some point,
the number of different ways code can interact eventually
becomes too complicated for the human brain to comprehend.
With exponential growth, we can reach this point sooner rather
than later.
The more paths you have through your code, the harder it is to
test, even automatically. Some paths may be very difficult to
traverse. You may have dead code and not even realize it.
If no effort is made to restrain code complexity, the velocity
of a project will slow and eventually reach an inflection
point. Accelerating progress will turn to deceleration.
Functionality will continue to grow, but only asymptotically
approaching a maximum capacity for the system.
I believe this curve of functionality versus time is the same
as for any bounded exponential growth -- the logistic function, also known as
the S-curve or sigmoid function. (See a graph.)
At least, this limit on functionality and complexity applies to
any single indivisible unit of code. When we decouple
subsystems from each other, we reduce the amount of code that
must be understood at once. If subsystems have well defined
roles, then we can minimize the ways in which they interact
with other subsystems.
If we use abstractions correctly, then we need only understand
how the abstractions interact, not the specifics of each
implementation. We again have code we can grasp at once, at
both large and small scales.
One of the most insidious sources of complexity is the *
special case *: an algorithm has a sensible general
strategy, but certain situations require special behavior.
Sometimes the special cases are unavoidable. For example, we
cannot divide by zero. Sometimes special cases are added as an
afterthought to fix oversights in the original design. Bug
fixing is a tempting time to introduce special cases, since
they seem to affect only local behavior; yet, they necessarily
increase the total number of paths through the code. Every
programmer encountering a special case must pause and think
through the implications.
Another source of complexity is
unnecessary generality.
For a project to survive the bounded growth of complexity, the
programmers must eventually consider refactoring. Otherwise,
the system will eventually lose its ability to hold new
functionality and remain stable:
[[Defense_of_refactoring.html]]
Bill Harlan, July 2009