Defense of refactoring

§ Definition

Here is one definition of code refactoring taken from Wikipedia:

Code refactoring is the process of changing a computer program's internal structure without modifying its external functional behavior or existing functionality, in order to improve internal non-functional properties of the software, for example to improve code readability, to simplify code structure, to change code to adhere to a given programming paradigm, to improve maintainability, to improve performance, or to improve extensibility.

Martin Fowler's influential book "Refactoring" stresses that refactoring proceeds by small testable changes, even when the ultimate goal is a major rewrite. [ http://www.refactoring.com ]

§ Objections

It is not always easy to justify such an effort to a program manager, product manager, or product architect. Since they see no change in functionality from refactoring alone, they anticipate only a downside. They fear instability from mistakes. They see less immediate effort given to urgent new features. In short, they do not trust their developers very much. The developers are just having fun with a "science project" or indulging some new programming fad. Developers need more adult supervision than this.

There is another source of instability and inefficiency that is less visible to outsiders: dangerous code. The system is hard to understand and easy to break. If it has insufficient unit tests, then no one may even be sure what it is supposed to do.

§ Crisis

A common crisis occurs when a monolithic application has evolved into a framework. Every object seems bound to every other. No one can write unit tests without testing the entire system at once. No one can foresee all the implications of any change. Frankly, no one is clever enough to program safely in such an environment. New functionality is likely to copy old functionality, cutting-and-pasting fragments of similar code from elsewhere. New state and metadata will be attached to whatever objects seem to be near at hand, muddying the dependencies even more. At some point, code complexity reaches a maximum sustainable capacity. There are so many possible states and paths through the code that any modification introduces more bugs than it removes. There are even metrics of complexity to predict when code has become unmaintainable. [ Code_complexity.html ] We cannot add more complexity to such code before some is removed.

§ Response

Refactoring a monolith usually aims to isolate identifiable subsystems and decouple unnecessary dependencies. If a unit of code, say one class or one package, has an identifiable role, then we can define a clear protocol for how that unit interacts with other objects. Services and clients can hide behind abstract interfaces, so we can imitate (mock) their behavior without actually creating them. We can now write unit tests for each role and make sure that any future modifications preserve the contract. We now have a unit of code that can be understood and documented by a single person in a reasonable amount of time. [ Unit_Tests.html ]

§ Risks

There is risk in this sort of refactoring because early steps can only proceed with larger scale system tests. The behavior of the original unit of code may have been inconsistent, depending on the client, perhaps as a workaround or patch of previous bugs. Such band-aids have to be removed eventually. Latent bugs may reappear until they are approached more systematically.

The danger in a large project is that is easier to add complexity than remove it. Ten programmers, or groups, may simplify code only to see one new arrival cancel out their effort. An undisciplined coder can exploit the temporary improvement to hack in a new feature quickly. It is possible to maintain complexity at a low enough level for all groups to be productive, but only if all understand the cost of poor code. An organization that only rewards immediate visible progress will find itself with code that never ever stabilizes.

Bill Harlan, June 2009

Return to parent directory.