Tunnel vision. Every engineer suffers from it at one time or another. A programmer gets so lost in the details they miss the forest through the trees. Instead of writing clear, concise, well organized code, little things may fall through the cracks. Two functionally identical behaviors get implemented. Code creeps. Behaviors get jumbled together. All in the name of delivering that feature that was requested. It happens, and even stellar engineers may not even be aware for months. At the time, the code made sense, and thus assumed it was good enough. The code is reviewed, merged into the mainline, and time passes. Eventually, a new feature is requested and you, my lucky friend, are tasked with doing the implementation. You peruse the code, and your head spins. What does ‘variable’ do? Wait, didn’t I read this exact code segment above? Man, the feature I need to implement would be easy if I could leverage this portion that is in the body of these two other functions. So, what do you do? These are all hints that the given code could be refactored; the process where functional code is rewritten to keep behaviors identical while improving maintainability and extensibility.
When Should I Refactor Code?
Clarity, code duplication, and extensibility are all facets of technical debt and point to a need for refactorization, however, they should not dictate a refactor occurring. Instead, time is the mitigating factor here. As with any job, an engineer has many demands on their time and reinventing a wheel may not be the best use. An engineer must consider if the effort required to clear the technical debt is worth the cost in time. Not only in present implementation time, but also in consideration of future time sink as well. Did the ramp up time on learning the code seem unreasonable? Were there things that took longer to understand than if the code was written more clearly? If so, these are good candidates for refactorization.
Code Refactorization Considerations
Determining if code needs to be refactored is the easy part. Deconstructing and rebuilding the code in an effective manner is where the cost of a refactor occurs. Refactoring can seem daunting. Unclear variable names, magic numbers, poor use of global variables; these are issues that will need to be addressed but put them on the back burner initially.
Before beginning the nitty-gritty of refactorization take a moment and evaluate if you have unit tests in place that cover the currently implemented functionality. Unit tests are a powerful development tool that clearly define scope and keep an engineer designing towards the spec and reducing the chance of missing functionality. Used in initial development, they add two important safety valves to an engineer during refactorization. Firstly, it is a good way to refresh or learn what the true goal of the code was intended to be. Secondly, and of more pertinence, it provides test cases that ensure no functionality is lost in refactorization. If no unit tests exist, it is well worth the time to create the unit test before diving into refactorization.
When beginning a refactorization, as with writing new code, consider the design first. Does the functionality in a file or module make sense being there? Often, design choice is the most systemic issue contributing to a need for a refactor. Consider the design, and if there is an opportunity to compartmentalize functionality into similar containers and create accessor functions for variables owned by these logical containers, it is usually in an engineer’s best interest to do so. This clarifies what specific variables do, while having the added benefit of better-defining the scope. If two counters are defined with similar names, but for different purposes moving them each into corresponding modules eliminates the chance of confusing their usage. Furthermore, providing these variables with accessor functions further clarifies which module they are being modified in. To continue, the compartmentalization coupled with accessor functions can also eliminate global variables from modules that do not rely on the variable itself, which can reduce namespace pollution and size of initialized data read from the program file.
Compartmentalization can also help with stack considerations. A poorly designed program may have stack corruption issues generating difficult to diagnose defects as as the stack pointer goes off into the weeds. Compartmentalization of code into independent threads (OS or interrupt contexts) can provide the benefit of localizing defects, and thus making it easier to pinpoint the offending code.
Another design consideration is the principle of generalization. Code creep can occur because the same functionality is, unintentionally or for expediency, recreated multiple times throughout a project. For example, if iteration over an array of user names occurs multiple times just to return a given name index, consider transferring this behavior to a helper function. While reducing code size, it has the added benefit of eliminating opportunities for bugs to propagate, as instead of having multiple entry points; only one exists. Also, redundant code requires more code space. Of course, the trade-off is that stack space is required to make the function calls if they are not put in line by the compiler. As with any manual work, using the eyeball test to find and remove repetitive code can easily miss less obvious offender. To avoid this pitfall, consider usage of tools such as Simian or PMD’s Copy/Paste Detector.
As for those unclear variable names, magic numbers, and excess of global variables, there is a very real chance that using the design principles of compartmentalization and generalization will encapsulate the correction of these issues if you keep the principle of clarity in mind, as well. As you retool and rewrite the code, consider whether a term or number is ambiguous or ill-defined. Take our previous example, an array of user names; a variable defined as ‘i’ will not have the same clarity as one named ‘nameLocation’. While the given example is basic enough that a variable ‘i’ in a for loop is easy to understand at a glance, the underlying principle of clarity scales quickly with complexity.
As you’ve probably noticed, the principles for refactoring are the basics on which all solid coding is built. There is no magic bullet. Organization, design, and generalization are the guiding principles for successful initial implementation and refactorization. Reinforcing these behaviors will allow an engineer to grow and develop more extensible and maintainable code, that then helps other engineers understand a given code base more quickly and efficiently.