Update: See the good discussion on Hacker News and Reddit /r/programming.

I have been relatively busy lately due to unpacking all my things from my cross-country move. Some of my hobbies have been receiving more time. However, don't worry, Cakelisp and GameLib development still continues.

For this month, I'm going to write a short article about an approach I take to difficult programming tasks. I call it "surgical programming".

The key difference between junior and mid-level software engineers

In my career, I think one of the most important skills I had to develop was learning how to read code.

My less-experienced self would frequently reach out for documentation, or only read function signatures. I would call functions other people had written without reading them myself to confirm they did what they claimed.

Now, I rarely ever read online documentation. The code is the ground truth, and additional context can be gleaned from reading the version control system logs for the file. I try to read much more functions in their entirety before using them. When I have a question about some functionality, I try to read the code before asking the original developers for help.

I think reading code is a skill that you can practice, but it definitely takes discipline. If you respect the programmer who wrote the code, it gets a bit easier.

John Carmack recommends¹ stepping through code from main() to understand what's going on:

An exercise that I try to do every once in a while is to "step a frame" in the game, starting at some major point like common->Frame(), game->Frame(), or renderer->EndFrame(), and step into every function to try and walk the complete code coverage. This usually gets rather depressing long before you get to the end of the frame. Awareness of all the code that is actually executing is important, and it is too easy to have very large blocks of code that you just always skip over while debugging, even though they have performance and stability implications.

This is also a good way to force yourself to read the code—the instruction pointer acts as a virtual bookmark, and you can go statement-by-statement rather than having to find the best place to start in the myriad of files in a codebase.

Surgical programming

Some tasks require a large amount of code or a complex system to be comprehended before the correct modification can be discovered and implemented. Surgical programming is a way to systematically approach these tasks.

It relates to reading code because it essentially divides hard problems into two phases: a pre-op (reading) phase, and a operation (writing) phase.

Pre-op

The pre-operation is the first phase in approaching a difficult task. The goal of pre-op is to understand the system and the task, answer any questions you have, and define a clear implementation sequence for the operation.

Importantly, the pre-op puts you on the hook: you don't get to write code until you've read enough to complete the operation plan.

When I'm embarking on a difficult task or hairy investigation, I make notes² under a "Pre-op" heading where I list everything I encounter while reading that is relevant to the operation.

I also think of things I don't know and add them as to-dos on the pre-op. I can't start writing code until I've read enough to have good answers to the to-dos. They can be questions like "how did they handle X?" or "what do I need to modify to get Y?". It also includes things like "what do designers mean by Z?" where I have to talk to concerned parties to gain more context and requirements.

It feels good to call it a pre-op because it's more cool sounding, and feels like you're still making progress and spending time wisely. It could also be called the "research phase", but in my opinion that sounds much more boring.

When trying to understand complex systems, you may need to insert logging, visualizations, or other instrumentation to help illustrate the system's behavior. This is appropriate to do in pre-op, and will likely help with future investigation, so it should be kept in the code (perhaps behind boolean toggles or #if clauses, if necessary).

The key with this technique is not starting "work" on the task until you are sure you know what to do. In my career I remember making false starts where I would write a bunch of code only to find the approach wouldn't work half-way through implementing it. In almost every case the problem was a lack of understanding of the existing code. The pre-op helps to reduce chances of false-starts, because you deliberately seek out your blind-spots in understanding a system and illuminate them.

Once all your questions are answered and you feel you have a good understanding of the situation, you can write out an operation plan. This is a step-by-step outline of the things you need to do in order to make the modification correctly. It is useful as you are reading code to take note of functions and whatnot that are going to be relevant to the operation. If you do this, you will be able to jump straight to the definitions, signatures, etc. that need to be modified.

Operation

Once you have answered all the questions in the pre-op and have an operation plan, you can proceed with the operation.

It feels good to write the code now, because you can just blaze through it. You're no longer "feeling around" while at the same time fighting compiler errors. This happens when your code is written with only a half-baked understanding of the system you're changing.

The key during the operation is to notice when you still trip up. Could that have been handled in the pre-op instead? Did you start the operation before you were ready? You can write these instances down and form a pre-op checklist for the next time, if you find yourself consistently forgetting them.

The final part of the operation is validation. You should step through code in a debugger the first time you run it, checking all of your assumptions and confirming the data is modified as you intend. This is a great way to cut down on iteration time, because you don't waste time getting your hopes up and skipping straight to testing. Off-by-one, inverted conditionals, and error-handling are usually very obvious when stepping through code, but difficult to spot when only testing.

Exploratory programming

There are some problems where it is necessary to make a few different attempts at implementations. This is sometimes called exploratory programming. The surgical approach would consider this style of development part of the pre-op, because it's about gaining more understanding before writing the eventually committed operation code.

These types of problems don't fit as well into the surgical method, which is okay. It's mainly important to recognize when you are flailing due to lack of understanding versus exploring in order to gain more insight. The goal of flailing is to complete the task³, whereas the goal of exploring is to learn new things about the system.

Debrief

Once you have completed a task, it can be beneficial to analyze at the meta-level various things:

How long it took to complete the task
What things during pre-op made understanding the system difficult (software architechture, etc.)
Why the task was required. If it was due to a bug, why did the bug occur? Could that type of bug be automatically or systematically prevented?
How you can improve future operations

Conclusion

Surgical programming provides a deliberate structure for approaching complex problems. It becomes automatic after you've done it for a while, but I hope especially for more junior programmers (or for anyone on very complex problems) having the explicit pre-op and operation phases will make solving hard problems easier.

John Carmack on Inlined Code (archive.org)↩︎
I make all my notes in Org-mode, which is simply unmatched in its suitability for complex note-taking. In this case, the nesting and folding of headings helps manage the complexity. You can also insert direct links to files and lines of code which are relevant. I also copy-paste code snippets into my notes for easier reference.↩︎
In The Pragmatic Programmer they call this Programming by Coincidence—if your flailing ends up working, it's only by chance, not by a deliberate, systematic approach.↩︎