As software developers, we write code. Code has bugs in it. And when they’re found, we need to debug the code and fix the bugs. I tend to be the person in my projects that gets assigned the hard bugs, the ones that aren’t consistently reproducible, the ones that rely on complex interactions between components, because I’m pretty good at figuring such things out. I feel like my approach is a mindset that is not so common, a mindset drawing from the empirical sciences, focusing on systematic exploration and looking at things in multiple ways, so here are some things I do when debugging the difficult issues.
Explore the Negative Space
A bug is reported. The first task is always to reproduce it, to find under which conditions it happens. Sometimes, this is enough for a fix. Something that was supposed to happen doesn’t, the connection from the initial conditions to the thing is clear, it gets fixed by remembering to consider an additional state.
Not all bugs are this simple. When there is no clear omission in state handling, no easily conceivable combination of states that could lead to the bug, it gets difficult to isolate the actual cause. What I do in such cases, as many of us would, is to form plausible hypotheses on the cause. But the thing I do that I don’t see that many people doing is, I try to refute those hypotheses. I don’t look only for cases where the bug happens but also cases where it shouldn’t happen according to the hypothesis. By varying the conditions slightly I can get a very precise understanding of which specific conditions are needed for the bug to happen, and this is often enough to figure out the actual cause.
The Wason rule discovery test is a psychological experiment where a rule needs to be discovered by asking yes-no questions, and Wason’s interpretation of the results nicely illustrates the difficulty of adopting the refutation mindset. Typically in this experiment people form a hypothesis of what the rule is and ask only questions that they expect to confirm, not refute the hypothesis. This was also the result I got when trying out the test with a group of programmers. It can be difficult to switch your thinking from hypothesis confirmation to refutation but when it works, it can save a lot of time.
Know the Data Flow
You cannot fix a bug unless you understand the system well enough to know where it comes from. And (almost) none of us can keep a large complicated program in their head in its entirety, with enough recall to be able to know things based on the code. So, to borrow another tool from the sciences, we model the system, we create a simplification of reality that can be comprehended.
As they say, all models are wrong but some models are useful. One model I have found to be particularly useful is to think of the program in terms of its data flow. Where does the data come from, which part processes which data, where does the data go? A surprisingly large part of the kinds of programs I work with can be reduced to transforming data from one form into another, and just knowing the transformations that happen to, say, data coming in from the backend for display on the screen is helpful in pinpointing why a specific wrong value is displayed.
Of course, with some programs, this data flow model isn’t much simpler than the whole program… When the program data exists in shared global variables, modified unpredictably from any part of the program, there is no real “flow” of data, and this model breaks down. To be able to have a useful simplified model, the program needs architecture, modularity, separation of concerns, all that jazz.
Simplify the System
A scientist may have many models for the same phenomenon. For instance, one model could be a simple one that is not as good as it could be but still useful for certain practical applications, and another model could work better to describe reality but be more complex so not as useful in practice. For instance, learning genetics often begins with the simple model of eye color inheritance, but in reality it’s a bit more complex than that.
This can work for programs as well. If we think of any model that we have for the complete program behavior, that is on the complex end but is the most accurate one. Sometimes though, we don’t need the full complexity to reveal a bug but can get by with something simpler. This can be translated to removing pieces of code that don’t have anything to do with the bug, or more often, commenting them out. And sometimes, commenting out some code that shouldn’t have anything to do with the bug makes the bug go away, revealing what amount of complexity is needed to trigger the bug.
Programmers Think the Same Way
You or your team is not always the one at fault for a bug. Third-party libraries that you might want to use, even the platform libraries, can and will be buggy too. But while it may not be your fault, it is still your responsibility that your program is not buggy. An empirical scientist, in the face of nature behaving unexpectedly, has a huge number of possibilities of what could be happening, and careful experiment design is needed to isolate and test different ones.
As programmers, we can take a shortcut. A library we use was consciously designed and implemented by another programmer. And programmers are people, usually people with similar educational backgrounds and mindsets, which means that one programmer is quite likely to think in very similar ways to another programmer.
To me, this has been a useful strategy to figure out what kind of workaround might be appropriate. I try to imagine how I would personally implement the functionality of the library and how my specific situation could be some corner case that wasn’t completely thought through. After this, I see how I would work around my imagined implementation and change the program to match. This works quite well.
Make Complicated Mistakes
Brian Kernighan wrote: “Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” While useful advice, it can also be seen as the way of stagnation. “Cleverness” is not a static quantity that we are given a specific, never-changing amount once, it is a thing that we can develop. And the way to develop it is to push ourselves to the limits of our abilities, challenging ourselves constantly.
Sure, if I write very clever code, I may not understand enough to debug it. But that’s only now. If I keep pushing myself, improving my cleverness, my understanding, maybe in 3 months or 6 months, I’ve improved enough to understand the situation and be able to debug an issue, usually by figuring out a simpler way of accomplishing what I wanted in the first place. But the initial push for something clever, something at the limits of my abilities, is absolutely necessary to reach this state. So, go ahead and be clever, as long as you’re pushing yourself to increased cleverness.