In the Introduction I describe how the cognitive effects of stress - particularly as studied in neuroscience labs and military contexts - degrade the very faculties we need to be good at programming. I also propose that since the neurochemical effects of stress - particularly the release of dopamine and norepenephrine - are so similar to the effects of addictive drugs, we might expect whole groups to become addicted to maintaining a locally comfortable level of psychosocial stress. Because the members of the group don’t realize they are stuck in an addictive loop and are experiencing both cognitive impairment and gratification of their addictive need, they have little chance of gaining insight and instead will tend to rationalize their stress-raising behaviours as inevitable or desirable, but won’t be able to say why.

No part of this is terribly radical, but taken together it certainly is. For example, it implies that to really understand what is going on in a workaday engineering situation, we have to look outside the context and think about the whole culture in an unusual way. Like the old puzzle about connecting 9 dots with 4 connected and unbroken lines:



We have to “think outside the box” - but we have to really expand our contextual awareness and not just say we are doing so. The trouble is, we need access to the very faculties that stress takes away to be able to think juxtapositionally and tell the difference!

So in this post I’m going to look at a real programming task, and talk about how people often think about such tasks. I’ll look at the opportunities to do a good and interesting job that we miss all too often, how our perception of work can change to become a mockery of itself and how process can amplify this change. I’ll also discuss how this change distorts our appreciation of our tools, and touch (just a little) on the profound social implications outside engineering work.

The Task

I recently had to apply a bug fix across a large codebase (around 1.1 MLOCS in the relevant part) implementing some complex business logic for several large telecoms customers. Some colleagues had successfully tracked down a problem which cropped up rarely, during an even rarer failover event in an underlying Sybase database. (A failover happens when one copy of a database becomes unavailable for any reason, so Sybase transparently switches to a hot backup.) Like so many others, the problem was rooted in the object-relational impedance mismatch. To summarize, the object and relational models don’t play well together. In this case we had some classes to handle the database operations and nicely encapsulate their state. The biggest lump of state was the result sets that some operations could return. Errors were handled by C++ exceptions, but when an exception was thrown, it could leave a result set waiting in the database connection encapsulated by the classes. It was the waiting result sets which blocked the failovers and prevented them performing as expected. The solution was to make sure that every operation - and every catch - purged any result set that might be waiting. Once understood, the fix was easily stated. The problem simply came from the number of cases which had to be considered.

Fortunately the code was well-layered. Database operations were rigourously contained within a specific set of classes, clearly separated from higher level business logic. Even better, they were all wrapped in SQLAPI++, which provides a (somewhat) platform independent collection of classes to wrap database operations. So every possible cause of problems could be found by looking for uses of the SQLAPI++ method SACommand::Execute(). So as we first understood the fix, I had to make sure that each such was embedded in a pattern:

try
{
   myCommand.Execute()

   if(myCommand.isResultSet())
   {
      while(myCommand.FetchNext())
      {
         // Do stuff
      }
   }
}
catch(SAException &e)
{
   if(myCommand.isResultSet())
   {
      while(myCommand.FetchNext())
      {
         // Do stuff
      }
   }
}

Of course, it was important to get it right in every case. The edit would involve checking many cases, each with their own variations, spread over (I later determined) 133 different source files. This must be done without introducing more errors, including source code corruption caused by simple finger trouble while doing such a large task!

Perceiving The Task

Just as I was getting interested in the challenge of this source manipulation problem, someone made a comment recognizing that it was boring. For a moment I was amazed - it really hadn’t occurred to me to see it as a boring problem rather than a difficult one! I realized that the atypical perception was on my side, and thinking that observation through led to some interesting realizations.

Firstly, I’ve been doing this kind of thing for about 30 years now. I know that there’s no magic fix. No matter what technologies we use to help us work at convenient levels of abstraction and sem-automate the tasks, if we are going to own and maintain large and sophisticated programs, we will always come up against this kind of problem. Dealing with them effectively is an intellectual challenge for grownups.

Secondly, I’ve never been seduced by the rationalization of laziness that makes people speak of “mere code”. This is the idea that highly intelligent people are supposed to occupy themselves with grander things, and find “mere code” to be beneath them. I’ve always been suspicious of this. Code is the product. It doesn’t matter if your code undergoes mechanical translation, perhaps from some IDL or something to C, thence to assembly language and finally to opcodes. In that case the IDL is your product. Dismissing it as “mere code” and wandering off to do something else is like a cook who sneers at “mere food” and prefers to criticize the decor of the kitchen instead. It’s thoroughly misguided. Coping with large codebases requires skill, experience and a challenging mixture of rigour and ability to hold temporary partial hypotheses - and keep in mind which is which! It’s difficult, learned on the job, and cannot be performed in knee-jerk fashion after skimming “Codebases for Dummies” in a lunchbreak. The simple fact is, the presence of at least a proportion of people who have mastered these skills is necessary to the success of shops doing real, grownup work.

Thirdly, I’m well aware of the danger of boredom, and recognize it as one of the things which must be managed in order to do a good job - not just accepted as an unavoidable misery of life, leading to certain and inevitable failure. Effective attention can be maintained in two regimes, with a danger area in the middle:



It’s easy to maintain careful attention on an interesting task, and it’s also easy to perform a strictly mechanical task (like lawn mowing or ironing) particularly with an mp3 player available. It’s the tedious and repetitive tasks which require some attention, such that we cannot disengage, yet neither is there anything to keep us engaged, which are the problem. So faced with a task which initially looks like it sits up on the peak of the boredom curve, I know that the way to do it well is to decompose it into two tasks which are on the ends. That means investigating and thinking about it until I can structure the remaining parts in a very mechanical way. Then I can pick some matching music and get a rhythm going with my fingers!

Performing The Task

It was pretty easy to find all instances of string Execute in the source tree and examine them. I quickly found there were more instances of Execute than just calls to the method of class SACommand. On the other hand, my exhaustive examination of all Executes quickly showed that all the ones that mattered declared an SACommand in the same file. (I admit I’m simplifying here to keep the posting on topic - I have a point to make.)

It was then very easy to constrain my scans to the relevant files, by using a little shell one-liner:

$ vi `find . -print | while read fname
> do
> if test -f "$fname"
> then
> if grep SACommand "$fname | grep Execute > /dev/null
> then
> echo "$fname"
> fi
> fi
> done`

If you don’t know shell, that says:

Edit a list of files comprising:
List all files below current working directory. For each file:
If it’s a regular file (not a directory or anything else):
If the file contains SACommand and Execute:
Include it.

The thing is, if we are willing to use little shell one-liners like that, we can very easily customize searches of any complexity we need, as we need them. That’s why many of us stick with the “old fashioned” command line tools, even in this era of eye-candy encrusted Integrated Development Environments. Those tools can do some very clever things with a click of the mouse (and tools like Eclipse are very clever). But they are limited to the functionality imagined by their designers, rather than what the user needs at any moment. The teaching that A Jedi needs only his light saber contains a deep truth, even though it has come in for some ridicule in recent times:



(The two twerps in that video sound like they come from near my own home town. It’s not Nottingham, but maybe Mansfield. English English accents really are that fine-grained.)

You probably see where I’m going with this: I reckon surrendering to the inevitability of boredom (and so making errors) is part of a pattern which includes becoming dependent on (and so constrained by) the functionality available in prerolled tools. It’s about lowered aspirations, increased passivity and reactiveness. These perceptual and behavioural shifts are in keeping with the effects of stress, as I discuss on the page Neuroscience.

While I was writing this posting, I had an idea and tried a small experiment. I used the same one-liner, on the same codebase, but on Apple’s MacOs X instead of Sun’s Solaris. It gagged, complaining:

VIM - Vi IMproved 6.2 (2003 Jun 1, compiled Mar  1 2006 20:29:11)
Too many edit arguments: "-"
More info with: "vim -h"

I guess that’s why Sun can still sell stuff to grownups!

By the time I’d finished my exhaustive scan I’d divided all use cases into simple cases as above, and a couple of variant patterns where it was appropriate to put the result set clearing snippet elsewhere in an inheritance hierarchy containing the Execute. I got my corrected snippet ready on the clipboard (I was using a Windows XP box running PuTTY as a terminal), and I thought I was ready…

Gotcha!

I quickly discovered a problem. There were some formatting problems which had crept into the codebase over time. The indentation wasn’t very rigourous, and even more annoying, there was a mixture of spaces and tabs used to create the indentation. This didn’t matter when editing a few lines, but the unpredictable editor behaviour when scaling to many lines pushed the job back into the error-prone bit of the boredom curve. I had to delay my edit and prepare the source files first.

That was a two stage process. First I fed all the relevant files through bcpp, the classic C++ beautifier. (Another little shell one-liner accomplished that.) Then I went through all the files to neaten them up. The beautifier is good, but it can’t make aesthetical judgements about stuff that should be lined up to improve readability - like this:

buffer << "Name: "    << name
       << " Quest: "  << quest
       << " Colour: " << colour;

Once those problems were out of the way the noise level was lowered, and I was able to crystalize my awareness of another problem that my subconscious had been screeching about, but I didn’t know what it was.

If the Execute() / isResultSet() / FetchNext() sequence threw an SAException, I still had to purge the result set. But doing that involved further calls to isResultSet() and FetchNext(), which could themselves throw SAExceptions - and probably would if any part of the database connection was confused! It was important to surround the purge within the catch{} in it’s own try{} block and catch{} these SAExceptions too, or the uncaught exception could cause the server to crash ungracefully!

This demonstrates the importance of reducing the noise level to the point where we can hear the voice of our own experience - and why Paul Graham says most commercial programmers never reach the state of awareness where they can do good work.

So far the task had taken about 5 working days - and I hadn’t made a single keystroke that actually implemented the edit I needed to perform. But then I did the edit, as an uninterrupted mechanical process which took less than a day - and I got it right.

The Code Review

How do I know I got it right? After all, the problem is usually not so much being correct, as it is being able to prove we are correct. In this case the biggest potential source of error was the manual straightening up process after the automated beautification. There were plenty of opportunities for mis-edits - unintended modifications - in there. Usually the CVS diff command will find the edits, but in this case the beautification made the diff too big to handle.

Fortunately I’d pulled a second copy of the source tree before I started. (CVS diff compares with the version the local copies may have diverged from, but pulling the a separate copy of the right version of everything would be tricky.) The CVS diff could at least give me a list of the files I’d touched, if I used grep to find all the lines that started with the string RCS, and edited them to produce a simple list of files.

I reckoned it would be much easier if I could consolidate all the files into a single pair of directories - before and after. That would be possible if the file names were all different (ignoring the directory names). I could check that easily by using each line of the file list as an argument of basename and doing a unique sort of the list:

$ cat CvsDiffFileList.txt | while read fname
> do
> basename "$fname"
> done > BaseNames.txt
$ wc -l BaseNames.txt
     133
$ sort -u BaseNames.txt > UniqueNames.txt
$ wc -l UniqueNames.txt
     132

Oh dear. There was a name collision. Let’s find out about it:

$ diff BaseNames.txt UniqueNames.txt
< main.cpp
$ grep main CvsDiffFileList.txt
deep/in/the/tree/test/main.cpp
some/other/dir/test/main.cpp

OK. The name collisions were just mains for running test stubs. I didn’t care about them, so I deleted them from the file list, and knew I could copy the before and after versions of all the touched files into a pair of directories. So I did that with another little one-liner:

$ mkdir Before
$ mkdir After
$ cat CvsDiffFileList.txt | while read fname
> do
> cp src/"$fname" Before
> cp src/"$fname" After
> done

Then I removed all the tabs and spaces from both sets of copies. That removed any differences caused by changing tabs to spaces, the indentation, or differences in if(x) compared to if ( x ) style, so the remaining differences would be my substantial edits or finger trouble:

$ cd StrippedBefore
$ for fname in *
> do
> cat "$fname" | tr -d " " | tr -d "t" > fred
> mv fred "$fname"
> done

The result was 15,000 lines of diff - much better than the 100,000 I started with, and mainly consisting of blank lines and easily recognizable patterns of intentional edits. I then looked through the diff to make sure it was possible, checking for unmatched business logic type lines, before asking two colleagues to do likewise. A much simpler process also enabled us to review the intentional changes by looking for the Executes, like I had done originally.

Process Degradation

There’s a pattern to what I did, and we’ve all learned about it ad nauseum. It’s W. Edwards Deming’s Plan-Do-Check-Act Cycle. However, there’s a major difference between what I just described and the way “process” is normally represented. The difference is ownership.

I believe in the Deming Cycle. It works. Through reflection we become aware of what we are trying to achieve. We ask how we can do this as reliably and efficiently as possible. We form a plan, and when we do what the plan indicates we ask if it’s working or not, and correct as necessary. We find a way to check our work as an integral part of the cycle (like I pulled that second copy before I started), and when we’ve finished we act on what we’ve learned. In this case I realized the whole process formed a wonderful model of something deeper that I wanted to discuss.

Unfortunately the Deming Cycle is rarely applied as I did in the tale above. There, I was responsible for each step, I corrected my process as necessary, and I even discovered that I had to recurse, performing a sub-process within my outer Do stage.

Imagine how such a process would degrade under the cognitive distortions due to stress, described on the page Implications for Software Engineers. People under stress:

  • Screen out peripheral stimuli.
  • Make decisions based on heuristics.
  • Suffer from performance rigidity or narrow thinking.
  • Lose their ability to analyze complicated situations and manipulate information.

I was analyzing (reasonably) complicated situations, and using my shell like a kind of mental exoskeleton to manipulate information all the time! If I couldn’t do that, how could I know what to do? I’d have to become a robot myself, and execute a program that I’d got from somewhere else. If that’s not making decisions based on heuristics I don’t know what is! Of course, the motions I’d be going through wouldn’t be very effective, but I wouldn’t be able to compare my performance with a gold standard that I might imagine for myself based on the question, “How do I know?”, because of my narrow thinking. I wouldn’t even notice my mistakes, because I’d be screening out peripheral stimuli.

I’d be lost. Instead of being able to rely on my own capacity to operate a Deming Cycle and move forward with confidence, I’d be “piggy in the middle” - responsible for my failures, yet ill-equipped to avoid them. It would all be very stressful. I would obviously be keen to embrace a culture which says that results don’t matter, only compliance. Then when the failure occurred, I could pass the buck. Of course, I would always have to be on the lookout for others trying to pass their bucks to me, and I would have to be clever in my evasiveness and avoidance of responsibility.

The processes I could use would not be ones I cooked up myself, always following the meta-process of the Deming Cycle. Instead they would be behavioural prescriptions collected into shelfware, and micropoliced in a petty and point-missing stressful fashion, as my non-compliances were smelled out. All the stress would lock me into a cognitively reduced state, and if I got hooked on the stress chemistry, the raised dopamine and norepinephrine, the whole business would start to develop a self-evident rightness. If I suffered this decline in the company of others, we’d develop language and a local mythology to celebrate our dysfunctional state!

All this, of course, is exactly what we see as Quality programs degrade into reactive pettiness and individuals lose their problem solving abilities. Deming’s big thing was devolving process improvement to the people nearest to the work, but in the degraded version process is handed down from the organization. We have an entire belief system predicated on the belief that:

The quality of a system is highly influenced by the quality of the process used to acquire, develop, and maintain it.

Rather than the skills, imagination and dedication of the people applying it!

So we must always be very careful of process. When applied with knowledge and understanding, we get the growth of awareness, efficiency and personal fulfillment that some practitioners of the Deming method report. But with a supervening cultural tendency to stress addiction always in play, it can easily degrade into the exact opposite of what Deming intended. A spiral of decline where individuals become progressively less capable and empowered while the process becomes more externalized and disconnected from the work at hand.

Ontological Drift

After a while, groups suffering from a spiral of psychosocial stress start to behave in very peculiar ways. Being unable to trust their own good senses to even select a useful script, but unable to proceed without one, they resort to grabbing any old script and following it, no matter how silly the result. The deeper they get into trouble, the worse their problems become.

This way of looking at things provides a simple explanation for some very odd things. Consider the hilarious BBC story of the man who was told to remove his shirt at Heathrow because it bore a cartoon image of a Transformer which has a stylized gun for an arm - Gun T-shirt ‘was a security risk’:


A man wearing a T-shirt depicting a cartoon character holding a gun was stopped from boarding a flight by the security at Heathrow’s Terminal 5.

Brad Jayakody, from Bayswater, central London, said he was “stumped” at the objection to his Transformers T-shirt.

Mr Jayakody said he had to change before boarding as security officers objected to the gun, held by the cartoon character.

Airport operator BAA said it was investigating the incident.

Mr Jayakody said the incident happened a few weeks ago, when he was challenged by an official during a pre-flight security check.

“He says, ‘we won’t be able to let you through because your T-shirt has got a gun on it’,” Mr Jayakody said.

“I was like, ‘What are you talking about?’.

“[The official’s] supervisor comes over and goes ’sorry we can’t let you through and you’ve a gun on your T-shirt’,” he said.

Mr Jayakody said he had to strip and change his T-shirt there before he was allowed to board his flight.

“I was just looking for someone with a bit of common sense,” he said.

That’s stress addiction induced ontological drift for you. Of course, it isn’t so funny when whole cultures go that way. “Mr. K, the committee will see you now…”