This is an update to a previous post reporting some very interesting results from the Embedded Figures Test, now that nearly 500 results are in. A specific signature has emerged, which (if it remains under more controlled conditions) would enable a clear comparison with existing tests of cognitive flexibility.

To recap: The Neuroscience page describes lab tests which show how cognitive flexibility is adversely affected by even slight stress, and the Implications for Software Engineers page discusses the need for exactly this kind of flexibility if we are to juxtapose multiple considerations in our minds and be good at programming - or any other kind of creative work.

The problems with most tests of cognitive flexibility are that they are costly to administer and they are not repeatable per subject. Once a subject has seen a given test it won’t be a puzzle again, so we can’t apply it once, alter the conditions and try it again. To apply this stuff in industrial contexts, some way to objectively identify high value initiatives quickly and efficiently would be really good.

The EFT resembles some known tests of cognitive flexibility, it is repeatable per individual and can be automated. So if it does vary with stress then it could be very useful. Of course, the experiment on this blog is only indicative - because the self-assessment of stress is quite simplistic and the test conditions are not closely controlled it can’t produce definitive answers. For one thing, if the mouse and screen that people are using are very different, that could easily distort the results. But if it looks promising, it might give professional psychologists an incentive to look at it in more controlled conditions. Hopefully the large number of results now collected will average out things like the responsiveness of mice.

Within these limits, the results so far look quite promising - and a specific signature has emerged. As before, you can download the raw data eft11oct2009.txt and the analysis program: EftAnalyzer.py. On my Mac at least, I simply downloaded and installed wxPython, and the program ran from the command line. Then use File/Open… to select the data file. The graphs here are screenshots from MacOs X. You can also grab the data, program, and copies of all 24 graphs I’m currently drawing in one download: eft11oct2009-all.tar.gz.

There are 499 contributions in the database. 438 of those pass validation as sensible. Comparing with the numbers in the previous post it’s striking how the larger sample has scaled in proportion. It’s still a sample strongly biased towards male geeks - 391 males and 47 females, 270 geekly occupations. While there’s no evidence that people who feel nauseous when bored have EFT results which are different to those who don’t, I’m amazed that 56% of respondents experience the nausea effect! This may be something worth looking into further. (I asked about it because nausea is a side effect reported by people taking dopamine raising drugs for Parkinson’s disease, dopamine is raised by stress, and I suspect that groups become habituated to raising dopamine by subjecting themselves to miserable, boring meetings.) Here are the numbers:



Most respondents scored between 2000 and 3999 milliseconds per figure, but there is a “long tail”:



There’s also a good spread of self-assessed occupational stress, slightly biased towards the unstressed side of the survey questions. (The positive or negative wording of the questions is mixed to discourage people from just clicking down a vertical line. A strong unstressed response gets 2 “chill points”, and strong stressed response gets -2 “chill points”. Weaker responses get 1 or -1 “chill points”.):



It would be nice to have a wider spread of respondent age, but there are many in their 20s, 30s and 40s, over 20 in their 50s and a few in their 60s, so the data does represent a good spread:



An important graph in this analysis plots the average age of people in each band of self-reported stress. There’s no correlation at all, which is important because later we’ll see correlations between age and results, and stress and results. If (say) older people were reporting more stress, then we couldn’t talk about both effects - either age or stress might be the important factor:



The relationship between age and score on the test is very interesting. Overall, there is a small correlation with time taken to see the figures tending to increase as the respondents’ age increases:



The surprise comes when we look in more detail at the best, central and worst scores within each age band. The best performing third take the same time whatever their age. If anything, the best performers actually get a bit better at it as they age (although there are few data points in the older groups, and we might imagine that senior citizens who are reading blogs and doing cognitive tests are the kind of people who keep themselves alert):



Age makes no difference at all to the performance of the centrally performing group in each age range (there are fewer data points because the first remainder after dividing each age band into three is assigned to the “best”, the second remainder is assigned to the “worst”, and so the “central” group ends up the smallest):



Now compare with the worst performing third. The worst performers show a clear increase in time taken to see the figures as they age, which is not found in the best and central performers. This is part of what I’m calling the “signature” - if existing tests of cognitive flexibility show a similar drop in performance with age amongst the worst performing third only, then thats a good argument that the simple test tracks the more complex and less repeatable ones:



A similar effect occurs in the variation of response time with self-reported stress - and this is the meat of the experiment. An important difference is that with age, the worst performers get much worse with age, but with stress it’s the best performers who perform less well under stress:



While the central third spread out at around the same point (-5 Chill Points or mild stress) but also spread out if they report a very unstressful environment (10 Chill Points or more):



Meanwhile the worst performing third in each stress band is all over the place. Perhaps some people are just really bad at this, and the amount of stress they are under doesn’t make any difference at all:



So this is the other half of the “signature”. The test shows a reduction in performance amongst the best performing third under stress. Expressed predictively, if we were to administer the test to a work group, and look at the numbers associated with the best performing third, we could predict that they are under occupational stress if they were mostly over 2.5 seconds.There are a couple of other interesting graphs. A few respondents reported scores before and after stress reducing exercises, and indeed the scores do seem to be improved - except the person who did 8 stress reduction exercises, which perhaps is quite stressful in itself!



There was a surprise (at least to me) in the results for people who reported use of psychotropic drugs. Nothing makes much difference except the peak for alcohol users, which is shifted to the right compared with everyone else, and the frequency distribution above. There’s also a bit of a rightward shift for marijuana users, but it isn’t as pronounced. Can it really be that alcohol use has a general effect on a person’s cognitive performance, irrespective of whether they could pass a breathalyzer test at that precise moment?



So the next thing to do is see if I can interest professional psychologists in the EFT as a repeatable and low cost alternative to existing tests of cognitive flexibility. If the EFT signature of the worst performing third getting worse with age, and the best performing third getting worse with stress is also found in existing tests, then it might be quite easy to move this kind of work into an industrial, field context in a quantifiable way, and then start looking at things like fault report counts and schedule reliability as EFT results change.