I’m looking at Haskell as a production language, because for the foreseeable future we can expect processors to get more cores rather than more GHz, and this means we have to be able to parallelize our programs much more easily than we’ve done up to now. It was the sight of a Teraflops the size and shape of a pizza that set me off, but since I’ve been playing Intel announced a 6 core Xeon. Great! I’ll be able to have 5 idle cores instead of 1!

I don’t see the practicalities of production programming as either equivalent or orthogonal to the computer science virtues of a language - instead they are “the next question”. I don’t even believe the production practicalities are necessarily inherent to the properties of a language, although obviously designers like Thompson, Richie and Stroustrup have achieved great things by being production programmers themselves, and bearing the practicalities in mind. I do think we tend to forget how much our habits and conventional practices bring order to chaos, at least until we have to move to a new language and have to rebuild that cultural strength-in-depth. That’s why I decided the thing to do is dive in and see what happens.

I’m a big fan of wxWidgets, a cross-platform GUI toolkit which is a pleasure to program, produces native look and feel on Windows, Linux and MacOs, and has a liberal licence permitting use in commercial code while being completely free itself. So I was delighted to discover wxHaskell, which wraps wxWidgets for Haskell. This gave me the possibility of translating the EftAnalyzer.cpp program (for analyzing the EFT results) into Haskell. You can see the output of the program in the posts More EFT Results and This Is Your EFT On Drugs. The original EftAnalyzer.cpp isn’t anything special - it was really just a lash-up for doing one job, but it’s complex enough to make a non-trivial comparison.

This post records my first thoughts, and offers a C++ to Haskell translation which (hopefully) will be of practical help to others who want to get into Haskell.

Code and Data

The C++ and Haskell code, plus some test data are available. In case I enhance EftAnalyzer.cpp in the future, I copied it to EftExperiment.cpp before I started. The equivalent Haskell program is EftExperiment.hs. A suitable data file, exactly as I downloaded it from the WordPress MySql database behind this blog, is eft27jan2008.txt.

The Newbie’s Lament

As I described in post Haskell Needs A Four Calendar Cafe, there is a barrier to entering the Haskell world caused by a lack of practical examples of simple, working programs. By working programs I mean ones which do some basic error handling and practical examples of basic programming steps. I spent several days trying to open and read a file, and also had grave difficulty seeing how to step through an input file which described a single record using multiple text lines, before Cale Gibbard on the haskell-cafe mailing list gave me a wonderfully clear and concise couple of hints. Here are links to Cale’s postings describing how to open a file with error checking and merge records. I used both techniques described by Cale.

Where does this barrier come from? Perhaps in part it’s because Haskell has been a small community of functional enthusiasts until recently, with newbies arriving gradually and perhaps having physical proximity to old hands so that the basics don’t tend to get spelled out. And the problem does seem to involve basics rather than more sophisticated stuff. Once I was over those initial hurdles I didn’t need any more help. Google and grep were all I needed.

There also seems to be a noise problem. This may have been caused by people who haven’t encountered the functional paradigm before trying to get their heads round it by writing not-very-helpful on-line tutorials, which unfortunately confuse more than they illuminate. That’s why I haven’t done this, but instead simply link to a commented Haskell program and a C++ equivalent, with the same names for the same things where possible! For example, it is certainly true that in functional programming recursion takes the places of iteration in imperative languages, and we can find many examples of this in the dreaded tutorials. But as Cale explained,

Using explicit recursion is generally a means of last resort.
Higher order functions on data structures are our control structures.

Where in C++ we might say:

for(int x = 0; x < 10; x++) func(x);

in Haskell we would say:

map func (take (10 [0..]))

Because we don’t see that kind of thing, we can’t play “monkey see, monkey do” until we start to get it. Now there are an awful lot of teams out there that have a mix of some theory, some practice and some enthusiasm, but not in the depth we might once have found - more like the crew of the Black Pearl after a summer of JavaSchool than the Knights of the Lambda Calculus.



I don’t think I’m the only person who has noticed this barrier to entry. If Haskell is going to triumph, it needs to be lowered. Perhaps the forthcoming Real World Haskell book will help. Ohh arrgh me hearties.

Culture, Community and Scholarship

On the other hand, there doesn’t seem to be anything deliberate about the barrier. On the contrary, the Haskell community seems to be particularly friendly, helpful and knowledgable. There are a lot of things about Haskell which are interesting and exciting and the mood of the culture is one of them. It reminds me of the Unix User Groups of the early 1980s. UNIX culture was - and is - hugely important. A bottom up community with strength in depth (despite the Death Star vs. Berzerkely Great Schism, ps -ef vs. ps -aux) that corporately sponsored cultures have never been able to duplicate. I’m thinking of JavaLand as much as anything .NET. See Eric S. Raymond’s The Art of UNIX Programming for a celebration.

There’s certainly something there for newbies to be proud of joining.

Indentation

I’m still having problems with the active whitespace, indentation thing. Again, I’m not the only person to have this problem, as this message indicates. I was stunned to discover that one compiler error was coming from a comment that wasn’t indented correctly with respect to the function it was within! This may be a part of the barrier to entry, an effect of most newbies to date having good access to old hands, and picking up an oral tradition that isn’t recorded.

An interesting comparison occurred to me as I was adding and removing whitespace, often at random in my desperation. If someone took a file of Java or C++ laid out in rigorous K&R or Allman brace style, removed the braces and gave it to me, I could probably put the braces back. So why is Haskell indentation so darned hard? The existence of an oral tradition can explain why can’t I find a simple description in words of what GHC actually does, but why does it do complicated things in the first place? Certainly “maintain the same level of indentation” is not a complete theory.

Program Length and Width

Functional languages including Haskell are famously more terse than imperative languages, and indeed in my statistical processing this is true. Yet the C++ version is 1551 lines, the Haskell version 1296 lines. This is not because the Haskell version is more heavily commented. I removed the comments from both, and found that the C++ version was 1152 lines, Haskell 956 lines. The reason for this similarity of length is that I’ve got a lot of marching legions of code. Phalanxes of lines that do the same thing for many text controls, strings to match, windows and so on. These are the same, and ultimately have to be mentioned the same number of times, in both programs. For this reason the programs are of similar length. Marching legions are common in production commercial code, so we can expect total line counts to be similar in such use cases, even if xmonad implements an X window manager in 500 lines of code.

That said, the syntax of Haskell makes the marching legions more parade ground than is possible in C++, with less boilerplate in the way, which means that the complexity is still significantly reduced, and we are more likely to be able to spot the bugs from 6 feet away. Pattern matching arguments are a huge bonus, eliminating much horrid nesting with tricksey curly braces scattered everywhere. I really like this and I’m sure it will be of enormous benefit in production shops.

I’ve finally been obliged to abandon my decades long love of the 80 character line. As displays have gone graphical and gained in resolution and physical width, as printers have disappeared from my working life, I’ve always stuck to the 80 character rule. Perhaps it’s because of my failure to understand Haskell indentation, but I find my code flying off to the right all too often. I increased my limit to 100 characters but it still looks like a broadside from the aforesaid Black Pearl, flying rightwards and then dropping. This doesn’t impact readability because layout does convey useful information when parsing by eyeball, but it still feels sinful. Maybe I’m just being silly about this. GNU a2ps(1) has a style sheet for Haskell so I can always pretty-print it in landscape if I must.

Functional Paradigm

While I had an initial problem with idioms, shifting to the functional paradigm was not a problem for me. I’ve played with Scheme before, but not for many years, and I enjoy using m4(1) for generating documentation, which also might have helped a little but is hardly an exhaustive functional primer. In particular I was struck by how easy it was to convert the inheritance based C++ program to function calls. Unless a person has never used anything except an OO/imperative language I don’t see why they should have a problem - and everyone has to write shell scripts and stuff don’t they? They do in production shops anyway!

The trickiest bit in the translation was interesting in itself. On some of the graphs I look at the maximum value to be plotted on the Y axis and select a suitable scale and distance between the marks from a list of options that I reckon are aesthetically pleasing. In the C++ version I do a simple search and just return silently if I can’t find an option that’s big enough. Haskell does not allow such fudging. The syntax is fudge unfriendly. So in the Haskell version I had to think carefully about what I really wanted. Which is good.

The higher level of Haskell actually made the statistical processing simpler. When I calculated the standard deviations for some graphs I used C++ arrays, but later I found that in other calculations arrays wouldn’t do and I had to use STL vectors instead. Then it was a hassle to go back and convert the arrays to vectors so I didn’t get around to it, and this prevented me easily refactoring to remove redundancy. In Haskell a list is a list so I never got into a situation with two representations of the same concept. The use of higher order map also simplified the statistical calculations - I could dispense with all the outer loops.

There may be a problem with the perception of productivity in programming shops. There are an awful lot of cases where people don’t really get to the essence of the problems they must solve (because they can’t access juxtaposition thinking as I describe in the Introduction), and to avoid worrying about it they busy themselves with displacement activities instead. In Haskell the need for busy-work is clearly reduced which is very good for real productivity, but may cause problems for organizations which emphasize looking busy. Hmmm… Perhaps accessing juxtapositional thinking is a precondition for functional programming. If so, the development of multi-core processors is going to have even more interesting effects than we might otherwise suppose.

Laziness

Laziness both fascinates and disturbs me. To date I’ve been really bad at writing tail recursions correctly and find that with sufficient inputs or processing I easily get stack overflows. I understand the benefits for simplification of algorithms and potential for optimizations - that’s the fascinating bit. I’m not troubled by relinquishing power over operations to the runtime - after all that’s something I always try to maximize when using SQL. The more I can do in one query, inside the engine, the better. If I have to slosh result sets around, filtering them client side, I know I’m squandering cycles like crazy.

The difference is that when I give some SQL to the engine I know the optimizer will do its best for me. I’m not in danger of writing SQL which will become pathological without realizing it. Now in practice I suppose I can detect pathological cases by using lots of test data, such that I’m confident I’ll get a stack overflow if I’ve got it wrong, but I still have to correct the problem. And that’s something I still find hard, even with trivial examples. Also, I might find that I’ve designed an algorithm that overall isn’t tail recursive, but only make this discovery late on, so wasting a lot of work.

Are there working practices that stop us growing the stack when we intend to tail recurse1, like the golden rule for avoiding deadlock, which says we must always acquire locks in a given order, even if it means relinquishing some already held locks, only to re-acquire them in due course? Is it just practice? The barrier to entry I discuss above is an issue, but I’m sure it’s fixable. I’m not so certain that avoiding pathological cases involving laziness is as easy.

1) I know I know. Good English would be “tail recur”. Let’s say it’s like “computer program” and “theatre programme”. Which itself has the additional provision that if you can’t spell “theatre” you aren’t expected to be able to spell “programme” either :-)

Code Organization

Java code organization irritates me. A file for every class. Programming which requires IDE support. The horrors that can result. I once saw nearly 2 million lines of code in one great, entangled wodge. No UML, no categorization, nothing. I had to use genetic annealing to even propose a division of classes into continent natural categories which we could aspire to move towards. Eclipse is an amazing thing, itself a testament to what’s possible in Java, but I could not help but be aware that without it, the team would never have been able to get so deeply into trouble.

What does good code organization look like in Haskell? I don’t know. I’ve not yet seen any obvious conventional practices, so perhaps they are yet to evolve.

I have been playing with it, and in my first non-trivial program I discovered a few things which seem to be good ideas. Pure computation involves small functions, usually < 10 lines. Several of these with close coupling can be gathered together, with a commentary describing the group at the top. Building from lower level or early (perhaps input parsing) towards higher level or late (processing or output) is good. Do-ey, sequential stuff is better commented in-line.

Buglessness

They say that if it compiles, it will run correctly. It’s nearly true! I’m amazed. One bug I had involved the positioning of the “Chill Points” calibration on the “Stressors By Average Score” graph. I had one line specifying [-3..], where another line assumed it said [0..] and subtracted 3. So the calibration was shifted leftwards past the origin. That wasn’t hard to find.

Such buglessness will remove a huge source of indeterminism in production environments where the work of many teams is co-ordinated by schedules.

Awesome.

Fun

Yes. It’s fun. At least, it was once I got going. Until then, it was an immensely frustrating experience and I’d have given up if I wasn’t conducting an experiment to see how accessible and useful Haskell really is. Because of my objective, frustration was a result in itself so I was able to keep my morale up enough to make progress. Had I been experimenting on the weekend on general principles, which is how we usually do these things, I’m sure I’d have given up in annoyance and frustration.

Functional programming is harder work because there’s less blathering, more focus on what needs to be done, and the language itself discourages fudging. Anyone who enjoys programming will find themselves enjoying Haskell, except for the qualification given above. Anyone who’s been using Java will probably realize they haven’t had fun for quite some time. (People who enjoy Snoopy talking dirty have a different definition of %#@*&^% fun.)

Initials

It’s too early to have Conclusions, so I can only have Initials. Haskell will probably be a super production language, offering significant improvements in robustness, schedule predictability, readability (and so maintainability) and recruitment over the imperative languages available today. Plus a better chance of natural parallelization predicted by theory, although I’ve not got far enough to try that yet.

However, the barrier to entry issue in particular means that today, Haskell will be a hard sell on most production sites. Real examples - toys but with all the indentation, error handling, input and output and so on in place and commented - would go a long way towards solving this problem.

Having got past the barrier enough to write a non-trivial program, I’m certainly going to press on.

Cautionary Tale: The BBC B ULA

As I’ve been looking at this stuff, a worry has been nagging away at the back of my mind. It’s not about Haskell, but about those multi-core processors. They all incorporate switching logic to stop the processors when they are not in use. This is particularly emphasized for the amazing Intel 80 core Teraflops processor, and it’s phrased in term of energy saving. The thing is, if there are 80 processors going on one chip (even if it’s a pizza sized chip) it’s going to get very hot. Perhaps the motivation for the power switching is more to do with controlling heat than saving power, which begs the question: How much of the time do their designers assume these chips will be working? If we beat the parallelization problem, will we cook the chips?

It’s happened before. 25 years ago Acorn made the BBC B, a 6502 based 8 bit machine. The video was handled by a ULA which was made (I think) by Ferranti. The ULA designers had assumed that no-one would ever use more than half the gates in the ULA, with the other half being blown away to create the logic that was required. The clever Acorn people had managed to use 90% of the gates in the ULA, squeezing astonishing performance out of it. As a result of all the switching, early releases of the machine tended to overheat and the video (which went through a UHF modulator and worked a normal TV) went snowy. It’s incredible to think of it today, but to diagnose the problem we used a handy product found on all electronics benches. An aerosol can which contained CFC propellant and nothing else. A quick squirt would cool the chip, the video would lose the snow, and we’d have confirmed the problem!

Let’s hope the clever Intel people don’t make the same mistake as the Ferranti people did all those years ago. If you build the gates, we will find a way to use them!