# Introduction

I recently wrote a detailed series of blog posts on benchmark for the D language (involving a test I called the Botch test).

The test involved implementing generic polynomial rings (with bignum entries). I listed numerous requirements, such as being able to distinguish a polynomial in “x” from a polynomial in “y”, being able to coerce elements from $R = \mathbb{Z}[x][y]$ into $S = \mathbb{Z}[x][y][z]$, etc., and pointed to two implementations of generic rings (in C) to benchmark against.

I received numerous comments and decided to write a follow up, dealing in turn with the issues people raised and then show one way that the essential difficulties can be worked around, albeit with rather a lot of work.

The timings below will make the D people much happier I think!

# Replies to comments

Thanks to everyone who made comments. I really appreciate them and learned numerous technical bits and pieces about D that I was not aware of.

Here are replies to some individual comments.

- “Compiler switches”. For me at least, dmd -O -release -inline -noboundscheck made little difference to the original benchmark.
- “The benchmark taxes the garbage collector.” Well, of course it does! From the D website: “Garbage collected programs are often faster.”
- “Use structs for the polynomial type.” Well, this is possible, but not straightforward. As the code is generic, all the rings involved need to behave basically the same way. And because I was unable to find a way to implement the bignums using structs instead of classes, this made it difficult to use structs for the polynomials. It wouldn’t be impossible, it’s just going to make the code much messier.
- “+= would help.” Yes, of course it would. Again, the whole point of the benchmark was to do something generic. Most bottlenecks in real world applications are not soluble by using E += D instead of E = E + D. I know full well this is difficult to make fast, which is precisely why I chose it! Many times in real life you really need to add polynomials and assign the result elsewhere. (But see below.) If you think I am being unfair to D, try doing this in a dynamic language and see just how bad things get!
- “Haskell would pass the BOTCH test hands down.” Really? Show me the code!. It seems like someone figured that you could represent polynomials in Haskell with algebraic notation, and that it has a REPL (two of the points) and that therefore it must pass the whole test. But will it raise a type error if the user tries to add polynomials in different variables? Does coercion work between all the different rings? What is the performance like compared to C? What about the performance of the REPL? And just how much work went into that Haskell polynomial library? Has the writer effectively written a parser inside that library to make it work (I don’t know, I didn’t check)? I really like Haskell, but seriously, claims of this nature need proof. (Ah, someone has tried the benchmark in Haskell and it takes 43s on a 2.4GHz Intel machine. Feel free to try and improve on this. )
- “Nimrod would nearly pass the BOTCH test.” Great. I believe you. But many languages do nearly pass. I do concede that Nimrod’s term rewriting macros are totally sweet!
- “Most modern programming languages fail the test.” Isn’t that kinda the point!?
- “The operator syntax used in the code is deprecated.” I fixed this in the bignum2.d file, but I was unable to get the new syntax to work for the polynomial module. The new syntax was marginally faster (modulo timing discrepancies on my system).

# How can the D code be made faster without “cheating”?

I think some of the people who believed I had made “implementation errors” didn’t understand what I was aiming to do with the benchmark (thought I did apparently also make implementation errors related to my incomplete knowledge of the language).

The object of the exercise was not to optimise E = E + D, but to determine how fast/slow the code would be on **generic** arithmetic.

However, there is one very big deficit which can be overcome with a lot of work. It occurs in both the bignum module and the polynomial module. This is the issue of creating a new object every time an assignment is completed.

I don’t care what anyone thinks. The **reality** is that this is a hard problem to fix in generic code. It can be fixed with an arithmetic expression parser, i.e. an external DSL. But that is cheating as you then lose the rest of the power of your implementation language, being restricted to however much of a language you implement in your external DSL.

Another solution is a preparser. The Sage interactive console uses this trick to get around some issues with Python that mathematicians have, such as ** instead of ^ for exponentiation, and a (very) few additional things.

However, it is also possible to fix this problem using an expression parser within the syntax of the language itself if you have sufficiently powerful templates and the language is statically typed, as D is.

# Class based expression parser

The basic idea of a type based expression parser is that each syntactical element of an arithmetic expression is given a class corresponding to it. Instead of computing the value of the expression immediately and then creating a copy of the value to assign to the result, one constructs an expression tree using these classes and delays evaluation of the expression until the assignment actually happens.

This trick is called *lazy evaluation*. It is extremely hard (for me) to implement, as one is essentially pushing every element of the arithmetic expression syntax up into the language of types. This explores some really annoying areas of the compiler’s ability to handle template classes or equivalent.

However, the advantage of doing this is that much of the cost can become compile time cost, as all the information about the structure of the expression syntax is stored in types, not values. Essentially the only thing that happens at run time is the evaluation of the expression itself.

# The implementation

I managed to implement a template class based expression parser for the bignum module that I wrote. See the file bignum2.d to see how this is done. (I didn’t include the accompanying polynomial.d which needed significant modifications, simply because my code is now too messy. I leave it as an exercise for the reader to make the appropriate modifications to the old polynomial.d.)

Of course, this new bignum.d makes very little difference on its own. Every ring needs an expression parser, and it was all far too much work for me to write one in D for the polynomial rings too!

However, I simulated doing this by using the much vaunted +=. In theory, this allows elision of the additional allocation and copy of polynomials made on assignment, precisely as a type based expression parser would. However it is important to note that **the expression parser would work just as quickly for E = E + D**, if I implemented it for polynomial rings, i.e. the E += D is only to **simulate** having an expression parser for the polynomial rings. I only **actually** (partially) implemented an expression parser for the bignum module.

# New timings

Here are the new times given this approach on my 2.2GHz AMD Mangy Cours Opteron server:

- Sympy : 560s (pure Python)
- Sage : 21s (uses Cython)
- flint : 2.5s
- Pari/GP : 10s
- dmd -O : 4.7s
- ldc2 -O2 : 6.1s
- gdc-4.6.3 : (still) failed to compile

This time I disabled the bounds checking for both dmd and ldc2 and used the additional switches suggested for dmd.

Clearly this is much closer to the speed of the C code, and could possibly be improved further with some additional (hard) work on my part.

# Conclusion

I should emphasise that these are simulated times (by switching to E += D throughout the polynomial part). I am yet to write an expression parser for polynomial rings, and I am not even convinced I would be successful. There seem to be real issues at present with pattern matching of types in D which I simply don’t understand well enough.

I should also add that D still fails the BOTCH test, as there is no fast REPL (as far as I know) and that I had to really work far too hard to implement this expression parser. I said at the beginning of my post, “one does not prefer to implement a parser for everything one wishes to do”.

Overall, I think D is an amazing language. I’m looking forward to some of the noted issues being resolved in future. D is an incredibly sophisticated language, and it is really a tribute to its authors that it can do so much. It does *nearly* pass the BOTCH test!

The Haskell benchmark is here: https://gist.github.com/dimpase/4992965

On a machine with a 2.4GHz Intel chip it runs in 43 sec.

Certainly, this is a straightforward (well, I did make the loop tail-recursive by hand, and I needed to throw in normalization to make evaluation of ‘+’ strict) using off-the-shelf implementation from http://hackage.haskell.org/package/HaskellForMaths-0.4.1

(which appears to be an one-person effort to cover much more than just polynomials).

Very nice. That machine is slightly faster than mine. Anyway, let’s call it 10x slower than D as implemented.