GMP Bignums in D

GMP bindings

To get started, we first need bignum support. This will allow us to work with integers that are bigger than 32/64 bits.

The D library includes a std.bigint module, but we prefer to link in GMP, since that is faster for large operands.

After a short search I found some GMP bindings for D on GitHub. The file gmp.d is possibly autogenerated from gmp.h by some script, so we’ll make use of that. It allows us to access all of the functions in GMP via the C Foreign Function Interface in D. In particular, we’ll make use of the __mpz_struct data type and the mpz_blah functions for bignum integers.

A few small changes were necessary to gmp.d. I have a 64 bit machine, so __GMP_BITS_PER_MP_LIMB and GMP_LIMB_BITS need to be set to 64. Also, int and uint throughout have to be replaced with long and ulong (D’s 64 bit integer types). I also stripped anything that was not datatypes or function prototypes.

One problem with this header is that it assumes the global prefix symbol is a double underscore, an assumption that it not warranted on many platforms. Thus, for example, mpz_add is aliased to the hard coded __gmpz_add library symbol. However, D’s static compile time directives are probably powerful enough to work around this so that the code can be made portable. In my case, no change was required.

Simple bignum program

We can now write the following simple program (gmpex.d), which compiles with the line:

dmd gmpex.d /usr/local/lib/libmpir.a

assuming the GMP library has been installed in /usr/local

import gmp;
import std.stdio;
import std.string;

class ZZ
{
   __mpz_struct z;

   this() { mpz_init(&z); }

   ~this() { mpz_clear(&z); }

   this(int val) { mpz_init_set_si(&z, val); }

   override string toString()
   {
      char[] s = new char[mpz_sizeinbase(&z, 10)
                 + (z._mp_size <= 0)];
      mpz_get_str(s.ptr, 10, &z);
      return format("%s", s);
   }
}

void main()
{
   auto a = new ZZ(23);
   auto b = new ZZ(31);
   auto c = new ZZ();

   mpz_add(&c.z, &a.z, &b.z);

   writeln(c);
}

Classes vs structs

By default, D implements classes as structs with reference semantics. Nevertheless, the compiler wants the parameters of the mpz_add function to be of type __mpz_struct * (the same thing as an mpz_t in a C program) rather than ZZ, thus we have to pass references to the z field of the relevant ZZ’s to the mpz_add function.

The ZZ class is quite easy to understand. We have a couple of constructors, one which takes an int and gives us a ZZ whose contents are that int. The other takes no arguments and simply gives us ZZ containing an initialised mpz_t. These call the mpz_init_set_si and mpz_init functions respective.

We have a destructor which calls the mpz_clear function.

Finally, in order to allow us to print mpz_t’s we overload the toString method. It creates an array of characters of the correct size by computing the size of the bignum in base 10 (decimal) and adding 1 extra character for a sign or zero. The function fills the array using the mpz_get_str function and then converts it to a D string type using the format function.

Interestingly, the std.bigint module in D uses structs rather than classes to implement the bignums. This is because D uses value semantics for structs rather than reference semantics. This makes assignment very fast, but slows down operations like m++.

It would be an interesting experiment to compare the two different approaches. One problem with this is the need to call mpz_clear to clean up any memory allocated internally by GMP for an mpz_t. This can’t happen every time a ZZ goes out of scope, as its contents (including the internal _mp_d pointer) may have been copied to a new struct. Calling mpz_clear would free the contents of the mpz_t making the copied _mp_d pointer invalid.

We could simply not call mpz_clear ever and allow a memory leak. In fact, I tried this approach just for science, and the times reported at the end of this set of blog posts were halved. But this is hardly a fair comparison.

To really work with structs, we’d essentially be back to the C way of doing things, where every function that uses an mpz_t (or any other ring object for that matter) is responsible for cleaning up after use, including the user.

Another approach would be to reimplement the mpz_t interface of GMP in D, calling the low level mpn functions, and bringing the internal pointers of the mpz_t struct under control of the D garbage collector. This is quite a bit more work than we wanted to do for this blog, however, and it assumes that using the garbage collector for the internals of the mpz_t’s would actually be faster than the current approach, which may be an unwarranted assumption.

Yet another approach would be to set Boehm’s GC as the default memory allocator for GMP. Some languages take this approach and it is a good middle ground.

Despite the slowness of the class vs struct approach, the class wrapper of the GMP package is convenient and fairly lightweight compared to some other languages, essentially being no worse than allocating mpz_t’s on the heap in C or perhaps no worse than a standard C++ class wrapper of GMP.

‹‹‹ Previous Post Next Post ›››

Leave a Reply