Jenkins, unsigned integers and VB.NET

Fotunately I had my hair cut short recently, cause otherwise I’d have pulled them all out yesterday! 🙁

For a project I’m working on, I need to use the so-called “Jenkins’s one-at-a-time” hash. The algorithm looks like this, in C code:

uint32_t jenkins_one_at_a_time_hash(char *key, size_t len)
  uint32_t hash, i;
  for(hash = i = 0; i < len; ++i)
    hash += key[i];
    hash += (hash << 10);
    hash ^= (hash >> 6);
  hash += (hash << 3);
  hash ^= (hash >> 11);
  hash += (hash << 15);
  return hash;

No big deal. Right? Until you have to rewrite this to VB.Net. Here's what I started with, and when things got weird:

Public Function Jenkins_one_at_a_time_hash(ByVal data() As Byte) As UInt32

  Dim hash As UInt32 = 0

  For Each b As Byte In data
   hash = hash + b
   hash = hash + (hash << 10)
   hash = hash Xor (hash >> 6)
  hash = hash + (hash << 3)
  hash = hash Xor (hash >> 11)
  hash = hash + (hash << 15)

  Return hash

End Function

While looking at the original code, I knew that with all the bit-shifting going on, there would definitely be overflows; this algorithm has overflow written all over it. So I anticipated some small bumps, but not real big ones...

OK. I don't care about bits falling out of the bucket, as long as this doesn't result in exceptions. How wrong could I be...nearly every line gave me exceptions. But, surprisingly, those exceptions weren't thrown by the bit-shifting code, but by intermediate results that exceeded the UInt32's maximum value, for example:

3281982255 + 2085403648 = 5367385903; yep, that's 33 bits.

OK. Nevermind; I'll use a UInt64 and only keep the 32 LSBs. That should do the trick. No! The VB.Net OverlowException told me 'Arithmetic operation resulted in an overflow.' Yeah I know, with 32 bits it does, but not with 64!
What's going on here ?. Lets isolate the problem a bit...

Dim a As UInt32 = 3281982255 ' 11000011100111110001001100101111
Dim b As UInt32 = 2085403648 ' 01111100010011001011110000000000
Dim result As UInt64 = a + b

Line 3 also throws the exception I mentioned above. But why?? The result of a+b is 5367385903, which should perfectly fit in 64 bits. This is becoming really weird ...
Changing the result type to Decimal, with a largest possible value of +/-79228162514264337593543950335 didn't help either. Same Exception.

This code works though:

Dim a As UInt64 = 3281982255
Dim b As UInt64 = 2085403648
Dim result As UInt64 = a + b

But why should I have to this; it's stupid - using 64 bits to store 32 bit values; huh? Since I don't work with VB.Net that much, I started looking for compiler directives to ignore overflows in this hash routine. The only thing I could find was this:

Yeah, right, remove integer overflow checks... at project level. Who invented that? Seems I can't use compiler directives in the code at all, and I sure don't want to remove all the overflow checks from the project of course. I did a little more testing with the Intxx type variables and saw some strange things happening:

Dim a As UInt32 = 1073741824
Dim b As UInt32 = 1073741824
Dim result1 As UInt64 = (2 * a) + (2 * b)
Dim result2 As UInt64 = a + a + b + b
Dim result3 As UInt64 = (a + a) + (b + b)
Dim result4 As UInt64 = 5 * a

Line 3 does not result in an Overflow Exception, but line 4 does. Line 5 does too, but line 6 doesn't. It looks like only additions can trigger those overflows? What is this??

Finally I found out that the UInt32 and UInt64 types are not CLS-compliant and that CLS-compliant alternative for UInt32 is Int64. Again, I don't want to use 64 bits. I only need 32 of them!

I can't be the only one having this trouble, so I started searching what this CLS compliance meant. From what I've read about this, I know now that .NET doesn't treat UInt32 values as such internally, which results in the MSB getting back its role as sign bit, which in turn results in a sign bit being shifted out during the arithmetic which eventually results in an Exception...
So .NET doesn't natively support unsigned data types! OK..

After multiple hours of trying all kinds of stupid things to get the right results without exceptions, I ended up with this code:

Public Function Jenkins_one_at_a_time_hash(ByVal data() As Byte) As UInt32
  Dim hash As UInt32 = 0

  For Each b As Byte In data
    hash = ((hash + b) And &HFFFFFFFFL)
    hash = (hash + ((hash << 10) And &HFFFFFFFFL)) And &HFFFFFFFFL
    hash = hash Xor ((hash >> 6) And &HFFFFFFFFL)


  hash = (hash + ((hash << 3) And &HFFFFFFFFL)) And &HFFFFFFFFL
  hash = hash Xor ((hash >> 11) And &HFFFFFFFFL)
  hash = (hash + ((hash << 15) And &HFFFFFFFFL)) And &HFFFFFFFFL

  Return hash

End Function

So actually, it comes down to 'manually' preventing overflows wherever they can occur; even in intermediate results; then why did it take me so long to figure this out???

Hurrying back to Delphi again, yuck!

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *