Conntrack hash function comparisons II

The comparison of the candidate hash functions for conntrack is based on Patrick Schaaf's cttest program. All the results below can be reproduced by running 'mkstat.pl' from the package.

The results of the first conntrack comparison can be found here.

The key of the hash functions for conntrack consists of the network parameters identifying a connection:

In the mathetamical formulae below I use the following notation:

Unfortunately it's not quite trivial how to decide which hash function is the best:

In order to get more data, I widened the artifical test used in the previous conntrack hash function comparison. This time the following input datasets were used: 50000 connections (100000 entry to hash), each "connecting to" one fixed IP address at a given port. The clients are connecting from 50000 different ports. In the different datasets the number of clients (and thus the number of different source IP addresses) is varied:
	50000, 45000, 40000, ... 15000,
	10000, 9000, 8000, ... 2000,
	1000, 900, 800, ... 200,
	100, 99, 98, ... 1
That means 126 input datasets for every hash, at every hash size.

All hashes were tested at the following odd hash sizes:

	8191 8209
        16383 16411
        32767 32771
        65535 65537
        131071 131101
        262143 262147
The best hashes (abcd64 and the jenkins ones) which work for even hash sizes as well were tested additionally with the following hash sizes:
	8192 16384 32768 65536 131072 262144

Hashes original, rt, crc32, abcd64

The original hash, compared to other hashes.

The chi-square tests show clearly that there are real problems with the original hash.

Hashes rt, rt_ab, rt_ab2, rt_new, abcd64

The rt hashes were derived from the IP routing code by Joakim Axelsson and Martin Josefsson.

All of them can produce unsatisfactory results.

Hashes crc32, rusty, abcd64

The crc32 hash use the crc32 function of zlib, the implementation is by Patrick. The rusty hash is based on Rusty's include/linux/hash.h from Linux 2.5.

The crc32 hash is quite good, alas there are serious problems with the rusty hash.

Hashes guilmette, pearson, abcd64

The guilmette hash was suggested by Ron Guilmette on comp.compilers, 13 Dec 1990 06:25:10. The pearson hash is the famous one :-). The original version of both hashes generate hash keys between 0-255, I created the tested variants which produce unsigned long hash keys.

There are problems with the guilmette hash but the pearson hash is quite good.

Hashes abcd, abcd-1, abcdef, abcd_xor, abcd64

The abcd and abcdef hashes were proposed by Don Cohen and Rich Scroeppel, the latter coded by Joakim Axelsson. The abcd_xor hash is by Joakim Axelsson and Martin Josefsson. The abcd-1 variant (formerly abcdx) is the same as the abcd but with different coefficients. The abcd64 hash uses 64bit random numbers and works for even hash sizes as well.

Formerly the abcd hash was my favourite one. However the chi-square tests show that the abcd and abcd-1 hashes are not perfect. The abcd64 hash seems to be OK.

Hashes abcd-1, abcd-2, abcd-3, abcd-4

Another run to examine the abcd hash with different, worse and worse coefficients:

abcd-1 0x898092BB
0x1CE6487C
0xF219B787
0x677F6D40
10001001100000001001001010111011
00011100111001100100100001111100
11110010000110011011011110000111
01100111011111110110110101000000
abcd-2 0x0E3A9C80
0xE1338378
0xB4CC63E7
0x5BC57C1F
00001110001110101001110010000000
11100001001100111000001101111000
10110100110011000110001111100111
01011011110001010111110000011111
abcd-3 0x0000173C
0xF94471C8
0x1EBFC863
0xE7FBAE97
00000000000000000001011100111100
11111001010001000111000111001000
00011110101111111100100001100011
11100111111110111010111010010111
abcd-4 0x0000006A
0x03FD9B60
0xFDCFFE95
0xFE32659F
00000000000000000000000001101010
00000011111111011001101101100000
11111101110011111111111010010101
11111110001100100110010110011111

Unfortunately nothing much could be gained from the results except the suscepted one: there are naturally better and worse coefficients.

A closer look at the abcd hashes revealed, that there is a flawn in these hashes: an attacker could easily fill one hash bucket, if he can guess or find out the hash size. This is due to the algorithm behind the abcd hashes:

	(coef0 * srcip + coef1 * sport 
       + coef2 * dstip + coef3 * dport + proto) % hash_size
By keeping tree input parameters fixed and using the following source IP addresses srcip = const + n * hash_size the hash function becomes
	(CONST0 + coef0 * n * hash_size) % hash_size
which gives a constant hash value for any n. Rehashing does not stop the attack, it just moves the filling bucket.

The abcd64 variant fixes the problem by using 64-bit arithmetic and mixing the upper and lower 32-bit parts of the partial results.

Hashes jenkins, jenkins2, jenkins2b

The jenkins hash is by Bob Jenkins. The jenkins2 and jenkins2b variants are hashes optimized for conntrack by me.

They are overall very good hashes.

Hashes abcd64, abcd64b, abcd64c, abcd64d

Comparison of different variants of the abcd64 hash: abcd64 has 64-bit random coefficients while the b, c and d variants have 32-bit ones. abcd64b and abcd64c differs only in coding, an attempt to gain more speed (in vain). The abcd64d hash has the "best" coefficients in the sense that the four random coefficients were chosen so that they have 16 one's and 16 zeros.

The chi-square graphs show that the abcd64 hashes are not perfect for hash sizes of power of two.

Hashes abcd64d, jenkins2b, jenkins64

This is the comparison of the best hashes found so far. (The jenkins64 hash is the 64-bit variant of the jenkins hash.) Again, these look to be the best hashes, but the abcd64 hash has troubles with even hash sizes.

The jenkins hashes work reliably in every case.

Bit tests

There were a separated testing of the hashes, which probed wether every input bit affects every output bit half the time.

Only the jenkins hashes passed the bit test.

Speed comparisons

Finally the speed comparison of the hashes from the fastest to the slowest:

It is clear, that (taking into account the speed) the best hashes are the jenkins (jenkins2b) and the abcd64 (abcd64d) ones. There are pros and cons for both:

In my opinion is we should adopt the jenkins2b hash.

Any suggestion for improvements are welcomed.

Jozsef Kadlecsik