Again, we must wait to see the final parameters and optimized implementations. It seems likely they'll pick
n=144 and k=5, which means they need to find collisions on 16 bits at a time. That is just a lookup in a 64K entry table;
I don't expect to see any real sorting there. It's less clear how the actual items are represented, and if they are moved
themselves, or by pointer.
I believe you have the constants wrong. n=144, k=5 means that they're finding collisions
on n/(k+1) = n/6 = 24 bits at a time, repeated 4 times, and then finding final collisions
on the remaining 144-(4*24) = 48 bits, each time from a set of approximately 2^25
candidate hashes.
You're right; I somehow managed to mess up a trivial calculation. Thanks for the correction.
It's quite intriguing algorithmically. I think the original paper might be a little too glib about the constants, though.
Which constants are you referring to?
Btw, nice to see you back after a 10 month hiatus...
Thanks! I'm not really back - still enjoying the heck out of my sabbatical, but I got curious about this
one this weekend.
The constants: I think they're thinking about the design a bit wrong and should be able to get the
peak tuple length down to 150 bits, or about 600MB, plus perhaps another ~50MB of ancillary
data structures, without incurring more than a constant number of additional hash calculations.
But I could be wrong - I'm just handwaving in my head about how I'd solve it and haven't
tried writing it down carefully. That wouldn't substantially change their view of their memory
use.
I think they slightly overcount (again, small things - maybe 20%) the number of rows that
need to be sorted in main memory, because they're ignoring some of the things that can
hit in L3 cache. As above, though, it's a minor difference.
As others already noted, their analysis ignores HBM / stacked DRAM, which, unlike last year,
is now distinctly real (though still expensive).
None of that is particularly important from an algorithmic standpoint.