Hello folks, this thread appears to be the place with the highest concentration of zECQuihash optimization gurus. May I trouble you with a stupid question?
You need a fair bit of space to hold the index-lists...
Indeed you do.
I'm trying to correlate that with the zcashd implementation, since it's the only available working cross-checker (too bad it is written for speed instead of readability). The only part that has me confused is the need to eliminate entries whose index sets contain duplicate/repeated entries after EVERY one of the k collision steps. There is no mention of this in the algorithm given in the Equihash paper.
I understand why these can't be part of the final solution (the X_i's must use distinct i's). But why not wait until the end to drop these superfluous solutions? Why check for duplicates k times? I did some quick back-of-the-envelope calculations and I'm having a hard time believing that leaving these entries in the table would significantly increase memory consumption. If you had a duplicate X_a in the second level of the collision, it would have to be a result of (X_a*X_b) colliding with (X_a*X_c) on the first 40 bits (where X_a collides with each of X_b and X_c on the first 20 bits). If this is true then
((X_a*X_b)*(X_a*X_c)) = 00000000000000000000 00000000000000000000
and therefore
(X_a*X_a*X_b*X_c) = 00000000000000000000 00000000000000000000
so
(X_b*X_c) = 00000000000000000000 00000000000000000000
and moreover
(X_a*X_b) = 00000000000000000000 ....................
(X_a*X_c) = 00000000000000000000 ....................
In general it seems like repeated indices arise in the Equihash-constrained variant of Wagners algorithm as a result of an "overcollision" -- when two rows of the table collide not just on the current column but the next one as well. When this happens the members of the overcollided set will generate one suprious entry in the next round for every entry they collide with in the current round. It seems like that would happen on average twice per column. Is that really enough to bloat the memory usage in a noticable way for
k=10 k=9? Surely for something like k=100 we'd be talking about serious combinatorial explosion, but for
k=10 k=9 there just aren't that many steps compared to the size of the table.
(edit: changed k=10 to k=9)