In that post I showed a comparison between the reference and tmto implementation at memory parity, which showed a factor 70 slowdown and a factor 1-1/e reduced solution finding rate (due to sampling a fraction 1/42 of all vertices as starting points for the depth 21 breadth first search).
Discounting for the lower solution finding rate, the tmto implementation was 111 times slower.
I tried another approach, which turns out to be superior. First of all, assume that the graph we're working on has a unique 42 cycle, and we hope to find it by sampling a small fraction of nodes, and exploring their local neighbourhood with a depth-limited breadth-first-search (BFS).
Instead of doing a depth 21 BFS , which (roughly) requires starting from a node on the 42-cycle, we can do a much deeper search, say depth 42. Then we'll find the 42-cycle as long as we start from any node with distance at most 21 to the 42-cycle, which seems much more likely.
Apart from doubling the BFS depth, there is another difference. In the former approach, we only care about nodes on the cycle, and can eliminate any that are known to be off the cycle, such as those with only one incident edge. In the latter approach, we cannot eliminate these. So the BFS tree will be much larger in size relative to the starting set, not only due to doubled depth, but to less elimination as well.
To keep space usage down, we thus need to shrink the starting sets even more.
I ran some experiments which indicate that the increased likelihood of finding the cycle more than makes up for having smaller starting sets.
The updated tomato_miner.h can now use k times less memory with a (discounted) slowdown of roughly 25*k. Further details will appear in the whitepaper, in a week or two.
With this progress on the time-memory trade-off, the partial bounty payout, and the relaxation of the slowdown, I'm reducing the TMTO bounty from $1000 to $500.