oh please.
Cute idea and it's going to make the GPU fun, but this is a bogus claim.
Process roughly 4096 keys at a time in 16 different workgroups, each working on a specific algorithm, shuffling the keys between them as they switch between rounds. Requires a little more attention paid to global memory use because the keys are probably too big to fit in local, but there's nothing fundamentally hard about this coin.
Sorry, we don't get it, could you elaborate? Especially on how are you going to synchronize these workgroups and what do you mean by shuffling the keys between them? Shuffling like moving them around in a global memory? Or perhaps draw the general idea, a picture would make it more clear.
1. Have 16 regions of global memory, one per hash function to compute. Each region can store, say, N keys.
2. Define a function that groups keys by their low-order 4 bits using local memory, for a workgroup. Call this put_keys_in_bins().
The result of calling put_keys_in_bins is that all 256 of the keys computed by a workgroup will be placed into the appropriate regions of global memory.
3. Compute a lot of sha512 hashes for a lot of keys and call put_keys_in_bins().
for i = 0; i < 16; i++ {
for algo = 0; algo < 16; algo++ {
invoke hash_algo on bin[algo]
put_keys_in_bins for all of those keys again.
}
}
I'll stop doing your homework for you at this point. There are plenty of GPU devs who could do this. the choice of Echo makes it a little problematic for C&C to destroy this coin in a few days, but if anyone would like to place bets.... :-)
OK, we get it now. Looks promising, we will try to apply this optimization in a GPU miner.