Hi!
Being annoyed by the fact that all altcoins are at least mineable on a GPU (since my GPU sucks and my FPGA is quite low-end), I thought about creating an altcoin that will never perform well on a GPU, FPGA or ASIC. So I thought: "What is it that CPUs can do much better than GPUs?" and the answer of course is "Conditional branching!".
So my idea is basically this:
Many coins already chain multiple hash algorithms together, but they do so in an order that is known beforehand. This means it's no problem for a GPU and even on an FPGA/ASIC, you can just chain it. Instead, I want to make the next hash algorithm to use dependent on the output of the previous hash function. This can be done by giving each hash function a number and determining the number of the next hash function to use using (hash % num_hashfn).
So why is this hard for a GPU/FPGA/ASIC? Let's start with the GPU:
GPUs are good at processing lots of data using the same operations, but as soon as you need to branch, their performance is laughable. Since the next algorithm to use is different for each block you try (except for the first hash function to use, as that should be dependent on something like the last hash of the last block in the block chain), you are not using the same operations on a lot of data - you are using different operations on a lot of data and need to look at the output of the last hash to determine which code to execute next. GPUs suck at this, so much that it's commonly recommended to do 10 calculations at once and check at the end which one you need and then throw away the other 9.
Ok, so this is hard on GPUs. But why is it hard for FPGAs/ASICs? Well, there are two options you have to implement this on an FPGA/ASIC. Let's start with option A: Option A is to use a state machine and do the hash depending on the state. At the end, you set the state to the next hash function to use. This requires a flip flop to store the state and a flip flop needs one cycle to change. So you can only do one hash for each cycle, which means you would need to have a really high clock rate. Affordable FPGAs and ASICs usually have quite low clock rates and thus, this would significantly slow them down.
Option B is to use a lot of mux and demux. The output of (hash % num_hashfn) can be given as input to a number of demuxes which then give the wires to the correct hash function module and at the end of the circuit use a mux to select the output from the right hashing functions. However, that would make the circuit extremely complex, making it not fit many times on an FPGA and thus only allow very limited parallelism. The same is true for an ASIC, which would need an extremely large die.
As you can see, this algorithm works quite good on a CPU which has many GHz and is good at branching (well, compared to the others, that is), but is slow on a GPU and either slow or very expensive on an FPGA/ASIC.
So what do people think of this? Is this something people are interested in and I should implement? Any feedback on the design?
// Edit: Possible names for this would be BranchCoin or JumpCoin
.
I think combining various principles employed by hashing algorithms that stayed CPU-only for a decent amount of time combined with heavy usage of conditional branching would suit your purpose well. I read that Primecoin's use of integers is doing extremely well at being CPU-only, but I haven't really reviewed the actual algorithm to verify that. Protoshares' Momentum algorithm looked interesting as well. Consecutive rounds of hashing with different algorithms and large memory requirements were also effective for some time.
A quick implementation would be something like:
Hash1(input + nonce1) -> a
Hash2(a) -> b
if(nonce1.ToInt) is even { Hash3(input + a + b) -> c } else { Hash4(input + a + b) -> c }
Hash4(Hash3(Hash2(Hash1(c.ToInt.ToString)))) -> d
d must be lower than current diff
Hash1(input + nonce2) -> e
Hash2(e) -> f
if(nonce.ToInt) is even { Hash3(input + e + f) -> g } else { Hash4(input + e + f) -> g }
Hash4(Hash3(Hash2(Hash1(g.ToInt.ToString)))) -> h
h must be lower than current diff
next hash based on (block number ^ 2) % 4
Hash(d + h) must be lower than current diff
Or something...
That's prolly a bad algorithm now that I look at it. But something like that, with more branches and more memory requirements.