HW Testing shows that I can get about 2.5MH/s (for the sha256/skein algo) out of a Cyclone IV E115. Not impressive. That's only burning about 1W though, so that's 2.5MH/s/W which is kinda "cool" if you're gonna make a large rig of dozens of FPGAs.
Ouch. It's rather common that FPGAs are slower than GPUs but more power efficient, but this is a bit too far from practical.
Todays cryptocurrency landscape doesn't appear to have much room left for FPGAs. The algo's are a whole generation newer (and more complex) than they were in BitCoin days and those damned GPUs are REALLY REALLY fast now! Even though GPUs are not running algo steps in parallel (like an FPGA can) they're running their core at some insane GHz speed that more than makes up for it.
GPUs run at about 1 GHz, which is way less than current CPUs. They just have so many cores/threads running in parallel. They also have more/faster memory than common FPGA boards. (Though I wonder if the SRAM of DE2-115 could be put to good use...)
If I really work at timing issues I can get logic steps in the FPGA to run up to 100MHz. That's roughly the theoretical max for how fast a single fully unrolled instance of an algo machine can run in this FPGA. The SHA3 generation of algos though are so much more complex than SHA2 that you'd never fit an entire fully unrolled instance in a single IC.
When I hacked together my first FPGA code, I didn't really know/care about timing (and I'm not much wiser today
). The suggested speed limit was about 70 MHz, but the Spartan 3E didn't have the flexibility of the Cyclone IV clocking, so I just built 50 and 100 MHz versions, and they were both fine. Sure, you'll get the occasional error, but the overall efficiency is better with a little overclocking. Ztex boards/miners make great use of this idea by dynamic clocking based on error rate.
Offtopic: Keccak is supposed to be really hardware-friendly, so it could be fine even without full unrolling. Opencores has an implementation, but their login systems is having some issues...