Bitcoin Forum
November 10, 2024, 07:39:42 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: FPGA mining - massively parallel looped vs unrolled  (Read 1579 times)
smellymoo (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0



View Profile WWW
January 05, 2013, 07:05:45 PM
 #1

I started a thread in the newbie section, but I would rather discuss it in it's rightful place.

Before I start, I have no desire to debate ASIC. As it's not the point of the question, first I am slightly doubting anything will transpire, and even if it does, I would like to talk about FPGA's and not theoretical ASIC's.

now, I am interested in making an array of FPGA chips that mine coins, and every where I look people are unrolling and pipeling to achieve 1 or 2 bitcoin mining (2x sha) cores. What I am wondering is what happens if you go the other way, and instead try and make a sha core as small as possible (looped and hand designed) then repeat it many times in cheaper FPGA chips to make a massively parallel set up instead.

the way I see it, it's a trade of, speed of one complete bitcoin hash vs amount of logic blocks used, with a threshold of the maximum logic blocks of the FPGA chip.

a simple example would be that having 2 sha cores linked up to do 1 bitcoin hash is simular is speed (I think) to 2x one sha core doing 1 bitcoin hash in 2 steps. only difference is that one can fit on a chip with half as many logic blocks. or am I wrong?

any real numbers to argue for or against what I propose?
pieppiep
Hero Member
*****
Offline Offline

Activity: 1596
Merit: 502


View Profile
January 05, 2013, 07:26:42 PM
 #2

3 times a spartan LX-150 vs 50 times a spartan LX-9 ?
In terms of logic it is 3 * 150 = 450 and 50 * 9 = 450 so the same?
Except that 50 spartan LX-9's are probably a lot more expensive than 3 spartan LX-150's.
phr33
Full Member
***
Offline Offline

Activity: 226
Merit: 100


View Profile
January 05, 2013, 07:27:21 PM
 #3

The fully unrolled version is best use of the FPGA resources. When you start folding the design you need to add control logic and large muxes that selects the combinatorial path for the current round. In FPGA's everything i simplemented in LUTs, so all that muxing will steal LUTs from the combinatorial logicfor the acctual algorithm.

You will reduce the number of registers needed, but that's not the most scarse resource. At least not with any normal FPGA fabric...


::EDIT

And if we take into consideration FPPGA pricing I think you would still go for the fully unrolled design. I think the FPGA with most LUT's / $ will fit such a design...

My BTC input: 1GAtPwoTGPQ35y9QugJueum5GzaEzLYjiQ
My GPG ID: B0CCFD4A
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1073



View Profile
January 05, 2013, 07:37:34 PM
 #4

any real numbers to argue for or against what I propose?
Check out the posts by bitfury. He was/is offering such design for sale.

https://bitcointalk.org/index.php?topic=83332.0

http://www.bitfury.org/bitfury110.html

http://www.bitfury.org/xc6slx150.html  <- the planahead porn is here

Briefly: almost everyone is offering the unrolled designs because the rolled sea-of-hashers design hit some worst cases in the synthesis/place/route toolchains: they either fail to converge or converge to shamefully bad implementations. Any practical implementation would have to utilise some sort of workaround for the toolchain's lack of convergence.

I believe that the most recent/fastest bitstreams from ngzhang are also closed-source because of the effort he had to expend to successfully implement them. The default Xilinx ISE wasn't cutting it anymore.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!