Bitcoin Forum
November 14, 2024, 01:49:48 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
  Print  
Author Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards  (Read 119439 times)
sadpandatech
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500



View Profile
May 22, 2012, 06:02:55 PM
Last edit: May 22, 2012, 08:01:57 PM by sadpandatech
 #241


ur math, check it. that would be 468.75MH/s assuming it is linear to the MHZ. And assuming they could cool it enough and the chip could handle the juice to keep error rate low.

? (300MH/s / 240MHz) * 300MHz = 375MH/s

aye, my math check it. i did  (375MH/s / 240)* 300Mhz for the fail

edit; fixed my broken quote..

If you're not excited by the idea of being an early adopter 'now', then you should come back in three or four years and either tell us "Told you it'd never work!" or join what should, by then, be a much more stable and easier-to-use system.
- GA

It is being worked on by smart people.  -DamienBlack
BTCurious
Hero Member
*****
Offline Offline

Activity: 714
Merit: 504


^SEM img of Si wafer edge, scanned 2012-3-12.


View Profile
May 22, 2012, 07:12:37 PM
 #242


ur math, check it. that would be 468.75MH/s assuming it is linear to the MHZ. And assuming they could cool it enough and the chip could handle the juice to keep error rate low.

? (300MH/s / 240MHz) * 300MHz = 375MH/s

aye, my math check it. i did  (375MH/s / 240)* 300Mhz for the fail
[/quote]Smiley But yeah, they only state that it "worked".

antirack
Hero Member
*****
Offline Offline

Activity: 489
Merit: 500

Immersionist


View Profile
May 23, 2012, 12:44:34 AM
 #243

on the personal front, i suggest stop doing any effort on these pipelined architecture.

wha chew talkin' bout, Willis?

It sounds to me that he is saying several Spartan 6 each doing a small part of the work in parallel would do a better job than the same number of Spartan 6 each doing their own thing.

rjk
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250


1ngldh


View Profile
May 23, 2012, 12:48:57 AM
 #244

on the personal front, i suggest stop doing any effort on these pipelined architecture.

wha chew talkin' bout, Willis?

It sounds to me that he is saying several Spartan 6 each doing a small part of the work in parallel would do a better job than the same number of Spartan 6 each doing their own thing.


I think what he means is that partially rolled hashers would be faster than fully unrolled hashers, on this architecture. It coincides with the info from http://bitfury.org/

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!
kano
Legendary
*
Offline Offline

Activity: 4620
Merit: 1851


Linux since 1997 RedHat 4


View Profile
May 23, 2012, 01:53:26 AM
 #245

Well, the device itself (GPU, FPGA) does 2 sha256 rounds of 64 in length.
However, there is a VERY simple optimisation of that to remove 8 rounds
(4 at the beginning of the 1st and 4 at the end of the 2nd) that is done by GPUs (and most? FPGAs?)
(i.e. a 6.25% gain) that is not available with this at all.
(so you also need to subtract 6.25% from any gain)
Maybe that is what he is referring to?

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
May 23, 2012, 03:48:06 AM
 #246

Well, the device itself (GPU, FPGA) does 2 sha256 rounds of 64 in length.
However, there is a VERY simple optimisation of that to remove 8 rounds
(4 at the beginning of the 1st and 4 at the end of the 2nd) that is done by GPUs (and most? FPGAs?)
(i.e. a 6.25% gain) that is not available with this at all.
(so you also need to subtract 6.25% from any gain)
Maybe that is what he is referring to?

That is already "subtracted" from the results, and apparently both the MH/s and MH/J are still better for the rolled version. This is most likely due to the Spartan6's awful long distance routing fabric, which means that keeping things very close to each other pays off (which is one reason why 85 small, 64-clocks-per-hash cores together are faster than just three 2-clocks-per-hash cores, you can just clock them at much higher frequencies, and you can utilize more area on the chip).

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
May 23, 2012, 03:02:30 PM
 #247

Well, the device itself (GPU, FPGA) does 2 sha256 rounds of 64 in length.
However, there is a VERY simple optimisation of that to remove 8 rounds
(4 at the beginning of the 1st and 4 at the end of the 2nd) that is done by GPUs (and most? FPGAs?)
(i.e. a 6.25% gain) that is not available with this at all.
(so you also need to subtract 6.25% from any gain)
Maybe that is what he is referring to?

That is already "subtracted" from the results, and apparently both the MH/s and MH/J are still better for the rolled version. This is most likely due to the Spartan6's awful long distance routing fabric, which means that keeping things very close to each other pays off (which is one reason why 85 small, 64-clocks-per-hash cores together are faster than just three 2-clocks-per-hash cores, you can just clock them at much higher frequencies, and you can utilize more area on the chip).

Thats an interesting hack. Thats exactly the same reason why GPUs unroll the entire thing, just so the in registers are kept in registers instead of pushed back to local or global RAM.

2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1073



View Profile
May 23, 2012, 11:18:43 PM
 #248

This is most likely due to the Spartan6's awful long distance routing fabric,
This isn't Spartan's fault. This is a property of any modern FPGA: most of the delay and energy loss occurs in the routing fabric. So the easiest way to speed up the design is to minimize the demand on routing resources.

I was always perplexed why everyone here was focusing on unrolling the combinatorial logic. After gaining some experience with the currently available EDA tool suites for FPGA it became obvious: they make the place and route of repetitive designs very difficult.

The "sea of tight hashers" approach will probably be also beneficial for the future ASIC designs, although not by such a wide margin.

Does anyone know if bitfury's design stores the SHA-256 constants in BRAMs or has them spread over through the SLICEs?

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Inspector 2211
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250



View Profile
May 23, 2012, 11:34:05 PM
 #249

This is most likely due to the Spartan6's awful long distance routing fabric,
This isn't Spartan's fault. This is a property of any modern FPGA: most of the delay and energy loss occurs in the routing fabric. So the easiest way to speed up the design is to minimize the demand on routing resources.

I was always perplexed why everyone here was focusing on unrolling the combinatorial logic. After gaining some experience with the currently available EDA tool suites for FPGA it became obvious: they make the place and route of repetitive designs very difficult.

The "sea of tight hashers" approach will probably be also beneficial for the future ASIC designs, although not by such a wide margin.

Does anyone know if bitfury's design stores the SHA-256 constants in BRAMs or has them spread over through the SLICEs?

In a completely unrolled design, there are no long lines.
The start vector is fed in on the left side, then the calculations percolate down to the right, and at the right a "matching" circuit determines if a "golden nonce" was found. There is no feedback from the right side to the left side.
Thus, while I do think that Bitfury's approach is EASIER (as one only has to worry about a few hundred wires and their associated delays, and not tens of thousands), I fail to see why it is inherently faster. I don't think it is inherently faster.
Maybe the Xilinx router goofs up wires that would be short and local and sends them the long way like a crooked cab driver an out-of-town tourist. But, to reiterate, a fully unrolled miner does not involve a feedback from the right side to the left side.

               ▄█▄
            ▄█ ▀█▀
     ▄ ▄███▄▄████▄▀ ▄▄▀▄
    ▀█▄████
██████▀▄█████▀▄▀
   ▄█▀▄
███████████████████▄
 ▄██▀█▀
▀▀▀███▀▀▀█████▄▄▄▀█▀▄
 ▄█▀▀   ▀█
███▀▄████████ █▀█▄▄
██▀  ▀ ▀ ▀
██████████▄   ▄▀▀█▄
     ▀ ▀
  ███▀▀▀▀▀████▌ ▄  ▀
          ████████████▌   █
        █████████████▀
        ▀▀▀██▀▀██▀▀
           ▀▀  ▀▀
BTC-GREEN       ▄▄████████▄▄
    ▄██████████████▄
  ▄██████
██████████████▄
 ▄███
███████████████████▄
▄█████████████████████████▄
██████████████████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
 ▀███████████████████████▀
  ▀█████████████████████▀
    ▀█████████████████
       ▀▀█████████▀▀
Ecological Community in the Green Planet
❱❱❱❱❱❱     WHITEPAGE   |   ANN THREAD     ❰❰❰❰❰❰
           ▄███▄▄
       ▄▄█████████▄
      ▄████████████▌
   ▄█████████████▄▄
 ▄████████████████████
███████████████▄
▄████████████████████▀
███████████████████████▀
 ▀▀██████▀██▌██████▀
   ▀██▀▀▀  ██  ▀▀▀▀▀▀
           ██
           ██▌
          ▐███▄
.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
May 23, 2012, 11:39:23 PM
 #250

This is most likely due to the Spartan6's awful long distance routing fabric,
This isn't Spartan's fault. This is a property of any modern FPGA: most of the delay and energy loss occurs in the routing fabric.

That one has a particularly bad routing fabric though. Virtex, Kintex or even Artix are all much better.

And as pointed out above already, most of your other claims don't really apply here, especially for ASICs I think a pipelined design is likely to perform better for several reasons. The only downside that I can think of right now is that a sea of small cores approach has much better damage containment properties, thus increasing yield.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
May 23, 2012, 11:45:07 PM
 #251

This is most likely due to the Spartan6's awful long distance routing fabric,
This isn't Spartan's fault. This is a property of any modern FPGA: most of the delay and energy loss occurs in the routing fabric. So the easiest way to speed up the design is to minimize the demand on routing resources.

I was always perplexed why everyone here was focusing on unrolling the combinatorial logic. After gaining some experience with the currently available EDA tool suites for FPGA it became obvious: they make the place and route of repetitive designs very difficult.

The "sea of tight hashers" approach will probably be also beneficial for the future ASIC designs, although not by such a wide margin.

Does anyone know if bitfury's design stores the SHA-256 constants in BRAMs or has them spread over through the SLICEs?

In a completely unrolled design, there are no long lines.
The start vector is fed in on the left side, then the calculations percolate down to the right, and at the right a "matching" circuit determines if a "golden nonce" was found. There is no feedback from the right side to the left side.
Thus, while I do think that Bitfury's approach is EASIER (as one only has to worry about a few hundred wires and their associated delays, and not tens of thousands), I fail to see why it is inherently faster. I don't think it is inherently faster.
Maybe the Xilinx router goofs up wires that would be short and local and sends them the long way like a crooked cab driver an out-of-town tourist. But, to reiterate, a fully unrolled miner does not involve a feedback from the right side to the left side.

Theres a small difference, though. There technically is enough room to fit 2 full hashes on a Spartan 6, but due to how the leftover space is arranged, it probably will never fit (so eldentyrell fit 1 and a half). However, a shitload of tiny rolled engines would easily fit into weirdly shaped unused space. I think someone did the math and said they're almost at the equiv of 2 full hashes.

bitfury
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
May 23, 2012, 11:53:36 PM
 #252

This is most likely due to the Spartan6's awful long distance routing fabric,
This isn't Spartan's fault. This is a property of any modern FPGA: most of the delay and energy loss occurs in the routing fabric. So the easiest way to speed up the design is to minimize the demand on routing resources.

I was always perplexed why everyone here was focusing on unrolling the combinatorial logic. After gaining some experience with the currently available EDA tool suites for FPGA it became obvious: they make the place and route of repetitive designs very difficult.

The "sea of tight hashers" approach will probably be also beneficial for the future ASIC designs, although not by such a wide margin.

Does anyone know if bitfury's design stores the SHA-256 constants in BRAMs or has them spread over through the SLICEs?

You have all the clues... Turn on your head and just guess using data you have - print screen from PlanAhead - I certify that it is correct one... Try placing some BRAM and watch your timings... Why would you ask then ?

With routing fabric - it is the same... Open FPGA Editor, and start placing routes manually, understood how QUAD, DOUBLE, SINGLE routes works within spartan, what are costs of switch to switch hop, and switch to logic entry etc. It is interesting, believe me :-) Most pity however with them - is that P&R tool is far from ideal, and less routing resources left - worse design it produces. In SHA-256 round expander kills routing, as taking that w[0], w[1] and w[9] requires a lot of routing, because you basically pulling data from N rounds behind... so you basically put either SRL or BRAM to do that... near end of game... however if working really hard on it - spartan has barely enough resources just to route these parallel rounds - if you find right placement schema to use more adequately vertical and horizontal interconnect. Also interconnect works in one direction only, so if rounds placed in smart way, you'll get more efficiency in routing resources usage ( i.e. A,B  <---> C,D while A --> C and B <--- D are interconnected and placed into same regions).

So I really respect author's work of fitting 1.5 parallel rounds into Spartan 6 - it is tough and very nice work. And probably Spartan is showing his bad temper in error rates. In case of rolled rounds - only single round failures, in case of unrolled rounds - if some part of chip fails more frequently than other - you get higher performance degradation. In my experience during debug runs - it starts to degrade from central slices to peripheral, when you rise clocks. It is interesting indeed if design performance would actually match performance that tools display.

Finally I would say that implementing FPGA design mostly about placement and routing... Do not even start trying it, if you are not prepared to waste weeks figuring all of that things, or use only simple designs, when you have about clocks 2-3 times smaller than chip's maximums... designs @ 50 - 100 Mhz would be easy....
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
May 24, 2012, 12:01:38 AM
 #253

Finally I would say that implementing FPGA design mostly about placement and routing... Do not even start trying it, if you are not prepared to waste weeks figuring all of that things, or use only simple designs, when you have about clocks 2-3 times smaller than chip's maximums... designs @ 50 - 100 Mhz would be easy....

I completely agree. I currently have the most optimized OpenCL kernel for GPUs out there, and the most recent version took me 2 weeks of 6-8 hour a day fiddling to get it done, after 1+ year of working on previous versions.

FPGA design is about 2-3 times harder.

Inspector 2211
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250



View Profile
May 24, 2012, 12:08:05 AM
 #254

In SHA-256 round expander kills routing, as taking that w[0], w[1] and w[9] requires a lot of routing, because you basically pulling data from N rounds behind...

Oh yeah, I totally forgot about that.
Now you got me almost convinced that such a sea of small blocks is the better way to do it.
Live and learn...

               ▄█▄
            ▄█ ▀█▀
     ▄ ▄███▄▄████▄▀ ▄▄▀▄
    ▀█▄████
██████▀▄█████▀▄▀
   ▄█▀▄
███████████████████▄
 ▄██▀█▀
▀▀▀███▀▀▀█████▄▄▄▀█▀▄
 ▄█▀▀   ▀█
███▀▄████████ █▀█▄▄
██▀  ▀ ▀ ▀
██████████▄   ▄▀▀█▄
     ▀ ▀
  ███▀▀▀▀▀████▌ ▄  ▀
          ████████████▌   █
        █████████████▀
        ▀▀▀██▀▀██▀▀
           ▀▀  ▀▀
BTC-GREEN       ▄▄████████▄▄
    ▄██████████████▄
  ▄██████
██████████████▄
 ▄███
███████████████████▄
▄█████████████████████████▄
██████████████████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
 ▀███████████████████████▀
  ▀█████████████████████▀
    ▀█████████████████
       ▀▀█████████▀▀
Ecological Community in the Green Planet
❱❱❱❱❱❱     WHITEPAGE   |   ANN THREAD     ❰❰❰❰❰❰
           ▄███▄▄
       ▄▄█████████▄
      ▄████████████▌
   ▄█████████████▄▄
 ▄████████████████████
███████████████▄
▄████████████████████▀
███████████████████████▀
 ▀▀██████▀██▌██████▀
   ▀██▀▀▀  ██  ▀▀▀▀▀▀
           ██
           ██▌
          ▐███▄
.
bitfury
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
May 24, 2012, 12:13:56 AM
 #255

In a completely unrolled design, there are no long lines.
The start vector is fed in on the left side, then the calculations percolate down to the right, and at the right a "matching" circuit determines if a "golden nonce" was found. There is no feedback from the right side to the left side.
Thus, while I do think that Bitfury's approach is EASIER (as one only has to worry about a few hundred wires and their associated delays, and not tens of thousands), I fail to see why it is inherently faster. I don't think it is inherently faster.
Maybe the Xilinx router goofs up wires that would be short and local and sends them the long way like a crooked cab driver an out-of-town tourist. But, to reiterate, a fully unrolled miner does not involve a feedback from the right side to the left side.

TheSeven said correctly - Spartan routing resources are ugly. no handy BENTQUADs etc.... plus 50% of Slices.X. adds up problems. With Artix my highest expectation 2x Spartan.... but I am afraid to make such predictions, because I've heard that on 28-nm chips there's even more problems with power distribution..... Do not want to make again troubles, like having estimation of 500 Mh/s per chip, then target of 400 Mh/s and finishing with 300 Mh/s.

About "there's no long lines" - I've already commented, but will try to draw it, where epic fail for parallel expander is exactly....

say computing w0+w1 and feeding to w9:

                                        ---+---------------------------------
                                   ---+---------------------------------
                              ---+--------------------------------
                          ---+-------------------------------
                     ---+------------------------------
                ---+-----------------------------
           ---+----------------------------
      ---+----------------------------
 ---+---------------------------
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16

How many wires ? biggest cross-section just for that ? 9x32 bits :-)
The same happens when pushing w9 to w16... and w14 to w16...
Lazy to calculate - but near 512 bits cross-section...

And in Spartan-6 there's difficult to pass more than 256-bit cross-section in 8 slices height long-way (there's
32 QUAD routes per each switch - so 256-bits would use QUAD routes in horizontal case for 8 slices height).

Then what will happen - it will go to DOUBLE route, and will go wide outside of your round expander area slowing
down interconnect for other parts of design....

I've started with that :-( Plus it is a question how this design would survive reality that sha256 is VERY TOUGH TEST for bit error rates. even small infrequent errors are amplified by avalanche effect through rounds.

with unrolled rounds however it is true - no problem there - it works like charm... unrolled design is also more compact than rolled one.... and rolled design within 240 slices is very difficult... even 248 would be easier. as in 240 I had to fight for each register, and reuse parts of logics to do other things.... in my design rounds only looks similar, but in reality there's 3 kinds of rounds with special cases. and they are different.

PS. You've answered before I written post... Anyway I think this will be helpful for those who try with parallel rounds... With ASICs it will do same mess BTW Smiley lots of wires for round expander Smiley + lots of clock problems.

PPS. So getting quick and dense parallel design is tough task - that's why I respect this work!

2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1073



View Profile
May 24, 2012, 01:16:49 AM
 #256

You have all the clues... Turn on your head and just guess using data you have - print screen from PlanAhead - I certify that it is correct one... Try placing some BRAM and watch your timings... Why would you ask then ?
I'm asking because I'm not fully up-to-speed on possible space-time tradeoffs on the current Xilinx platforms. When I worked on them professionally we had the information about the routing and bitstream format available directly from Xilinx (maybe under NDA, I'm not sure, it was years ago).

I've also remember the comments from a poster who implemented the bitcoin hashers on Virtex-6 and quick-and-dirty solution was to use DSP48s for some fraction of the adders in SHA-256 mixing steps.

In theory at least it should be possible to fill every BRAM with multiple copies of the constants and use those constants at least in those hashing cells that are close to the BRAMs. As far as I understand your design you currently have just one class/macro of hashing cell, but have plans on implementing another class/macro to fill out the space that currently remains unused.

Overall, I'll venture to guess that the ultimate Spartan-6 bitstream will use the sea-of-hashers concept and the hashers will be a heterogenous mixture: close-to-DSP48, close-to-BRAM and far-from-DSP-and-BRAM. I occasionally talk to my friends who do digital design and they always mention "don't leave any FPGA resource unused, even at the expense of partially mangling the original algorithm".

I guess the ultimate way to express all the above is that the design space tradeoffs are multidimensional space of clock-freq*number-of-gates*time-to-market.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
May 24, 2012, 01:27:10 AM
 #257

You have all the clues... Turn on your head and just guess using data you have - print screen from PlanAhead - I certify that it is correct one... Try placing some BRAM and watch your timings... Why would you ask then ?
I'm asking because I'm not fully up-to-speed on possible space-time tradeoffs on the current Xilinx platforms. When I worked on them professionally we had the information about the routing and bitstream format available directly from Xilinx (maybe under NDA, I'm not sure, it was years ago).

I've also remember the comments from a poster who implemented the bitcoin hashers on Virtex-6 and quick-and-dirty solution was to use DSP48s for some fraction of the adders in SHA-256 mixing steps.

In theory at least it should be possible to fill every BRAM with multiple copies of the constants and use those constants at least in those hashing cells that are close to the BRAMs. As far as I understand your design you currently have just one class/macro of hashing cell, but have plans on implementing another class/macro to fill out the space that currently remains unused.

Overall, I'll venture to guess that the ultimate Spartan-6 bitstream will use the sea-of-hashers concept and the hashers will be a heterogenous mixture: close-to-DSP48, close-to-BRAM and far-from-DSP-and-BRAM. I occasionally talk to my friends who do digital design and they always mention "don't leave any FPGA resource unused, even at the expense of partially mangling the original algorithm".

I guess the ultimate way to express all the above is that the design space tradeoffs are multidimensional space of clock-freq*number-of-gates*time-to-market.

Thats pretty much my analysis of this too. Everything that can lead to faster hashing is on the table no matter how insane or ugly.

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
May 24, 2012, 08:12:36 AM
 #258

Theres a small difference, though. There technically is enough room to fit 2 full hashes on a Spartan 6, but due to how the leftover space is arranged, it probably will never fit (so eldentyrell fit 1 and a half). However, a shitload of tiny rolled engines would easily fit into weirdly shaped unused space. I think someone did the math and said they're almost at the equiv of 2 full hashes.

Not quite. Due to additional overhead of each core, it is only equivalent to ~1.3 fully unrolled cores hashes-per-clock wise. What bumps this to >1.5 times the total hashing speed is the higher speed those little cores can run at.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
May 24, 2012, 08:24:32 AM
 #259

With ASICs it will do same mess BTW Smiley lots of wires for round expander Smiley + lots of clock problems.

With (fully custom) ASICs, however, you can just match your exact routing needs with wires, which should take care of the routing problems.
I'm certainly not an expert on that area, but I'd expect the overhead of intermediate result storage (in a rolled design) to outweigh the routing overhead (in an unrolled deisn).
As I stated above already a rolled design might still be useful to increase yield by containing defects into smaller functional units.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
May 24, 2012, 03:01:26 PM
 #260

With ASICs it will do same mess BTW Smiley lots of wires for round expander Smiley + lots of clock problems.

With (fully custom) ASICs, however, you can just match your exact routing needs with wires, which should take care of the routing problems.
I'm certainly not an expert on that area, but I'd expect the overhead of intermediate result storage (in a rolled design) to outweigh the routing overhead (in an unrolled deisn).
As I stated above already a rolled design might still be useful to increase yield by containing defects into smaller functional units.

Thats only if you get real ASIC. SASIC still screws you the same way since its just a hardwired version of the FPGA.

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!