Bitcoin Forum
December 14, 2024, 10:38:08 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: BFL Single and BFL mini-rig seems to have inferior performance  (Read 6689 times)
bitfury (OP)
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
July 01, 2012, 08:42:13 AM
 #1

Hello All!

I've based some of my research on topic, basically used previous work of ngzhang
where he have identified chip: https://bitcointalk.org/index.php?topic=79825.0

Unfortunately I do not know exact speed grade of used devices.

In Quartus using my "prototype" code that I used for Hardcopy IV evaluation,
and other Stratix and Cyclone V devices (I would remember that for Cyclone V
it is possible to get 320 Mh/s performance per chip @ 160 Mhz @ 6 W approx).

The same code on EP3SL150F780C4 gave highest clock 220 Mhz, and on
EP3SL150F780C3 gave highest clock 250 Mhz. It is exactly unrolled round calculation.
And clock is based on "Slow 110mV 85C Model Fmax Summary" so if some overvolt
practice done it would run a bit (probably like 10%) faster.

Fitter report:

Fitter status: Successful - Sun Jul 01 10:22:07 2012
Quartis II 64-Bit Version: 11.1 Build 173 11/01/2011 SJ Full Version
Revision Name: ALAdder
Top-level Entity Name: sha_s4_test
Family: Stratix III
Device: EP3SL150F780C4
Logic utilization: 86%
  Combinational ALUTs: 86,417 / 113,600 (76%)
  Memory ALUTs: 0 / 56,800 (0%)
  Dedicated logic registers: 85,360 / 113,600 (75%)
Total registers: 85360
Total pins: 7/488 (1%)
Total block memory bits: 198,080 / 5,630,976 (4%)

As you see - one of improvements for Stratix / Cyclone design is to use RAMs...I use them with altsyncram primitive as
it gives me ability to implement shift registers with read_during_write_mode_mixed_ports => "DONT_CARE" mode, which
is important or otherwise memory will be slower (consider this as a HINT to BFL - to not use altshift_taps or automated synthesis of shift registers).

PowerPlay gives estimation of about 26214.26 mW and average toggle rate 249.704 millions transitions / sec (for 250 Mhz clock setup).

So what this means for BFL single device:

1) Without any overdriving practices device with C4 could deliver 220*4 = 880 Mh/s and with C3 250*4 = 1000 Mh/s
2) Power consumption would be (let's assume that PowerPlay lied and it is 30 W @ 250 Mhz @ 1.1 V) ~ 50W for C4 chip and 65 W for C3 chip;
3) in case of overdriving chip to 1.2 V C4 chip would deliver about 960 Mh/s and C3 chip about 1090 Mh/s, power consumption would be about 60 W and 76W correspondingly;

But because they already have about 80W power consumption, that leads me to conclusion, that C3 chip is used, but top-level logics and round maths is inferior and suboptimal.  As basically you could get _lower_ performance just by doing operations in wrong order.

I've already tried to contact BFL in PM - regarding my development and ASIC future deployments, but no answer. Maybe this topic would add some heat.

But I have few questions here:
1. What chip speed grade exactly there ?

2. What voltage is used there (this can be probably measured by many owners of BFL singles) ? Is it standard 1.1 V or something like 1.2 V ?

It would be nice, if BFL would do full disclosure here about their previous product art, as their ASIC initiative seems to make them obsolete already.

Also 2 BFL - if you use same top-level for your ASIC development, don't you think you may end up with product "obsolete-on-arrival" ? Because this is not "custom IC cell design", this is just math, and not complex part of it - as this "test sha_s4_test.vhd" is actually pretty small file mostly using RTL-style code and not using low-level primitives etc. I've run fitter and synthesis without optimization settings. But - if you can't deliver best in top-level optimization, why would I believe, that you would in low-level, where things are more complex and you'd likely have to do full-wave simulations of your custom cells, and still have several re-spins ? Or in layout - because layout could be done in a way by automatic tools, that will destroy all harvested performance. (2 DiabloD3 and those who think that custom IC is always that difficult - NO - I've studied more - if you don't try to harvest performance, and would do reliable cell and don't care about performance much, it is _likely_ that your cell would work, you may even implement cell that would work on different fabs... "portable" one but it would be quite inefficient... actually custom IC may be even cheaper - because basically it is just the same as PCB but on silicon, so if you do design in a way where you accept wide tolerance of your transistors - you get good manufacturability, good portability but poor performance... surprising, but tools for custom IC design without extensive modelling are actually cheaper - say for example www.tannereda.com - pretty nice tool to go from schematic to layout of chip - you even can get evaluation there for free and try to layout several transistors yourself, testing their performance in SPICE... that would be however likely far from specs you get from silicon... ).

I would be sorry, if you have worked on ASIC for quite long period and already have layout, because it seems that you'll have to re-do it if my estimations about your top-level is right.

Regards,
BitFury.

PS. 2 BFL fans - please do not turn BFL into religion :-) There's mining speculation subforum for exactly that purpose.

PPS. As you see _only_ 75% of chip is used... In extra space there could be fitted approx 2 times bigger serial hashers as addition, as design using automated placement would fit into 90% of a chip. Leaving about 10%. In these 10% it is possible to place about 8 serial hashers running at same or faster (LIKELY FASTER) clock. Each hasher outputting additional 3.5 Mh/s - so +25-28 Mh/s per chip and +50-56 Mh/s per BFL single. Setting best theoretical output as 1140 Mh/s per BFL single.
Dargo
Legendary
*
Offline Offline

Activity: 1820
Merit: 1000


View Profile
July 01, 2012, 01:51:46 PM
 #2

Well, this is a pretty thinly veiled piece of BFL-bashing.
BlackPrapor
Hero Member
*****
Offline Offline

Activity: 628
Merit: 504



View Profile WWW
July 01, 2012, 02:11:46 PM
 #3

Maybe he realized that there is no way to compete with BFL and wants to get a job there, by showing how 'smart' he is. Otherwise, he'd just make something better himself Smiley

There is no place like 127.0.0.1
In blockchain we trust
bitfury (OP)
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
July 01, 2012, 02:48:56 PM
 #4

Maybe he realized that there is no way to compete with BFL and wants to get a job there, by showing how 'smart' he is. Otherwise, he'd just make something better himself Smiley

Well. I don't need job with them, but I may sell to them solution of course. Right now they have more than enough money just to buy some know-hows about sha256. And looking at their hash-rates I clearly see that they need it :-)

As for Spartan solution - it is already fastest, and if yohan's prices for board combined with my bitstream price per Mh/s would be $0.53 / Mh/s for FPGA (1200 Mh/s for $640). This already beats BFL prices. If my licensing per-spartan would be applicable ($25 per chip) - then it would be 1200 Mh/s for $740 - $0.616 - again beats BFL. But - yohan prefers 840 Mh/s :-) While our capabilities do not allow to deploy quickly and cheap solutions.

For ASIC solution it is tougher - because this sets stakes higher. And I don't like to happen competing with phantom like it was with 20 W / 1050 Gh/s single... Trying like crazy getting 500 Mh/s from single Spartan :-) And then seeing that no magic was there, just some marketing fraud. I think they did it with this intention as well, to make others spending time trying to compete with the thing you can't :-) And then simply say "oops" - it is 80 W but not 20 W :-) Sorry - this is the thing that I won't forget :-) Quite happy that mini-rig was done differently. :-)
kakobrekla
Hero Member
*****
Offline Offline

Activity: 714
Merit: 500


Psi laju, karavani prolaze.


View Profile
July 01, 2012, 03:10:41 PM
 #5

Maybe he realized that there is no way to compete with BFL and wants to get a job there, by showing how 'smart' he is. Otherwise, he'd just make something better himself Smiley
Quite happy that mini-rig was done differently. :-)

Well, it was cut to half to get it together.

bitfury (OP)
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
July 01, 2012, 03:41:30 PM
 #6

Maybe he realized that there is no way to compete with BFL and wants to get a job there, by showing how 'smart' he is. Otherwise, he'd just make something better himself Smiley
Quite happy that mini-rig was done differently. :-)

Well, it was cut to half to get it together.

Well. wrong point. To clarify it further - look @ transactions of BFL:

http://blockchain.info/address/1JuZT3sBomuzcFjQvVTLdXM97U6wCvazJR

Today there's 7'500 BTC is left (of total 112k BTC turnover).

I would gladly accept that 7'500 BTC and sell them know-how that would make their solution faster in range 10% to 20%.

This deal is effective for next 24 hours, otherwise terms may be reviewed.

What this makes ? they could deliver something in range 30 Th/s to 50 Th/s and could 35 Th/s to 55 Th/s. Effectively increasing gain for all their customers, and if they would wish for all buyers of mini-rigs and singles.

If they would wish - for their payment I could make full disclosure here (if they don't fear other developers for ASICs and FPGAs). But I won't do it on my own, as this is small edge from start over their tech. Of course that is not exclusive transfer of IP and not impose any condition that it won't be used in any other ASIC or FPGA solution. Exclusive deal will not be possible as I would embed into price for it potential revenues forecasting next 5 years, as it would quite crazy to compete with this solution.

ice_chill
Sr. Member
****
Offline Offline

Activity: 336
Merit: 250


View Profile
July 01, 2012, 04:46:20 PM
 #7

Interesting post, but FPGA is already dead, based on the fact that people are canceling their FPGA orders and converting them to ASIC. 6 months ago this post would be much more interesting, when FPGA was in the headlines.
Raoul Duke
aka psy
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002



View Profile
July 01, 2012, 05:01:50 PM
 #8


Well. wrong point. To clarify it further - look @ transactions of BFL:

http://blockchain.info/address/1JuZT3sBomuzcFjQvVTLdXM97U6wCvazJR

Today there's 7'500 BTC is left (of total 112k BTC turnover).


That address belongs to bit-pay, not to BFL, so, you can't say for sure if that payment was for them or not.
Also, the only transaction to that adress today was of 3.500 BTC, not 7.500 like you say.
Look at the first transaction that ever went to it, 1 year ago http://blockchain.info/tx-index/1018671/3f53678ae6f8d51f004705aae9dff75c62e924fe34fb26c04c869f861fade274

You don't seem as smart as you say you are, dude. Roll Eyes
nedbert9
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250

Inactive


View Profile
July 01, 2012, 05:43:06 PM
 #9


Well. wrong point. To clarify it further - look @ transactions of BFL:

http://blockchain.info/address/1JuZT3sBomuzcFjQvVTLdXM97U6wCvazJR

Today there's 7'500 BTC is left (of total 112k BTC turnover).


That address belongs to bit-pay, not to BFL, so, you can't say for sure if that payment was for them or not.
Also, the only transaction to that adress today was of 3.500 BTC, not 7.500 like you say.
Look at the first transaction that ever went to it, 1 year ago http://blockchain.info/tx-index/1018671/3f53678ae6f8d51f004705aae9dff75c62e924fe34fb26c04c869f861fade274

You don't seem as smart as you say you are, dude. Roll Eyes


I think the point was the huge spike in tx activity on 6/23 forward.  Plausible.

As for the last part.  Could you build the rack, cooling, PCB's and bitstreams they did?  No?  Ok, then.
bitfury (OP)
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
July 01, 2012, 06:02:56 PM
 #10

Interesting post, but FPGA is already dead, based on the fact that people are canceling their FPGA orders and converting them to ASIC. 6 months ago this post would be much more interesting, when FPGA was in the headlines.

But this is not about fpga, but about math in rounds (top-level vhd or verilog design RTL or primitives - that does not matter). So it applies to ASICs as well as to FPGAs... If they had it - would you think they would intentionally cripple their BFL single and mini-rig ?
Raoul Duke
aka psy
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002



View Profile
July 01, 2012, 06:10:11 PM
 #11


Well. wrong point. To clarify it further - look @ transactions of BFL:

http://blockchain.info/address/1JuZT3sBomuzcFjQvVTLdXM97U6wCvazJR

Today there's 7'500 BTC is left (of total 112k BTC turnover).


That address belongs to bit-pay, not to BFL, so, you can't say for sure if that payment was for them or not.
Also, the only transaction to that adress today was of 3.500 BTC, not 7.500 like you say.
Look at the first transaction that ever went to it, 1 year ago http://blockchain.info/tx-index/1018671/3f53678ae6f8d51f004705aae9dff75c62e924fe34fb26c04c869f861fade274

You don't seem as smart as you say you are, dude. Roll Eyes


I think the point was the huge spike in tx activity on 6/23 forward.  Plausible.


What's plausible is that it's a Bit-pay consolidation adress. Just look at the exact sums that get sent to the address.
Bit-pay converts the USD price to BTC each time a payment is made, so those exact amounts are impossible. Also, they use a different address for each transaction.
What you're seeing there is one of bit-pay's cold storage addresses. Ofcourse some of those coins are from BFL sales, but that's not a BFL address.
jothan
Full Member
***
Offline Offline

Activity: 184
Merit: 100


Feel the coffee, be the coffee.


View Profile
July 01, 2012, 06:23:19 PM
 #12

With a desktop power supply powering my 6 singles and a Dreamplug, I get about 50 Watts per single at the wall.

It must be the default power adapter that eats up a whole 30 Watts to itself.

Bitcoin: the only currency you can store directly into your brain.

What this planet needs is a good 0.0005 BTC US nickel.
bitfury (OP)
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
July 01, 2012, 06:37:16 PM
 #13

With a desktop power supply powering my 6 singles and a Dreamplug, I get about 50 Watts per single at the wall.

It must be the default power adapter that eats up a whole 30 Watts to itself.

Thanks for report, but could you measure it with multimeter - voltage and current consumed by board ? That is really interesting. And possibly core voltage ?

Thanks again!!!
nedbert9
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250

Inactive


View Profile
July 01, 2012, 07:00:43 PM
 #14


Well. wrong point. To clarify it further - look @ transactions of BFL:

http://blockchain.info/address/1JuZT3sBomuzcFjQvVTLdXM97U6wCvazJR

Today there's 7'500 BTC is left (of total 112k BTC turnover).


That address belongs to bit-pay, not to BFL, so, you can't say for sure if that payment was for them or not.
Also, the only transaction to that adress today was of 3.500 BTC, not 7.500 like you say.
Look at the first transaction that ever went to it, 1 year ago http://blockchain.info/tx-index/1018671/3f53678ae6f8d51f004705aae9dff75c62e924fe34fb26c04c869f861fade274

You don't seem as smart as you say you are, dude. Roll Eyes


I think the point was the huge spike in tx activity on 6/23 forward.  Plausible.


What's plausible is that it's a Bit-pay consolidation adress. Just look at the exact sums that get sent to the address.
Bit-pay converts the USD price to BTC each time a payment is made, so those exact amounts are impossible. Also, they use a different address for each transaction.
What you're seeing there is one of bit-pay's cold storage addresses. Ofcourse some of those coins are from BFL sales, but that's not a BFL address.


Agreed.  I don't know what bit-pay's volume is, but I can't imagine that the volume from 6/23 forward is coincidental.
hm
Member
**
Offline Offline

Activity: 107
Merit: 10


View Profile
July 01, 2012, 07:19:01 PM
 #15

IIUC, with EP3SL150F780C4/C3 the the clock rate (MHz) to hash rate (Mh/s) ratio seems to be the same (different power consumption).
Will the ratio be more than 1Mh/s per 1MHz with newer generation FPGA, or will they instead just have more MHz?

sorry, it seems I don't yet understand enough to ask the right questions...
bitfury (OP)
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
July 01, 2012, 07:34:03 PM
 #16

IIUC, with EP3SL150F780C4/C3 the the clock rate (MHz) to hash rate (Mh/s) ratio seems to be the same (different power consumption).
Will the ratio be more than 1Mh/s per 1MHz with newer generation FPGA, or will they instead just have more MHz?

sorry, it seems I don't yet understand enough to ask the right questions...

Depends on architecture. For example for spartan sea-of-hashers - it is not. Mh/s is based on how many small cores inside placed. Fractional could be there.

Basically for example in Spartan - 61 clock cycles used for calculation 4 cycles used for loading of new job. 65 clock total. 82 cores. You get <clock> * 82 / 65 Mh/s hash rate.

For Altera solution I've put into chip that is 2 unrolled sha256, so output 2 * <clock> Mhz.
this time
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 01, 2012, 08:16:41 PM
 #17

Maybe he realized that there is no way to compete with BFL and wants to get a job there, by showing how 'smart' he is. Otherwise, he'd just make something better himself Smiley


As for Spartan solution - it is already fastest, and if yohan's prices for board combined with my bitstream price per Mh/s would be $0.53 / Mh/s for FPGA (1200 Mh/s for $640). This already beats BFL prices. If my licensing per-spartan would be applicable ($25 per chip) - then it would be 1200 Mh/s for $740 - $0.616 - again beats BFL. But - yohan prefers 840 Mh/s :-) While our capabilities do not allow to deploy quickly and cheap solutions.


If you can do this, why not use the tricone model? Publish a bitstream for cairnsmore that extracts a commission and go straight to the end user. Every single person would use it. You could direct 200mh/s? to yourself and make your goal of $25/chip in no time. There are hundreds of these ready to go.
GernMiester
Sr. Member
****
Offline Offline

Activity: 285
Merit: 250


View Profile
July 01, 2012, 08:36:36 PM
 #18

Do you own the hardware? If you don't have at least 2 of each for testing go shit in your hat.
Whats your products performance? NO SIMULATIONS either.

bitfury (OP)
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
July 01, 2012, 08:46:41 PM
 #19

Maybe he realized that there is no way to compete with BFL and wants to get a job there, by showing how 'smart' he is. Otherwise, he'd just make something better himself Smiley


As for Spartan solution - it is already fastest, and if yohan's prices for board combined with my bitstream price per Mh/s would be $0.53 / Mh/s for FPGA (1200 Mh/s for $640). This already beats BFL prices. If my licensing per-spartan would be applicable ($25 per chip) - then it would be 1200 Mh/s for $740 - $0.616 - again beats BFL. But - yohan prefers 840 Mh/s :-) While our capabilities do not allow to deploy quickly and cheap solutions.


If you can do this, why not use the tricone model? Publish a bitstream for cairnsmore that extracts a commission and go straight to the end user. Every single person would use it. You could direct 200mh/s? to yourself and make your goal of $25/chip in no time. There are hundreds of these ready to go.

With tricone model unrolled round design is necessary, because otherwise it is near impossible to protect design. for me this means start another design from scratch for unrolled round. I already had ideas how to fit actually 2 full unrolled rounds there at clock 150-160 Mhz approx. (300 - 320 Mh/s). and little amount of small hashers (4-5) each giving 4 Mh/s approx. So total about 316 Mh/s to 340 Mh/s which is more than sea of hashers but also power envelope not 12 W but only 8W. This design would beat also Stratix approach with BFL mini-rig, as basically spartans would consume for 25 Gh/s about 820 W from plug compared to 1.25 kW of mini-rig. And will work on all boards as well. But this is due found solution with round expander which needs to wide buses with dual-clock design.

Plus another big part of work is testing for compatibility with every board around - I need these boards.

About question for hardware - about 655 spartans installed and mining using current bitstream (one full and one 5/6 filled rack).
So yes - verified in hardware of course, not in software. About BFL estimation - only simulations though, so it may fail due to power problems... but actually not likely, as these conditions are not putting chips to their edges at all, compared with 12 W on Spartan device.
hm
Member
**
Offline Offline

Activity: 107
Merit: 10


View Profile
July 01, 2012, 10:28:08 PM
 #20

IIUC, with EP3SL150F780C4/C3 the the clock rate (MHz) to hash rate (Mh/s) ratio seems to be the same (different power consumption).
Will the ratio be more than 1Mh/s per 1MHz with newer generation FPGA, or will they instead just have more MHz?

sorry, it seems I don't yet understand enough to ask the right questions...

Depends on architecture. For example for spartan sea-of-hashers - it is not. Mh/s is based on how many small cores inside placed. Fractional could be there.

Basically for example in Spartan - 61 clock cycles used for calculation 4 cycles used for loading of new job. 65 clock total. 82 cores. You get <clock> * 82 / 65 Mh/s hash rate.

For Altera solution I've put into chip that is 2 unrolled sha256, so output 2 * <clock> Mhz.

thank you for the insight.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!