Bitcoin Forum
December 05, 2016, 08:27:40 AM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 [2] 3 4 5 6 »  All
  Print  
Author Topic: An estimate of fpga performance  (Read 48715 times)
ArtForz
Sr. Member
****
Offline Offline

Activity: 406


View Profile
December 23, 2010, 10:13:27 AM
 #21

300MHz? on a Spartan3? Roll Eyes
Oh, and bitcoin hash is TWO rounds of sha256.
I just synthesized it, 60MHz max for one core on a -5 speed grade S3E-500.
So NOT
300MHz / 80 clocks/hash * 3 cores = 11MHps
instead (assuming we can lose overhead and just have to do a mid-add and a compare)
60MHz / 130 clocks/hash * 3 cores = 1.4MHps

at $20/chip thats 0.07MH/$ or about 25x worse than a HD5970...

and for "GPU needs mainboard".. FPGA needs PCB, VRMs, config memory, some kind of host connection, ...

So yeah, pull a few crazy numbers out of your ass and FPGAs look decent.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
1480926460
Hero Member
*
Offline Offline

Posts: 1480926460

View Profile Personal Message (Offline)

Ignore
1480926460
Reply with quote  #2

1480926460
Report to moderator
Once a transaction has 6 confirmations, it is extremely unlikely that an attacker without at least 50% of the network's computation power would be able to reverse it.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
mimarob
Member
**
Offline Offline

Activity: 98


View Profile
December 23, 2010, 08:33:45 PM
 #22

60 MHz

Hmm, yes I discovered that myself today.. .really...

The numbers where though of as a maximum possible with-all-the-luck-you-can-have.

Unfortunately 11 MHash/sec is not to impressive either...

But then one has to remember that this is not the final implementation, it isn't even runable as it is.

Please correct me if I'm wrong but I thought that the maximum clock inside the spartan was about 300 MHz?

Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?

It's amazing the we have so many knowledgable people on this board.


bitcoin address: 15swBLKathoPyX94HgptYXSSqf7SUGhG4z
lfm
Full Member
***
Offline Offline

Activity: 196



View Profile
December 25, 2010, 02:40:13 AM
 #23


Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Satoshi defined it in the original implementation, yes. sha256(sha256(block header))

Quote
Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?


Well it is offset 12 to the second part of the first hash, ya. offset 76 out of 80 in the block header.

Yes it will always be 32 bits.
mike_la_jolla
Newbie
*
Offline Offline

Activity: 4


View Profile
December 27, 2010, 07:48:52 PM
 #24

mike_la_jolla checking in here to clarify some FPGA questions.

- DNDPB_S327:  http://www.dinigroup.com/new/DNDPB_S327.html
List price is $19,680 for quantity 1.

- This is probably a much better choice:  DNBFC_S12_PCIe: http://www.dinigroup.com/new/DNBFC_S12_PCIe.html
List price for quantity 1 is $8,950.  We sell thousands of these to do (spooky) things.  We can fit 12 in a single chassis.

- 300 MHz is probably not achievable for Spartan-6 or Cyclone 3.  With some effort by an expert, assume you can get to 200 Mhz or so.  Don't bother with the 'C' to FPGA methodologies.  You'll need someone that is well versed in VHDL/verilog.  Also, you generally can't get to 100% utilization without breaking the tools.

- Any FPGA solution will required a host.  The DNDPB_S327 connects via Ethernet, so has low data throughput.  The DNBFC_S12_PCIe is GEN1/GEN2 PCIe, so the bandwidth is much higher.

- Those of you that think you can do a custom ASIC are nuts.  The expense and effort of an ASIC would cost millions ($USD).  The Genomic search market isn't even large enough to support a custom ASIC.

- If this is a pure code breaking application, you are probably better off with FPGAs than GPUs, but it is very easy to gang together a few Xboxes.  FPGAs are harder to come by.
bitcoin2
Jr. Member
*
Offline Offline

Activity: 32


View Profile
December 27, 2010, 08:44:55 PM
 #25

mike_la_jolla checking in here to clarify some FPGA questions.

- Those of you that think you can do a custom ASIC are nuts.  The expense and effort of an ASIC would cost millions ($USD).  The Genomic search market isn't even large enough to support a custom ASIC.

The EFF built Deep Crack for less than $250,000, I thought he has make it with custom ASIC DES chips (called Deep Crack or AWT-4500) http://en.wikipedia.org/wiki/Deep_crack.

How you can help? A full implementation would be great  Smiley I would give you 150BTC for a miner implementation (vhdl or verilog) on Spartan-6. Maybe there are other user who would donate. You should write your Bitcoin address in your Signature
adulau
Newbie
*
Offline Offline

Activity: 12


View Profile
December 27, 2010, 10:35:39 PM
 #26


- If this is a pure code breaking application, you are probably better off with FPGAs than GPUs, but it is very easy to gang together a few Xboxes.  FPGAs are harder to come by.

Looking at the price of the FPGA with the design of the custom board,
why not going for a (or more) Nvidia Tesla board C2050/C2070?
(price is around 3000,- USD per board for 448 GPU core)

http://www.nvidia.com/docs/IO/43395/BD-04983-001_v04.pdf
http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html

jgarzik
Legendary
*
qt
Offline Offline

Activity: 1470


View Profile
December 28, 2010, 12:09:23 AM
 #27

NVIDIA is geared towards floating point, while bitcoin's SHA256 algorithm wants integer math.

ATI GPUs are better at this.

Jeff Garzik, bitcoin core dev team and BitPay engineer; opinions are my own, not my employer.
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
adulau
Newbie
*
Offline Offline

Activity: 12


View Profile
December 28, 2010, 02:03:02 PM
 #28

NVIDIA is geared towards floating point, while bitcoin's SHA256 algorithm wants integer math.

ATI GPUs are better at this.

You are maybe right, I don't know well the inner set of instructions per GPU-brand/type.

The instructions usually used for SHA-256 (IMHO, all the SHA-2 implementation as they use the same
scheme just the size is different) implementations are all the bit-wise (AND, OR, NOT and XOR)
operators on 32-bit word, the right shift instruction but also the rotate right/left instructions.

A comparison of all cycles required for all the instructions per type FPGA, GPU, Cell-like or other
SIMD could be useful. I don't know if someone in the forum already made this along with a rough
estimation of the cost per technology.

On the other hand, building something for SHA-2 that can be reused for other projects
relying on SHA-2 is not a waste of time/money.

If you or someone else build something in that scope, I will be willing to invest some time
and money in the project.




mike_la_jolla
Newbie
*
Offline Offline

Activity: 4


View Profile
December 28, 2010, 05:53:05 PM
 #29

The EFF built Deep Crack for less than $250,000, I thought he has make it with custom ASIC DES chips (called Deep Crack or AWT-4500) http://en.wikipedia.org/wiki/Deep_crack.
That appears to have been 1998.  You might be able to do it for a few 100's of thousands, but you would start with FPGAs and then hardwire.
ArtForz
Sr. Member
****
Offline Offline

Activity: 406


View Profile
December 28, 2010, 06:25:22 PM
 #30

The real issue on FPGA isnt the logic ops(cheap) or the rotates(pretty much free), but the 32-bit adds.
A_out = H + s0 + s1 + maj + ch + K + W
-> at least 3 level adder tree ((H + s0) + (s1 + maj)) + ((ch + K) + W)
Carry chain delay in a single 32-bit adder on a -3 speed grade Spartan6 is ~2ns, so without ANY routing delays we're already limited to 166MHz.
Real-world you're lucky to get 80MHz out of a non-pipelined round on a -3 S6
Pipelinining a round to 2 or 3 stages helps, but increases FF usage a LOT (you have to carry 256 bits of A..H, 512 bits of W[0..15] and the initial A..H for the final add around).
2-stage gives ~140MHz on a -3, 3-stage ~180MHz
= a 2-stage pipelined sha256 round is ~1k FFs, 3-stage pipelined ~1.5k FFs
XC6SLX150 has something like 160k FFs available, and the synthesis tools pretty much throw speed out the window once you go >70% FF utilization.
so realistically you MIGHT be able to fit 64 2-pipelined rounds of sha256 on a LX150, 2 clocks/bitcoinhash @ 140MHz -> 70Mh/s
or maybe with lots of luck and sacrificing a chicken to the place and route gods 48 rounds 3-stage @ 180MHz -> 68Mh/s
= 70Mh/s on a -3 speed grade XC6SLX150, 20%-30% less on a -2 speed grade.
so 9 grand for MAYBE 850Mh/s... a $500 HD5970 can get >550Mh/s stock, well >600Mh/s OCed at stock voltage even on a "bad" card.

okay, let's be REALLY generous, assume we can magically get 1.2Gh/s out of 12 150-2s and they consume NO POWER AT ALL.
So how long does it take at 600W for 2 5970s and $0.10/kWh to make up that $8k price difference? 0.6kW @ $0.10 kWh = $1.44/day ... about 15 years.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
mimarob
Member
**
Offline Offline

Activity: 98


View Profile
December 29, 2010, 10:54:41 PM
 #31

Wow thats some really impressive calculations.

$9k... to bad xmas is over for this time :-)

Smart idea to pipeline the adders, does it mean you spend more flip-flops but not more gates?

I was thinking of getting an old-fashioned xc3s500 for a reasonable price, at 1k-1.5k flip flops maybe it would be possible to fit one out of the 64 of these
pipelined sha modules into one chip?

So, if I'm lucky I could get it running at 60-70 MHz meaning a full sha would take about 1us and that would give me 0.5 MHash/sec, right?

Its almost as fast as my old computer which runs at 0.7 MHash/s :-)

bitcoin address: 15swBLKathoPyX94HgptYXSSqf7SUGhG4z
Jason
Member
**
Offline Offline

Activity: 114


View Profile
December 30, 2010, 05:19:37 PM
 #32

I have an old Altera DE2-70 board I picked up for $300 (academic price) 1.5 years ago (Cyclone II).  Looks like the current model is a DE2-115 based on the Cyclone III FPGA.

I just did a bit of research on existing SHA-256 implementations for FPGAs, and I see that several companies sell high performance FPGA implementations (e.g. http://www.cast-inc.com/ip-cores/encryption/sha-256/index.html).  Taking the Cast implementation as an example:

"The processing of one 512-bit block is performed in 66 clock cycles and the bit-rate achieved is 7.75Mbps / MHz on the input of the SHA256 core."

Taking a clock rate of 132MHz as a reasonably conservative number for my older Cyclone II (Cast claims up to 280MHz on high performance FPGAs), this comes out to 2Mhps per block.  Cast's implementation uses around 2,531 LEs on the Cyclone.  My older DE2-70 board contains about 68,000 LEs.

Adding 10% overhead for communication/synchronization/etc, it should be possible to put 24 SHA-256 processors on my DE2-70.  That should allow up to 48Mhps peak processing rate (>80Mhps for the DE2-115 which can also be clocked faster).

Another question:  How much communications bandwidth is needed at these speeds, and can it fit on a 100baseT channel?  Certainly not if we want the host to transfer all of the candidates to be hashed onto the FPGA (48M * 512 = 12.3Gbps -- well above even gigabit ethernet speeds).  Is there another approach that can overcome this limitation.  I think so...

FPGAs have room for a dedicated CPU as well as a lot of logic, depending on what level of functionality you need in the CPU.  There are a lot of free and powerful CPU cores available on opencores.org, but it will be hard to beat the Nios II architecture if you are using Altera FPGAs.

A 32-bit Nios II/f CPU core is capable of 140 MIPS of performance (at 125MHz) and uses 1600 LE's on the Cylcone II.  Is this sufficient to keep 24 high-speed SHA-256 blocks from stalling?  Not even close.  In fact, it would probably not even be able to keep even one SHA-256 block from stalling.  Back to the drawing board...

It looks like a better approach would be to implement the search logic directly in gates on the FPGA, and have it fill one or more 256-bit-wide queue(s) which would be drawn on by the SHA-256 processing blocks.  A single NIOS II CPU still makes sense for collecting the results and communicating the results back to the host CPU (TCP/IP stack), as well as to load the search logic starting and ending values.

Anyway, my back-of-the-envelope calculations seem to confirm almost everything ArtForz is saying below.  It looks like the ATI 5970s are the right choice if your goal is to crunch bitcoins.

OTOH, if you want an excuse to learn how to program FPGAs, you will certainly be able to run circles around a state-of-the-art hex-core i7 CPU with a pretty modest FPGA -- but at considerable effort.

Jason

The real issue on FPGA isnt the logic ops(cheap) or the rotates(pretty much free), but the 32-bit adds.
A_out = H + s0 + s1 + maj + ch + K + W
-> at least 3 level adder tree ((H + s0) + (s1 + maj)) + ((ch + K) + W)
Carry chain delay in a single 32-bit adder on a -3 speed grade Spartan6 is ~2ns, so without ANY routing delays we're already limited to 166MHz.
Real-world you're lucky to get 80MHz out of a non-pipelined round on a -3 S6
Pipelinining a round to 2 or 3 stages helps, but increases FF usage a LOT (you have to carry 256 bits of A..H, 512 bits of W[0..15] and the initial A..H for the final add around).
2-stage gives ~140MHz on a -3, 3-stage ~180MHz
= a 2-stage pipelined sha256 round is ~1k FFs, 3-stage pipelined ~1.5k FFs
XC6SLX150 has something like 160k FFs available, and the synthesis tools pretty much throw speed out the window once you go >70% FF utilization.
so realistically you MIGHT be able to fit 64 2-pipelined rounds of sha256 on a LX150, 2 clocks/bitcoinhash @ 140MHz -> 70Mh/s
or maybe with lots of luck and sacrificing a chicken to the place and route gods 48 rounds 3-stage @ 180MHz -> 68Mh/s
= 70Mh/s on a -3 speed grade XC6SLX150, 20%-30% less on a -2 speed grade.
so 9 grand for MAYBE 850Mh/s... a $500 HD5970 can get >550Mh/s stock, well >600Mh/s OCed at stock voltage even on a "bad" card.

okay, let's be REALLY generous, assume we can magically get 1.2Gh/s out of 12 150-2s and they consume NO POWER AT ALL.
So how long does it take at 600W for 2 5970s and $0.10/kWh to make up that $8k price difference? 0.6kW @ $0.10 kWh = $1.44/day ... about 15 years.

BM-2D7sazxZugpTgqm3M2MCi5C1t8Du8BN11f
mimarob
Member
**
Offline Offline

Activity: 98


View Profile
December 30, 2010, 08:21:32 PM
 #33

fpga in my case is mainly for fun, but I wont refuse to try a cuda/opencl graphics card either. I'm using about 600 Watt on average to keep a building frost-free at the moment..

As I read in a few threads here, the usage of GPU's isn't totally problem free either, or?


bitcoin address: 15swBLKathoPyX94HgptYXSSqf7SUGhG4z
bitcoin2
Jr. Member
*
Offline Offline

Activity: 32


View Profile
December 30, 2010, 08:51:28 PM
 #34

fpga in my case is mainly for fun, but I wont refuse to try a cuda/opencl graphics card either. I'm using about 600 Watt on average to keep a building frost-free at the moment..

As I read in a few threads here, the usage of GPU's isn't totally problem free either, or?

One HD5970 need 300 Watt. Put one computer with 2 HD5970 in your building and you have 600 Watt. I don't know if the windows driver support 2 HD5970 at the same time, but linux should do this. Of course you need Internet connection in your building. You need the standard bitcoin client and m0mchil (or puddinpops) miner. http://bitcointalk.org/index.php?topic=1334.0;all
WSDN
Member
**
Offline Offline

Activity: 109


Unix Live free or die...


View Profile WWW
January 01, 2011, 02:12:01 AM
 #35

Best os to run bitcoin client? NetBSD? or OpenBSD?

Bitcoin in spanish http://bitcoins.com.ar/
bitcoin2
Jr. Member
*
Offline Offline

Activity: 32


View Profile
January 01, 2011, 03:22:21 AM
 #36

Best os to run bitcoin client? NetBSD? or OpenBSD?
The bitcoin client is OS independent, but the OpenCL driver for mining / ATI Radeon runs only under win and Linux (for Radeon 5970 linux is recocommend because you can't disable CrossFire under Windows). I don't know if there are NetBSD or OpenBSD driver from ATI. You could take debian or ubuntu and install the driver from ati.
WSDN
Member
**
Offline Offline

Activity: 109


Unix Live free or die...


View Profile WWW
January 01, 2011, 05:17:57 AM
 #37

I don't know if there are NetBSD or OpenBSD driver from ATI.

I remember in openBSD is possible this not have these drivers, but netbsd is perfect have the most updates drivers too and is bledeng eye tecnology.

Bitcoin in spanish http://bitcoins.com.ar/
lucky
Jr. Member
*
Offline Offline

Activity: 43


View Profile
January 01, 2011, 07:35:42 PM
 #38

I don't know if there are NetBSD or OpenBSD driver from ATI.

I remember in openBSD is possible this not have these drivers, but netbsd is perfect have the most updates drivers too and is bledeng eye tecnology.


It depends on CUDA/OpenCL support in the proprietary ATI/Nvidia drivers.

These are not available on OpenBSD or NetBSD.
WSDN
Member
**
Offline Offline

Activity: 109


Unix Live free or die...


View Profile WWW
January 01, 2011, 08:02:06 PM
 #39

Quote
These are not available on OpenBSD or NetBSD.

Yes is true friend! =)

Bitcoin in spanish http://bitcoins.com.ar/
ttul
Member
**
Offline Offline

Activity: 70


View Profile
March 26, 2011, 03:53:06 AM
 #40

Say hypothetically that some mystery vendor releases a new chip capable of mining at 100x the power efficiency of existing cards, for 2x the price of a 5970. Would this mining hardware sell well? How many of you would buy such a magic box?

I understand that the difficulty would adjust to neutralize the increased power introduced by the new technology; however, that difficulty increase would also render the old technology irrelevant and would sort of force everyone to upgrade.

GPUs pretty much wiped out CPU mining last year. I wonder if there was another step up in performance if current generation GPUs could similarly be completely side-stepped.

Thoughts appreciated...
Pages: « 1 [2] 3 4 5 6 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!