Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: mimarob on December 19, 2010, 03:39:46 PM



Title: An estimate of fpga performance
Post by: mimarob on December 19, 2010, 03:39:46 PM
Hello!

I converted the sha256 available in opencores.org into vhdl, just to see what to expect.

I got an xc3s500E about 1/3 full on the logic gates, so with a shoe horn one could maybe fit 3 cores into one of these. That does not include the communications with the host, nor the "less than" compare if the result is below the current threshhold.

 (I used an antique ISE (8.something) so maybe newer versions will compute better).

Assuming a maximum of 300 MHz and about 80 cycles to read in, process and output the result (8 + 64 + 8)

You'd get 3*300/80 ~= 11 Mhash/s

This device can be had on a nice DIP socket (GOP modules) at about 60 EUR.

If you want to run this simulation (only tried on Linus), you need ghdl and gtkwave packages (and probably some more stuff that I forgot).

The tar in the attachement contains the synthable file sha256.vhd and the test_sha256.vhd and a simple Makefile.



Title: Re: An estimate of fpga performance
Post by: grondilu on December 19, 2010, 04:10:39 PM
Ok but in order to mine don't you need a full VHDL implementation of the miner code ?
Because if you must communicate between your PC and your FPGA, this might slow it down quite a lot.

Also I don't understand :  have you just done simulations or have you tried on the actual device ?


Title: Re: An estimate of fpga performance
Post by: bytemaster on December 19, 2010, 05:45:54 PM
How does this compare to a GPU?   


Title: Re: An estimate of fpga performance
Post by: jgarzik on December 19, 2010, 05:46:17 PM
Because if you must communicate between your PC and your FPGA, this might slow it down quite a lot.

As long as the FPGA performs millions of hashes for each "call" (host sents work to FPGA), host<->FPGA communication cost is small.


Title: Re: An estimate of fpga performance
Post by: bitcoin2 on December 19, 2010, 06:16:21 PM
Nice  :)
A full implementation would be great!


Title: Re: An estimate of fpga performance
Post by: bitcoin2 on December 19, 2010, 06:28:26 PM
How does this compare to a GPU?   


One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.


Title: Re: An estimate of fpga performance
Post by: jgarzik on December 19, 2010, 06:42:25 PM
One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.

I'm not sure what Mhash/W is.  But, GPUs are ASIC so they begin with a significant advantage over FPGAs.


Title: Re: An estimate of fpga performance
Post by: bitcoin2 on December 19, 2010, 07:06:52 PM
One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.

I'm not sure what Mhash/W is.  But, GPUs are ASIC so they begin with a significant advantage over FPGAs.

Mhash/watt. FPGAs should be has a better power efficients than GPUs. ASIC (Application-specific integrated circuit) for mining only (like Deep Crack for DES) would be the greatest variant, but this is very expensive in development (maybe 300,000 USD?).


Title: Re: An estimate of fpga performance
Post by: GeorgeH on December 19, 2010, 08:21:03 PM
Under my rough calculations, the highest end Virtex 5 could hit 40-50 mhps. At $3000+ for a PCIE dev board, it is far more cost effective to buy ATI video cards.

Edit:

Of course, if one were to connect a bunch of these things in parallel, they could make a big dent, ie:
http://www.sciengines.com/copacobana/


Title: Re: An estimate of fpga performance
Post by: mimarob on December 20, 2010, 05:23:58 AM
hello again!

Just to clarify, I did run the program under simulation only, but I also compiled the module into the Xilinx synthesis tool (ISE) just to see how much space it would take in the chip. (I don't even own a spartan fpga :-)

A full implementation.. well I'm just trying to understand the criteria for a found block, not being an expert in cryptography. This will also need to be in hardware, I think, so the fpga only reports back when it has found something.

I just checked in at the calculator (http://www.alloscomp.com/bitcoin/calculator.php).
What is the correlation between the "difficulty factor" and the "hash target"? Why do we use two concepts?

I also checked in to the bitcoin code, but it seems that the routine I'm trying to accelerate (ScanHash_CryptoPP) is only checking for a certain number of zeroes and then returning.

Where is the code that checks if you've found a block? I guess it would only be a simple less-than compare in the hardware.

The code would also need to contain some uart comms or similar, I thought of broadcasting the request to all devices and then daisy-chaining the results back so that the "winning" device could break the chain and report back to the host computer.



Title: Re: An estimate of fpga performance
Post by: jib on December 20, 2010, 05:41:40 AM
Difficulty = (2^224)/target. They're just two representations of the same thing. To check if you've found a block, you check if the hash is less than the target.


Title: Re: An estimate of fpga performance
Post by: jgarzik on December 20, 2010, 06:07:17 AM
I also checked in to the bitcoin code, but it seems that the routine I'm trying to accelerate (ScanHash_CryptoPP) is only checking for a certain number of zeroes and then returning.

Correct.  The scanner performs a fast-path check, and then a more exhaustive check if the fast-path check exits the scanner loop.


Quote
Where is the code that checks if you've found a block? I guess it would only be a simple less-than compare in the hardware.

See CheckWork().  It is a less-than compare, on an unsigned 256-bit little endian integer.


Title: Re: An estimate of fpga performance
Post by: romkyns on December 20, 2010, 05:00:24 PM
http://www.dinigroup.com/product/data/DNDPB_S327/images/board_front6.jpg

Drool...

By my own estimates, this thing could generate a block every few hours at the current difficulty. I doubt it would cost less than $25k-$50k though...

(source: http://www.dinigroup.com/new/products.html)


Title: Re: An estimate of fpga performance
Post by: mimarob on December 21, 2010, 02:33:44 PM
Yeah that seems about right, that altera board contains 12 times as many 4 input lut's as an xc3s500 spartan module in the GOP module. 12 x 27 = 324 times the 11 MHash in my calcs => 3564000 khash/sec input in the calculator gives you 4 hours for a block. Counting at 2000 blocks per year you get 100000 BTC or $25k a year assuming moderate difficulty increase.

So I guess the graphics cards beat the crap out of the fpga's. But what about power consumption? Also the graphics cards need a motherboard, host cpu etc.

Wonder how far you could optimize the gate count?

Putting a few hundreds of these DIP formfactor boards together would also give you a priceless 80's feeling :-)

http://shop.trenz-electronic.de/catalog/product_info.php?products_id=81


Title: Re: An estimate of fpga performance
Post by: bitcoin2 on December 21, 2010, 06:49:11 PM
ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).


Title: Re: An estimate of fpga performance
Post by: MoonShadow on December 22, 2010, 05:45:16 PM
ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).

What kind of ASIC is it?  Is this a custom PCI card?  Would higher production volumes improve the price point?  I'm interested in this, as a purpose made PCI card would be as big a boon as buying an expensive GPU.


Title: Re: An estimate of fpga performance
Post by: bitcoin2 on December 22, 2010, 07:27:09 PM
ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).

What kind of ASIC is it?  Is this a custom PCI card?  Would higher production volumes improve the price point?  I'm interested in this, as a purpose made PCI card would be as big a boon as buying an expensive GPU.

ArtForz expect the arrive in february:
https://stuff.caurea.org/irssi/freenode/%23bitcoin-dev/2010/12/%23bitcoin-dev-2010-12-20.log : 18:36

Maybe the first step to develop a ASIC is this vhdl code. I don't believe that ArtForz will give us his code. If we put money together, maybe we could have enough money to let manufacturing a real ASIC.


Title: Re: An estimate of fpga performance
Post by: Cdecker on December 22, 2010, 08:28:00 PM
Just for reference again the logs for that moment: http://veritas.maximilianeum.ch/bitcoin/irc/logs/2010/12/20#l2461


Title: Re: An estimate of fpga performance
Post by: GeorgeH on December 23, 2010, 05:57:55 AM
Just for reference again the logs for that moment: http://veritas.maximilianeum.ch/bitcoin/irc/logs/2010/12/20#l2461

Thanks, that was a good read.


Title: Re: An estimate of fpga performance
Post by: grondilu on December 23, 2010, 06:19:14 AM

If some people created a bitcoin-dedicated ASIC, I'd be amazed.  It would be a strong indicator about how involved are some people into the bitcoin project.


Title: Re: An estimate of fpga performance
Post by: ArtForz on December 23, 2010, 10:13:27 AM
300MHz? on a Spartan3? ::)
Oh, and bitcoin hash is TWO rounds of sha256.
I just synthesized it, 60MHz max for one core on a -5 speed grade S3E-500.
So NOT
300MHz / 80 clocks/hash * 3 cores = 11MHps
instead (assuming we can lose overhead and just have to do a mid-add and a compare)
60MHz / 130 clocks/hash * 3 cores = 1.4MHps

at $20/chip thats 0.07MH/$ or about 25x worse than a HD5970...

and for "GPU needs mainboard".. FPGA needs PCB, VRMs, config memory, some kind of host connection, ...

So yeah, pull a few crazy numbers out of your ass and FPGAs look decent.


Title: Re: An estimate of fpga performance
Post by: mimarob on December 23, 2010, 08:33:45 PM
60 MHz

Hmm, yes I discovered that myself today.. .really...

The numbers where though of as a maximum possible with-all-the-luck-you-can-have.

Unfortunately 11 MHash/sec is not to impressive either...

But then one has to remember that this is not the final implementation, it isn't even runable as it is.

Please correct me if I'm wrong but I thought that the maximum clock inside the spartan was about 300 MHz?

Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?

It's amazing the we have so many knowledgable people on this board.



Title: Re: An estimate of fpga performance
Post by: lfm on December 25, 2010, 02:40:13 AM

Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Satoshi defined it in the original implementation, yes. sha256(sha256(block header))

Quote
Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?


Well it is offset 12 to the second part of the first hash, ya. offset 76 out of 80 in the block header.

Yes it will always be 32 bits.


Title: FPGA Expert here ... How can I help?
Post by: mike_la_jolla on December 27, 2010, 07:48:52 PM
mike_la_jolla checking in here to clarify some FPGA questions.

- DNDPB_S327:  http://www.dinigroup.com/new/DNDPB_S327.html (http://www.dinigroup.com/new/DNDPB_S327.html)
List price is $19,680 for quantity 1.

- This is probably a much better choice:  DNBFC_S12_PCIe: http://www.dinigroup.com/new/DNBFC_S12_PCIe.html (http://www.dinigroup.com/new/DNBFC_S12_PCIe.html)
List price for quantity 1 is $8,950.  We sell thousands of these to do (spooky) things.  We can fit 12 in a single chassis.

- 300 MHz is probably not achievable for Spartan-6 or Cyclone 3.  With some effort by an expert, assume you can get to 200 Mhz or so.  Don't bother with the 'C' to FPGA methodologies.  You'll need someone that is well versed in VHDL/verilog.  Also, you generally can't get to 100% utilization without breaking the tools.

- Any FPGA solution will required a host.  The DNDPB_S327 connects via Ethernet, so has low data throughput.  The DNBFC_S12_PCIe is GEN1/GEN2 PCIe, so the bandwidth is much higher.

- Those of you that think you can do a custom ASIC are nuts.  The expense and effort of an ASIC would cost millions ($USD).  The Genomic search market isn't even large enough to support a custom ASIC.

- If this is a pure code breaking application, you are probably better off with FPGAs than GPUs, but it is very easy to gang together a few Xboxes.  FPGAs are harder to come by.


Title: Re: FPGA Expert here ... How can I help?
Post by: bitcoin2 on December 27, 2010, 08:44:55 PM
mike_la_jolla checking in here to clarify some FPGA questions.

- Those of you that think you can do a custom ASIC are nuts.  The expense and effort of an ASIC would cost millions ($USD).  The Genomic search market isn't even large enough to support a custom ASIC.

The EFF built Deep Crack for less than $250,000, I thought he has make it with custom ASIC DES chips (called Deep Crack or AWT-4500) http://en.wikipedia.org/wiki/Deep_crack.

How you can help? A full implementation would be great  :) I would give you 150BTC for a miner implementation (vhdl or verilog) on Spartan-6. Maybe there are other user who would donate. You should write your Bitcoin address in your Signature


Title: Re: FPGA Expert here ... How can I help?
Post by: adulau on December 27, 2010, 10:35:39 PM

- If this is a pure code breaking application, you are probably better off with FPGAs than GPUs, but it is very easy to gang together a few Xboxes.  FPGAs are harder to come by.

Looking at the price of the FPGA with the design of the custom board,
why not going for a (or more) Nvidia Tesla board C2050/C2070?
(price is around 3000,- USD per board for 448 GPU core)

http://www.nvidia.com/docs/IO/43395/BD-04983-001_v04.pdf
http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html



Title: Re: An estimate of fpga performance
Post by: jgarzik on December 28, 2010, 12:09:23 AM
NVIDIA is geared towards floating point, while bitcoin's SHA256 algorithm wants integer math.

ATI GPUs are better at this.


Title: Re: An estimate of fpga performance
Post by: adulau on December 28, 2010, 02:03:02 PM
NVIDIA is geared towards floating point, while bitcoin's SHA256 algorithm wants integer math.

ATI GPUs are better at this.

You are maybe right, I don't know well the inner set of instructions per GPU-brand/type.

The instructions usually used for SHA-256 (IMHO, all the SHA-2 implementation as they use the same
scheme just the size is different) implementations are all the bit-wise (AND, OR, NOT and XOR)
operators on 32-bit word, the right shift instruction but also the rotate right/left instructions.

A comparison of all cycles required for all the instructions per type FPGA, GPU, Cell-like or other
SIMD could be useful. I don't know if someone in the forum already made this along with a rough
estimation of the cost per technology.

On the other hand, building something for SHA-2 that can be reused for other projects
relying on SHA-2 is not a waste of time/money.

If you or someone else build something in that scope, I will be willing to invest some time
and money in the project.






Title: Re: FPGA Expert here ... How can I help?
Post by: mike_la_jolla on December 28, 2010, 05:53:05 PM
The EFF built Deep Crack for less than $250,000, I thought he has make it with custom ASIC DES chips (called Deep Crack or AWT-4500) http://en.wikipedia.org/wiki/Deep_crack.
That appears to have been 1998.  You might be able to do it for a few 100's of thousands, but you would start with FPGAs and then hardwire.


Title: Re: An estimate of fpga performance
Post by: ArtForz on December 28, 2010, 06:25:22 PM
The real issue on FPGA isnt the logic ops(cheap) or the rotates(pretty much free), but the 32-bit adds.
A_out = H + s0 + s1 + maj + ch + K + W
-> at least 3 level adder tree ((H + s0) + (s1 + maj)) + ((ch + K) + W)
Carry chain delay in a single 32-bit adder on a -3 speed grade Spartan6 is ~2ns, so without ANY routing delays we're already limited to 166MHz.
Real-world you're lucky to get 80MHz out of a non-pipelined round on a -3 S6
Pipelinining a round to 2 or 3 stages helps, but increases FF usage a LOT (you have to carry 256 bits of A..H, 512 bits of W[0..15] and the initial A..H for the final add around).
2-stage gives ~140MHz on a -3, 3-stage ~180MHz
= a 2-stage pipelined sha256 round is ~1k FFs, 3-stage pipelined ~1.5k FFs
XC6SLX150 has something like 160k FFs available, and the synthesis tools pretty much throw speed out the window once you go >70% FF utilization.
so realistically you MIGHT be able to fit 64 2-pipelined rounds of sha256 on a LX150, 2 clocks/bitcoinhash @ 140MHz -> 70Mh/s
or maybe with lots of luck and sacrificing a chicken to the place and route gods 48 rounds 3-stage @ 180MHz -> 68Mh/s
= 70Mh/s on a -3 speed grade XC6SLX150, 20%-30% less on a -2 speed grade.
so 9 grand for MAYBE 850Mh/s... a $500 HD5970 can get >550Mh/s stock, well >600Mh/s OCed at stock voltage even on a "bad" card.

okay, let's be REALLY generous, assume we can magically get 1.2Gh/s out of 12 150-2s and they consume NO POWER AT ALL.
So how long does it take at 600W for 2 5970s and $0.10/kWh to make up that $8k price difference? 0.6kW @ $0.10 kWh = $1.44/day ... about 15 years.


Title: Re: An estimate of fpga performance
Post by: mimarob on December 29, 2010, 10:54:41 PM
Wow thats some really impressive calculations.

$9k... to bad xmas is over for this time :-)

Smart idea to pipeline the adders, does it mean you spend more flip-flops but not more gates?

I was thinking of getting an old-fashioned xc3s500 for a reasonable price, at 1k-1.5k flip flops maybe it would be possible to fit one out of the 64 of these
pipelined sha modules into one chip?

So, if I'm lucky I could get it running at 60-70 MHz meaning a full sha would take about 1us and that would give me 0.5 MHash/sec, right?

Its almost as fast as my old computer which runs at 0.7 MHash/s :-)


Title: Re: An estimate of fpga performance
Post by: Jason on December 30, 2010, 05:19:37 PM
I have an old Altera DE2-70 board I picked up for $300 (academic price) 1.5 years ago (Cyclone II).  Looks like the current model is a DE2-115 based on the Cyclone III FPGA.

I just did a bit of research on existing SHA-256 implementations for FPGAs, and I see that several companies sell high performance FPGA implementations (e.g. http://www.cast-inc.com/ip-cores/encryption/sha-256/index.html).  Taking the Cast implementation as an example:

"The processing of one 512-bit block is performed in 66 clock cycles and the bit-rate achieved is 7.75Mbps / MHz on the input of the SHA256 core."

Taking a clock rate of 132MHz as a reasonably conservative number for my older Cyclone II (Cast claims up to 280MHz on high performance FPGAs), this comes out to 2Mhps per block.  Cast's implementation uses around 2,531 LEs on the Cyclone.  My older DE2-70 board contains about 68,000 LEs.

Adding 10% overhead for communication/synchronization/etc, it should be possible to put 24 SHA-256 processors on my DE2-70.  That should allow up to 48Mhps peak processing rate (>80Mhps for the DE2-115 which can also be clocked faster).

Another question:  How much communications bandwidth is needed at these speeds, and can it fit on a 100baseT channel?  Certainly not if we want the host to transfer all of the candidates to be hashed onto the FPGA (48M * 512 = 12.3Gbps -- well above even gigabit ethernet speeds).  Is there another approach that can overcome this limitation.  I think so...

FPGAs have room for a dedicated CPU as well as a lot of logic, depending on what level of functionality you need in the CPU.  There are a lot of free and powerful CPU cores available on opencores.org, but it will be hard to beat the Nios II architecture if you are using Altera FPGAs.

A 32-bit Nios II/f CPU core is capable of 140 MIPS of performance (at 125MHz) and uses 1600 LE's on the Cylcone II.  Is this sufficient to keep 24 high-speed SHA-256 blocks from stalling?  Not even close.  In fact, it would probably not even be able to keep even one SHA-256 block from stalling.  Back to the drawing board...

It looks like a better approach would be to implement the search logic directly in gates on the FPGA, and have it fill one or more 256-bit-wide queue(s) which would be drawn on by the SHA-256 processing blocks.  A single NIOS II CPU still makes sense for collecting the results and communicating the results back to the host CPU (TCP/IP stack), as well as to load the search logic starting and ending values.

Anyway, my back-of-the-envelope calculations seem to confirm almost everything ArtForz is saying below.  It looks like the ATI 5970s are the right choice if your goal is to crunch bitcoins.

OTOH, if you want an excuse to learn how to program FPGAs, you will certainly be able to run circles around a state-of-the-art hex-core i7 CPU with a pretty modest FPGA -- but at considerable effort.

Jason

The real issue on FPGA isnt the logic ops(cheap) or the rotates(pretty much free), but the 32-bit adds.
A_out = H + s0 + s1 + maj + ch + K + W
-> at least 3 level adder tree ((H + s0) + (s1 + maj)) + ((ch + K) + W)
Carry chain delay in a single 32-bit adder on a -3 speed grade Spartan6 is ~2ns, so without ANY routing delays we're already limited to 166MHz.
Real-world you're lucky to get 80MHz out of a non-pipelined round on a -3 S6
Pipelinining a round to 2 or 3 stages helps, but increases FF usage a LOT (you have to carry 256 bits of A..H, 512 bits of W[0..15] and the initial A..H for the final add around).
2-stage gives ~140MHz on a -3, 3-stage ~180MHz
= a 2-stage pipelined sha256 round is ~1k FFs, 3-stage pipelined ~1.5k FFs
XC6SLX150 has something like 160k FFs available, and the synthesis tools pretty much throw speed out the window once you go >70% FF utilization.
so realistically you MIGHT be able to fit 64 2-pipelined rounds of sha256 on a LX150, 2 clocks/bitcoinhash @ 140MHz -> 70Mh/s
or maybe with lots of luck and sacrificing a chicken to the place and route gods 48 rounds 3-stage @ 180MHz -> 68Mh/s
= 70Mh/s on a -3 speed grade XC6SLX150, 20%-30% less on a -2 speed grade.
so 9 grand for MAYBE 850Mh/s... a $500 HD5970 can get >550Mh/s stock, well >600Mh/s OCed at stock voltage even on a "bad" card.

okay, let's be REALLY generous, assume we can magically get 1.2Gh/s out of 12 150-2s and they consume NO POWER AT ALL.
So how long does it take at 600W for 2 5970s and $0.10/kWh to make up that $8k price difference? 0.6kW @ $0.10 kWh = $1.44/day ... about 15 years.


Title: Re: An estimate of fpga performance
Post by: mimarob on December 30, 2010, 08:21:32 PM
fpga in my case is mainly for fun, but I wont refuse to try a cuda/opencl graphics card either. I'm using about 600 Watt on average to keep a building frost-free at the moment..

As I read in a few threads here, the usage of GPU's isn't totally problem free either, or?



Title: Re: An estimate of fpga performance
Post by: bitcoin2 on December 30, 2010, 08:51:28 PM
fpga in my case is mainly for fun, but I wont refuse to try a cuda/opencl graphics card either. I'm using about 600 Watt on average to keep a building frost-free at the moment..

As I read in a few threads here, the usage of GPU's isn't totally problem free either, or?

One HD5970 need 300 Watt. Put one computer with 2 HD5970 in your building and you have 600 Watt. I don't know if the windows driver support 2 HD5970 at the same time, but linux should do this. Of course you need Internet connection in your building. You need the standard bitcoin client and m0mchil (or puddinpops) miner. http://bitcointalk.org/index.php?topic=1334.0;all


Title: Re: An estimate of fpga performance
Post by: WSDN on January 01, 2011, 02:12:01 AM
Best os to run bitcoin client? NetBSD? or OpenBSD?


Title: Re: An estimate of fpga performance
Post by: bitcoin2 on January 01, 2011, 03:22:21 AM
Best os to run bitcoin client? NetBSD? or OpenBSD?
The bitcoin client is OS independent, but the OpenCL driver for mining / ATI Radeon runs only under win and Linux (for Radeon 5970 linux is recocommend because you can't disable CrossFire under Windows). I don't know if there are NetBSD or OpenBSD driver from ATI. You could take debian or ubuntu and install the driver from ati.


Title: Re: An estimate of fpga performance
Post by: WSDN on January 01, 2011, 05:17:57 AM
I don't know if there are NetBSD or OpenBSD driver from ATI.

I remember in openBSD is possible this not have these drivers, but netbsd is perfect have the most updates drivers too and is bledeng eye tecnology.


Title: Re: An estimate of fpga performance
Post by: lucky on January 01, 2011, 07:35:42 PM
I don't know if there are NetBSD or OpenBSD driver from ATI.

I remember in openBSD is possible this not have these drivers, but netbsd is perfect have the most updates drivers too and is bledeng eye tecnology.


It depends on CUDA/OpenCL support in the proprietary ATI/Nvidia drivers.

These are not available on OpenBSD or NetBSD.


Title: Re: An estimate of fpga performance
Post by: WSDN on January 01, 2011, 08:02:06 PM
Quote
These are not available on OpenBSD or NetBSD.

Yes is true friend! =)


Title: Re: An estimate of fpga performance
Post by: ttul on March 26, 2011, 03:53:06 AM
Say hypothetically that some mystery vendor releases a new chip capable of mining at 100x the power efficiency of existing cards, for 2x the price of a 5970. Would this mining hardware sell well? How many of you would buy such a magic box?

I understand that the difficulty would adjust to neutralize the increased power introduced by the new technology; however, that difficulty increase would also render the old technology irrelevant and would sort of force everyone to upgrade.

GPUs pretty much wiped out CPU mining last year. I wonder if there was another step up in performance if current generation GPUs could similarly be completely side-stepped.

Thoughts appreciated...


Title: Re: An estimate of fpga performance
Post by: fpgaminer on March 26, 2011, 10:49:03 AM
 :o Wow, this is such a coincidence! I was just browsing the forums tonight, and stumbled upon this thread. I finally registered an account just to post in this thread.

I've been working on an FPGA miner for the past few weeks! It's fully working*, currently running on my desk in front of me and generating up some tasty shares 8) I'll give an overview of my work:

Current Performance
Device: Altera Cyclone 3 C120 Dev Kit
Performance: 70Mhash/s
Power: 2.26W
Efficiency: 30.9 Mhash/W


It's written in Verilog, all crafted painstakingly by hand. There are two alternative designs. One is a serial design composed of many SHA256 cores running in parallel, each core computing a hash in 64 cycles (2 cores needed for the full hash). Each full core (2 half cores) consumes about 2800 LEs. The second design (currently running in front of me) is a pipelined version with one LOOOOONNNNGGGG chain of hashing stages running in parallel. That design computes 1 full hash every clock cycle. It runs at a maximum of 70MHz right now. Actually, I haven't tried pushing it to its limit, so it may very well run much faster. I'm hoping for 100MHz.

These are my results after off-and-on work for a few weeks. I've actually put most of my efforts into the serial design, because the pipelined design takes at least an hour to synthesize each time. The serial design can currently fit 42 full cores into the C120, each running at 90MHz and computing a full hash every 64 cycles. That's about 59Mhash/s.

The latest revision of the pipelined design consumes 90,000 LEs, so it's pretty big. I'm working to cram it into <64,000LEs so I can get two of them in one C120 chip, and push their clock to 100MHz, giving me a whopping 200Mhash/s.

I haven't used the on-board power meter before, but if I'm reading it correctly the FPGA is currently using 2.26 Watts. That ... seems really low, but Altera's website verifies that that's actually above average for a C120, so I guess it's accurate. That's 31 Mhash/W, which is 1200% more efficient than the most efficient GPU listed on the Wiki. (https://en.bitcoin.it/wiki/Mining_hardware_comparison) So efficient, it's basically free. Poor guy runs terribly hot though. I need to go put a fan on him...

The only downside is that this board in particular, the C120, costs $1000. The same design will easily fit into the DE2-115 board (from Terasic), which only costs $600. I have one of those too, so I'll test on him later. You're not likely to pay off that $600 quickly, though, so I guess it isn't economical yet. A reduced version may run in the DE0-Nano board, which is $80, but obviously it won't have the same performance (about 25%).

All my efforts are put into optimizing every last bit of the design, so we'll see how far I push the poor FPGA. It already out-performs my GTX 285 card, so I'm happy  ;D and at a fraction of the power cost.

And I'm only getting started  8) Who wants to front the money to buy me a Stratix board and move this into Hardcopy?  :P

* By fully working, I really do mean it. It's happily submitting hashes to a pool. I was quite thrilled when my little baby submitted his first share  :D


Title: Re: An estimate of fpga performance
Post by: deadlizard on March 26, 2011, 11:16:51 AM
this is relevent to my interests

don't mind me, just monitoring this thread


Title: Re: An estimate of fpga performance
Post by: fpgaminer on March 26, 2011, 11:25:45 AM
If you guys are interested in my work, let me know, and I'll continue to post updates and such. Otherwise, I guess I'll just toil away in silence.

And a quick note:
The current design uses my PC to fetch work, and push it to the FPGA, as well as check for "Golden Tickets" (my funny internal name for valid nonces) and submit them when found. There's room in the pipelined design to put in a NIOS microprocessor. This could potentially use the ethernet port on the dev kit to do all the fetching and submitting. That way it'd be totally automated, and headless.  8)


Title: Re: An estimate of fpga performance
Post by: LMGTFY on March 26, 2011, 11:35:45 AM
If you guys are interested in my work, let me know, and I'll continue to post updates and such. Otherwise, I guess I'll just toil away in silence.

And a quick note:
The current design uses my PC to fetch work, and push it to the FPGA, as well as check for "Golden Tickets" (my funny internal name for valid nonces) and submit them when found. There's room in the pipelined design to put in a NIOS microprocessor. This could potentially use the ethernet port on the dev kit to do all the fetching and submitting. That way it'd be totally automated, and headless.  8)
Oh, I think there'll definitely be interest. My initial thoughts are that Mhash/W is superb, about 15 times better than a 5970. Mhash/s is still quite low, given the cost, but a "quad DE0-Nano board" version would be particularly interesting - $320 (compared with $400 or more for a second-hand 5970). Professional miners who are concerned about on-going electricity costs more than they are about fixed, up-front costs might very well be interested.


Title: Re: An estimate of fpga performance
Post by: MoonShadow on March 26, 2011, 04:00:36 PM

At the same time your results do seem rather better than what I thought can be had on both hash/watt hash/$ . I'd say if you manage to perfect your design, order ASIC fabrication and turn in into some device for sha256 and bitcoin mining this would sell well, not only to bitcoiners but to various spooks too.


If someone were to develop such an ASIC and put it on a small SOC, and networked a bunch in a ribbon, they would make great heat trace cabling for water lines.  Parking garages (which still have to have fire suppression systems) have heat trace wrapped around water lines and mains, which are then insulated over that.  These water lines have to be heated continuously anytime the outside temp is below 35 degrees, so that cold spots don't freeze & bust the water lines.

I considered making a Linux cluster like this about 10 years ago, but never did anything with the idea.  These might sell well in high latitudes.


Title: Re: An estimate of fpga performance
Post by: nphard on March 26, 2011, 04:33:22 PM
Current Performance
Device: Altera Cyclone 3 C120 Dev Kit
Performance: 70Mhash/s
Power: 2.26W
Efficiency: 30.9 Mhash/W

Those are some surprisingly good numbers. Just think what could be done with something like this. (http://www.chrec.org/facilities.html)


Title: Re: An estimate of fpga performance
Post by: ArtForz on March 26, 2011, 05:56:19 PM
Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.


Title: Re: An estimate of fpga performance
Post by: Jered Kenna (TradeHill) on April 04, 2011, 03:16:36 PM
posting to follow


Title: Re: An estimate of fpga performance
Post by: fpgaminer on April 07, 2011, 06:15:11 AM
Well, I've been occasionally poking and prodding my design. The pipelined version is clocking at 80MHz now, and down to 80K LEs (64K being the goal, down from 90K). Not huge progress, but I figured I'd keep the thread alive.


Title: Re: An estimate of fpga performance
Post by: eMansipater on April 07, 2011, 07:41:01 AM
I'm very interested to see what happens with this too.  Do FPGA's follow a similar tech curve to GPU's?


Title: Re: An estimate of fpga performance
Post by: Adeq on April 07, 2011, 12:30:45 PM
What model of FPGA are you currently using?
You can send to production as ASIC and maybe get 200MHz+


Title: Re: An estimate of fpga performance
Post by: Jered Kenna (TradeHill) on April 07, 2011, 12:53:48 PM
I could be wrong here but I assumed there was already a need for an efficient version of this out there.
Wouldn't spy agencies or security or whoever want these? Hell maybe they have 100,000 and just aren't saying anything though..

Or are they way more specialized to bitcoin than I realize.


Title: Re: An estimate of fpga performance
Post by: mrb on April 07, 2011, 03:59:32 PM
The NSA does have its own silicon foundry.

Not anymore. They abandoned it because a decent foundry these days costs multiple billion of dollars which is estimated to be a large fraction the classified budget of the NSA. They now produce chips by buying production capacity from semiconductor companies through the TAPO (http://www.nsa.gov/business/programs/tapo.shtml) program.


Title: Re: An estimate of fpga performance
Post by: fpgaminer on April 07, 2011, 10:32:39 PM
Quote
What model of FPGA are you currently using?
Altera's Cyclone III EP3C120F780, from the Cyclone III FPGA Development Kit (http://www.altera.com/products/devkits/altera/kit-cyc3.html).

The design will also run just fine on a Cyclone IV C115, which is a bit cheaper.


Title: Re: An estimate of fpga performance
Post by: marcus_of_augustus on April 08, 2011, 10:59:36 PM
hmmmmm....


Title: Re: An estimate of fpga performance
Post by: merlyn on April 09, 2011, 02:21:43 PM
posting to activate e-mail notifications


Title: Re: An estimate of fpga performance
Post by: randomguy7 on April 27, 2011, 12:28:33 AM
just watching - ignore me :)


Title: Re: An estimate of fpga performance
Post by: pusle on April 27, 2011, 07:19:01 PM

http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  ;D


Title: Re: An estimate of fpga performance
Post by: farmer_boy on April 29, 2011, 01:28:37 PM

http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  ;D
At an unknown $ amount. Why can't these people just put a dollar amount next to their product?


Title: Re: An estimate of fpga performance
Post by: mrb on April 29, 2011, 03:38:07 PM
At an unknown $ amount. Why can't these people just put a dollar amount next to their product?

They like to force you to have to call one of their salespersons who will price the product according to how much money you have. Market segmentation at work...


Title: Re: An estimate of fpga performance
Post by: FooDSt4mP on April 29, 2011, 10:04:36 PM

http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  ;D
At an unknown $ amount. Why can't these people just put a dollar amount next to their product?

Or better yet, a BTC amount ;).


Title: Re: An estimate of fpga performance
Post by: randomguy7 on April 30, 2011, 02:00:17 PM
About how many hashes could a fpga like that do?


Title: Re: An estimate of fpga performance
Post by: pusle on April 30, 2011, 07:00:40 PM

Using fpgaminer's numbers for his CycloneIII-120 board.

90k LE's @70MHz = 70Mhash/sec.


Assuming LE's = LUT's  and it could actually run this design at 1.5GHz ->  10.5 Gigahash/sec
Using the Cast IP -> 6 Gigahash/sec


Another FPGA company has come up with space time reconfig @ 1.6GHz:
http://www.tabula.com/technology/technology.php






Title: Re: An estimate of fpga performance
Post by: phelix on April 30, 2011, 08:44:27 PM
30.4.2011: 6GHash --> ~200$/day

sweet


Title: Re: An estimate of fpga performance
Post by: xyzzy on May 01, 2011, 04:42:11 AM
Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.


Always so negative :( smells like agenda


Title: Re: An estimate of fpga performance
Post by: Cheeseman on May 01, 2011, 05:00:33 AM
Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.


Always so negative :( smells like agenda

Also, those are prices for just the FPGA without the board. The best price I've seen for the FPGA + board is $330 for a CycloneIV dev kit.


Title: Re: An estimate of fpga performance
Post by: ttul on May 10, 2011, 10:51:42 PM

http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  ;D
At an unknown $ amount. Why can't these people just put a dollar amount next to their product?

Because they're selling these things in very low volumes and/or haven't yet set up a distributor network. You can count on them being priced at >$10K per unit if there is no price list. Otherwise their sales and marketing engine won't be profitable.


Title: Re: An estimate of fpga performance
Post by: bitcoinBull on May 10, 2011, 11:30:47 PM
Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.


Always so negative :( smells like agenda

Also, those are prices for just the FPGA without the board. The best price I've seen for the FPGA + board is $330 for a CycloneIV dev kit.

But, if you were going to mine with FPGAs you wouldn't use a dev board, you'd use multiple boards each with an array of FPGA chips.


That's from the Copacobana (http://www.sciengines.com/products/computers-and-clusters/copacobana-s3-1000.html): Cost-Optimized Parallel COde Breaker.

Don't think I've seen it mentioned here before.

Its successor uses the Spartan6 LX150(T), the Rivyera: http://www.sciengines.com/products/computers-and-clusters/rivyera-s6-lx150.html (http://www.sciengines.com/products/computers-and-clusters/rivyera-s6-lx150.html)

Starts at EUR 20'000 (16 count FPGA).

RIVYERA S6-LX150
FPGA Type: Xilinx Spartan-6 LX150
FPGA count min. 16 to max. 128
Price from EUR 19'900 to 86'900

RIVYERA S3-5000
FPGA Type: Xilinx Spartan-3 5000
FPGA count min. 16 to max. 128
Price from EUR 16'900 to EUR 58'900

RIVYERA V4-SX35
FPGA Type: Xilinx Spartan-6 LX150
FPGA count 128
Price above EUR 1 million





Title: Re: An estimate of fpga performance
Post by: ttul on May 10, 2011, 11:41:28 PM
...
That's from the Copacobana (http://www.sciengines.com/products/computers-and-clusters/copacobana-s3-1000.html): Cost-Optimized Parallel COde Breaker.

Don't think I've seen it mentioned here before.

Its successor uses the Spartan6 LX150(T), the Rivyera: http://www.sciengines.com/products/computers-and-clusters/rivyera-s6-lx150.html (http://www.sciengines.com/products/computers-and-clusters/rivyera-s6-lx150.html)

Starts at EUR 20'000 (16 count FPGA).

RIVYERA S6-LX150
FPGA Type: Xilinx Spartan-6 LX150
FPGA count min. 16 to max. 128
Price from EUR 19'900 to 86'900

RIVYERA S3-5000
FPGA Type: Xilinx Spartan-3 5000
FPGA count min. 16 to max. 128
Price from EUR 16'900 to EUR 58'900

RIVYERA V4-SX35
FPGA Type: Xilinx Spartan-6 LX150
FPGA count 128
Price above EUR 1 million

These prices are 10x the $ / MHash/s cost of a 5970 board. But I would imagine vastly more efficient in terms of power consumption.


Title: Re: An estimate of fpga performance
Post by: fpgaminer on May 19, 2011, 12:19:15 AM
It has been a little while since I submitted an update on my progress, so here we go.

Area Improvement: <80K LUTs for 80MH/s
I recently did another round of area optimization on one of my designs. As I suspected, it now successfully fits on a Cyclone3 C80 device. This is the 80MH/s design, so it achieves the theoretical 1MH/s = 1K LUT numbers that I had on the back of my napkin.

The next step is to synthesize for a Cyclone4 C75 device, which might be a very tight fit. The Cyclone 4s are a bit cheaper and use slightly less power. Also, if it does fit into 75K LUTs, then it is likely that two of the same design will fit into a C150. That would achieve a total of 160MHash/s.


New Parts Coming In
I have a Xilinx Spartan-6 LX150T-3 development board coming in soon. My goal here is to achieve 160MHash/s on this single chip. Estimates predict that it will be possible, but may be very difficult. We shall see.

Goals: Achieve 160MHash/s on a Spartan-6 LX150-3, which is a sub-$200 chip. That's $1.25USD per MH/s. The average cost of a complete GPU mining rig is $1 USD per MH/s. This goal would bring me very close to achieving GPU parity, and it most certainly will continue to exceed GPUs in power and temperature performance.

I also ordered an Ethernet module and hope to use it to make the FPGA miner completely independent. Plug and profit!


Power Consumption Measured
I now have a Kill-a-Watt, which measures the amount of electricity drawn "at the wall" by any device. Using this, I measured the "at the wall" power consumption of my Cyclone-4 50MHash/s design. It was 8Watts. Quite impressive, considering that this is for the entire development kit and power inefficiency of the power supplies.

Also, the Cyclone-4 required no cooling. No fan, no heat sink. It chugged along happily.  :) Unlike my noisy mining rigs...




Title: Re: An estimate of fpga performance
Post by: caston on May 19, 2011, 12:57:16 AM
fpgaminer: where do you order your dev kits from?


Title: Re: An estimate of fpga performance
Post by: aahzmundus on May 19, 2011, 04:16:33 AM
This is awesome.  I am starting to invest in mining equipment but this looks like it may be better.  Too bad I have no ability to do this on my own and I doubt you would release your miner without compensation...

Do you have plans to release it? Should someone start a bounty?


Title: Re: An estimate of fpga performance
Post by: bitdiver on May 19, 2011, 12:22:38 PM
A devel board with a LX150 ? I was only aware of the Digilent Atlys Spartan-6 with a LX45 for about 140 EUR. From which company do you source it ?
Google found a PCIe card from Enterpoint in the UK with a XC6SLX150T for £480.00 tax excl, a bit much for a devel board.

Having read through the post concerning bitcoin mining with fpgas here I wonder whether the DSP48A1 slice in the newer Spartan-6 can be put to good use, since it has a nice adder.

Seems that I must pull out my old Spartan 3E devel board to have a better look. But that doesn't have dsp slices. Darn.




Title: Re: An estimate of fpga performance
Post by: eturnerx on May 19, 2011, 03:41:39 PM
The Xilinx Spartan-6 LX150T-3 has a PCI-e connector, does that mean it has to be mounted in a computer for comms and power? It's a nice board otherwise - I just don't want to have to buy another computer!


Title: Re: An estimate of fpga performance
Post by: bitdiver on May 19, 2011, 05:54:54 PM

Regarding a devel board with a Xilinx Spartan-6

There is a massive fpga compute board here:
http://www.dinigroup.com/new/DNBFC_S12_PCIe.html
I don't know what it costs though, but I doubt that it'll come at a bargain price.
The 2 gb ram per fpga is not needed for computing sha256 hashes.

There is a devel board from Avnet
http://www.em.avnet.com/ctf_shared/evk/df2df2usa/xlx-s6-lx150t-dev-pb122409.pdf
http://www.silica.com/products/highlight/product/xilinxR-spartanR-6-lx150t-development-kit.html
No idea about availability. Price stated in the PDF and website is USD 995,-








Title: Re: An estimate of fpga performance
Post by: fpgaminer on May 19, 2011, 09:24:27 PM
Quote
There is a devel board from Avnet
http://www.em.avnet.com/ctf_shared/evk/df2df2usa/xlx-s6-lx150t-dev-pb122409.pdf
Yes, I got the one from Avnet.

Quote
The Xilinx Spartan-6 LX150T-3 has a PCI-e connector, does that mean it has to be mounted in a computer for comms and power?
It's just a comm link, like the 20 some odd other interfaces on the bloody thing :P It has its own power supply and can run just fine without a computer.

Quote
Having read through the post concerning bitcoin mining with fpgas here I wonder whether the DSP48A1 slice in the newer Spartan-6 can be put to good use, since it has a nice adder.
Oh hey, I forgot about those! Thank you for reminding me  :D Yeah, I have an old Spartan-3E as well, and it indeed only has multipliers :( But now that I have my LX150 I will certainly give these shiny new DSP48A1 slices a try. I'm skimming the datasheet now, but from a first glance it looks like it only handles 18-bits. Two will have to be strung together to achieve 32-bit. That means 90 "free" adds on the LX150. Not much, but anything is helpful.

Quote
Do you have plans to release it?
Yes


Title: Re: An estimate of fpga performance
Post by: ryepdx on May 19, 2011, 09:35:12 PM
These prices are 10x the $ / MHash/s cost of a 5970 board. But I would imagine vastly more efficient in terms of power consumption.

Um... where are you getting the Mh/s for this thing?


Title: Re: An estimate of fpga performance
Post by: bitdiver on May 19, 2011, 10:41:34 PM

The way I understand Xilinx' ug389.pdf data sheet you appear to just refer to the pre-adder, which is indeed 18 bits + 18 bits.

However the post-adder is 48 bits wide. Do have a look at ug389.pdf, especially p. 10 , figure 1-1 and 1-2.

You see the inputs a, b and d, which are 18 bits each. They can be concatenated with opcode[1:0] set to 11. Then set opcode[3:2] to 11 to select input c as the other input source of the post-adder.
Finally please see p. 22, table 1-7 and have a look at the 5th entry from the end. The eq. describing the output of the dsp48 is given there as: P = C ± (D:A:B + CIN)

That should do nicely to speed up the adder at the end of each sha256 round.

If you could compare your design's performace with and without the use of the dsp48a1 slice and share the resulting specs, then that would certainly satisfy my curiosity ;)

Are you aware of articles like this one, which suggest ways to improve the implementations, or did you use the opencore or write your own ?
http://ce.et.tudelft.nl/publicationfiles/1194_657_SHA2.pdf
http://ce.et.tudelft.nl/publicationfiles/1194_657_springer-SHA-2.pdf
http://ce.et.tudelft.nl/publicationfiles/1429_657_04560238.pdf


Title: Re: An estimate of fpga performance
Post by: en3r0 on May 25, 2011, 03:01:04 AM
This is really exciting, be sure to keep us posted!


Title: Re: An estimate of fpga performance
Post by: itsagas on May 29, 2011, 01:54:45 AM
Does anyone happen to know how much the Xilinx Spartan-6 LX150T chips are in bulk (ie. with no board, just the chip).  Say buying by the 100s or 1000s to produce ones own boards, so we can see economic viability. 


Title: Re: An estimate of fpga performance
Post by: kjj on May 30, 2011, 01:37:54 AM
$300 (http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=122-1726-ND)


Title: Re: An estimate of fpga performance
Post by: itsagas on May 30, 2011, 02:35:12 AM
$300 (http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=122-1726-ND)

Nice, thanks. 



Title: Re: An estimate of fpga performance
Post by: anisoptera on May 30, 2011, 10:19:56 PM
Once this starts rivaling mining rigs....


Title: Re: An estimate of fpga performance
Post by: mike_la_jolla on June 03, 2011, 05:19:22 PM
$300 (http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=122-1726-ND)
Nope.  You can get them from Digikey for $152:
http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=XC6SLX150-2FG484C-ND (http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=XC6SLX150-2FG484C-ND)

The LX150T for $172:
http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=XC6SLX150T-2FGG484C-ND (http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=XC6SLX150T-2FGG484C-ND)


Title: Re: An estimate of fpga performance
Post by: kjj on June 03, 2011, 06:31:29 PM
Yeah, sorry.  I have the bad habit of limiting my searches on Digikey to parts that are actually in stock.


Title: Re: An estimate of fpga performance
Post by: bitdiver on June 03, 2011, 07:00:02 PM
Yes, but for this application the LX150T is not needed. The T at the end is for a quite fast transceiver which is great when you want to connect to fast periphery.

For this application a Spartan 6 LX150 is right. Preferably multiple ones on one pcb.

However the LX150 is available only in BGA or CSP, which you cannot solder yourself. You'll need an oven for that. And maybe a stencil for the solder paste too if it's not a prototype. Also BGA package means that you need a multilayer pcb.

What I want to say is that it's certainly not impractical, but you'll not going to engineer this on a weekend.


Title: Re: An estimate of fpga performance
Post by: kjj on June 03, 2011, 07:34:21 PM
You probably could whip one up over the weekend.  At least the design.  The PCB fab would take a while.  Bad luck on that too, the next 4 layer dorkbotpdx (http://dorkbotpdx.org/wiki/pcb_order) order is going out on Monday.  So, some time in August if you like their service.  I don't know if sparkfun has a shorter cycle time for 4 layer or not.

If I didn't have to pack this weekend, I could probably bust out a quick and dirty breakout design in FreePCB.  They should already have the footprint, and after that it is just a matter of dragging the pins out to the edges and a quick shot at the autorouter.  No promises on clock skew or noise at high speeds, but good enough to play with.

Soldering would be rough.  I think I could manage it on a stove / hot plate with my SMD rework gun, but most people would be putting a $150+ chip in their oven with way either way too much or way too little solder paste.

If you are reading this thread, and you didn't understand any of what I said above, please consider a different approach to mining, or get a demo board, or wait until someone has a tested and working design that they are willing to produce and sell.


Title: Re: An estimate of fpga performance
Post by: marcus_of_augustus on June 04, 2011, 12:12:46 AM
Quote
If you are reading this thread, and you didn't understand any of what I said above, please consider a different approach to mining, or get a demo board, or wait until someone has a tested and working design that they are willing to produce and sell.

You think will stop them trying?  :D

Ovens, solder, chips ... what could possible go wrong? It's like a chemistry set for grown-ups this place.


Title: Re: An estimate of fpga performance
Post by: mimarob on June 13, 2011, 11:17:56 AM
Maybe one could make an el-cheapo pcb since we have no use for all those bga pins.

If we manage to connect powers, jtag and a few i/o lines that would suffice.

Perhaps one could make a two-layer card and just leave pins not being used?



Title: Re: An estimate of fpga performance
Post by: romkyns on June 13, 2011, 12:09:31 PM
Area Improvement: <80K LUTs for 80MH/s

I managed to fit one SHA256 round, one hash per clock, into about 30k LUTs + 13k registers on a Cyclone, although I never validated this design because my FPGA only has 17k LUTs. So, if I didn't mess up (which I can't really tell...) this would mean 60k LUTs + some interfacing. I verified the core idea behind this in a non-FPGA simulation, and then implemented the idea in Verilog.

Unfortunately the larger dev boards are a bit too expensive for my taste, so this project is on halt. If anyone is willing to loan one to a complete stranger, I'm all up for it :) We could meet first. I live in East of England, pm me if you wish.


Title: Re: An estimate of fpga performance
Post by: Basiley on June 13, 2011, 02:12:55 PM
if someone design FPGA-chip-based board, designed for mining, not FPGA-related software development, ie, not "evalution board"[without plenty of redundant features and w/o ridiculous pricing] and publish design in open domain for nominal BTC fee, thats would be cool.
ordering/using software-developing-targeted boards/kits for BTC network needs isn't reasonable.


Title: Re: An estimate of fpga performance
Post by: LeFBI on June 15, 2011, 11:50:17 AM
Maybe one could make an el-cheapo pcb since we have no use for all those bga pins.

If we manage to connect powers, jtag and a few i/o lines that would suffice.
if someone design FPGA-chip-based board, designed for mining, not FPGA-related software development, ie, not "evalution board"[without plenty of redundant features and w/o ridiculous pricing

stripped to the bone circuit for a fpga board can look like this:
http://img269.imageshack.us/img269/5028/800pxfpgaconfig.png
source: http://www.mikrocontroller.net/articles/Low_Cost_FPGA_Konfiguration (all german tho)

long story short:
you can program the fpga/tiny12/eeprom directly via ISP/JTAG. during development you configure the fpga directly via JTAG from your PC.
when you finished development you can write the .bin file to the eeprom and the Tiny12 will take care of programing the fpga when no pc is connected.

this works really fine with Spartan-3. and you don't need to invest in an expensive development board for this purpose. Of course you will additionally need an ethernet core and/or communication lanes between fpgas if you want to gang them together, etc ,etc. the above circuit is as already said just a cheap basic circuit for a fpga board/dev-board that doesn't have non-volatile memory.


Title: Re: An estimate of fpga performance
Post by: Basiley on June 15, 2011, 11:58:52 AM
and thats main reason to stack more-than-one FPGA maxtrix per/board, i guess ? i mean in real-use-applications.


Title: Re: An estimate of fpga performance
Post by: kjj on June 15, 2011, 06:10:52 PM
You may run into thermal issues if you leave a bunch of BGA balls unconnected.  The chip designers typically assume that the PCB is going to be sinking most of the heat load.

Then again, with the complexity of SHA256, gate propagation problems will probably force us to run the chips slowly enough that heat won't be the limiting factor.


Title: Re: An estimate of fpga performance
Post by: romkyns on June 15, 2011, 06:36:44 PM
You may run into thermal issues if you leave a bunch of BGA balls unconnected.  The chip designers typically assume that the PCB is going to be sinking most of the heat load.

I would have thought that if you are going to attach *any* BGA balls then it is far easier to attach them all, than to leave some unconnected. Unconnected pads on the PCB won't make any difference to the PCB price. While I haven't ever hand-soldered BGAs, having all pads is supposed to make it easier, rather than harder. For example, by pulling the part into proper alignment uniformly as the solder melts and wets the pads.


Title: Re: An estimate of fpga performance
Post by: kjj on June 15, 2011, 07:19:39 PM
Yup.  Someone had suggested doing a minimal connection to avoid having to deal with 4 layer PCBs.  It might work, but there are a number of potential problems.

Soldering a BGA, PLCC or QFP and watching it pull itself into perfect alignment is one of the coolest things a guy can do.  Totally makes you feel like a wizard, commanding the universe with seemingly nothing but your willpower.  On the other hand, when it doesn't work right it'll make you want to murder kittens.


Title: Re: An estimate of fpga performance
Post by: dinox on June 15, 2011, 07:23:35 PM
QFP is possible to solder by hand but BGA is not. You will need a special tool and some experience to solder BGA, or pay someone to do it for you.


Title: Re: An estimate of fpga performance
Post by: fpgaminer on June 16, 2011, 11:28:31 PM
Quote
QFP is possible to solder by hand but BGA is not. You will need a special tool and some experience to solder BGA, or pay someone to do it for you.
People have soldered BGA with blow dryers before  :P Not that that is the best idea, but just sayin'.

Quote
Then again, with the complexity of SHA256, gate propagation problems will probably force us to run the chips slowly enough that heat won't be the limiting factor.
It's not a huge problem, but it's there. The latest design gets 100MH/s (@100MHz) and requires either a lot of air-flow or a heatsink.

Quote
you can program the fpga/tiny12/eeprom directly via ISP/JTAG. during development you configure the fpga directly via JTAG from your PC
I only looked at the circuit image you posted, not the rest of it, so excuse me if I missed something obvious, but why is there an ATtiny on there? FPGAs can program themselves from a flash chip unless I'm mistaken.


Title: Re: An estimate of fpga performance
Post by: BubbleBoy on June 17, 2011, 07:59:06 AM
BGAs are definitely solderable with hot blowers - I've done it a few times with maybe 80% success rate. The hard part is creating the balls on a new chip, you need a special solder paste and a thin mesh that allows only a certain amount of paste on each pad (reballing kit). When heated, the paste turns into solder balls. If the balls are readily formed, it's all fun and games.

Anyway, I'd outsource such a job to shops specialized in prototypes or small series, maybe somewhere in China. It will most likely cost less than the whole hardware and man hours otherwise required.


Title: Re: An estimate of fpga performance
Post by: genewitch on June 17, 2011, 03:40:24 PM
single chip 100Mhash/s?

What about evolving the hardware to do the hashing rather than writing it as straight VHDL?

I had a good idea about using hadoop clusters to run the fitness tests for the evolutionary algorithm testing.

For those who have no clue what i am talking about, read the article about the professors that got an fpga to recognize the difference between two tones with way less than 100 gates and no CLK.

http://fsweb.olin.edu/~mchang/research/documents/seminar/evolve2k2/evolve.ppt
http://www.cogs.susx.ac.uk/users/adrianth/ade.html

I always had a thought that evolving the circuits would be a way to find really fast ways of "cracking" various hashing algorithms, as well as making really tiny encoders and decoders for various projects.

Anyhow, i enjoyed this thread.


Title: Re: An estimate of fpga performance
Post by: ttul on June 17, 2011, 04:43:20 PM
single chip 100Mhash/s?

What about evolving the hardware to do the hashing rather than writing it as straight VHDL?

I had a good idea about using hadoop clusters to run the fitness tests for the evolutionary algorithm testing.

For those who have no clue what i am talking about, read the article about the professors that got an fpga to recognize the difference between two tones with way less than 100 gates and no CLK.

http://fsweb.olin.edu/~mchang/research/documents/seminar/evolve2k2/evolve.ppt
http://www.cogs.susx.ac.uk/users/adrianth/ade.html

I always had a thought that evolving the circuits would be a way to find really fast ways of "cracking" various hashing algorithms, as well as making really tiny encoders and decoders for various projects.

Anyhow, i enjoyed this thread.

You're on the right track - the synthesis of ASIC or FPGA circuitry from Verilog or VHDL code is very very good these days, but there are ways to make things better particularly when you are building an ASIC.


Title: Re: An estimate of fpga performance
Post by: jon_smark on June 17, 2011, 06:15:57 PM
What about evolving the hardware to do the hashing rather than writing it as straight VHDL?

I had a good idea about using hadoop clusters to run the fitness tests for the evolutionary algorithm testing.

Could you expand more on this idea?  Not all applications are suited for evolutionary approaches, and my guess is that hashing algorithms are definitely not one of them.

For those who have no clue what i am talking about, read the article about the professors that got an fpga to recognize the difference between two tones with way less than 100 gates and no CLK.

http://fsweb.olin.edu/~mchang/research/documents/seminar/evolve2k2/evolve.ppt
http://www.cogs.susx.ac.uk/users/adrianth/ade.html

That kind of application is well suited to evolutionary approaches.

Quote
I always had a thought that evolving the circuits would be a way to find really fast ways of "cracking" various hashing algorithms, as well as making really tiny encoders and decoders for various projects.

There's not supposed to be any smooth gradients in a cryptographically secure hash, so I don't see how any evolution-based approach could work for cracking them.  What exactly do you have in mind?


Title: Re: An estimate of fpga performance
Post by: film2240 on July 21, 2011, 03:21:59 PM
I'm very interested in high performance,low cost and ultra low energy hardware for mining.

The FPGA already meets my needs for low power use and high performance,now all I need is it to be low cost and I can easily get my hands on one.Hope its in UK soon.

The setup has to be easy as well.

If someone can create something that can rival my Radeon HD 6950 (400MHash/s),then I can finally put those noisy GPU and CPU things to rest.

Een if it doesn't rival my card,I can use several FPGAs together to get that performance anyways.

I worked out its less than 13Watts with 6 units (1 each produces 70MHash/s) VS 210Watts for my card.



Title: Re: An estimate of fpga performance
Post by: Newton on July 21, 2011, 03:42:28 PM
I'm very interested in high performance,low cost and ultra low energy hardware for mining.

lulz