Bitcoin Forum
May 02, 2024, 09:21:46 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 3 4 5 6 [All]
  Print  
Author Topic: An estimate of fpga performance  (Read 51417 times)
mimarob (OP)
Full Member
***
Offline Offline

Activity: 354
Merit: 103



View Profile
December 19, 2010, 03:39:46 PM
Merited by ABCbits (3)
 #1

Hello!

I converted the sha256 available in opencores.org into vhdl, just to see what to expect.

I got an xc3s500E about 1/3 full on the logic gates, so with a shoe horn one could maybe fit 3 cores into one of these. That does not include the communications with the host, nor the "less than" compare if the result is below the current threshhold.

 (I used an antique ISE (8.something) so maybe newer versions will compute better).

Assuming a maximum of 300 MHz and about 80 cycles to read in, process and output the result (8 + 64 + Cool

You'd get 3*300/80 ~= 11 Mhash/s

This device can be had on a nice DIP socket (GOP modules) at about 60 EUR.

If you want to run this simulation (only tried on Linus), you need ghdl and gtkwave packages (and probably some more stuff that I forgot).

The tar in the attachement contains the synthable file sha256.vhd and the test_sha256.vhd and a simple Makefile.

1714641706
Hero Member
*
Offline Offline

Posts: 1714641706

View Profile Personal Message (Offline)

Ignore
1714641706
Reply with quote  #2

1714641706
Report to moderator
"You Asked For Change, We Gave You Coins" -- casascius
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714641706
Hero Member
*
Offline Offline

Posts: 1714641706

View Profile Personal Message (Offline)

Ignore
1714641706
Reply with quote  #2

1714641706
Report to moderator
grondilu
Legendary
*
Offline Offline

Activity: 1288
Merit: 1076


View Profile
December 19, 2010, 04:10:39 PM
 #2

Ok but in order to mine don't you need a full VHDL implementation of the miner code ?
Because if you must communicate between your PC and your FPGA, this might slow it down quite a lot.

Also I don't understand :  have you just done simulations or have you tried on the actual device ?

bytemaster
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
December 19, 2010, 05:45:54 PM
 #3

How does this compare to a GPU?   

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
December 19, 2010, 05:46:17 PM
 #4

Because if you must communicate between your PC and your FPGA, this might slow it down quite a lot.

As long as the FPGA performs millions of hashes for each "call" (host sents work to FPGA), host<->FPGA communication cost is small.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
December 19, 2010, 06:16:21 PM
 #5

Nice  Smiley
A full implementation would be great!
bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
December 19, 2010, 06:28:26 PM
 #6

How does this compare to a GPU?   


One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
December 19, 2010, 06:42:25 PM
 #7

One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.

I'm not sure what Mhash/W is.  But, GPUs are ASIC so they begin with a significant advantage over FPGAs.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
December 19, 2010, 07:06:52 PM
Last edit: December 19, 2010, 07:43:49 PM by bitcoin2
 #8

One AMD radeon 5970 (570 Mhash/s) = ~ 50 * xc3s500E (11 Mhash/s). But with FPGAs the Mhash/W should be better as with GPUs.

I'm not sure what Mhash/W is.  But, GPUs are ASIC so they begin with a significant advantage over FPGAs.

Mhash/watt. FPGAs should be has a better power efficients than GPUs. ASIC (Application-specific integrated circuit) for mining only (like Deep Crack for DES) would be the greatest variant, but this is very expensive in development (maybe 300,000 USD?).
GeorgeH
Member
**
Offline Offline

Activity: 83
Merit: 10


View Profile
December 19, 2010, 08:21:03 PM
 #9

Under my rough calculations, the highest end Virtex 5 could hit 40-50 mhps. At $3000+ for a PCIE dev board, it is far more cost effective to buy ATI video cards.

Edit:

Of course, if one were to connect a bunch of these things in parallel, they could make a big dent, ie:
http://www.sciengines.com/copacobana/

1DSpPtPTGXTYjkZehPsiAbjkXLkB1jsZ2x
mimarob (OP)
Full Member
***
Offline Offline

Activity: 354
Merit: 103



View Profile
December 20, 2010, 05:23:58 AM
 #10

hello again!

Just to clarify, I did run the program under simulation only, but I also compiled the module into the Xilinx synthesis tool (ISE) just to see how much space it would take in the chip. (I don't even own a spartan fpga :-)

A full implementation.. well I'm just trying to understand the criteria for a found block, not being an expert in cryptography. This will also need to be in hardware, I think, so the fpga only reports back when it has found something.

I just checked in at the calculator (http://www.alloscomp.com/bitcoin/calculator.php).
What is the correlation between the "difficulty factor" and the "hash target"? Why do we use two concepts?

I also checked in to the bitcoin code, but it seems that the routine I'm trying to accelerate (ScanHash_CryptoPP) is only checking for a certain number of zeroes and then returning.

Where is the code that checks if you've found a block? I guess it would only be a simple less-than compare in the hardware.

The code would also need to contain some uart comms or similar, I thought of broadcasting the request to all devices and then daisy-chaining the results back so that the "winning" device could break the chain and report back to the host computer.

jib
Member
**
Offline Offline

Activity: 92
Merit: 10


View Profile
December 20, 2010, 05:41:40 AM
 #11

Difficulty = (2^224)/target. They're just two representations of the same thing. To check if you've found a block, you check if the hash is less than the target.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
December 20, 2010, 06:07:17 AM
 #12

I also checked in to the bitcoin code, but it seems that the routine I'm trying to accelerate (ScanHash_CryptoPP) is only checking for a certain number of zeroes and then returning.

Correct.  The scanner performs a fast-path check, and then a more exhaustive check if the fast-path check exits the scanner loop.


Quote
Where is the code that checks if you've found a block? I guess it would only be a simple less-than compare in the hardware.

See CheckWork().  It is a less-than compare, on an unsigned 256-bit little endian integer.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
romkyns
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
December 20, 2010, 05:00:24 PM
 #13

http://www.dinigroup.com/product/data/DNDPB_S327/images/board_front6.jpg

Drool...

By my own estimates, this thing could generate a block every few hours at the current difficulty. I doubt it would cost less than $25k-$50k though...

(source: http://www.dinigroup.com/new/products.html)
mimarob (OP)
Full Member
***
Offline Offline

Activity: 354
Merit: 103



View Profile
December 21, 2010, 02:33:44 PM
 #14

Yeah that seems about right, that altera board contains 12 times as many 4 input lut's as an xc3s500 spartan module in the GOP module. 12 x 27 = 324 times the 11 MHash in my calcs => 3564000 khash/sec input in the calculator gives you 4 hours for a block. Counting at 2000 blocks per year you get 100000 BTC or $25k a year assuming moderate difficulty increase.

So I guess the graphics cards beat the crap out of the fpga's. But what about power consumption? Also the graphics cards need a motherboard, host cpu etc.

Wonder how far you could optimize the gate count?

Putting a few hundreds of these DIP formfactor boards together would also give you a priceless 80's feeling :-)

http://shop.trenz-electronic.de/catalog/product_info.php?products_id=81
bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
December 21, 2010, 06:49:11 PM
 #15

ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).
MoonShadow
Legendary
*
Offline Offline

Activity: 1708
Merit: 1007



View Profile
December 22, 2010, 05:45:16 PM
 #16

ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).

What kind of ASIC is it?  Is this a custom PCI card?  Would higher production volumes improve the price point?  I'm interested in this, as a purpose made PCI card would be as big a boon as buying an expensive GPU.

"The powers of financial capitalism had another far-reaching aim, nothing less than to create a world system of financial control in private hands able to dominate the political system of each country and the economy of the world as a whole. This system was to be controlled in a feudalist fashion by the central banks of the world acting in concert, by secret agreements arrived at in frequent meetings and conferences. The apex of the systems was to be the Bank for International Settlements in Basel, Switzerland, a private bank owned and controlled by the world's central banks which were themselves private corporations. Each central bank...sought to dominate its government by its ability to control Treasury loans, to manipulate foreign exchanges, to influence the level of economic activity in the country, and to influence cooperative politicians by subsequent economic rewards in the business world."

- Carroll Quigley, CFR member, mentor to Bill Clinton, from 'Tragedy And Hope'
bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
December 22, 2010, 07:27:09 PM
 #17

ArtForz has developed sha256 ASICs and let them (100 pieces) manufacturing for about $500/engine. This ASICs beats 5970 on hash/W by a factor of 6 but loses to 5970 on hash/$ by about a factor of 3, he said. These ASICs are not exactly a real standard cell ASIC but "metal-layer defined ASIC, basically FPGA without the FP part" (source: #bitcoin-dev).

What kind of ASIC is it?  Is this a custom PCI card?  Would higher production volumes improve the price point?  I'm interested in this, as a purpose made PCI card would be as big a boon as buying an expensive GPU.

ArtForz expect the arrive in february:
https://stuff.caurea.org/irssi/freenode/%23bitcoin-dev/2010/12/%23bitcoin-dev-2010-12-20.log : 18:36

Maybe the first step to develop a ASIC is this vhdl code. I don't believe that ArtForz will give us his code. If we put money together, maybe we could have enough money to let manufacturing a real ASIC.
Cdecker
Hero Member
*****
Offline Offline

Activity: 489
Merit: 504



View Profile WWW
December 22, 2010, 08:28:00 PM
 #18

Just for reference again the logs for that moment: http://veritas.maximilianeum.ch/bitcoin/irc/logs/2010/12/20#l2461

Want to see what developers are chatting about? http://bitcoinstats.com/irc/bitcoin-dev/logs/
Bitcoin-OTC Rating
GeorgeH
Member
**
Offline Offline

Activity: 83
Merit: 10


View Profile
December 23, 2010, 05:57:55 AM
 #19

Just for reference again the logs for that moment: http://veritas.maximilianeum.ch/bitcoin/irc/logs/2010/12/20#l2461

Thanks, that was a good read.

1DSpPtPTGXTYjkZehPsiAbjkXLkB1jsZ2x
grondilu
Legendary
*
Offline Offline

Activity: 1288
Merit: 1076


View Profile
December 23, 2010, 06:19:14 AM
 #20


If some people created a bitcoin-dedicated ASIC, I'd be amazed.  It would be a strong indicator about how involved are some people into the bitcoin project.

ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
December 23, 2010, 10:13:27 AM
 #21

300MHz? on a Spartan3? Roll Eyes
Oh, and bitcoin hash is TWO rounds of sha256.
I just synthesized it, 60MHz max for one core on a -5 speed grade S3E-500.
So NOT
300MHz / 80 clocks/hash * 3 cores = 11MHps
instead (assuming we can lose overhead and just have to do a mid-add and a compare)
60MHz / 130 clocks/hash * 3 cores = 1.4MHps

at $20/chip thats 0.07MH/$ or about 25x worse than a HD5970...

and for "GPU needs mainboard".. FPGA needs PCB, VRMs, config memory, some kind of host connection, ...

So yeah, pull a few crazy numbers out of your ass and FPGAs look decent.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
mimarob (OP)
Full Member
***
Offline Offline

Activity: 354
Merit: 103



View Profile
December 23, 2010, 08:33:45 PM
 #22

60 MHz

Hmm, yes I discovered that myself today.. .really...

The numbers where though of as a maximum possible with-all-the-luck-you-can-have.

Unfortunately 11 MHash/sec is not to impressive either...

But then one has to remember that this is not the final implementation, it isn't even runable as it is.

Please correct me if I'm wrong but I thought that the maximum clock inside the spartan was about 300 MHz?

Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?

It's amazing the we have so many knowledgable people on this board.

lfm
Full Member
***
Offline Offline

Activity: 196
Merit: 104



View Profile
December 25, 2010, 02:40:13 AM
 #23


Also I'm confused about the hash definition, do we define the bitcoin hash as two regular hashes?

Satoshi defined it in the original implementation, yes. sha256(sha256(block header))

Quote
Another thing that really puzzels me is the nounce, will it always be at offset 12 and never be more than 32 bits ?


Well it is offset 12 to the second part of the first hash, ya. offset 76 out of 80 in the block header.

Yes it will always be 32 bits.
mike_la_jolla
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 27, 2010, 07:48:52 PM
 #24

mike_la_jolla checking in here to clarify some FPGA questions.

- DNDPB_S327:  http://www.dinigroup.com/new/DNDPB_S327.html
List price is $19,680 for quantity 1.

- This is probably a much better choice:  DNBFC_S12_PCIe: http://www.dinigroup.com/new/DNBFC_S12_PCIe.html
List price for quantity 1 is $8,950.  We sell thousands of these to do (spooky) things.  We can fit 12 in a single chassis.

- 300 MHz is probably not achievable for Spartan-6 or Cyclone 3.  With some effort by an expert, assume you can get to 200 Mhz or so.  Don't bother with the 'C' to FPGA methodologies.  You'll need someone that is well versed in VHDL/verilog.  Also, you generally can't get to 100% utilization without breaking the tools.

- Any FPGA solution will required a host.  The DNDPB_S327 connects via Ethernet, so has low data throughput.  The DNBFC_S12_PCIe is GEN1/GEN2 PCIe, so the bandwidth is much higher.

- Those of you that think you can do a custom ASIC are nuts.  The expense and effort of an ASIC would cost millions ($USD).  The Genomic search market isn't even large enough to support a custom ASIC.

- If this is a pure code breaking application, you are probably better off with FPGAs than GPUs, but it is very easy to gang together a few Xboxes.  FPGAs are harder to come by.
bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
December 27, 2010, 08:44:55 PM
 #25

mike_la_jolla checking in here to clarify some FPGA questions.

- Those of you that think you can do a custom ASIC are nuts.  The expense and effort of an ASIC would cost millions ($USD).  The Genomic search market isn't even large enough to support a custom ASIC.

The EFF built Deep Crack for less than $250,000, I thought he has make it with custom ASIC DES chips (called Deep Crack or AWT-4500) http://en.wikipedia.org/wiki/Deep_crack.

How you can help? A full implementation would be great  Smiley I would give you 150BTC for a miner implementation (vhdl or verilog) on Spartan-6. Maybe there are other user who would donate. You should write your Bitcoin address in your Signature
adulau
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
December 27, 2010, 10:35:39 PM
 #26


- If this is a pure code breaking application, you are probably better off with FPGAs than GPUs, but it is very easy to gang together a few Xboxes.  FPGAs are harder to come by.

Looking at the price of the FPGA with the design of the custom board,
why not going for a (or more) Nvidia Tesla board C2050/C2070?
(price is around 3000,- USD per board for 448 GPU core)

http://www.nvidia.com/docs/IO/43395/BD-04983-001_v04.pdf
http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html

jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
December 28, 2010, 12:09:23 AM
 #27

NVIDIA is geared towards floating point, while bitcoin's SHA256 algorithm wants integer math.

ATI GPUs are better at this.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
adulau
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
December 28, 2010, 02:03:02 PM
 #28

NVIDIA is geared towards floating point, while bitcoin's SHA256 algorithm wants integer math.

ATI GPUs are better at this.

You are maybe right, I don't know well the inner set of instructions per GPU-brand/type.

The instructions usually used for SHA-256 (IMHO, all the SHA-2 implementation as they use the same
scheme just the size is different) implementations are all the bit-wise (AND, OR, NOT and XOR)
operators on 32-bit word, the right shift instruction but also the rotate right/left instructions.

A comparison of all cycles required for all the instructions per type FPGA, GPU, Cell-like or other
SIMD could be useful. I don't know if someone in the forum already made this along with a rough
estimation of the cost per technology.

On the other hand, building something for SHA-2 that can be reused for other projects
relying on SHA-2 is not a waste of time/money.

If you or someone else build something in that scope, I will be willing to invest some time
and money in the project.




mike_la_jolla
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 28, 2010, 05:53:05 PM
 #29

The EFF built Deep Crack for less than $250,000, I thought he has make it with custom ASIC DES chips (called Deep Crack or AWT-4500) http://en.wikipedia.org/wiki/Deep_crack.
That appears to have been 1998.  You might be able to do it for a few 100's of thousands, but you would start with FPGAs and then hardwire.
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
December 28, 2010, 06:25:22 PM
 #30

The real issue on FPGA isnt the logic ops(cheap) or the rotates(pretty much free), but the 32-bit adds.
A_out = H + s0 + s1 + maj + ch + K + W
-> at least 3 level adder tree ((H + s0) + (s1 + maj)) + ((ch + K) + W)
Carry chain delay in a single 32-bit adder on a -3 speed grade Spartan6 is ~2ns, so without ANY routing delays we're already limited to 166MHz.
Real-world you're lucky to get 80MHz out of a non-pipelined round on a -3 S6
Pipelinining a round to 2 or 3 stages helps, but increases FF usage a LOT (you have to carry 256 bits of A..H, 512 bits of W[0..15] and the initial A..H for the final add around).
2-stage gives ~140MHz on a -3, 3-stage ~180MHz
= a 2-stage pipelined sha256 round is ~1k FFs, 3-stage pipelined ~1.5k FFs
XC6SLX150 has something like 160k FFs available, and the synthesis tools pretty much throw speed out the window once you go >70% FF utilization.
so realistically you MIGHT be able to fit 64 2-pipelined rounds of sha256 on a LX150, 2 clocks/bitcoinhash @ 140MHz -> 70Mh/s
or maybe with lots of luck and sacrificing a chicken to the place and route gods 48 rounds 3-stage @ 180MHz -> 68Mh/s
= 70Mh/s on a -3 speed grade XC6SLX150, 20%-30% less on a -2 speed grade.
so 9 grand for MAYBE 850Mh/s... a $500 HD5970 can get >550Mh/s stock, well >600Mh/s OCed at stock voltage even on a "bad" card.

okay, let's be REALLY generous, assume we can magically get 1.2Gh/s out of 12 150-2s and they consume NO POWER AT ALL.
So how long does it take at 600W for 2 5970s and $0.10/kWh to make up that $8k price difference? 0.6kW @ $0.10 kWh = $1.44/day ... about 15 years.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
mimarob (OP)
Full Member
***
Offline Offline

Activity: 354
Merit: 103



View Profile
December 29, 2010, 10:54:41 PM
 #31

Wow thats some really impressive calculations.

$9k... to bad xmas is over for this time :-)

Smart idea to pipeline the adders, does it mean you spend more flip-flops but not more gates?

I was thinking of getting an old-fashioned xc3s500 for a reasonable price, at 1k-1.5k flip flops maybe it would be possible to fit one out of the 64 of these
pipelined sha modules into one chip?

So, if I'm lucky I could get it running at 60-70 MHz meaning a full sha would take about 1us and that would give me 0.5 MHash/sec, right?

Its almost as fast as my old computer which runs at 0.7 MHash/s :-)
Jason
Member
**
Offline Offline

Activity: 114
Merit: 10


View Profile
December 30, 2010, 05:19:37 PM
 #32

I have an old Altera DE2-70 board I picked up for $300 (academic price) 1.5 years ago (Cyclone II).  Looks like the current model is a DE2-115 based on the Cyclone III FPGA.

I just did a bit of research on existing SHA-256 implementations for FPGAs, and I see that several companies sell high performance FPGA implementations (e.g. http://www.cast-inc.com/ip-cores/encryption/sha-256/index.html).  Taking the Cast implementation as an example:

"The processing of one 512-bit block is performed in 66 clock cycles and the bit-rate achieved is 7.75Mbps / MHz on the input of the SHA256 core."

Taking a clock rate of 132MHz as a reasonably conservative number for my older Cyclone II (Cast claims up to 280MHz on high performance FPGAs), this comes out to 2Mhps per block.  Cast's implementation uses around 2,531 LEs on the Cyclone.  My older DE2-70 board contains about 68,000 LEs.

Adding 10% overhead for communication/synchronization/etc, it should be possible to put 24 SHA-256 processors on my DE2-70.  That should allow up to 48Mhps peak processing rate (>80Mhps for the DE2-115 which can also be clocked faster).

Another question:  How much communications bandwidth is needed at these speeds, and can it fit on a 100baseT channel?  Certainly not if we want the host to transfer all of the candidates to be hashed onto the FPGA (48M * 512 = 12.3Gbps -- well above even gigabit ethernet speeds).  Is there another approach that can overcome this limitation.  I think so...

FPGAs have room for a dedicated CPU as well as a lot of logic, depending on what level of functionality you need in the CPU.  There are a lot of free and powerful CPU cores available on opencores.org, but it will be hard to beat the Nios II architecture if you are using Altera FPGAs.

A 32-bit Nios II/f CPU core is capable of 140 MIPS of performance (at 125MHz) and uses 1600 LE's on the Cylcone II.  Is this sufficient to keep 24 high-speed SHA-256 blocks from stalling?  Not even close.  In fact, it would probably not even be able to keep even one SHA-256 block from stalling.  Back to the drawing board...

It looks like a better approach would be to implement the search logic directly in gates on the FPGA, and have it fill one or more 256-bit-wide queue(s) which would be drawn on by the SHA-256 processing blocks.  A single NIOS II CPU still makes sense for collecting the results and communicating the results back to the host CPU (TCP/IP stack), as well as to load the search logic starting and ending values.

Anyway, my back-of-the-envelope calculations seem to confirm almost everything ArtForz is saying below.  It looks like the ATI 5970s are the right choice if your goal is to crunch bitcoins.

OTOH, if you want an excuse to learn how to program FPGAs, you will certainly be able to run circles around a state-of-the-art hex-core i7 CPU with a pretty modest FPGA -- but at considerable effort.

Jason

The real issue on FPGA isnt the logic ops(cheap) or the rotates(pretty much free), but the 32-bit adds.
A_out = H + s0 + s1 + maj + ch + K + W
-> at least 3 level adder tree ((H + s0) + (s1 + maj)) + ((ch + K) + W)
Carry chain delay in a single 32-bit adder on a -3 speed grade Spartan6 is ~2ns, so without ANY routing delays we're already limited to 166MHz.
Real-world you're lucky to get 80MHz out of a non-pipelined round on a -3 S6
Pipelinining a round to 2 or 3 stages helps, but increases FF usage a LOT (you have to carry 256 bits of A..H, 512 bits of W[0..15] and the initial A..H for the final add around).
2-stage gives ~140MHz on a -3, 3-stage ~180MHz
= a 2-stage pipelined sha256 round is ~1k FFs, 3-stage pipelined ~1.5k FFs
XC6SLX150 has something like 160k FFs available, and the synthesis tools pretty much throw speed out the window once you go >70% FF utilization.
so realistically you MIGHT be able to fit 64 2-pipelined rounds of sha256 on a LX150, 2 clocks/bitcoinhash @ 140MHz -> 70Mh/s
or maybe with lots of luck and sacrificing a chicken to the place and route gods 48 rounds 3-stage @ 180MHz -> 68Mh/s
= 70Mh/s on a -3 speed grade XC6SLX150, 20%-30% less on a -2 speed grade.
so 9 grand for MAYBE 850Mh/s... a $500 HD5970 can get >550Mh/s stock, well >600Mh/s OCed at stock voltage even on a "bad" card.

okay, let's be REALLY generous, assume we can magically get 1.2Gh/s out of 12 150-2s and they consume NO POWER AT ALL.
So how long does it take at 600W for 2 5970s and $0.10/kWh to make up that $8k price difference? 0.6kW @ $0.10 kWh = $1.44/day ... about 15 years.

BM-2D7sazxZugpTgqm3M2MCi5C1t8Du8BN11f
mimarob (OP)
Full Member
***
Offline Offline

Activity: 354
Merit: 103



View Profile
December 30, 2010, 08:21:32 PM
 #33

fpga in my case is mainly for fun, but I wont refuse to try a cuda/opencl graphics card either. I'm using about 600 Watt on average to keep a building frost-free at the moment..

As I read in a few threads here, the usage of GPU's isn't totally problem free either, or?

bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
December 30, 2010, 08:51:28 PM
 #34

fpga in my case is mainly for fun, but I wont refuse to try a cuda/opencl graphics card either. I'm using about 600 Watt on average to keep a building frost-free at the moment..

As I read in a few threads here, the usage of GPU's isn't totally problem free either, or?

One HD5970 need 300 Watt. Put one computer with 2 HD5970 in your building and you have 600 Watt. I don't know if the windows driver support 2 HD5970 at the same time, but linux should do this. Of course you need Internet connection in your building. You need the standard bitcoin client and m0mchil (or puddinpops) miner. http://bitcointalk.org/index.php?topic=1334.0;all
WSDN
Sr. Member
****
Offline Offline

Activity: 493
Merit: 250


IDENA.IO - Proof-Of-Person Blockchain


View Profile
January 01, 2011, 02:12:01 AM
 #35

Best os to run bitcoin client? NetBSD? or OpenBSD?

              ▄▄▄ ▀▀▀▀▀▀▀▀▀ ▄▄▄
           ▄▀▀    ▄▄▄▄▄▄▄▄▄    ▀▀▄
        ▄▀▀  ▄▄▀█          ▀█▀▄▄  ▀▀▄
      ▄▀▀ ▄▄▀    ▀▀▄▄▄▄▄▄▄▀▀    ▀▄▄ ▀▀▄
     █   █            ▀            █   █
   ▄▀ █  ▀▄▄                     ▄█▀  █ ▀▄
  ▄▀ ▄▀ █▄ ▀▀▀██▄▄▄       ▄▄▄██▀▀  ██ ▀▄ ▀▄
  ▀▄▀▀▄ ██ ▄▄▄▄▄▄  ▀▄   ▄▀  ▄▄▄▄▄▄ ██ ▄▀▀▄▀
 ██   █ ██ ▀▄    ▀▄ █   █ ▄▀    ▄▀ ██ █  ▀██
 █  ▄█  ▀█  ▀▀▀▀▀▀▀ █   █ ▀▀▀▀▀▀▀  █   █▄  █
█▀ █  █  █          █   █          █  █  █ ▀▀
 █▀  ▄▀  █▀▄        █   █        ▄▀█  ▀▄  ▀█
 ▄  █▀   █ ▀█▄      ▀   ▀      ▄█▀ █  ▄▀█  ▄
 █▄▀  █  █                         █  █  ▀▄█
 ▀▄  █   ▀█        ▄▄▀▄▀▄▄        █▀   █  ▄
  ▀▄▀▀  █▄ █     ▀█  ▀▀▀  █▀     █ ▄█ ▄▀▀▄▀
   ▀ ▄  ██ █▀▄     ▀▀▄▄▄▀▀     ▄▀█ ██ ▀▄ ▀
    ▀█  ██ █ █▀▄    ▄▄▄▄▄    ▄▀█ █ ██  █▀
      ▀▄ ▀ █ █ ██▄         ▄██ █ █ ▀ ▄▀
        ▀▄ █ █ █ ▀█▄     ▄█▀ █ █ █ ▄▀
          ▀▀▄█ █    ▀▀▀▀▀    █ █▄▀▀
              ▀▀ ▄▄▄▄▄▄▄▄▄▄▄ ▀▀
   
I  D  E  N  A
   
Proof-of-Person Blockchain

Join the mining of the first human-centric
cryptocurrency
 



 
▲    2 6 8 0

N  O  D  E  S
   
                ██
                ██
                ██
                ██
                ██
         ▄      ██      ▄
         ███▄   ██   ▄███
          ▀███▄ ██ ▄███▀
            ▀████████▀
              ▀████▀
                ▀▀
██▄                            ▄██
███                            ███
███                            ███
███                            ███
 ███▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄███
  ▀▀██████████████████████████▀▀
   
D O W N L O A D

Idena node

   
   
▄▄▄██████▄▄▄
▄▄████████████████▄▄
▄█████▀▀        ▀▀█████▄
████▀                ▀████
███▀    ▄▄▄▄▄▄▄▄▄       ▀███
███      █   ▄▄ █▀▄        ███
██▀      █  ███ █  ▀▄      ▀██
███       █   ▀▀ ▀▀▀▀█       ███
███       █  ▄▄▄▄▄▄  █       ███
███       █  ▄▄▄▄▄▄  █       ███
██▄      █  ▄▄▄▄▄▄  █      ▄██
███      █          █      ███
███▄    ▀▀▀▀▀▀▀▀▀▀▀▀    ▄███
████▄                ▄████
▀█████▄▄        ▄▄█████▀
▀▀████████████████▀▀
▀▀▀██████▀▀▀
   
    ███





███

REQUEST INVITATION
███
  █
  █
  █
  █
  █
███
bitcoin2
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
January 01, 2011, 03:22:21 AM
 #36

Best os to run bitcoin client? NetBSD? or OpenBSD?
The bitcoin client is OS independent, but the OpenCL driver for mining / ATI Radeon runs only under win and Linux (for Radeon 5970 linux is recocommend because you can't disable CrossFire under Windows). I don't know if there are NetBSD or OpenBSD driver from ATI. You could take debian or ubuntu and install the driver from ati.
WSDN
Sr. Member
****
Offline Offline

Activity: 493
Merit: 250


IDENA.IO - Proof-Of-Person Blockchain


View Profile
January 01, 2011, 05:17:57 AM
 #37

I don't know if there are NetBSD or OpenBSD driver from ATI.

I remember in openBSD is possible this not have these drivers, but netbsd is perfect have the most updates drivers too and is bledeng eye tecnology.

              ▄▄▄ ▀▀▀▀▀▀▀▀▀ ▄▄▄
           ▄▀▀    ▄▄▄▄▄▄▄▄▄    ▀▀▄
        ▄▀▀  ▄▄▀█          ▀█▀▄▄  ▀▀▄
      ▄▀▀ ▄▄▀    ▀▀▄▄▄▄▄▄▄▀▀    ▀▄▄ ▀▀▄
     █   █            ▀            █   █
   ▄▀ █  ▀▄▄                     ▄█▀  █ ▀▄
  ▄▀ ▄▀ █▄ ▀▀▀██▄▄▄       ▄▄▄██▀▀  ██ ▀▄ ▀▄
  ▀▄▀▀▄ ██ ▄▄▄▄▄▄  ▀▄   ▄▀  ▄▄▄▄▄▄ ██ ▄▀▀▄▀
 ██   █ ██ ▀▄    ▀▄ █   █ ▄▀    ▄▀ ██ █  ▀██
 █  ▄█  ▀█  ▀▀▀▀▀▀▀ █   █ ▀▀▀▀▀▀▀  █   █▄  █
█▀ █  █  █          █   █          █  █  █ ▀▀
 █▀  ▄▀  █▀▄        █   █        ▄▀█  ▀▄  ▀█
 ▄  █▀   █ ▀█▄      ▀   ▀      ▄█▀ █  ▄▀█  ▄
 █▄▀  █  █                         █  █  ▀▄█
 ▀▄  █   ▀█        ▄▄▀▄▀▄▄        █▀   █  ▄
  ▀▄▀▀  █▄ █     ▀█  ▀▀▀  █▀     █ ▄█ ▄▀▀▄▀
   ▀ ▄  ██ █▀▄     ▀▀▄▄▄▀▀     ▄▀█ ██ ▀▄ ▀
    ▀█  ██ █ █▀▄    ▄▄▄▄▄    ▄▀█ █ ██  █▀
      ▀▄ ▀ █ █ ██▄         ▄██ █ █ ▀ ▄▀
        ▀▄ █ █ █ ▀█▄     ▄█▀ █ █ █ ▄▀
          ▀▀▄█ █    ▀▀▀▀▀    █ █▄▀▀
              ▀▀ ▄▄▄▄▄▄▄▄▄▄▄ ▀▀
   
I  D  E  N  A
   
Proof-of-Person Blockchain

Join the mining of the first human-centric
cryptocurrency
 



 
▲    2 6 8 0

N  O  D  E  S
   
                ██
                ██
                ██
                ██
                ██
         ▄      ██      ▄
         ███▄   ██   ▄███
          ▀███▄ ██ ▄███▀
            ▀████████▀
              ▀████▀
                ▀▀
██▄                            ▄██
███                            ███
███                            ███
███                            ███
 ███▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄███
  ▀▀██████████████████████████▀▀
   
D O W N L O A D

Idena node

   
   
▄▄▄██████▄▄▄
▄▄████████████████▄▄
▄█████▀▀        ▀▀█████▄
████▀                ▀████
███▀    ▄▄▄▄▄▄▄▄▄       ▀███
███      █   ▄▄ █▀▄        ███
██▀      █  ███ █  ▀▄      ▀██
███       █   ▀▀ ▀▀▀▀█       ███
███       █  ▄▄▄▄▄▄  █       ███
███       █  ▄▄▄▄▄▄  █       ███
██▄      █  ▄▄▄▄▄▄  █      ▄██
███      █          █      ███
███▄    ▀▀▀▀▀▀▀▀▀▀▀▀    ▄███
████▄                ▄████
▀█████▄▄        ▄▄█████▀
▀▀████████████████▀▀
▀▀▀██████▀▀▀
   
    ███





███

REQUEST INVITATION
███
  █
  █
  █
  █
  █
███
lucky
Newbie
*
Offline Offline

Activity: 51
Merit: 0


View Profile
January 01, 2011, 07:35:42 PM
 #38

I don't know if there are NetBSD or OpenBSD driver from ATI.

I remember in openBSD is possible this not have these drivers, but netbsd is perfect have the most updates drivers too and is bledeng eye tecnology.


It depends on CUDA/OpenCL support in the proprietary ATI/Nvidia drivers.

These are not available on OpenBSD or NetBSD.
WSDN
Sr. Member
****
Offline Offline

Activity: 493
Merit: 250


IDENA.IO - Proof-Of-Person Blockchain


View Profile
January 01, 2011, 08:02:06 PM
 #39

Quote
These are not available on OpenBSD or NetBSD.

Yes is true friend! =)

              ▄▄▄ ▀▀▀▀▀▀▀▀▀ ▄▄▄
           ▄▀▀    ▄▄▄▄▄▄▄▄▄    ▀▀▄
        ▄▀▀  ▄▄▀█          ▀█▀▄▄  ▀▀▄
      ▄▀▀ ▄▄▀    ▀▀▄▄▄▄▄▄▄▀▀    ▀▄▄ ▀▀▄
     █   █            ▀            █   █
   ▄▀ █  ▀▄▄                     ▄█▀  █ ▀▄
  ▄▀ ▄▀ █▄ ▀▀▀██▄▄▄       ▄▄▄██▀▀  ██ ▀▄ ▀▄
  ▀▄▀▀▄ ██ ▄▄▄▄▄▄  ▀▄   ▄▀  ▄▄▄▄▄▄ ██ ▄▀▀▄▀
 ██   █ ██ ▀▄    ▀▄ █   █ ▄▀    ▄▀ ██ █  ▀██
 █  ▄█  ▀█  ▀▀▀▀▀▀▀ █   █ ▀▀▀▀▀▀▀  █   █▄  █
█▀ █  █  █          █   █          █  █  █ ▀▀
 █▀  ▄▀  █▀▄        █   █        ▄▀█  ▀▄  ▀█
 ▄  █▀   █ ▀█▄      ▀   ▀      ▄█▀ █  ▄▀█  ▄
 █▄▀  █  █                         █  █  ▀▄█
 ▀▄  █   ▀█        ▄▄▀▄▀▄▄        █▀   █  ▄
  ▀▄▀▀  █▄ █     ▀█  ▀▀▀  █▀     █ ▄█ ▄▀▀▄▀
   ▀ ▄  ██ █▀▄     ▀▀▄▄▄▀▀     ▄▀█ ██ ▀▄ ▀
    ▀█  ██ █ █▀▄    ▄▄▄▄▄    ▄▀█ █ ██  █▀
      ▀▄ ▀ █ █ ██▄         ▄██ █ █ ▀ ▄▀
        ▀▄ █ █ █ ▀█▄     ▄█▀ █ █ █ ▄▀
          ▀▀▄█ █    ▀▀▀▀▀    █ █▄▀▀
              ▀▀ ▄▄▄▄▄▄▄▄▄▄▄ ▀▀
   
I  D  E  N  A
   
Proof-of-Person Blockchain

Join the mining of the first human-centric
cryptocurrency
 



 
▲    2 6 8 0

N  O  D  E  S
   
                ██
                ██
                ██
                ██
                ██
         ▄      ██      ▄
         ███▄   ██   ▄███
          ▀███▄ ██ ▄███▀
            ▀████████▀
              ▀████▀
                ▀▀
██▄                            ▄██
███                            ███
███                            ███
███                            ███
 ███▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄███
  ▀▀██████████████████████████▀▀
   
D O W N L O A D

Idena node

   
   
▄▄▄██████▄▄▄
▄▄████████████████▄▄
▄█████▀▀        ▀▀█████▄
████▀                ▀████
███▀    ▄▄▄▄▄▄▄▄▄       ▀███
███      █   ▄▄ █▀▄        ███
██▀      █  ███ █  ▀▄      ▀██
███       █   ▀▀ ▀▀▀▀█       ███
███       █  ▄▄▄▄▄▄  █       ███
███       █  ▄▄▄▄▄▄  █       ███
██▄      █  ▄▄▄▄▄▄  █      ▄██
███      █          █      ███
███▄    ▀▀▀▀▀▀▀▀▀▀▀▀    ▄███
████▄                ▄████
▀█████▄▄        ▄▄█████▀
▀▀████████████████▀▀
▀▀▀██████▀▀▀
   
    ███





███

REQUEST INVITATION
███
  █
  █
  █
  █
  █
███
ttul
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
March 26, 2011, 03:53:06 AM
 #40

Say hypothetically that some mystery vendor releases a new chip capable of mining at 100x the power efficiency of existing cards, for 2x the price of a 5970. Would this mining hardware sell well? How many of you would buy such a magic box?

I understand that the difficulty would adjust to neutralize the increased power introduced by the new technology; however, that difficulty increase would also render the old technology irrelevant and would sort of force everyone to upgrade.

GPUs pretty much wiped out CPU mining last year. I wonder if there was another step up in performance if current generation GPUs could similarly be completely side-stepped.

Thoughts appreciated...
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
March 26, 2011, 10:49:03 AM
 #41

 Shocked Wow, this is such a coincidence! I was just browsing the forums tonight, and stumbled upon this thread. I finally registered an account just to post in this thread.

I've been working on an FPGA miner for the past few weeks! It's fully working*, currently running on my desk in front of me and generating up some tasty shares Cool I'll give an overview of my work:

Current Performance
Device: Altera Cyclone 3 C120 Dev Kit
Performance: 70Mhash/s
Power: 2.26W
Efficiency: 30.9 Mhash/W


It's written in Verilog, all crafted painstakingly by hand. There are two alternative designs. One is a serial design composed of many SHA256 cores running in parallel, each core computing a hash in 64 cycles (2 cores needed for the full hash). Each full core (2 half cores) consumes about 2800 LEs. The second design (currently running in front of me) is a pipelined version with one LOOOOONNNNGGGG chain of hashing stages running in parallel. That design computes 1 full hash every clock cycle. It runs at a maximum of 70MHz right now. Actually, I haven't tried pushing it to its limit, so it may very well run much faster. I'm hoping for 100MHz.

These are my results after off-and-on work for a few weeks. I've actually put most of my efforts into the serial design, because the pipelined design takes at least an hour to synthesize each time. The serial design can currently fit 42 full cores into the C120, each running at 90MHz and computing a full hash every 64 cycles. That's about 59Mhash/s.

The latest revision of the pipelined design consumes 90,000 LEs, so it's pretty big. I'm working to cram it into <64,000LEs so I can get two of them in one C120 chip, and push their clock to 100MHz, giving me a whopping 200Mhash/s.

I haven't used the on-board power meter before, but if I'm reading it correctly the FPGA is currently using 2.26 Watts. That ... seems really low, but Altera's website verifies that that's actually above average for a C120, so I guess it's accurate. That's 31 Mhash/W, which is 1200% more efficient than the most efficient GPU listed on the Wiki. So efficient, it's basically free. Poor guy runs terribly hot though. I need to go put a fan on him...

The only downside is that this board in particular, the C120, costs $1000. The same design will easily fit into the DE2-115 board (from Terasic), which only costs $600. I have one of those too, so I'll test on him later. You're not likely to pay off that $600 quickly, though, so I guess it isn't economical yet. A reduced version may run in the DE0-Nano board, which is $80, but obviously it won't have the same performance (about 25%).

All my efforts are put into optimizing every last bit of the design, so we'll see how far I push the poor FPGA. It already out-performs my GTX 285 card, so I'm happy  Grin and at a fraction of the power cost.

And I'm only getting started  Cool Who wants to front the money to buy me a Stratix board and move this into Hardcopy?  Tongue

* By fully working, I really do mean it. It's happily submitting hashes to a pool. I was quite thrilled when my little baby submitted his first share  Cheesy

deadlizard
Member
**
Offline Offline

Activity: 112
Merit: 11



View Profile
March 26, 2011, 11:16:51 AM
 #42

this is relevent to my interests

don't mind me, just monitoring this thread

btc address:1MEyKbVbmMVzVxLdLmt4Zf1SZHFgj56aqg
gpg fingerprint:DD1AB28F8043D0837C86A4CA7D6367953C6FE9DC

fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
March 26, 2011, 11:25:45 AM
 #43

If you guys are interested in my work, let me know, and I'll continue to post updates and such. Otherwise, I guess I'll just toil away in silence.

And a quick note:
The current design uses my PC to fetch work, and push it to the FPGA, as well as check for "Golden Tickets" (my funny internal name for valid nonces) and submit them when found. There's room in the pipelined design to put in a NIOS microprocessor. This could potentially use the ethernet port on the dev kit to do all the fetching and submitting. That way it'd be totally automated, and headless.  Cool

LMGTFY
Hero Member
*****
Offline Offline

Activity: 644
Merit: 502



View Profile
March 26, 2011, 11:35:45 AM
 #44

If you guys are interested in my work, let me know, and I'll continue to post updates and such. Otherwise, I guess I'll just toil away in silence.

And a quick note:
The current design uses my PC to fetch work, and push it to the FPGA, as well as check for "Golden Tickets" (my funny internal name for valid nonces) and submit them when found. There's room in the pipelined design to put in a NIOS microprocessor. This could potentially use the ethernet port on the dev kit to do all the fetching and submitting. That way it'd be totally automated, and headless.  Cool
Oh, I think there'll definitely be interest. My initial thoughts are that Mhash/W is superb, about 15 times better than a 5970. Mhash/s is still quite low, given the cost, but a "quad DE0-Nano board" version would be particularly interesting - $320 (compared with $400 or more for a second-hand 5970). Professional miners who are concerned about on-going electricity costs more than they are about fixed, up-front costs might very well be interested.

This space intentionally left blank.
MoonShadow
Legendary
*
Offline Offline

Activity: 1708
Merit: 1007



View Profile
March 26, 2011, 04:00:36 PM
 #45


At the same time your results do seem rather better than what I thought can be had on both hash/watt hash/$ . I'd say if you manage to perfect your design, order ASIC fabrication and turn in into some device for sha256 and bitcoin mining this would sell well, not only to bitcoiners but to various spooks too.


If someone were to develop such an ASIC and put it on a small SOC, and networked a bunch in a ribbon, they would make great heat trace cabling for water lines.  Parking garages (which still have to have fire suppression systems) have heat trace wrapped around water lines and mains, which are then insulated over that.  These water lines have to be heated continuously anytime the outside temp is below 35 degrees, so that cold spots don't freeze & bust the water lines.

I considered making a Linux cluster like this about 10 years ago, but never did anything with the idea.  These might sell well in high latitudes.

"The powers of financial capitalism had another far-reaching aim, nothing less than to create a world system of financial control in private hands able to dominate the political system of each country and the economy of the world as a whole. This system was to be controlled in a feudalist fashion by the central banks of the world acting in concert, by secret agreements arrived at in frequent meetings and conferences. The apex of the systems was to be the Bank for International Settlements in Basel, Switzerland, a private bank owned and controlled by the world's central banks which were themselves private corporations. Each central bank...sought to dominate its government by its ability to control Treasury loans, to manipulate foreign exchanges, to influence the level of economic activity in the country, and to influence cooperative politicians by subsequent economic rewards in the business world."

- Carroll Quigley, CFR member, mentor to Bill Clinton, from 'Tragedy And Hope'
nphard
Member
**
Offline Offline

Activity: 66
Merit: 10


View Profile
March 26, 2011, 04:33:22 PM
 #46

Current Performance
Device: Altera Cyclone 3 C120 Dev Kit
Performance: 70Mhash/s
Power: 2.26W
Efficiency: 30.9 Mhash/W

Those are some surprisingly good numbers. Just think what could be done with something like this.
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
March 26, 2011, 05:56:19 PM
 #47

Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
Jered Kenna (TradeHill)
Sr. Member
****
Offline Offline

Activity: 420
Merit: 250



View Profile WWW
April 04, 2011, 03:16:36 PM
 #48

posting to follow

moneyandtech.com
@moneyandtech @jeredkenna
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
April 07, 2011, 06:15:11 AM
 #49

Well, I've been occasionally poking and prodding my design. The pipelined version is clocking at 80MHz now, and down to 80K LEs (64K being the goal, down from 90K). Not huge progress, but I figured I'd keep the thread alive.

eMansipater
Sr. Member
****
Offline Offline

Activity: 294
Merit: 273



View Profile WWW
April 07, 2011, 07:41:01 AM
 #50

I'm very interested to see what happens with this too.  Do FPGA's follow a similar tech curve to GPU's?

If you found my post helpful, feel free to send a small tip to 1QGukeKbBQbXHtV6LgkQa977LJ3YHXXW8B
Visit the BitCoin Q&A Site to ask questions or share knowledge.
0.009 BTC too confusing?  Use mBTC instead!  Details at www.em-bit.org or visit the project thread to help make Bitcoin prices more human-friendly.
Adeq
Newbie
*
Offline Offline

Activity: 17
Merit: 0



View Profile
April 07, 2011, 12:30:45 PM
Last edit: April 07, 2011, 12:50:48 PM by Adeq
 #51

What model of FPGA are you currently using?
You can send to production as ASIC and maybe get 200MHz+
Jered Kenna (TradeHill)
Sr. Member
****
Offline Offline

Activity: 420
Merit: 250



View Profile WWW
April 07, 2011, 12:53:48 PM
 #52

I could be wrong here but I assumed there was already a need for an efficient version of this out there.
Wouldn't spy agencies or security or whoever want these? Hell maybe they have 100,000 and just aren't saying anything though..

Or are they way more specialized to bitcoin than I realize.

moneyandtech.com
@moneyandtech @jeredkenna
mrb
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
April 07, 2011, 03:59:32 PM
 #53

The NSA does have its own silicon foundry.

Not anymore. They abandoned it because a decent foundry these days costs multiple billion of dollars which is estimated to be a large fraction the classified budget of the NSA. They now produce chips by buying production capacity from semiconductor companies through the TAPO program.
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
April 07, 2011, 10:32:39 PM
 #54

Quote
What model of FPGA are you currently using?
Altera's Cyclone III EP3C120F780, from the Cyclone III FPGA Development Kit.

The design will also run just fine on a Cyclone IV C115, which is a bit cheaper.

marcus_of_augustus
Legendary
*
Offline Offline

Activity: 3920
Merit: 2348


Eadem mutata resurgo


View Profile
April 08, 2011, 10:59:36 PM
 #55

hmmmmm....

merlyn
Newbie
*
Offline Offline

Activity: 3
Merit: 0


View Profile
April 09, 2011, 02:21:43 PM
 #56

posting to activate e-mail notifications
randomguy7
Hero Member
*****
Offline Offline

Activity: 527
Merit: 500


View Profile
April 27, 2011, 12:28:33 AM
 #57

just watching - ignore me Smiley
pusle
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile
April 27, 2011, 07:19:01 PM
 #58


http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  Grin
farmer_boy
Newbie
*
Offline Offline

Activity: 56
Merit: 0


View Profile
April 29, 2011, 01:28:37 PM
 #59


http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  Grin
At an unknown $ amount. Why can't these people just put a dollar amount next to their product?
mrb
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
April 29, 2011, 03:38:07 PM
 #60

At an unknown $ amount. Why can't these people just put a dollar amount next to their product?

They like to force you to have to call one of their salespersons who will price the product according to how much money you have. Market segmentation at work...
FooDSt4mP
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
April 29, 2011, 10:04:36 PM
 #61


http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  Grin
At an unknown $ amount. Why can't these people just put a dollar amount next to their product?

Or better yet, a BTC amount Wink.

As we slide down the banister of life, this is just another splinter in our ass.
randomguy7
Hero Member
*****
Offline Offline

Activity: 527
Merit: 500


View Profile
April 30, 2011, 02:00:17 PM
 #62

About how many hashes could a fpga like that do?
pusle
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile
April 30, 2011, 07:00:40 PM
 #63


Using fpgaminer's numbers for his CycloneIII-120 board.

90k LE's @70MHz = 70Mhash/sec.


Assuming LE's = LUT's  and it could actually run this design at 1.5GHz ->  10.5 Gigahash/sec
Using the Cast IP -> 6 Gigahash/sec


Another FPGA company has come up with space time reconfig @ 1.6GHz:
http://www.tabula.com/technology/technology.php




phelix
Legendary
*
Offline Offline

Activity: 1708
Merit: 1019



View Profile
April 30, 2011, 08:44:27 PM
 #64

30.4.2011: 6GHash --> ~200$/day

sweet
xyzzy
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
May 01, 2011, 04:42:11 AM
 #65

Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.


Always so negative Sad smells like agenda
Cheeseman
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile
May 01, 2011, 05:00:33 AM
 #66

Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.


Always so negative Sad smells like agenda

Also, those are prices for just the FPGA without the board. The best price I've seen for the FPGA + board is $330 for a CycloneIV dev kit.
ttul
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
May 10, 2011, 10:51:42 PM
 #67


http://www.achronix.com/products/speedster22i.html

700K Luts @ 1.5GHz

Now we're getting somewhere?  Grin
At an unknown $ amount. Why can't these people just put a dollar amount next to their product?

Because they're selling these things in very low volumes and/or haven't yet set up a distributor network. You can count on them being priced at >$10K per unit if there is no price list. Otherwise their sales and marketing engine won't be profitable.
bitcoinBull
Legendary
*
Offline Offline

Activity: 826
Merit: 1001


rippleFanatic


View Profile
May 10, 2011, 11:30:47 PM
 #68

Good?
I've gotten 70Mh/s with a Spartan6 LX 150-3, $180 @ 1ea.
he gets the same from a CycloneIII 120-C8, $380 @ 1ea.
and expects about the same from a CycloneIV-E 115-C8, $310 @ 1ea.


Always so negative Sad smells like agenda

Also, those are prices for just the FPGA without the board. The best price I've seen for the FPGA + board is $330 for a CycloneIV dev kit.

But, if you were going to mine with FPGAs you wouldn't use a dev board, you'd use multiple boards each with an array of FPGA chips.



That's from the Copacobana: Cost-Optimized Parallel COde Breaker.

Don't think I've seen it mentioned here before.

Its successor uses the Spartan6 LX150(T), the Rivyera: http://www.sciengines.com/products/computers-and-clusters/rivyera-s6-lx150.html

Starts at EUR 20'000 (16 count FPGA).

RIVYERA S6-LX150
FPGA Type: Xilinx Spartan-6 LX150
FPGA count min. 16 to max. 128
Price from EUR 19'900 to 86'900

RIVYERA S3-5000
FPGA Type: Xilinx Spartan-3 5000
FPGA count min. 16 to max. 128
Price from EUR 16'900 to EUR 58'900

RIVYERA V4-SX35
FPGA Type: Xilinx Spartan-6 LX150
FPGA count 128
Price above EUR 1 million




College of Bucking Bulls Knowledge
ttul
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
May 10, 2011, 11:41:28 PM
 #69

...
That's from the Copacobana: Cost-Optimized Parallel COde Breaker.

Don't think I've seen it mentioned here before.

Its successor uses the Spartan6 LX150(T), the Rivyera: http://www.sciengines.com/products/computers-and-clusters/rivyera-s6-lx150.html

Starts at EUR 20'000 (16 count FPGA).

RIVYERA S6-LX150
FPGA Type: Xilinx Spartan-6 LX150
FPGA count min. 16 to max. 128
Price from EUR 19'900 to 86'900

RIVYERA S3-5000
FPGA Type: Xilinx Spartan-3 5000
FPGA count min. 16 to max. 128
Price from EUR 16'900 to EUR 58'900

RIVYERA V4-SX35
FPGA Type: Xilinx Spartan-6 LX150
FPGA count 128
Price above EUR 1 million

These prices are 10x the $ / MHash/s cost of a 5970 board. But I would imagine vastly more efficient in terms of power consumption.
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 19, 2011, 12:19:15 AM
 #70

It has been a little while since I submitted an update on my progress, so here we go.

Area Improvement: <80K LUTs for 80MH/s
I recently did another round of area optimization on one of my designs. As I suspected, it now successfully fits on a Cyclone3 C80 device. This is the 80MH/s design, so it achieves the theoretical 1MH/s = 1K LUT numbers that I had on the back of my napkin.

The next step is to synthesize for a Cyclone4 C75 device, which might be a very tight fit. The Cyclone 4s are a bit cheaper and use slightly less power. Also, if it does fit into 75K LUTs, then it is likely that two of the same design will fit into a C150. That would achieve a total of 160MHash/s.


New Parts Coming In
I have a Xilinx Spartan-6 LX150T-3 development board coming in soon. My goal here is to achieve 160MHash/s on this single chip. Estimates predict that it will be possible, but may be very difficult. We shall see.

Goals: Achieve 160MHash/s on a Spartan-6 LX150-3, which is a sub-$200 chip. That's $1.25USD per MH/s. The average cost of a complete GPU mining rig is $1 USD per MH/s. This goal would bring me very close to achieving GPU parity, and it most certainly will continue to exceed GPUs in power and temperature performance.

I also ordered an Ethernet module and hope to use it to make the FPGA miner completely independent. Plug and profit!


Power Consumption Measured
I now have a Kill-a-Watt, which measures the amount of electricity drawn "at the wall" by any device. Using this, I measured the "at the wall" power consumption of my Cyclone-4 50MHash/s design. It was 8Watts. Quite impressive, considering that this is for the entire development kit and power inefficiency of the power supplies.

Also, the Cyclone-4 required no cooling. No fan, no heat sink. It chugged along happily.  Smiley Unlike my noisy mining rigs...



caston
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500



View Profile WWW
May 19, 2011, 12:57:16 AM
 #71

fpgaminer: where do you order your dev kits from?

bitcoin BTC: 1MikVUu1DauWB33T5diyforbQjTWJ9D4RF
bitcoin cash: 1JdkCGuW4LSgqYiM6QS7zTzAttD9MNAsiK

-updated 3rd December 2017
aahzmundus
Hero Member
*****
Offline Offline

Activity: 644
Merit: 500


Invest & Earn: https://cloudthink.io


View Profile
May 19, 2011, 04:16:33 AM
 #72

This is awesome.  I am starting to invest in mining equipment but this looks like it may be better.  Too bad I have no ability to do this on my own and I doubt you would release your miner without compensation...

Do you have plans to release it? Should someone start a bounty?

bitdiver
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
May 19, 2011, 12:22:38 PM
Last edit: May 19, 2011, 02:28:43 PM by bitdiver
 #73

A devel board with a LX150 ? I was only aware of the Digilent Atlys Spartan-6 with a LX45 for about 140 EUR. From which company do you source it ?
Google found a PCIe card from Enterpoint in the UK with a XC6SLX150T for £480.00 tax excl, a bit much for a devel board.

Having read through the post concerning bitcoin mining with fpgas here I wonder whether the DSP48A1 slice in the newer Spartan-6 can be put to good use, since it has a nice adder.

Seems that I must pull out my old Spartan 3E devel board to have a better look. But that doesn't have dsp slices. Darn.


eturnerx
Member
**
Offline Offline

Activity: 84
Merit: 10


View Profile
May 19, 2011, 03:41:39 PM
 #74

The Xilinx Spartan-6 LX150T-3 has a PCI-e connector, does that mean it has to be mounted in a computer for comms and power? It's a nice board otherwise - I just don't want to have to buy another computer!
bitdiver
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
May 19, 2011, 05:54:54 PM
 #75


Regarding a devel board with a Xilinx Spartan-6

There is a massive fpga compute board here:
http://www.dinigroup.com/new/DNBFC_S12_PCIe.html
I don't know what it costs though, but I doubt that it'll come at a bargain price.
The 2 gb ram per fpga is not needed for computing sha256 hashes.

There is a devel board from Avnet
http://www.em.avnet.com/ctf_shared/evk/df2df2usa/xlx-s6-lx150t-dev-pb122409.pdf
http://www.silica.com/products/highlight/product/xilinxR-spartanR-6-lx150t-development-kit.html
No idea about availability. Price stated in the PDF and website is USD 995,-






fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 19, 2011, 09:24:27 PM
 #76

Quote
Yes, I got the one from Avnet.

Quote
The Xilinx Spartan-6 LX150T-3 has a PCI-e connector, does that mean it has to be mounted in a computer for comms and power?
It's just a comm link, like the 20 some odd other interfaces on the bloody thing Tongue It has its own power supply and can run just fine without a computer.

Quote
Having read through the post concerning bitcoin mining with fpgas here I wonder whether the DSP48A1 slice in the newer Spartan-6 can be put to good use, since it has a nice adder.
Oh hey, I forgot about those! Thank you for reminding me  Cheesy Yeah, I have an old Spartan-3E as well, and it indeed only has multipliers Sad But now that I have my LX150 I will certainly give these shiny new DSP48A1 slices a try. I'm skimming the datasheet now, but from a first glance it looks like it only handles 18-bits. Two will have to be strung together to achieve 32-bit. That means 90 "free" adds on the LX150. Not much, but anything is helpful.

Quote
Do you have plans to release it?
Yes

ryepdx
Hero Member
*****
Offline Offline

Activity: 714
Merit: 500


View Profile
May 19, 2011, 09:35:12 PM
 #77

These prices are 10x the $ / MHash/s cost of a 5970 board. But I would imagine vastly more efficient in terms of power consumption.

Um... where are you getting the Mh/s for this thing?
bitdiver
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
May 19, 2011, 10:41:34 PM
 #78


The way I understand Xilinx' ug389.pdf data sheet you appear to just refer to the pre-adder, which is indeed 18 bits + 18 bits.

However the post-adder is 48 bits wide. Do have a look at ug389.pdf, especially p. 10 , figure 1-1 and 1-2.

You see the inputs a, b and d, which are 18 bits each. They can be concatenated with opcode[1:0] set to 11. Then set opcode[3:2] to 11 to select input c as the other input source of the post-adder.
Finally please see p. 22, table 1-7 and have a look at the 5th entry from the end. The eq. describing the output of the dsp48 is given there as: P = C ± (D:A:B + CIN)

That should do nicely to speed up the adder at the end of each sha256 round.

If you could compare your design's performace with and without the use of the dsp48a1 slice and share the resulting specs, then that would certainly satisfy my curiosity Wink

Are you aware of articles like this one, which suggest ways to improve the implementations, or did you use the opencore or write your own ?
http://ce.et.tudelft.nl/publicationfiles/1194_657_SHA2.pdf
http://ce.et.tudelft.nl/publicationfiles/1194_657_springer-SHA-2.pdf
http://ce.et.tudelft.nl/publicationfiles/1429_657_04560238.pdf
en3r0
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
May 25, 2011, 03:01:04 AM
 #79

This is really exciting, be sure to keep us posted!
itsagas
Newbie
*
Offline Offline

Activity: 59
Merit: 0


View Profile
May 29, 2011, 01:54:45 AM
 #80

Does anyone happen to know how much the Xilinx Spartan-6 LX150T chips are in bulk (ie. with no board, just the chip).  Say buying by the 100s or 1000s to produce ones own boards, so we can see economic viability. 
kjj
Legendary
*
Offline Offline

Activity: 1302
Merit: 1024



View Profile
May 30, 2011, 01:37:54 AM
 #81

$300

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs.  You should too.
itsagas
Newbie
*
Offline Offline

Activity: 59
Merit: 0


View Profile
May 30, 2011, 02:35:12 AM
 #82


Nice, thanks. 

anisoptera
Member
**
Offline Offline

Activity: 308
Merit: 10



View Profile
May 30, 2011, 10:19:56 PM
 #83

Once this starts rivaling mining rigs....

mike_la_jolla
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
June 03, 2011, 05:19:22 PM
 #84

Nope.  You can get them from Digikey for $152:
http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=XC6SLX150-2FG484C-ND

The LX150T for $172:
http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=XC6SLX150T-2FGG484C-ND
kjj
Legendary
*
Offline Offline

Activity: 1302
Merit: 1024



View Profile
June 03, 2011, 06:31:29 PM
 #85

Yeah, sorry.  I have the bad habit of limiting my searches on Digikey to parts that are actually in stock.

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs.  You should too.
bitdiver
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
June 03, 2011, 07:00:02 PM
 #86

Yes, but for this application the LX150T is not needed. The T at the end is for a quite fast transceiver which is great when you want to connect to fast periphery.

For this application a Spartan 6 LX150 is right. Preferably multiple ones on one pcb.

However the LX150 is available only in BGA or CSP, which you cannot solder yourself. You'll need an oven for that. And maybe a stencil for the solder paste too if it's not a prototype. Also BGA package means that you need a multilayer pcb.

What I want to say is that it's certainly not impractical, but you'll not going to engineer this on a weekend.
kjj
Legendary
*
Offline Offline

Activity: 1302
Merit: 1024



View Profile
June 03, 2011, 07:34:21 PM
 #87

You probably could whip one up over the weekend.  At least the design.  The PCB fab would take a while.  Bad luck on that too, the next 4 layer dorkbotpdx order is going out on Monday.  So, some time in August if you like their service.  I don't know if sparkfun has a shorter cycle time for 4 layer or not.

If I didn't have to pack this weekend, I could probably bust out a quick and dirty breakout design in FreePCB.  They should already have the footprint, and after that it is just a matter of dragging the pins out to the edges and a quick shot at the autorouter.  No promises on clock skew or noise at high speeds, but good enough to play with.

Soldering would be rough.  I think I could manage it on a stove / hot plate with my SMD rework gun, but most people would be putting a $150+ chip in their oven with way either way too much or way too little solder paste.

If you are reading this thread, and you didn't understand any of what I said above, please consider a different approach to mining, or get a demo board, or wait until someone has a tested and working design that they are willing to produce and sell.

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs.  You should too.
marcus_of_augustus
Legendary
*
Offline Offline

Activity: 3920
Merit: 2348


Eadem mutata resurgo


View Profile
June 04, 2011, 12:12:46 AM
 #88

Quote
If you are reading this thread, and you didn't understand any of what I said above, please consider a different approach to mining, or get a demo board, or wait until someone has a tested and working design that they are willing to produce and sell.

You think will stop them trying?  Cheesy

Ovens, solder, chips ... what could possible go wrong? It's like a chemistry set for grown-ups this place.

mimarob (OP)
Full Member
***
Offline Offline

Activity: 354
Merit: 103



View Profile
June 13, 2011, 11:17:56 AM
 #89

Maybe one could make an el-cheapo pcb since we have no use for all those bga pins.

If we manage to connect powers, jtag and a few i/o lines that would suffice.

Perhaps one could make a two-layer card and just leave pins not being used?

romkyns
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
June 13, 2011, 12:09:31 PM
 #90

Area Improvement: <80K LUTs for 80MH/s

I managed to fit one SHA256 round, one hash per clock, into about 30k LUTs + 13k registers on a Cyclone, although I never validated this design because my FPGA only has 17k LUTs. So, if I didn't mess up (which I can't really tell...) this would mean 60k LUTs + some interfacing. I verified the core idea behind this in a non-FPGA simulation, and then implemented the idea in Verilog.

Unfortunately the larger dev boards are a bit too expensive for my taste, so this project is on halt. If anyone is willing to loan one to a complete stranger, I'm all up for it Smiley We could meet first. I live in East of England, pm me if you wish.
Basiley
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
June 13, 2011, 02:12:55 PM
 #91

if someone design FPGA-chip-based board, designed for mining, not FPGA-related software development, ie, not "evalution board"[without plenty of redundant features and w/o ridiculous pricing] and publish design in open domain for nominal BTC fee, thats would be cool.
ordering/using software-developing-targeted boards/kits for BTC network needs isn't reasonable.
LeFBI
Member
**
Offline Offline

Activity: 98
Merit: 10



View Profile
June 15, 2011, 11:50:17 AM
 #92

Maybe one could make an el-cheapo pcb since we have no use for all those bga pins.

If we manage to connect powers, jtag and a few i/o lines that would suffice.
if someone design FPGA-chip-based board, designed for mining, not FPGA-related software development, ie, not "evalution board"[without plenty of redundant features and w/o ridiculous pricing

stripped to the bone circuit for a fpga board can look like this:

source: http://www.mikrocontroller.net/articles/Low_Cost_FPGA_Konfiguration (all german tho)

long story short:
you can program the fpga/tiny12/eeprom directly via ISP/JTAG. during development you configure the fpga directly via JTAG from your PC.
when you finished development you can write the .bin file to the eeprom and the Tiny12 will take care of programing the fpga when no pc is connected.

this works really fine with Spartan-3. and you don't need to invest in an expensive development board for this purpose. Of course you will additionally need an ethernet core and/or communication lanes between fpgas if you want to gang them together, etc ,etc. the above circuit is as already said just a cheap basic circuit for a fpga board/dev-board that doesn't have non-volatile memory.
Basiley
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
June 15, 2011, 11:58:52 AM
 #93

and thats main reason to stack more-than-one FPGA maxtrix per/board, i guess ? i mean in real-use-applications.
kjj
Legendary
*
Offline Offline

Activity: 1302
Merit: 1024



View Profile
June 15, 2011, 06:10:52 PM
 #94

You may run into thermal issues if you leave a bunch of BGA balls unconnected.  The chip designers typically assume that the PCB is going to be sinking most of the heat load.

Then again, with the complexity of SHA256, gate propagation problems will probably force us to run the chips slowly enough that heat won't be the limiting factor.

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs.  You should too.
romkyns
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
June 15, 2011, 06:36:44 PM
 #95

You may run into thermal issues if you leave a bunch of BGA balls unconnected.  The chip designers typically assume that the PCB is going to be sinking most of the heat load.

I would have thought that if you are going to attach *any* BGA balls then it is far easier to attach them all, than to leave some unconnected. Unconnected pads on the PCB won't make any difference to the PCB price. While I haven't ever hand-soldered BGAs, having all pads is supposed to make it easier, rather than harder. For example, by pulling the part into proper alignment uniformly as the solder melts and wets the pads.
kjj
Legendary
*
Offline Offline

Activity: 1302
Merit: 1024



View Profile
June 15, 2011, 07:19:39 PM
 #96

Yup.  Someone had suggested doing a minimal connection to avoid having to deal with 4 layer PCBs.  It might work, but there are a number of potential problems.

Soldering a BGA, PLCC or QFP and watching it pull itself into perfect alignment is one of the coolest things a guy can do.  Totally makes you feel like a wizard, commanding the universe with seemingly nothing but your willpower.  On the other hand, when it doesn't work right it'll make you want to murder kittens.

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs.  You should too.
dinox
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
June 15, 2011, 07:23:35 PM
 #97

QFP is possible to solder by hand but BGA is not. You will need a special tool and some experience to solder BGA, or pay someone to do it for you.

blockchain.info/fb/1dinox - 1Dinox3mFw8yykpAZXFGEKeH4VX1Mzbcxe
Active trader on #bitcoin-otc - See here - Proof that my nick is dinox here
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
June 16, 2011, 11:28:31 PM
 #98

Quote
QFP is possible to solder by hand but BGA is not. You will need a special tool and some experience to solder BGA, or pay someone to do it for you.
People have soldered BGA with blow dryers before  Tongue Not that that is the best idea, but just sayin'.

Quote
Then again, with the complexity of SHA256, gate propagation problems will probably force us to run the chips slowly enough that heat won't be the limiting factor.
It's not a huge problem, but it's there. The latest design gets 100MH/s (@100MHz) and requires either a lot of air-flow or a heatsink.

Quote
you can program the fpga/tiny12/eeprom directly via ISP/JTAG. during development you configure the fpga directly via JTAG from your PC
I only looked at the circuit image you posted, not the rest of it, so excuse me if I missed something obvious, but why is there an ATtiny on there? FPGAs can program themselves from a flash chip unless I'm mistaken.

BubbleBoy
Sr. Member
****
Offline Offline

Activity: 504
Merit: 250



View Profile
June 17, 2011, 07:59:06 AM
 #99

BGAs are definitely solderable with hot blowers - I've done it a few times with maybe 80% success rate. The hard part is creating the balls on a new chip, you need a special solder paste and a thin mesh that allows only a certain amount of paste on each pad (reballing kit). When heated, the paste turns into solder balls. If the balls are readily formed, it's all fun and games.

Anyway, I'd outsource such a job to shops specialized in prototypes or small series, maybe somewhere in China. It will most likely cost less than the whole hardware and man hours otherwise required.

                ████
              ▄▄████▄▄
          ▄▄████████████▄▄
       ▄██████▀▀▀▀▀▀▀▀██████▄
     ▄████▀▀            ▀▀████▄
   ▄████▀                  ▀████▄
  ▐███▀                      ▀███▌
 ▐███▀   ████▄  ████  ▄████   ▀███▌
 ████    █████▄ ████ ▄█████    ████
▐███▌    ██████▄████▄██████    ▐███▌
████     ██████████████████     ████
████     ████ ████████ ████     ████
████     ████  ██████  ████     ████
▐███▌    ████   ████   ████    ▐███▌
 ████    ████   ████   ████    ████
 ▐███▄   ████   ████   ████   ▄███▌
  ▐███▄                      ▄███▌
   ▀████▄                  ▄████▀
     ▀████▄▄            ▄▄████▀
       ▀██████▄▄▄▄▄▄▄▄██████▀
          ▀▀████████████▀▀
              ▀▀████▀▀
                ████
MIDEX
▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ GET TOKENS ▂▂▂▂
▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
BLOCKCHAIN BASED FINANCIAL PLATFORM                                # WEB ANN + Bounty <
with Licensed Exchange approved by Swiss Bankers and Lawyers           > Telegram Facebook Twitter Blog #
genewitch
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
June 17, 2011, 03:40:24 PM
 #100

single chip 100Mhash/s?

What about evolving the hardware to do the hashing rather than writing it as straight VHDL?

I had a good idea about using hadoop clusters to run the fitness tests for the evolutionary algorithm testing.

For those who have no clue what i am talking about, read the article about the professors that got an fpga to recognize the difference between two tones with way less than 100 gates and no CLK.

http://fsweb.olin.edu/~mchang/research/documents/seminar/evolve2k2/evolve.ppt
http://www.cogs.susx.ac.uk/users/adrianth/ade.html

I always had a thought that evolving the circuits would be a way to find really fast ways of "cracking" various hashing algorithms, as well as making really tiny encoders and decoders for various projects.

Anyhow, i enjoyed this thread.
ttul
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 17, 2011, 04:43:20 PM
 #101

single chip 100Mhash/s?

What about evolving the hardware to do the hashing rather than writing it as straight VHDL?

I had a good idea about using hadoop clusters to run the fitness tests for the evolutionary algorithm testing.

For those who have no clue what i am talking about, read the article about the professors that got an fpga to recognize the difference between two tones with way less than 100 gates and no CLK.

http://fsweb.olin.edu/~mchang/research/documents/seminar/evolve2k2/evolve.ppt
http://www.cogs.susx.ac.uk/users/adrianth/ade.html

I always had a thought that evolving the circuits would be a way to find really fast ways of "cracking" various hashing algorithms, as well as making really tiny encoders and decoders for various projects.

Anyhow, i enjoyed this thread.

You're on the right track - the synthesis of ASIC or FPGA circuitry from Verilog or VHDL code is very very good these days, but there are ways to make things better particularly when you are building an ASIC.
jon_smark
Member
**
Offline Offline

Activity: 90
Merit: 10


View Profile
June 17, 2011, 06:15:57 PM
 #102

What about evolving the hardware to do the hashing rather than writing it as straight VHDL?

I had a good idea about using hadoop clusters to run the fitness tests for the evolutionary algorithm testing.

Could you expand more on this idea?  Not all applications are suited for evolutionary approaches, and my guess is that hashing algorithms are definitely not one of them.

For those who have no clue what i am talking about, read the article about the professors that got an fpga to recognize the difference between two tones with way less than 100 gates and no CLK.

http://fsweb.olin.edu/~mchang/research/documents/seminar/evolve2k2/evolve.ppt
http://www.cogs.susx.ac.uk/users/adrianth/ade.html

That kind of application is well suited to evolutionary approaches.

Quote
I always had a thought that evolving the circuits would be a way to find really fast ways of "cracking" various hashing algorithms, as well as making really tiny encoders and decoders for various projects.

There's not supposed to be any smooth gradients in a cryptographically secure hash, so I don't see how any evolution-based approach could work for cracking them.  What exactly do you have in mind?
film2240
Legendary
*
Offline Offline

Activity: 1022
Merit: 1000


Freelance videographer


View Profile WWW
July 21, 2011, 03:21:59 PM
 #103

I'm very interested in high performance,low cost and ultra low energy hardware for mining.

The FPGA already meets my needs for low power use and high performance,now all I need is it to be low cost and I can easily get my hands on one.Hope its in UK soon.

The setup has to be easy as well.

If someone can create something that can rival my Radeon HD 6950 (400MHash/s),then I can finally put those noisy GPU and CPU things to rest.

Een if it doesn't rival my card,I can use several FPGAs together to get that performance anyways.

I worked out its less than 13Watts with 6 units (1 each produces 70MHash/s) VS 210Watts for my card.


[This signature is available for rent.BTC/ETH/LTC or £50 equivalent a month]
[This signature is available for rent.BTC/ETH/LTC or £50 equivalent a month]
[This signature is available for rent.BTC/ETH/LTC or £50 equivalent a month]
Newton
Newbie
*
Offline Offline

Activity: 56
Merit: 0


View Profile
July 21, 2011, 03:42:28 PM
 #104

I'm very interested in high performance,low cost and ultra low energy hardware for mining.

lulz
Pages: 1 2 3 4 5 6 [All]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!