Bitcoin Forum
April 19, 2024, 02:48:25 PM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 »  All
  Print  
Author Topic: FPGA mining for fun and profit  (Read 67123 times)
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 17, 2011, 08:40:38 PM
 #41

I would like to buy a FPGA and program it with LabVIEW. Is this a good idea?

You can, but LabVIEW + LabVIEW FPGA module + supported FPGA dev board is about $5700
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713538105
Hero Member
*
Offline Offline

Posts: 1713538105

View Profile Personal Message (Offline)

Ignore
1713538105
Reply with quote  #2

1713538105
Report to moderator
1713538105
Hero Member
*
Offline Offline

Posts: 1713538105

View Profile Personal Message (Offline)

Ignore
1713538105
Reply with quote  #2

1713538105
Report to moderator
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 17, 2011, 08:44:01 PM
 #42

Hello cypherf0x! Good to see a fellow FPGA developer on the forums  Grin

I'm a bit curious about your numbers. Working with an Altera Cyclone3 and Cyclone4, which are the brothers of the Xilinx Spartan series, I've only ever seen 1MHash/s per 1K LUTs. The chips you describe have the equivalent of 75K LUTs (Xilinx has a funky design), which means no more than 75MHash/s. If you are indeed getting >200MHash/s out of a single Spartan 75K, that would be quite wonderful!

Anyway, thank you for posting your findings and, even if you don't post anything more about your research, it's always good to have a new, knowledgeable member on the forums.

cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 17, 2011, 08:49:02 PM
Last edit: May 17, 2011, 09:14:39 PM by cypherf0x
 #43

Hello cypherf0x! Good to see a fellow FPGA developer on the forums  Grin

I'm a bit curious about your numbers. Working with an Altera Cyclone3 and Cyclone4, which are the brothers of the Xilinx Spartan series, I've only ever seen 1MHash/s per 1K LUTs. The chips you describe have the equivalent of 75K LUTs (Xilinx has a funky design), which means no more than 75MHash/s. If you are indeed getting >200MHash/s out of a single Spartan 75K, that would be quite wonderful!

Anyway, thank you for posting your findings and, even if you don't post anything more about your research, it's always good to have a new, knowledgeable member on the forums.

The chips on the boards have about 100k LUTs  23k slices with 4 LUTs/slice
Try adding parallel pipelines
Chris Acheson
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
May 17, 2011, 09:11:16 PM
 #44

So someone should release the code and maybe get a bounty?  You can play with maybe all day.  In the end I already have a working prototype and now someone else with more FPGA experience than myself to polish the code.  He develops chips for a living, I develop hardware boards and embedded software so that seems like a pretty reasonable combo for getting something done.

If you're serious about this, you should arrange to have the contributions put in escrow until you actually release something.  Just putting up a black-hole donation address means that no one knows how much has been contributed, and if the total only gets halfway there the whole thing's a loss.

Anyway, the bounty isn't aimed at you specifically, so I'm going to split it off into its own thread.
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 17, 2011, 09:20:01 PM
 #45

Quote
The chips on the boards have about 100k LUTs  23k slices with 4 LUTs/slice
For which platform? The PICO EX-300 platform? You're probably talking about a platform with Spartan-6's on it, because those do indeed carry four 4-LUTs per slice, with the LX150 totaling ~150k 4-LUTs.


Quote
Try implementing parallel hashing pipelines if your FPGA has the gates for it.
I develop on a C120, and I have different designs, some of which are indeed pipelined. The pipelined designs get 1Mash/s per 1K LUTs.

Perhaps the Xilinx devices can pack far more bang-per-LUT than Altera's for SHA-256 designs? I shall certainly investigate. Again, thank you for sharing your numbers.

kebumaha
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
May 17, 2011, 09:28:06 PM
 #46

I got only one question. Where the heck can you even buy these things like "PICO EX-300?" All I find are specs and specs. I guess you need to study computer engineering for 10 years just to see one of those?
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 17, 2011, 09:29:27 PM
 #47

For anyone skeptical about FPGA abilities the video below is for MD5 hashing but it's the same principle.

http://www.youtube.com/watch?v=zEwWvVP_RU0
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 17, 2011, 09:31:05 PM
 #48

I got only one question. Where the heck can you even buy these things like "PICO EX-300?" All I find are specs and specs. I guess you need to study computer engineering for 10 years just to see one of those?

They're expensive enough you have to call the sales office to order them.
keybaud
Full Member
***
Offline Offline

Activity: 120
Merit: 100


View Profile
May 17, 2011, 09:39:32 PM
 #49

In your haste to create faster miners, be careful that you don't destroy that which you seek.

I just realised that Bitcoins future depends on using an algorithm that is not possible to put in hardware like this. If it is, there will probably only be one mining company left after a while because of the economy of scale.

My understanding is that there is a bigger problem, in that if one person/organisation controls over 50% of the Bitcoin network, then it is effectively compromised and bitcoins will no longer be a viable e-currency.

See this thread: http://forum.bitcoin.org/index.php?topic=8653.0

https://en.bitcoin.it/wiki/Weaknesses#Attacker_has_a_lot_of_computing_power

Attacker has a lot of computing power
An attacker that controls more than 50% of the network's computing power can, for the time that he is in control, exclude and modify the ordering of transactions. This allows him to:
Reverse transactions that he sends while he's in control
Prevent some or all transactions from gaining any confirmations
Prevent some or all other generators from getting any generations
The attacker can't:
Reverse other people's transactions
Prevent transactions from being sent at all (they'll show as 0/unconfirmed)
Change the number of coins generated per block
Create coins out of thin air
Send coins that never belonged to him
It's much more difficult to change historical blocks, and it becomes exponentially more difficult the further back you go. As above, changing historical blocks only allows you to exclude and change the ordering of transactions. It's impossible to change blocks created before the last checkpoint.
Since this attack doesn't permit all that much power over the network, it is expected that no one will attempt it. A profit-seeking person will always gain more by just following the rules, and even someone trying to destroy the system will probably find other attacks more attractive. However, if this attack is successfully executed, it will be difficult or impossible to "untangle" the mess created -- any changes the attacker makes might become permanent.
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 17, 2011, 10:05:07 PM
 #50

If anyone is looking for an inexpensive FPGA to experiment with try the SPARTAN-6 LX9 MICROBOARD.  I've gotten a lot of messages asking about it and these boards are USB and cost less than $100
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 17, 2011, 11:58:40 PM
 #51

Without having an actual Spartan-6 LX150 board on hand, I ran my design through ISE quickly. This showed that the LUT consumption is indeed similar to Altera's, so there does not appear to be any area improvements by using a Xilinx device over Altera.

What I do not know, however, is how fast Spartan-6 LUTs operate compared to Altera's, for apples-to-apples speed grades. If they run faster, it would indeed be possible to get more bang for your LUT. I get 80MHz in my design, resulting in 80MHash/s burning 80K LUTs. The Cyclone4-150 or Spartan6 LX150 may fit two full hashing pipelines (128 SHA-256 rounds per full hashing pipeline). This would double their performance. The Cyclone4-150 achieving 160MHash/s. If the Spartan6 is faster, it could possibly achieve >200MHash/s as you've reported.

You could get faster speed grades, but those are typically a bit more expensive. I haven't calculated whether a fast speed grade would balance out the cost for its improved hashing speeds.

ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
May 18, 2011, 12:18:13 AM
 #52

Quote
A single pipeline is now doing about 133MH/s with the chip around 210MH/s total
Trying to make any sense of this.
a) You have a 120+ stage unrolled pipelined engine at 133MHz. You fit 1.58 of em? what the hell is 0.58 of a engine?
b) You have a single registered round running at 133MHz. one bitcoinhash = double-sha256 takes 128 or so clocks. you fit 200 of those - ~ 208Mh/s.
let's assume B
you need to store at least a..h and W 0..15, that's 24*32 = 768 FFs per engine.
times 200 engines. thats 153600 FFs
a S3-5000 has 66560 FFs... nope
a S6 LX100 has 126576 FFs... still nope
a S6 LX150 has 184304 FFs... 83% utilization just for the storage FFs. far edge of plausible

For adder utilization it gets hilarious, you need at least 8 32-bit adders per round.
Times 200 single-round engines thats 1600 32bit adders...
half of a S6s slices have carry logic, each of those can do 4 bits of a adder, that's a max of 988 32 bit adders on a S6 LX100, 1439 on a LX150... we need 1600... ?!?

I have the sneaking suspicion someone didn't realize one bitcoinhash = 2 sha256 blocks...

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 18, 2011, 12:47:44 AM
 #53

Without having an actual Spartan-6 LX150 board on hand, I ran my design through ISE quickly. This showed that the LUT consumption is indeed similar to Altera's, so there does not appear to be any area improvements by using a Xilinx device over Altera.

What I do not know, however, is how fast Spartan-6 LUTs operate compared to Altera's, for apples-to-apples speed grades. If they run faster, it would indeed be possible to get more bang for your LUT. I get 80MHz in my design, resulting in 80MHash/s burning 80K LUTs. The Cyclone4-150 or Spartan6 LX150 may fit two full hashing pipelines (128 SHA-256 rounds per full hashing pipeline). This would double their performance. The Cyclone4-150 achieving 160MHash/s. If the Spartan6 is faster, it could possibly achieve >200MHash/s as you've reported.

You could get faster speed grades, but those are typically a bit more expensive. I haven't calculated whether a fast speed grade would balance out the cost for its improved hashing speeds.

It's actually about 90MH/s over time per pipeline but the speed average jumps around a bit at first but settles over a longer run.
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
May 18, 2011, 01:02:54 AM
 #54

Okay, so now you're fitting 2 pipelined engines on a LX150.
need 120 rounds, thanks to cheating with W updates etc you can get it down to ~6 32 bit adders per round avg, times 120 ... 720 or so 32 bit adders per engine, 1440 adders.
So *only* a bit over 100% slice utilization of a LX150, just for the adders. Yeah, sure.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 18, 2011, 01:26:43 AM
 #55

Quote
A single pipeline is now doing about 133MH/s with the chip around 210MH/s total
Trying to make any sense of this.
a) You have a 120+ stage unrolled pipelined engine at 133MHz. You fit 1.58 of em? what the hell is 0.58 of a engine?
b) You have a single registered round running at 133MHz. one bitcoinhash = double-sha256 takes 128 or so clocks. you fit 200 of those - ~ 208Mh/s.
let's assume B
you need to store at least a..h and W 0..15, that's 24*32 = 768 FFs per engine.
times 200 engines. thats 153600 FFs
a S3-5000 has 66560 FFs... nope
a S6 LX100 has 126576 FFs... still nope
a S6 LX150 has 184304 FFs... 83% utilization just for the storage FFs. far edge of plausible

For adder utilization it gets hilarious, you need at least 8 32-bit adders per round.
Times 200 single-round engines thats 1600 32bit adders...
half of a S6s slices have carry logic, each of those can do 4 bits of a adder, that's a max of 988 32 bit adders on a S6 LX100, 1439 on a LX150... we need 1600... ?!?

I have the sneaking suspicion someone didn't realize one bitcoinhash = 2 sha256 blocks...

I don't know where you came up with 133MHz out of MH/s.  There is the 'about' and 'around' meaning values are not absolute.  The speed average was a bit high initially.  You're also making design assumptions.  There are highly optimized commercial hashing cores available for FPGAs too.
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
May 18, 2011, 01:40:35 AM
 #56

Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?
Edit:
Yes, I make assumptions about sha256. it's sha256. the round function including W update needs at least 8 32 bit adders. no amount of "optimizing" changes that.
And those "highly optimized" commercial cores? barely 120MHz on a S6, 65+ clocks/block, and you can maybe fit 70 on a LX150. 65Mh/s wooo...

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 18, 2011, 01:50:07 AM
 #57

Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?

I never said I fit 1.5 engines on a chip.  I apologies if some of the numbers implied that since they were ballpark estimations based on short runs.

You're free to doubt, it's your time spent.

fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 18, 2011, 02:08:39 AM
 #58

Quote
Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?
You can actually fit 1.5 engines on a chip, assuming an engine is a full 128 rounds of SHA-256 (that's 128 because you need to do it twice to get the final hash that Bitcoin expects). One full engine, at 128 rounds, and a second half-engine, at 64 rounds, with a mux in front to switch between processing new data and finishing old data.

I've considered doing that for my C120, which would fit one full engine in 80K LEs, and the half-engine in 40K (if I'm lucky). Or just get my hands on a C150 and try desperately to cram two full engines on it Tongue

mooreaa
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
May 18, 2011, 02:25:26 AM
 #59

Hey cypherf0x,

I just got into bitcoin and ran across your post here. I have my own startup and we run a small design/assembly service as part of our business. We have the capability to assemble FPBGA/LGA parts on PCBs and I would be really interested in working with you on a low cost Spartan-6 FPGA board. I know I would be willing to put ups some of my own cash to fund some initial board revisions, and with a little help from the community we might be able to produce a batch of these at a really compelling price.

Interested?

Aaron
cypherf0x (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 1



View Profile
May 18, 2011, 04:21:39 AM
 #60

Hey cypherf0x,

I just got into bitcoin and ran across your post here. I have my own startup and we run a small design/assembly service as part of our business. We have the capability to assemble FPBGA/LGA parts on PCBs and I would be really interested in working with you on a low cost Spartan-6 FPGA board. I know I would be willing to put ups some of my own cash to fund some initial board revisions, and with a little help from the community we might be able to produce a batch of these at a really compelling price.

Interested?

Aaron


Yeah, send me a PM with your email.
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!