cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 17, 2011, 08:40:38 PM |
|
I would like to buy a FPGA and program it with LabVIEW. Is this a good idea?
You can, but LabVIEW + LabVIEW FPGA module + supported FPGA dev board is about $5700
|
|
|
|
fpgaminer
|
|
May 17, 2011, 08:44:01 PM |
|
Hello cypherf0x! Good to see a fellow FPGA developer on the forums I'm a bit curious about your numbers. Working with an Altera Cyclone3 and Cyclone4, which are the brothers of the Xilinx Spartan series, I've only ever seen 1MHash/s per 1K LUTs. The chips you describe have the equivalent of 75K LUTs (Xilinx has a funky design), which means no more than 75MHash/s. If you are indeed getting >200MHash/s out of a single Spartan 75K, that would be quite wonderful! Anyway, thank you for posting your findings and, even if you don't post anything more about your research, it's always good to have a new, knowledgeable member on the forums.
|
|
|
|
cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 17, 2011, 08:49:02 PM Last edit: May 17, 2011, 09:14:39 PM by cypherf0x |
|
Hello cypherf0x! Good to see a fellow FPGA developer on the forums I'm a bit curious about your numbers. Working with an Altera Cyclone3 and Cyclone4, which are the brothers of the Xilinx Spartan series, I've only ever seen 1MHash/s per 1K LUTs. The chips you describe have the equivalent of 75K LUTs (Xilinx has a funky design), which means no more than 75MHash/s. If you are indeed getting >200MHash/s out of a single Spartan 75K, that would be quite wonderful! Anyway, thank you for posting your findings and, even if you don't post anything more about your research, it's always good to have a new, knowledgeable member on the forums. The chips on the boards have about 100k LUTs 23k slices with 4 LUTs/slice Try adding parallel pipelines
|
|
|
|
Chris Acheson
|
|
May 17, 2011, 09:11:16 PM |
|
So someone should release the code and maybe get a bounty? You can play with maybe all day. In the end I already have a working prototype and now someone else with more FPGA experience than myself to polish the code. He develops chips for a living, I develop hardware boards and embedded software so that seems like a pretty reasonable combo for getting something done.
If you're serious about this, you should arrange to have the contributions put in escrow until you actually release something. Just putting up a black-hole donation address means that no one knows how much has been contributed, and if the total only gets halfway there the whole thing's a loss. Anyway, the bounty isn't aimed at you specifically, so I'm going to split it off into its own thread.
|
|
|
|
fpgaminer
|
|
May 17, 2011, 09:20:01 PM |
|
The chips on the boards have about 100k LUTs 23k slices with 4 LUTs/slice For which platform? The PICO EX-300 platform? You're probably talking about a platform with Spartan-6's on it, because those do indeed carry four 4-LUTs per slice, with the LX150 totaling ~150k 4-LUTs. Try implementing parallel hashing pipelines if your FPGA has the gates for it. I develop on a C120, and I have different designs, some of which are indeed pipelined. The pipelined designs get 1Mash/s per 1K LUTs. Perhaps the Xilinx devices can pack far more bang-per-LUT than Altera's for SHA-256 designs? I shall certainly investigate. Again, thank you for sharing your numbers.
|
|
|
|
kebumaha
Newbie
Offline
Activity: 14
Merit: 0
|
|
May 17, 2011, 09:28:06 PM |
|
I got only one question. Where the heck can you even buy these things like "PICO EX-300?" All I find are specs and specs. I guess you need to study computer engineering for 10 years just to see one of those?
|
|
|
|
|
cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 17, 2011, 09:31:05 PM |
|
I got only one question. Where the heck can you even buy these things like "PICO EX-300?" All I find are specs and specs. I guess you need to study computer engineering for 10 years just to see one of those?
They're expensive enough you have to call the sales office to order them.
|
|
|
|
keybaud
|
|
May 17, 2011, 09:39:32 PM |
|
In your haste to create faster miners, be careful that you don't destroy that which you seek. I just realised that Bitcoins future depends on using an algorithm that is not possible to put in hardware like this. If it is, there will probably only be one mining company left after a while because of the economy of scale.
My understanding is that there is a bigger problem, in that if one person/organisation controls over 50% of the Bitcoin network, then it is effectively compromised and bitcoins will no longer be a viable e-currency. See this thread: http://forum.bitcoin.org/index.php?topic=8653.0https://en.bitcoin.it/wiki/Weaknesses#Attacker_has_a_lot_of_computing_powerAttacker has a lot of computing power An attacker that controls more than 50% of the network's computing power can, for the time that he is in control, exclude and modify the ordering of transactions. This allows him to: Reverse transactions that he sends while he's in control Prevent some or all transactions from gaining any confirmations Prevent some or all other generators from getting any generations The attacker can't: Reverse other people's transactions Prevent transactions from being sent at all (they'll show as 0/unconfirmed) Change the number of coins generated per block Create coins out of thin air Send coins that never belonged to him It's much more difficult to change historical blocks, and it becomes exponentially more difficult the further back you go. As above, changing historical blocks only allows you to exclude and change the ordering of transactions. It's impossible to change blocks created before the last checkpoint. Since this attack doesn't permit all that much power over the network, it is expected that no one will attempt it. A profit-seeking person will always gain more by just following the rules, and even someone trying to destroy the system will probably find other attacks more attractive. However, if this attack is successfully executed, it will be difficult or impossible to "untangle" the mess created -- any changes the attacker makes might become permanent.
|
|
|
|
cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 17, 2011, 10:05:07 PM |
|
If anyone is looking for an inexpensive FPGA to experiment with try the SPARTAN-6 LX9 MICROBOARD. I've gotten a lot of messages asking about it and these boards are USB and cost less than $100
|
|
|
|
fpgaminer
|
|
May 17, 2011, 11:58:40 PM |
|
Without having an actual Spartan-6 LX150 board on hand, I ran my design through ISE quickly. This showed that the LUT consumption is indeed similar to Altera's, so there does not appear to be any area improvements by using a Xilinx device over Altera.
What I do not know, however, is how fast Spartan-6 LUTs operate compared to Altera's, for apples-to-apples speed grades. If they run faster, it would indeed be possible to get more bang for your LUT. I get 80MHz in my design, resulting in 80MHash/s burning 80K LUTs. The Cyclone4-150 or Spartan6 LX150 may fit two full hashing pipelines (128 SHA-256 rounds per full hashing pipeline). This would double their performance. The Cyclone4-150 achieving 160MHash/s. If the Spartan6 is faster, it could possibly achieve >200MHash/s as you've reported.
You could get faster speed grades, but those are typically a bit more expensive. I haven't calculated whether a fast speed grade would balance out the cost for its improved hashing speeds.
|
|
|
|
ArtForz
|
|
May 18, 2011, 12:18:13 AM |
|
A single pipeline is now doing about 133MH/s with the chip around 210MH/s total Trying to make any sense of this. a) You have a 120+ stage unrolled pipelined engine at 133MHz. You fit 1.58 of em? what the hell is 0.58 of a engine? b) You have a single registered round running at 133MHz. one bitcoinhash = double-sha256 takes 128 or so clocks. you fit 200 of those - ~ 208Mh/s. let's assume B you need to store at least a..h and W 0..15, that's 24*32 = 768 FFs per engine. times 200 engines. thats 153600 FFs a S3-5000 has 66560 FFs... nope a S6 LX100 has 126576 FFs... still nope a S6 LX150 has 184304 FFs... 83% utilization just for the storage FFs. far edge of plausible For adder utilization it gets hilarious, you need at least 8 32-bit adders per round. Times 200 single-round engines thats 1600 32bit adders... half of a S6s slices have carry logic, each of those can do 4 bits of a adder, that's a max of 988 32 bit adders on a S6 LX100, 1439 on a LX150... we need 1600... ?!? I have the sneaking suspicion someone didn't realize one bitcoinhash = 2 sha256 blocks...
|
bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
|
|
|
cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 18, 2011, 12:47:44 AM |
|
Without having an actual Spartan-6 LX150 board on hand, I ran my design through ISE quickly. This showed that the LUT consumption is indeed similar to Altera's, so there does not appear to be any area improvements by using a Xilinx device over Altera.
What I do not know, however, is how fast Spartan-6 LUTs operate compared to Altera's, for apples-to-apples speed grades. If they run faster, it would indeed be possible to get more bang for your LUT. I get 80MHz in my design, resulting in 80MHash/s burning 80K LUTs. The Cyclone4-150 or Spartan6 LX150 may fit two full hashing pipelines (128 SHA-256 rounds per full hashing pipeline). This would double their performance. The Cyclone4-150 achieving 160MHash/s. If the Spartan6 is faster, it could possibly achieve >200MHash/s as you've reported.
You could get faster speed grades, but those are typically a bit more expensive. I haven't calculated whether a fast speed grade would balance out the cost for its improved hashing speeds.
It's actually about 90MH/s over time per pipeline but the speed average jumps around a bit at first but settles over a longer run.
|
|
|
|
ArtForz
|
|
May 18, 2011, 01:02:54 AM |
|
Okay, so now you're fitting 2 pipelined engines on a LX150. need 120 rounds, thanks to cheating with W updates etc you can get it down to ~6 32 bit adders per round avg, times 120 ... 720 or so 32 bit adders per engine, 1440 adders. So *only* a bit over 100% slice utilization of a LX150, just for the adders. Yeah, sure.
|
bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
|
|
|
cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 18, 2011, 01:26:43 AM |
|
A single pipeline is now doing about 133MH/s with the chip around 210MH/s total Trying to make any sense of this. a) You have a 120+ stage unrolled pipelined engine at 133MHz. You fit 1.58 of em? what the hell is 0.58 of a engine? b) You have a single registered round running at 133MHz. one bitcoinhash = double-sha256 takes 128 or so clocks. you fit 200 of those - ~ 208Mh/s. let's assume B you need to store at least a..h and W 0..15, that's 24*32 = 768 FFs per engine. times 200 engines. thats 153600 FFs a S3-5000 has 66560 FFs... nope a S6 LX100 has 126576 FFs... still nope a S6 LX150 has 184304 FFs... 83% utilization just for the storage FFs. far edge of plausible For adder utilization it gets hilarious, you need at least 8 32-bit adders per round. Times 200 single-round engines thats 1600 32bit adders... half of a S6s slices have carry logic, each of those can do 4 bits of a adder, that's a max of 988 32 bit adders on a S6 LX100, 1439 on a LX150... we need 1600... ?!? I have the sneaking suspicion someone didn't realize one bitcoinhash = 2 sha256 blocks... I don't know where you came up with 133 MHz out of MH/s. There is the 'about' and 'around' meaning values are not absolute. The speed average was a bit high initially. You're also making design assumptions. There are highly optimized commercial hashing cores available for FPGAs too.
|
|
|
|
ArtForz
|
|
May 18, 2011, 01:40:35 AM |
|
Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all? Edit: Yes, I make assumptions about sha256. it's sha256. the round function including W update needs at least 8 32 bit adders. no amount of "optimizing" changes that. And those "highly optimized" commercial cores? barely 120MHz on a S6, 65+ clocks/block, and you can maybe fit 70 on a LX150. 65Mh/s wooo...
|
bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
|
|
|
cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 18, 2011, 01:50:07 AM |
|
Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?
I never said I fit 1.5 engines on a chip. I apologies if some of the numbers implied that since they were ballpark estimations based on short runs. You're free to doubt, it's your time spent.
|
|
|
|
fpgaminer
|
|
May 18, 2011, 02:08:39 AM |
|
Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all? You can actually fit 1.5 engines on a chip, assuming an engine is a full 128 rounds of SHA-256 (that's 128 because you need to do it twice to get the final hash that Bitcoin expects). One full engine, at 128 rounds, and a second half-engine, at 64 rounds, with a mux in front to switch between processing new data and finishing old data. I've considered doing that for my C120, which would fit one full engine in 80K LEs, and the half-engine in 40K (if I'm lucky). Or just get my hands on a C150 and try desperately to cram two full engines on it
|
|
|
|
mooreaa
Newbie
Offline
Activity: 5
Merit: 0
|
|
May 18, 2011, 02:25:26 AM |
|
Hey cypherf0x,
I just got into bitcoin and ran across your post here. I have my own startup and we run a small design/assembly service as part of our business. We have the capability to assemble FPBGA/LGA parts on PCBs and I would be really interested in working with you on a low cost Spartan-6 FPGA board. I know I would be willing to put ups some of my own cash to fund some initial board revisions, and with a little help from the community we might be able to produce a batch of these at a really compelling price.
Interested?
Aaron
|
|
|
|
cypherf0x (OP)
Newbie
Offline
Activity: 28
Merit: 1
|
|
May 18, 2011, 04:21:39 AM |
|
Hey cypherf0x,
I just got into bitcoin and ran across your post here. I have my own startup and we run a small design/assembly service as part of our business. We have the capability to assemble FPBGA/LGA parts on PCBs and I would be really interested in working with you on a low cost Spartan-6 FPGA board. I know I would be willing to put ups some of my own cash to fund some initial board revisions, and with a little help from the community we might be able to produce a batch of these at a really compelling price.
Interested?
Aaron
Yeah, send me a PM with your email.
|
|
|
|
|