Bitcoin Forum
June 14, 2025, 02:42:53 AM *
News: Pizza day contest voting
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432989 times)
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 06:02:03 PM
 #161

another minor isse:
after a share is found (shown in green) it gets uploaded and I get a
"... rejected share ..." while the pool I am using (bitclockers) shows the share a valid.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 06:11:07 PM
 #162

Hm, sounds like a bug in the pool's RPC service. It's supposed to return True if it accepts the share. I'll probably have to try that one.
It's working fine with ContinuumPool, Swepool.net, Bitcoins.lc, BTC Guild, Eligius, Slush's pool and DeepBit.

EDIT:
Code:
Found long polling URL for BitClockers: http://pool.bitclockers.com:8332/LP
Mining: BitClockers:cd1aa9fa22321dd0489e32e7090a601bd9735152cf5d64fcdd05b7e7342d741d:112d8c994ded19371a1d932f
Found long polling URL for BTC Guild: http://btcguild.com:8332/LP
Found share: BitClockers:cd1aa9fa22321dd0489e32e7090a601bd9735152cf5d64fcdd05b7e7342d741d:112d8c994ded19371a1d932f:a580871a
BitClockers accepted share a580871a
Seems to work fine for me.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 06:17:59 PM
 #163

trying to build a depth:=3 version right now.

slice luts: 54% (53% used as logic)
slice registers: 26%
occupied slices: 66%

estimates after synthesis.
with a targeted 50mhz clock p&r takes forever and finally fails with setup violations.
problem is congestion/routing, not available ressources in terms of FFs or LUTs...

if you have the time, then just give it a try for xc6slx45-2csg324 with 50mhz and depth:=3

increasing the frequency is not an option, with depth:=2 the timing performance design goal reports just 55mhz  after p&r.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 07:17:07 PM
 #164

trying to build a depth:=3 version right now.

slice luts: 54% (53% used as logic)
slice registers: 26%
occupied slices: 66%

estimates after synthesis.
Sounds like depth:=4 might be achievable
with a targeted 50mhz clock p&r takes forever and finally fails with setup violations.
problem is congestion/routing, not available ressources in terms of FFs or LUTs...
Sounds like Spartan6 routing is just crap.
You might want to try depth:=2 and depth:=3 with doubled registers in the pipeline stages to allow for retiming and thus hitting higher frequencys, at the expense of a couple of flipflops, which you seem to have plenty of.
if you have the time, then just give it a try for xc6slx45-2csg324 with 50mhz and depth:=3
No, being busy synthesizing a XC6VLX760 design, this will take a while.
increasing the frequency is not an option, with depth:=2 the timing performance design goal reports just 55mhz  after p&r.
This sounds like you might want to try the following:
- Split the sha256 rounds into two pipeline stages, as stated above (retiming)
- Experiment with various design strategies. For some reason "Runtime optimized" seems to yield the best results for this design. If you have the time, try SmartXplorer
- If all this doesn't work out, run it at 55MHz instead of 50, should bring it to 3.6MH/s Smiley

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
June 07, 2011, 12:50:08 AM
 #165

Quote
or possibly an Xtreme Data XD-PCIE3000 with three Stratix IVs per card?
What size Stratix IVs are they? I don't know how much better the performance of the Stratix parts are. I'd just guess 1.5x faster, so you'd get ~1.5 MH/s per 1,000 LEs compared to the Cyclone series. Maybe more. Maybe less.


Quote
* DATA is the block header for which a hash must be found. It does contain the unix timestamp. It also contains the current target value, so that's probably where the FPGA learns it (or it doesn't care at all and this is checked on the tcl-side). The nonce is set to 0x00000000.
The FPGA doesn't care, it just returns nonces that make a hash meet the Difficulty 1 target (H == 0). And no, it isn't checked on the tcl side either. All pools currently operate on Difficulty==1. For solo mining, the script will submit the data, bitcoind will check it, and return an error if it wasn't below the target. So, not too much harm done there.

And yes, I've run it solo before, against namecoind  Smiley It found a couple blocks!  Grin

TheSeven: That is some great work you're doing there! I'm glad someone is making a lot of progress with Xilinx devices and Python mining scripts. I haven't had the chance to push your code into the public repo yet, but it is most certainly on my list of things to do.

As a side note, for reasons I cannot understand, my design passed P&R yesterday, even though it failed the last time I tried it. This was a half-mining core (only one SHA-256 pass), at 50MHz. I haven't tested it or anything, but ... I'm just bewildered why it routed without any problems this time. I must have done something wrong the first time. Hopefully it isn't just messing with my head, and I can finally start mining on my SLX150. I'd also like to start testing the DSP48A1 slices, which are rated for 250MHz operation and will perform 48bit + 48bit addition  Cool

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 07, 2011, 09:05:13 AM
 #166

And yes, I've run it solo before, against namecoind  Smiley It found a couple blocks!  Grin
Guess what I tried with mine last night... Smiley
TheSeven: That is some great work you're doing there! I'm glad someone is making a lot of progress with Xilinx devices and Python mining scripts. I haven't had the chance to push your code into the public repo yet, but it is most certainly on my list of things to do.
Someone else also ported your code to Xilinx, while keeping it in Verilog. While I personally don't understand how one could ever chose Verilog over VHDL, you might possibly like that one better.
As a side note, for reasons I cannot understand, my design passed P&R yesterday, even though it failed the last time I tried it. This was a half-mining core (only one SHA-256 pass), at 50MHz. I haven't tested it or anything, but ... I'm just bewildered why it routed without any problems this time. I must have done something wrong the first time. Hopefully it isn't just messing with my head, and I can finally start mining on my SLX150. I'd also like to start testing the DSP48A1 slices, which are rated for 250MHz operation and will perform 48bit + 48bit addition  Cool
What did you change? There must have been something...

Regarding those DSP slices, I'm not sure if they will pay off. 250MHz for the DSP slice alone is not that fast, regular LUT-based adders might be even faster! Also, you couldn't use the 48bit width, and would have only 2 of them to spend per round. This might reduce LUT utilization a bit, but probably won't help performance.
You'll probably gain more by cutting the pipeline stages into halves, as you seem to have lots of spare flipflops around. Sadly this is not the case on my Virtex5. Sad You can usually do that by just adding a second register on the output, the synthesis tools will move over things from the preceding pipeline stage to this unused one. (You'll see this in the synthesis log as "register balancing".) I know that 190MHz are possible this way!

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
June 07, 2011, 02:42:57 PM
 #167

I don't know how much better the performance of the Stratix parts are. I'd just guess 1.5x faster, so you'd get ~1.5 MH/s per 1,000 LEs compared to the Cyclone series. Maybe more. Maybe less.

The Stratix IV architecture looks interesting. From the Altera docs on it, it seems you basically have the equivalent of a free full-adder attached to the output of each 4LUT. (Of course, it doesn't actually have 4LUTs as such, instead having 8-input 2-output ALMs that can be configured as 2 4LUTs, a 5LUT and a 3LUT, or a 6LUT.) Not that useful to me, since I neither have one nor the software to synthesise designs for one, but interesting nonetheless and should reduce LE usage compared to other FPGAs.

You'll probably gain more by cutting the pipeline stages into halves, as you seem to have lots of spare flipflops around. Sadly this is not the case on my Virtex5. Sad You can usually do that by just adding a second register on the output, the synthesis tools will move over things from the preceding pipeline stage to this unused one. (You'll see this in the synthesis log as "register balancing".) I know that 190MHz are possible this way!

Ah yes, Xilinx under-equipped the Virtex 5 series with flipflops for some daft reason. Why do I keep getting the impression their FPGAs are designed to look good on paper as much as they are to actually function well?

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 07, 2011, 09:07:51 PM
 #168

@TheSeven: Just interested, but you fitted a complete unit (depth=6) into a v5lx110t?
Gave it a try, but the default strategy already shows 250% LUTs after synthesis...
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 07, 2011, 09:19:27 PM
 #169

@TheSeven: Just interested, but you fitted a complete unit (depth=6) into a v5lx110t?
Exactly, but with no additional registers, just the code that I have uploaded.

Gave it a try, but the default strategy already shows 250% LUTs after synthesis...
For the Virtex5 LX110?
I'm getting an area constraint ratio of ~200, but it fits at ~96% utilization in the end.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Zalfrin
Sr. Member
****
Offline Offline

Activity: 401
Merit: 250



View Profile
June 07, 2011, 10:51:17 PM
 #170

As a side note, for reasons I cannot understand, my design passed P&R yesterday, even though it failed the last time I tried it.

Welcome to the wonderful world of P&R tools. Wink If nothing else changed in the design, it may have just been that the seed the tool started with when running it's algorithm changed. I see this a lot, good timing constraints help guide the tools to better solutions, and the tools are becoming more deterministic, but there's still a random element to them at times. Some vendors are worse than others. *cough*Actel*cough*

I've been trying to run the verilog version of the code through Synplify Pro just to see how it looks for utilization on various devices, but have been having problems getting it to assign the parameter correctly... Seems to be ignoring my assignment and optimizing everything away. Haven't had a chance to try out the parameterized VHDL code yet, I'm a lot more comfortable with VHDL so even if it doesn't work right out of the gate with Synplify, I should be able to get it working. Verilog has too many idiosyncrasies for my tastes.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 07, 2011, 10:56:21 PM
 #171

Make sure you connect all the inputs and outputs to something that can't be optimized away. For fpgaminer's Altera verilog design that would be this virtual wire thing, or for my VHDL design (and also the Xilinx verilog design) it would be the UART, which is itself connected to some I/O pins, so it can't be optimized away.
Oh, and I fully agree with your attitude towards verilog. That's why I translated it to VHDL Smiley

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
markm
Legendary
*
Offline Offline

Activity: 3192
Merit: 1168



View Profile WWW
June 08, 2011, 12:54:23 AM
 #172

All this seems to be about using dev-kit boards, what is involved, now that the designs have been tested on devkit boards, in doing it on presumably cheaper maybe simpler (no extra I/O types just the one you actually want or whatever other optimisations) "production" boards?

Are the devkit ones the only ones you can simply plug into a usb port and play?

Although they might seem kind of expensive per MHash upfront cost, low power usage is for some people not merely a savings of money on power bill but maybe even a case of keeping power usage low enough that landlords or employers or whatever won't see drastic spike in power bill thus decide to no longer provide it "free"...

-MarkM-

Browser-launched Crossfire client now online (select CrossCiv server for Galactic  Milieu)
Free website hosting with PHP, MySQL etc: http://hosting.knotwork.com/
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
June 08, 2011, 02:41:49 AM
 #173

Quote
If nothing else changed in the design, it may have just been that the seed the tool started with when running it's algorithm changed.
Yeah, I know about the random seed, but ... I dunno, that seems too far fetched that the random seed would let it go from completely un-routable, to routing in a few minutes. Then again, I haven't used ISE in a few years, so it may be a lot more temperamental than I remember.

I only did one thing different. I ran each stage one at a time, instead of telling it to P&R and have it automatically invoke all the necessary steps.  Huh

Quote
I've been trying to run the verilog version of the code through Synplify Pro
Ughhh. I've got an old version of Synplify Pro that refuses to synthesize the design. First, it didn't like the use of block names  Huh and last I left it, it was optimizing out the entire design, for no particular reason. I might install their latest eval version on a different machine and see if that version has better luck.

njloof
Member
**
Offline Offline

Activity: 73
Merit: 10


View Profile
June 08, 2011, 03:40:10 AM
 #174

Yeah, I know about the random seed, but ... I dunno, that seems too far fetched that the random seed would let it go from completely un-routable, to routing in a few minutes. Then again, I haven't used ISE in a few years, so it may be a lot more temperamental than I remember.

I don't know about today, but back in the day the Xilinx router used simulated annealing, and if that algorithm gets caught in a local minimum, it gets stuck and never recovers. Minor irrelevant tweaks to the input could sometimes shake things out.
AnnihilaT
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 08, 2011, 09:21:31 AM
 #175

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 01:00:23 PM
 #176

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
nathanrees19
Full Member
***
Offline Offline

Activity: 196
Merit: 100



View Profile
June 08, 2011, 04:12:13 PM
 #177

Quote
it reports 48C for the junction temperature. I might just keep it at 50 to be safe.
Altera commercial FPGAs are rated for 85C JT.

After doing some digging to get the JTAG device ID (the JTAG debugger in Quartus doesn't seem to think this is worth mentioning) I finally have it running on a DE0-Nano. I started with the plastic cover on, took it off and found the FPGA quite hot to touch. With it sitting over a not-so-small PC case fan with the cover off, it has dropped to (what feels like) under 40C.

It has successfully produced a block on the testnet Smiley
rb2k
Member
**
Offline Offline

Activity: 109
Merit: 10


View Profile
June 08, 2011, 05:05:17 PM
 #178

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 05:12:54 PM
 #179

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 08, 2011, 05:15:46 PM
 #180

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.
it would pay off in about 2 Month, if the current price and difficulty holds

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!