Bitcoin Forum
November 03, 2024, 11:56:05 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432940 times)
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 05:41:49 PM
 #161

2.6.5 on openSuse 11.3

Changing it to
delta = (endtime - starttime).seconds - 0.0145
fixes the problem.

Congrats for PyFPGAMiner, it is really nice Wink

ATM my Atlys with 50Mhz and depth:=2 is giving 3.2MH/s and I'm curious what performance I can reach.
I was thinking about using BlockRAM instead of Slice-FFs to squeeze in more logic and maybe ease the congestion problems of spartan 6 fpgas. A first glimpse showed that ISE is complaining about asynchronous reads in the current hw version.
I think it should help a lot to move all pipeline registers to BRAMS.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 06:00:01 PM
 #162

2.6.5 on openSuse 11.3

Changing it to
delta = (endtime - starttime).seconds - 0.0145
fixes the problem.
...and broke the hashrate calculation for everything that's taking more than 60 seconds to measure, so probably everything <0.8MH/s.
Congrats for PyFPGAMiner, it is really nice Wink

ATM my Atlys with 50Mhz and depth:=2 is giving 3.2MH/s and I'm curious what performance I can reach.
I was thinking about using BlockRAM instead of Slice-FFs to squeeze in more logic and maybe ease the congestion problems of spartan 6 fpgas. A first glimpse showed that ISE is complaining about asynchronous reads in the current hw version.
I think it should help a lot to move all pipeline registers to BRAMs.
This is not likely to work out. You can only use one address of every dual-port BRAM, so the pipeline stages alone would use up almost all the BRAMs for the depth=2 version. (Even if it could use the BRAMs 100% efficiently it would need 88 BRAMs for depth=2)
BRAMs are also slower than slice flipflops, and as more signals would need to be routed to/from those centralized memories, congestion might get even worse.

At how much LUT/FF/Slice usage are you? I think you might be better off squeezing another depth=0 miner into it, or trying to increase the clock frequency.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 06:02:03 PM
 #163

another minor isse:
after a share is found (shown in green) it gets uploaded and I get a
"... rejected share ..." while the pool I am using (bitclockers) shows the share a valid.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 06:11:07 PM
 #164

Hm, sounds like a bug in the pool's RPC service. It's supposed to return True if it accepts the share. I'll probably have to try that one.
It's working fine with ContinuumPool, Swepool.net, Bitcoins.lc, BTC Guild, Eligius, Slush's pool and DeepBit.

EDIT:
Code:
Found long polling URL for BitClockers: http://pool.bitclockers.com:8332/LP
Mining: BitClockers:cd1aa9fa22321dd0489e32e7090a601bd9735152cf5d64fcdd05b7e7342d741d:112d8c994ded19371a1d932f
Found long polling URL for BTC Guild: http://btcguild.com:8332/LP
Found share: BitClockers:cd1aa9fa22321dd0489e32e7090a601bd9735152cf5d64fcdd05b7e7342d741d:112d8c994ded19371a1d932f:a580871a
BitClockers accepted share a580871a
Seems to work fine for me.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 06:17:59 PM
 #165

trying to build a depth:=3 version right now.

slice luts: 54% (53% used as logic)
slice registers: 26%
occupied slices: 66%

estimates after synthesis.
with a targeted 50mhz clock p&r takes forever and finally fails with setup violations.
problem is congestion/routing, not available ressources in terms of FFs or LUTs...

if you have the time, then just give it a try for xc6slx45-2csg324 with 50mhz and depth:=3

increasing the frequency is not an option, with depth:=2 the timing performance design goal reports just 55mhz  after p&r.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 07:17:07 PM
 #166

trying to build a depth:=3 version right now.

slice luts: 54% (53% used as logic)
slice registers: 26%
occupied slices: 66%

estimates after synthesis.
Sounds like depth:=4 might be achievable
with a targeted 50mhz clock p&r takes forever and finally fails with setup violations.
problem is congestion/routing, not available ressources in terms of FFs or LUTs...
Sounds like Spartan6 routing is just crap.
You might want to try depth:=2 and depth:=3 with doubled registers in the pipeline stages to allow for retiming and thus hitting higher frequencys, at the expense of a couple of flipflops, which you seem to have plenty of.
if you have the time, then just give it a try for xc6slx45-2csg324 with 50mhz and depth:=3
No, being busy synthesizing a XC6VLX760 design, this will take a while.
increasing the frequency is not an option, with depth:=2 the timing performance design goal reports just 55mhz  after p&r.
This sounds like you might want to try the following:
- Split the sha256 rounds into two pipeline stages, as stated above (retiming)
- Experiment with various design strategies. For some reason "Runtime optimized" seems to yield the best results for this design. If you have the time, try SmartXplorer
- If all this doesn't work out, run it at 55MHz instead of 50, should bring it to 3.6MH/s Smiley

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
June 07, 2011, 12:50:08 AM
 #167

Quote
or possibly an Xtreme Data XD-PCIE3000 with three Stratix IVs per card?
What size Stratix IVs are they? I don't know how much better the performance of the Stratix parts are. I'd just guess 1.5x faster, so you'd get ~1.5 MH/s per 1,000 LEs compared to the Cyclone series. Maybe more. Maybe less.


Quote
* DATA is the block header for which a hash must be found. It does contain the unix timestamp. It also contains the current target value, so that's probably where the FPGA learns it (or it doesn't care at all and this is checked on the tcl-side). The nonce is set to 0x00000000.
The FPGA doesn't care, it just returns nonces that make a hash meet the Difficulty 1 target (H == 0). And no, it isn't checked on the tcl side either. All pools currently operate on Difficulty==1. For solo mining, the script will submit the data, bitcoind will check it, and return an error if it wasn't below the target. So, not too much harm done there.

And yes, I've run it solo before, against namecoind  Smiley It found a couple blocks!  Grin

TheSeven: That is some great work you're doing there! I'm glad someone is making a lot of progress with Xilinx devices and Python mining scripts. I haven't had the chance to push your code into the public repo yet, but it is most certainly on my list of things to do.

As a side note, for reasons I cannot understand, my design passed P&R yesterday, even though it failed the last time I tried it. This was a half-mining core (only one SHA-256 pass), at 50MHz. I haven't tested it or anything, but ... I'm just bewildered why it routed without any problems this time. I must have done something wrong the first time. Hopefully it isn't just messing with my head, and I can finally start mining on my SLX150. I'd also like to start testing the DSP48A1 slices, which are rated for 250MHz operation and will perform 48bit + 48bit addition  Cool

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 07, 2011, 09:05:13 AM
 #168

And yes, I've run it solo before, against namecoind  Smiley It found a couple blocks!  Grin
Guess what I tried with mine last night... Smiley
TheSeven: That is some great work you're doing there! I'm glad someone is making a lot of progress with Xilinx devices and Python mining scripts. I haven't had the chance to push your code into the public repo yet, but it is most certainly on my list of things to do.
Someone else also ported your code to Xilinx, while keeping it in Verilog. While I personally don't understand how one could ever chose Verilog over VHDL, you might possibly like that one better.
As a side note, for reasons I cannot understand, my design passed P&R yesterday, even though it failed the last time I tried it. This was a half-mining core (only one SHA-256 pass), at 50MHz. I haven't tested it or anything, but ... I'm just bewildered why it routed without any problems this time. I must have done something wrong the first time. Hopefully it isn't just messing with my head, and I can finally start mining on my SLX150. I'd also like to start testing the DSP48A1 slices, which are rated for 250MHz operation and will perform 48bit + 48bit addition  Cool
What did you change? There must have been something...

Regarding those DSP slices, I'm not sure if they will pay off. 250MHz for the DSP slice alone is not that fast, regular LUT-based adders might be even faster! Also, you couldn't use the 48bit width, and would have only 2 of them to spend per round. This might reduce LUT utilization a bit, but probably won't help performance.
You'll probably gain more by cutting the pipeline stages into halves, as you seem to have lots of spare flipflops around. Sadly this is not the case on my Virtex5. Sad You can usually do that by just adding a second register on the output, the synthesis tools will move over things from the preceding pipeline stage to this unused one. (You'll see this in the synthesis log as "register balancing".) I know that 190MHz are possible this way!

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
June 07, 2011, 02:42:57 PM
 #169

I don't know how much better the performance of the Stratix parts are. I'd just guess 1.5x faster, so you'd get ~1.5 MH/s per 1,000 LEs compared to the Cyclone series. Maybe more. Maybe less.

The Stratix IV architecture looks interesting. From the Altera docs on it, it seems you basically have the equivalent of a free full-adder attached to the output of each 4LUT. (Of course, it doesn't actually have 4LUTs as such, instead having 8-input 2-output ALMs that can be configured as 2 4LUTs, a 5LUT and a 3LUT, or a 6LUT.) Not that useful to me, since I neither have one nor the software to synthesise designs for one, but interesting nonetheless and should reduce LE usage compared to other FPGAs.

You'll probably gain more by cutting the pipeline stages into halves, as you seem to have lots of spare flipflops around. Sadly this is not the case on my Virtex5. Sad You can usually do that by just adding a second register on the output, the synthesis tools will move over things from the preceding pipeline stage to this unused one. (You'll see this in the synthesis log as "register balancing".) I know that 190MHz are possible this way!

Ah yes, Xilinx under-equipped the Virtex 5 series with flipflops for some daft reason. Why do I keep getting the impression their FPGAs are designed to look good on paper as much as they are to actually function well?

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 07, 2011, 09:07:51 PM
 #170

@TheSeven: Just interested, but you fitted a complete unit (depth=6) into a v5lx110t?
Gave it a try, but the default strategy already shows 250% LUTs after synthesis...
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 07, 2011, 09:19:27 PM
 #171

@TheSeven: Just interested, but you fitted a complete unit (depth=6) into a v5lx110t?
Exactly, but with no additional registers, just the code that I have uploaded.

Gave it a try, but the default strategy already shows 250% LUTs after synthesis...
For the Virtex5 LX110?
I'm getting an area constraint ratio of ~200, but it fits at ~96% utilization in the end.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Zalfrin
Sr. Member
****
Offline Offline

Activity: 401
Merit: 250



View Profile
June 07, 2011, 10:51:17 PM
 #172

As a side note, for reasons I cannot understand, my design passed P&R yesterday, even though it failed the last time I tried it.

Welcome to the wonderful world of P&R tools. Wink If nothing else changed in the design, it may have just been that the seed the tool started with when running it's algorithm changed. I see this a lot, good timing constraints help guide the tools to better solutions, and the tools are becoming more deterministic, but there's still a random element to them at times. Some vendors are worse than others. *cough*Actel*cough*

I've been trying to run the verilog version of the code through Synplify Pro just to see how it looks for utilization on various devices, but have been having problems getting it to assign the parameter correctly... Seems to be ignoring my assignment and optimizing everything away. Haven't had a chance to try out the parameterized VHDL code yet, I'm a lot more comfortable with VHDL so even if it doesn't work right out of the gate with Synplify, I should be able to get it working. Verilog has too many idiosyncrasies for my tastes.
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 07, 2011, 10:56:21 PM
 #173

Make sure you connect all the inputs and outputs to something that can't be optimized away. For fpgaminer's Altera verilog design that would be this virtual wire thing, or for my VHDL design (and also the Xilinx verilog design) it would be the UART, which is itself connected to some I/O pins, so it can't be optimized away.
Oh, and I fully agree with your attitude towards verilog. That's why I translated it to VHDL Smiley

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
markm
Legendary
*
Offline Offline

Activity: 2996
Merit: 1121



View Profile WWW
June 08, 2011, 12:54:23 AM
 #174

All this seems to be about using dev-kit boards, what is involved, now that the designs have been tested on devkit boards, in doing it on presumably cheaper maybe simpler (no extra I/O types just the one you actually want or whatever other optimisations) "production" boards?

Are the devkit ones the only ones you can simply plug into a usb port and play?

Although they might seem kind of expensive per MHash upfront cost, low power usage is for some people not merely a savings of money on power bill but maybe even a case of keeping power usage low enough that landlords or employers or whatever won't see drastic spike in power bill thus decide to no longer provide it "free"...

-MarkM-

Browser-launched Crossfire client now online (select CrossCiv server for Galactic  Milieu)
Free website hosting with PHP, MySQL etc: http://hosting.knotwork.com/
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
June 08, 2011, 02:41:49 AM
 #175

Quote
If nothing else changed in the design, it may have just been that the seed the tool started with when running it's algorithm changed.
Yeah, I know about the random seed, but ... I dunno, that seems too far fetched that the random seed would let it go from completely un-routable, to routing in a few minutes. Then again, I haven't used ISE in a few years, so it may be a lot more temperamental than I remember.

I only did one thing different. I ran each stage one at a time, instead of telling it to P&R and have it automatically invoke all the necessary steps.  Huh

Quote
I've been trying to run the verilog version of the code through Synplify Pro
Ughhh. I've got an old version of Synplify Pro that refuses to synthesize the design. First, it didn't like the use of block names  Huh and last I left it, it was optimizing out the entire design, for no particular reason. I might install their latest eval version on a different machine and see if that version has better luck.

njloof
Member
**
Offline Offline

Activity: 73
Merit: 10


View Profile
June 08, 2011, 03:40:10 AM
 #176

Yeah, I know about the random seed, but ... I dunno, that seems too far fetched that the random seed would let it go from completely un-routable, to routing in a few minutes. Then again, I haven't used ISE in a few years, so it may be a lot more temperamental than I remember.

I don't know about today, but back in the day the Xilinx router used simulated annealing, and if that algorithm gets caught in a local minimum, it gets stuck and never recovers. Minor irrelevant tweaks to the input could sometimes shake things out.
AnnihilaT
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 08, 2011, 09:21:31 AM
 #177

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 01:00:23 PM
 #178

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
nathanrees19
Full Member
***
Offline Offline

Activity: 196
Merit: 100



View Profile
June 08, 2011, 04:12:13 PM
 #179

Quote
it reports 48C for the junction temperature. I might just keep it at 50 to be safe.
Altera commercial FPGAs are rated for 85C JT.

After doing some digging to get the JTAG device ID (the JTAG debugger in Quartus doesn't seem to think this is worth mentioning) I finally have it running on a DE0-Nano. I started with the plastic cover on, took it off and found the FPGA quite hot to touch. With it sitting over a not-so-small PC case fan with the cover off, it has dropped to (what feels like) under 40C.

It has successfully produced a block on the testnet Smiley
rb2k
Member
**
Offline Offline

Activity: 109
Merit: 10


View Profile
June 08, 2011, 05:05:17 PM
 #180

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!