Bitcoin Forum
June 13, 2025, 08:24:04 AM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 [33] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432989 times)
kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 15, 2013, 02:25:51 PM
 #641

2) Will the TCL script handle multiple instances of the miner within a single FPGA?

I tried to build an image with two instances of fpgaminer_top but it did not change anything. Seems like the tcl scripts can only handle a single GNON/NONC interface. So I guess the answer is no here as well.
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
March 15, 2013, 02:35:35 PM
 #642

2) Will the TCL script handle multiple instances of the miner within a single FPGA?

I tried to build an image with two instances of fpgaminer_top but it did not change anything. Seems like the tcl scripts can only handle a single GNON/NONC interface. So I guess the answer is no here as well.

If a custom driver for cgminer were devised for this firmware should be possible to check for multiple instances per fpga and multiple fpga per jtag. As I mentioned before I'm willing to put up some coinage for a driver bounty. 4x 233mhz w/ cgminer sounds nice.










kramble
Sr. Member
****
Offline Offline

Activity: 384
Merit: 250



View Profile WWW
March 15, 2013, 02:47:07 PM
Last edit: March 15, 2013, 06:27:41 PM by kramble
 #643

2) Will the TCL script handle multiple instances of the miner within a single FPGA?

I tried to build an image with two instances of fpgaminer_top but it did not change anything. Seems like the tcl scripts can only handle a single GNON/NONC interface. So I guess the answer is no here as well.

You could just use different ident strings for the second instance eg STA1/DAT1/GNO1/NON1 and STA2/DAT2/GNO2/NON2
(they seem to need to be 4 chars not longer), then duplicate the jtag driver code to direct the work alternately between them.

At a very long shot you might even try just running two instances of the mine.tcl scripts (in separate directories due to dependancies) and just change the idents accordingly. I don't know if this is supported by the quartus_stp driver, but I have had some success with this approach on another project.

[EDIT]
Actually, thinking about this, its a pretty crude approach. It would be better to rewrite fpgaminer_top so as to apportion work between multiple hashers, and queue the results for transmission back via a single channel. Rather beyond my skills I'm afraid. I'll also note that the quartus_stp JTAG driver is not entirely reliable. I've had the mine.tcl script drop out with errors on extended runs (this is partly why I went the route of a custom serial interface to my raspberry pi host).

[EDIT2]
Then again it would be ridiculously easy just to push the same midstate_buf and data_buf into two or more sha256_transform units and just use different nonce values (eg for four instances just hardwire the top two bits of the nonce and count over a 30 bit range). Then just load golden_nonce based on whichever unit matches. Voila. You'd probably also need to reduce the getwork from every 20 seconds to something a bit faster in order to avoid wrapping and duplicating work. Longpoll would be nice too.

[EDIT3]
This is roughly what I had in mind https://github.com/kramble/DE0-Nano-BitCoin-Miner/blob/master/Misc/quad_fpgaminer_top.v
... just as a discussion point. I can't vouch for it working and it needs optimization, but I have given it a quick trip through the simulator (just tested one of the cores). I also found LX150_makomk_dualcore in the fpgaminer distribution which may be worth a look.

Anyway, with the rate that ASIC's are coming on stream and pushing up the difficulty, I doubt if anyone is going to do any further work on the fpgaminer code.

Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 15, 2013, 04:06:50 PM
 #644

It would be better to rewrite fpgaminer_top so as to apportion work between multiple hashers

Yes, a pretty plain indirect addressing unit which can address multiple instances was what I was thinking.

Anyway, with the rate that ASIC's are coming on stream and pushing up the difficulty, I doubt if anyone is going to do any further work on the fpgaminer code.

True, even though I haven't really sat down to study the different ASIC offerings which are available yet.
kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 16, 2013, 03:53:33 AM
 #645

# ** Error: (vsim-3033) fpgaminer_6_1200mv_85c_slow.vo(759): Instantiation of 'dffeas' failed. The design unit was not found.
#         Region: /test_fpgaminer_top/uut

You're doing a simulation on the netlist after synthesis. This is very slow and is probably not what you want.
fizzisist
Hero Member
*****
Offline Offline

Activity: 720
Merit: 528



View Profile WWW
March 16, 2013, 05:56:55 AM
 #646

Yes, I think that is likely. Just to eliminate this, why not just set up an account on btcguild (it only takes a moment) and try it on there?

The problem appears to be related to my local bitcoind. Now with btcguild.com as upstream I'm at least getting:


Code:
[03/15/2013 14:43:55] 100.00 MH/s (~154.00 MH/s) [Rej: 0/9 (0.00%)]
[03/15/2013 14:43:57] c6268b39 accepted
[03/15/2013 14:43:58] cf4ae4eb accepted
[03/15/2013 14:44:00] 99.98 MH/s (~184.55 MH/s) [Rej: 0/11 (0.00%)]
Thank you for the suggestion.


As I understand it, fpgaminer doesn't care about the target difficulty and always treats it as 1, so showed lots of rejects when you were solo mining. There's no problem with this, because that just means you haven't found a block yet. When one is accepted is when you should be interested, because you just earned 25 BTC! If you don't want to see the rejects because they bother you, recode the host software (tcl?) to request the real target and make sure any hashes received from the FPGA meet that target. The FPGA will always return shares that meet the target of 1, which you can just ignore or count to make sure the FPGA is still hashing properly.

kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 16, 2013, 01:31:58 PM
 #647

As I understand it, fpgaminer doesn't care about the target difficulty and always treats it as 1, so showed lots of rejects when you were solo mining.

That makes sense.
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
March 18, 2013, 09:20:56 AM
Last edit: March 18, 2013, 03:03:44 PM by senseless
 #648

2) Will the TCL script handle multiple instances of the miner within a single FPGA?

I tried to build an image with two instances of fpgaminer_top but it did not change anything. Seems like the tcl scripts can only handle a single GNON/NONC interface. So I guess the answer is no here as well.

I was able to increase the fmax on my chip for the Stratix IV OrphanGland code from 107 to 221 by removing all of the quasi-piplining code from the top and the associated files from the project. I now have 3 cores hashing @ 220mhz (450MH/s) on my EP4SGX230. It's still no where near as efficient as the makomk-mod code. Going to try to implement makomk's hashing core under the orphangland top.

Also, on a side note. I talked with Luke-Jr and he said he'd be willing to build a driver (bfgminer already has a jtag driver) if someone were to send him a board. I was thinking of just sending him some coinage so he can buy a DE0-Nano (which should be sufficient to get communication with the firmware running). He said it would be able to take a look in 1-2 weeks.


kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 18, 2013, 11:39:51 AM
 #649

I was able to increase the fmax on my chip for the Stratix IV OrphanGland code from 107 to 221 by removing all of the quasi-piplining code from the top and the associated files from the project. I now have 3 cores hashing @ 220mhz (450MH/s) on my EP4SGX230.

Cool. Do the three cores act like individual workers, or do they work together? Did you ever try to run your design with a Stratix-V target to see what fmax you would get?

senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
March 18, 2013, 12:05:10 PM
Last edit: March 18, 2013, 12:22:30 PM by senseless
 #650

I was able to increase the fmax on my chip for the Stratix IV OrphanGland code from 107 to 221 by removing all of the quasi-piplining code from the top and the associated files from the project. I now have 3 cores hashing @ 220mhz (450MH/s) on my EP4SGX230.

Cool. Do the three cores act like individual workers, or do they work together? Did you ever try to run your design with a Stratix-V target to see what fmax you would get?



They work together dividing work between N cores instead of each core doing its own work.

It holds the nonce value in a variable(?) and then rotates that and provides work to each core. I think it might be a bit more efficient if each core was given a nonce range and then does its own internal flipping but would require a larger footprint. It's definitely not doing 1 hash per clock per core. I think the cores just get hung up waiting for data. It seems like it's doing 1 hash per 2 clocks (Maybe it takes a full clock just to get the data to the core). My hash rate has slipped down to 325-350 with good streaks pushing that up to 400+ and the actual hashing value should be closer to 700mh/s. Hash rate is based on submitted shares since the firmware doesnt have any way to stat the internal hashing frequency.

I've got 2-3 compiles going at any given time atm (takes like 2 hrs to compile). I'll try it out on a Stratix V when I can.




kramble
Sr. Member
****
Offline Offline

Activity: 384
Merit: 250



View Profile WWW
March 18, 2013, 12:38:06 PM
 #651

Are you still using the original mine.tcl script? This does getwork at a fixed rate (every 20 secs, unless a share is submitted in the meantime), but at 660MH/s (which 3 cores at 220MHz should be doing), the nonce will wrap every 6 seconds, so you end up duplicating work and get a reduced actual throughput. Just a thought.

Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
March 18, 2013, 01:03:02 PM
 #652

Are you still using the original mine.tcl script? This does getwork at a fixed rate (every 20 secs, unless a share is submitted in the meantime), but at 660MH/s (which 3 cores at 220MHz should be doing), the nonce will wrap every 6 seconds, so you end up duplicating work and get a reduced actual throughput. Just a thought.

Mark

Thank you, I was even showing duplicate submissions on the pool. I set it to 3 and now i'm pushing 600MH/s submission rate!

Now if I can just find 8% more resources so I can add in 1 more core I'll meet BFL single original specs!




kramble
Sr. Member
****
Offline Offline

Activity: 384
Merit: 250



View Profile WWW
March 18, 2013, 01:17:24 PM
 #653

Now if I can just find 8% more resources so I can add in 1 more core I'll meet BFL single original specs!

If not, you could probably squeeze in an extra half-core, though it would make apportioning work a bit trickier.

Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
March 18, 2013, 04:23:43 PM
Last edit: March 18, 2013, 08:32:30 PM by senseless
 #654

Average of 670Mh/s based on my submissions over the last 1240 share submissions. Thanks again, the TCL was definitely the problem.

I started compiling a stratix V with 4 cores to see what the fmax would be.

Fmax was 147mhz. There were some routing issues and it had to add in some delays to fit 4 cores. There also seemed to be some weird clocking things going on in the stratix v with some sort of fractional clock. I'm not sure what that was all about, probably some sort of EP5 specific features that will improve performance if the design takes advantage of them.






kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 19, 2013, 08:02:08 AM
 #655

I would have expected the 28nm Stratix V to be faster. Maybe the new ALM structure does not map that well into the sha logic, at least with the current version of Quartus.
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
March 19, 2013, 09:10:40 AM
 #656

I would have expected the 28nm Stratix V to be faster. Maybe the new ALM structure does not map that well into the sha logic, at least with the current version of Quartus.

That was under Quartus 11.1, so not even the latest Quartus. I would also expect it to clock higher but it seems the stratix v's are pretty significantly different when compared to previous versions.

Do you have a supply of Stratix Vs? What sort of pricing are you getting for them?







kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 19, 2013, 09:50:11 AM
 #657

No I don't have a supply of Stratix V's. I was juts curious of their mining performance
kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 23, 2013, 01:56:22 PM
 #658

I was playing with the multicore FPGA design and I get high local rates (as expected), but the estimated rate is low. Is this because the mine.tcl script does not implement long polling?

[03/23/2013 14:52:55] 300.00 MH/s (~54.09 MH/s) [Rej: 0/42 (0.00%)]

kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 24, 2013, 05:40:25 AM
 #659

The testbench contains a block of test data for the simuation:

https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/blob/master/testbenches/test_data.txt
Code:
Valid hash examples:

Midstate: 90f741afb3ab06f1a582c5c85ee7a561912b25a7cd09c060a89b3c2a73a48e22
Data: 000000014cc2c57c7905fd399965282c87fe259e7da366e035dc087a0000141f000000006427b6492f2b052578fb4bc23655ca4e8b9e2b9b69c88041b2ac8c771571d1be4de695931a2694217a33330e000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
NONCE: 32'h0e33337a == 238,236,538


Verilog values:
Data: 512'h000002800000000000000000000000000000000000000000000000000000000000000000000000000000000080000000000000002194261a9395e64dbed17115
Midstate: 256'h228ea4732a3c9ba860c009cda7252b9161a5e75ec8c582a5f106abb3af41f790

The Midstate is converted to big endian, but
  • How was the data extracted/calculated?
  • In the testbench the data is set as 512 bit value

Code:
		uut.data_buf = 512'h000002800000000000000000000000000000000000000000000000000000000000000000000000000000000080000000000000002194261a9395e64dbed17115;
  but in the uut it's declared as a 256-bit value:

Code:
	//// Virtual Wire Control
reg [255:0] midstate_buf = 0, data_buf = 0;
    why is this?

  • How can a new set of data for data_buf and midstate_buf be found on the net or extracted from the BDB files?
kingcoin
Sr. Member
****
Offline Offline

Activity: 262
Merit: 250


View Profile
March 24, 2013, 06:01:53 AM
 #660

As of my last question it seems like I can use blockchain.info, but I have to convert the timestamp and difficulty unless there is a site which dumps the block header in hex.

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 [33] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!