Bitcoin Forum
March 19, 2024, 09:51:18 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [40] 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432863 times)
Khertan
Full Member
***
Offline Offline

Activity: 193
Merit: 100


View Profile WWW
May 07, 2013, 09:35:27 PM
 #781

two loop instead of 3 will increase design of 33 percen t, That's Incredible and Awesome boost ... Witch will give to my de0 nano 10mh/s instead of 6.66mh/s (at 40mhz to not fry it). that's amazing !!!


:p of course that's more for fun and to learn Smiley

1710841878
Hero Member
*
Offline Offline

Posts: 1710841878

View Profile Personal Message (Offline)

Ignore
1710841878
Reply with quote  #2

1710841878
Report to moderator
1710841878
Hero Member
*
Offline Offline

Posts: 1710841878

View Profile Personal Message (Offline)

Ignore
1710841878
Reply with quote  #2

1710841878
Report to moderator
"I'm sure that in 20 years there will either be very large transaction volume or no volume." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1710841878
Hero Member
*
Offline Offline

Posts: 1710841878

View Profile Personal Message (Offline)

Ignore
1710841878
Reply with quote  #2

1710841878
Report to moderator
kramble
Sr. Member
****
Offline Offline

Activity: 384
Merit: 250



View Profile WWW
May 07, 2013, 10:02:09 PM
 #782

two loop instead of 3 will increase design of 33 percen t, That's Incredible and Awesome boost ... Witch will give to my de0 nano 10mh/s instead of 6.66mh/s (at 40mhz to not fry it). that's amazing !!!

I concluded that the power consumption is pretty much proportional to the hash rate. So for example 10MHash/sec will consume the same power (and get just as hot) whether running at 50MHz or using half the resources at 100Mhz (to a first approximation anyway as faster clock should be slightly less efficient).

I've gone back to look at makomk's code (he's uploaded something recently to http://www.makomk.com/gitweb/?p=Open-Source-FPGA-Bitcoin-Miner.git;a=tree;h=refs/heads/de0-nano-usb;hb=de0-nano-usb ), so I thought I'd give it a try (swapping out his usb interface for my serial code), its still compiling after 2 hours (only just failing to route at the last attempt, just one signal short!) It will be interesting to see how fast it will run (assuming it does finish compiling!)

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 08, 2013, 03:50:05 AM
 #783

Quote
I'd like to ask what optimization options need to use to achieve > 190MHz clock speed? please help me, thanks very much.
The project won't "just compile" and achieve >190MHz.  Getting timing that high requires using Xilinx's SmartXplorer to brute force it.

Khertan
Full Member
***
Offline Offline

Activity: 193
Merit: 100


View Profile WWW
May 08, 2013, 06:32:04 AM
 #784

two loop instead of 3 will increase design of 33 percen t, That's Incredible and Awesome boost ... Witch will give to my de0 nano 10mh/s instead of 6.66mh/s (at 40mhz to not fry it). that's amazing !!!

I concluded that the power consumption is pretty much proportional to the hash rate. So for example 10MHash/sec will consume the same power (and get just as hot) whether running at 50MHz or using half the resources at 100Mhz (to a first approximation anyway as faster clock should be slightly less efficient).

I've gone back to look at makomk's code (he's uploaded something recently to http://www.makomk.com/gitweb/?p=Open-Source-FPGA-Bitcoin-Miner.git;a=tree;h=refs/heads/de0-nano-usb;hb=de0-nano-usb ), so I thought I'd give it a try (swapping out his usb interface for my serial code), its still compiling after 2 hours (only just failing to route at the last attempt, just one signal short!) It will be interesting to see how fast it will run (assuming it does finish compiling!)

PowerPlay estimate less power usage to use two loop at 40mhz than 3 at 50mhz. and i think you will not be able to get more than 100mhz with two loop.

I use very aggressive fitter settings, effort multiplier of 40, that's 2hours of fitting Smiley

xbaby
Newbie
*
Offline Offline

Activity: 16
Merit: 0


View Profile
May 08, 2013, 11:12:48 AM
 #785

Quote
I'd like to ask what optimization options need to use to achieve > 190MHz clock speed? please help me, thanks very much.
The project won't "just compile" and achieve >190MHz.  Getting timing that high requires using Xilinx's SmartXplorer to brute force it.

Thanks for you hint. I've already tried SmartXplorer with default 7 built-in strategies, but can't achieve above 160MHz result. so, you mean I should use the cost table method to brute force it? thanks.
kramble
Sr. Member
****
Offline Offline

Activity: 384
Merit: 250



View Profile WWW
May 08, 2013, 01:12:52 PM
Last edit: May 08, 2013, 02:28:50 PM by kramble
 #786

I use very aggressive fitter settings, effort multiplier of 40, that's 2hours of fitting Smiley

Thanks for the tip, I've been using the default settings so far but I'll give the more aggressive ones a try.

Makomk's code did eventually compile (for 120MHz clock) and gave a fmax of 123MHz at 85C. This should be giving 30MHash/s, though I'm not convinced I'm seeing that in practice. Possibly the fpga is running a bit too hot, though I'm not seeing any bad hashes. I'll have to run it a bit longer to be certain.

[EDIT] Its actually working perfectly. I cranked it up to 140MHz and it seems quite stable, pushing out 35MHash/sec! Not bad at all for a DE0-Nano. Cheers makomk  Cheesy

Regards
Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 09, 2013, 12:27:46 AM
 #787

Quote
Thanks for you hint. I've already tried SmartXplorer with default 7 built-in strategies, but can't achieve above 160MHz result. so, you mean I should use the cost table method to brute force it? thanks.
Yup.  For reference, the released bitstreams took days/weeks to compile.

iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
May 09, 2013, 06:25:56 AM
 #788

Quote
Thanks for you hint. I've already tried SmartXplorer with default 7 built-in strategies, but can't achieve above 160MHz result. so, you mean I should use the cost table method to brute force it? thanks.
Yup.  For reference, the released bitstreams took days/weeks to compile.

xbaby,

You can also try to floorplan the DSP48s if you want to cut your runtime.  To get the boards I have with V6 130Ts to run at 300 MHz, I had to constrain each of the DSP48s, otherwise there was no chance.  This was based on the original verilog port, but I'm sure the problem with no pre-placement is the same.
AJRGale
Hero Member
*****
Offline Offline

Activity: 767
Merit: 500



View Profile
May 09, 2013, 06:29:38 AM
 #789

I use very aggressive fitter settings, effort multiplier of 40, that's 2hours of fitting Smiley

Thanks for the tip, I've been using the default settings so far but I'll give the more aggressive ones a try.

Makomk's code did eventually compile (for 120MHz clock) and gave a fmax of 123MHz at 85C. This should be giving 30MHash/s, though I'm not convinced I'm seeing that in practice. Possibly the fpga is running a bit too hot, though I'm not seeing any bad hashes. I'll have to run it a bit longer to be certain.

[EDIT] Its actually working perfectly. I cranked it up to 140MHz and it seems quite stable, pushing out 35MHash/sec! Not bad at all for a DE0-Nano. Cheers makomk  Cheesy

Regards
Mark

wow, now im in, 35MH/s = $5 a month... at max of 5W? now if i was going to replace my setup now thats pulling 200W, i need 100 of these, and that beating my 190MH/s setup! (35 x 100 = 3500MH/s!!) and thats just the DE0-nanos!!

now, wheres my $10,000...
kramble
Sr. Member
****
Offline Offline

Activity: 384
Merit: 250



View Profile WWW
May 09, 2013, 08:40:41 AM
 #790

wow, now im in, 35MH/s = $5 a month... at max of 5W? now if i was going to replace my setup now thats pulling 200W, i need 100 of these, and that beating my 190MH/s setup! (35 x 100 = 3500MH/s!!) and thats just the DE0-nanos!!

now, wheres my $10,000...

Yes, quite! In the 6 months I've been tinkering I've mined the glorious sum of 0.4BTC. I was sort of hoping to get up to a whole bitcoin eventually (I rather fancied one of those shiny physical coins as a keepsake), but that now seems rather forlorn.

Still, I already had the kit, which (as I explained way back up the thread), I obtained in a fit of enthusiasm for rekindling the electronics hobbyist days of my youth, and all I've invested is my time (of which I have a lot spare at the moment), and a little electricity. It was fun though, so no regrets.

Anyway, this is rather derailing fpgaminer's thread with my chattering, so I'll shut up now.

TTFN
Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
xbaby
Newbie
*
Offline Offline

Activity: 16
Merit: 0


View Profile
May 09, 2013, 01:21:50 PM
 #791

Quote
Thanks for you hint. I've already tried SmartXplorer with default 7 built-in strategies, but can't achieve above 160MHz result. so, you mean I should use the cost table method to brute force it? thanks.
Yup.  For reference, the released bitstreams took days/weeks to compile.

xbaby,

You can also try to floorplan the DSP48s if you want to cut your runtime.  To get the boards I have with V6 130Ts to run at 300 MHz, I had to constrain each of the DSP48s, otherwise there was no chance.  This was based on the original verilog port, but I'm sure the problem with no pre-placement is the same.

Hi, thanks for your tips. I'm compiling the "X6000_ztex_comm4" project, which doesn't use any DSP48 block as I know. I also successfully compiled the same project on V6 130T device (with minor fix for MMCM, FIFO, JTAG core), just achieve at most 300MHz, same as yours, but no DSP48s. the compile time of V6 device is much less than spartan6 LX150. I guess the long-route resources of virtex6 make the difference.

next, I want to try difference implement options to go higher target, such as 350MHz.

BTW the power estimation given by ISE of V6 130T @ 300MHz is about 10W. below is the resource usage:

Code:
Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:                85,173 out of 160,000   53%
    Number used as Flip Flops:              85,172
    Number used as Latches:                      1
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                     57,385 out of  80,000   71%
    Number used as logic:                   34,910 out of  80,000   43%
      Number using O6 output only:          14,978
      Number using O5 output only:             539
      Number using O5 and O6:               19,393
      Number used as ROM:                        0
    Number used as Memory:                   9,759 out of  27,840   35%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         9,759
        Number using O6 output only:         9,759
        Number using O5 output only:             0
        Number using O5 and O6:                  0
    Number used exclusively as route-thrus: 12,716
      Number with same-slice register load: 12,452
      Number with same-slice carry load:       264
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                15,859 out of  20,000   79%
  Number of LUT Flip Flop pairs used:       62,383
    Number with an unused Flip Flop:         1,382 out of  62,383    2%
    Number with an unused LUT:               4,998 out of  62,383    8%
    Number of fully used LUT-FF pairs:      56,003 out of  62,383   89%
    Number of slice register sites lost
      to control set restrictions:               0 out of 160,000    0%
iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
May 09, 2013, 05:25:01 PM
 #792

That's really interesting, I am not familiar with the ztex projects (they don't compile in my preferred synthesis tool, synplify pro).  I would not have expected it to run on the V6 with that usage @ 300 MHz without additional pipelining.  Maybe I'll take a look at that project and see what the big difference is.

I used the old veriliog port to get to 300 MHz on mine (adding DSPs and pipelining), but the power usage is around 7W due to the use of the DSPs.

IIDX

Quote
Thanks for you hint. I've already tried SmartXplorer with default 7 built-in strategies, but can't achieve above 160MHz result. so, you mean I should use the cost table method to brute force it? thanks.
Yup.  For reference, the released bitstreams took days/weeks to compile.

xbaby,

You can also try to floorplan the DSP48s if you want to cut your runtime.  To get the boards I have with V6 130Ts to run at 300 MHz, I had to constrain each of the DSP48s, otherwise there was no chance.  This was based on the original verilog port, but I'm sure the problem with no pre-placement is the same.

Hi, thanks for your tips. I'm compiling the "X6000_ztex_comm4" project, which doesn't use any DSP48 block as I know. I also successfully compiled the same project on V6 130T device (with minor fix for MMCM, FIFO, JTAG core), just achieve at most 300MHz, same as yours, but no DSP48s. the compile time of V6 device is much less than spartan6 LX150. I guess the long-route resources of virtex6 make the difference.

next, I want to try difference implement options to go higher target, such as 350MHz.

BTW the power estimation given by ISE of V6 130T @ 300MHz is about 10W. below is the resource usage:

Code:
Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:                85,173 out of 160,000   53%
    Number used as Flip Flops:              85,172
    Number used as Latches:                      1
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                     57,385 out of  80,000   71%
    Number used as logic:                   34,910 out of  80,000   43%
      Number using O6 output only:          14,978
      Number using O5 output only:             539
      Number using O5 and O6:               19,393
      Number used as ROM:                        0
    Number used as Memory:                   9,759 out of  27,840   35%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         9,759
        Number using O6 output only:         9,759
        Number using O5 output only:             0
        Number using O5 and O6:                  0
    Number used exclusively as route-thrus: 12,716
      Number with same-slice register load: 12,452
      Number with same-slice carry load:       264
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                15,859 out of  20,000   79%
  Number of LUT Flip Flop pairs used:       62,383
    Number with an unused Flip Flop:         1,382 out of  62,383    2%
    Number with an unused LUT:               4,998 out of  62,383    8%
    Number of fully used LUT-FF pairs:      56,003 out of  62,383   89%
    Number of slice register sites lost
      to control set restrictions:               0 out of 160,000    0%
iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
May 09, 2013, 05:26:23 PM
 #793

Oh, what speed grade did you use for the V6?  All my boards with 130s and 240s (ml605) are -1, so if you used -3 that could explain the big difference in the quality of the results.
xbaby
Newbie
*
Offline Offline

Activity: 16
Merit: 0


View Profile
May 10, 2013, 01:23:12 AM
 #794

Oh, what speed grade did you use for the V6?  All my boards with 130s and 240s (ml605) are -1, so if you used -3 that could explain the big difference in the quality of the results.

my board have 2 pcs 130T devices. the speed grade is -2I. so your design with DSP48s really have some speed and power usage advantage. BTW, can your design be synthesized on XST?
iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
May 10, 2013, 05:38:42 AM
 #795

Possibly, but I used some compiler directives to force SRLs and registers in certain situations so the design would fit.  In XST it infers too many of register or SRL to properly fit, so some manual instantiation might be required.

I'm still surprised that the Ztex project would hit 300 Mhz without extra pipeline stages.  I think that it might be better to add DSPs to that project instead?
xbaby
Newbie
*
Offline Offline

Activity: 16
Merit: 0


View Profile
May 10, 2013, 05:52:28 AM
 #796

the 300MHz result was just compiled with no timing error. in next few days, I'll program it on board to see if it could really run perfectly.
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
May 11, 2013, 07:11:32 AM
Last edit: May 11, 2013, 07:46:03 AM by senseless
 #797

I've been asked a few times about a mining script for the current KC705 firmware.  I wrote a plugin for Modular Python Bitcoin Miner.  Here's the message I sent to someone about it:

Quote
I uploaded the custom MBPM module, which is compatible with the current KC705 mining code, here:
https://mega.co.nz/#!Oh5HTDRB!C0RLYW4yZN8gbg38FfgLpzmKFcseOql3Xx1i_gXTfdM

You'll want to download a copy of MPBM's testing branch.  Then extract the above archive into
Code:
modules/fpgamining
such that you end up with:

Code:
modules/fpgamining/kc705_uart/__init__.py
modules/fpgamining/kc705_uart/kc705uartworker.py

Once you start MPBM, you can now add a KC705 Worker by openning up the MPBM web-interface (http://127.0.0.1:8832) and clicking the "Workers" button on the left.  On Windows, I ran MPBM under Cygwin, and the "Port" ended up being /dev/com2 for me.  The Baudrate is 115200.

~fpgaminer

I haven't had a chance to clean it up and put it on the repo yet.

Have you tested the code on windows? Having a hell of a time trying to get mining. Tried as best I could without knowing python to get it running without much success. First was getting a ton of indentation errors. PyWin editor was telling me 1/2 the code was not idented properly. Think I fixed those successfully; now getting the following errors. Any idea?

I was thinking it was a result of my python setup in windows [since another user was able to get it running under linux on the VC707]. Tried 3.3 and 3.2 with same errors on both.

Code:
2013-05-11 00:10:32.222 [100] KC705: Traceback (most recent call last):
  File "c:\FPGA Work\Scripts\mpm\modules\fpgamining\kc705_uart\kc705uartworker.py", line 201, in main
    self._sendjob(job)
  File "c:\FPGA Work\Scripts\mpm\modules\fpgamining\kc705_uart\kc705uartworker.py", line 391, in _sendjob
    self.handle.write(job.data[64:76].encode('hex') + job.midstate.encode('hex') + "\n")
AttributeError: 'bytes' object has no attribute 'encode'

2013-05-11 00:10:32.223 [100] KC705: Traceback (most recent call last):
  File "c:\FPGA Work\Scripts\mpm\modules\fpgamining\kc705_uart\kc705uartworker.py", line 323, in _listener
    data_buffer += self.handle.read(9)
TypeError: Can't convert 'bytes' object to str implicitly

When running the code default without changing any of the indentations I get:

Code:
2013-05-11 00:41:46.872 [300] Core: Could not load module fpgamining.kc705_uart: Traceback (most recent call last):
  File "c:\FPGA Work\Scripts\mpm\core\core.py", line 108, in __init__
    module = getattr(__import__("modules.%s" % maintainer, globals(), locals(), [module], 0), module)
  File "c:\FPGA Work\Scripts\mpm\modules\fpgamining\kc705_uart\__init__.py", line 1, in <module>
    from .kc705uartworker import KC705UARTWorker
  File "c:\FPGA Work\Scripts\mpm\modules\fpgamining\kc705_uart\kc705uartworker.py", line 324
    if '\n' not in data_buffer: continue
                                       ^
TabError: inconsistent use of tabs and spaces in indentation

fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
May 12, 2013, 08:44:52 AM
 #798

Quote
Have you tested the code on windows? Having a hell of a time trying to get mining.
Weird.  I have tested it on Cygwin, using Python 2.7.  Maybe Python 3 doesn't like the code?

senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
May 12, 2013, 07:48:52 PM
 #799

Quote
Have you tested the code on windows? Having a hell of a time trying to get mining.
Weird.  I have tested it on Cygwin, using Python 2.7.  Maybe Python 3 doesn't like the code?

Working perfectly under cygwin with py 2.7, thank you.


fizzisist
Hero Member
*****
Offline Offline

Activity: 720
Merit: 525



View Profile WWW
May 13, 2013, 04:37:22 PM
 #800

There are a few tabs mixed in there. That should cause problems on some systems: lines 324, 331, 333, 334, 338. The 'bytes' issues are definitely due to using python3: http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

Should be easy enough to fix that up, though. Throw the code up on github and I'm sure someone will do it. Smiley

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [40] 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!