Bitcoin Forum
December 05, 2016, 02:46:57 AM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
  Print  
Author Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards  (Read 109498 times)
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784


firstbits:1MinerQ


View Profile WWW
January 01, 2012, 06:24:31 AM
 #61

Are you seriously running all those instances? I hope not for too long...

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
rph
Full Member
***
Offline Offline

Activity: 176


View Profile
January 01, 2012, 07:47:17 PM
 #62

Are you seriously running all those instances? I hope not for too long...

They're spot instances; it's about $7/hr to run 25 of them & they're started/stopped on demand.
Definitely worth it in terms of build time reduction.

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
January 02, 2012, 08:25:38 PM
 #63

New plot.  Two rings, 161mhz.  As you can see I'm getting closer to being able to cram that third ring in there.


The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
January 02, 2012, 08:26:36 PM
 #64

heh, I'm still working on that..

-rph

Amazon EC2 FTW!


The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
DeepBit
Donator
Hero Member
*
Offline Offline

Activity: 532


We have cookies


View Profile WWW
January 02, 2012, 08:28:21 PM
 #65

New plot.  Two rings, 161mhz.  As you can see I'm getting closer to being able to cram that third ring in there.
Does it means that you are getting 320 MH/s per chip ?

Welcome to my bitcoin mining pool: https://deepbit.net ~ 3600 GH/s, Both payment schemes, instant payout, no invalid blocks !
Coming soon: ICBIT Trading platform
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
January 02, 2012, 08:54:13 PM
 #66

New plot.  Two rings, 161mhz.  As you can see I'm getting closer to being able to cram that third ring in there.
Does it means that you are getting 320 MH/s per chip ?

No.

First off, I haven't yet succeeded in cramming in the third ring, so this is still hypothetical.  I want to be very clear about that, although as you can see I'm obviously making major progress in that direction.

Secondly, each ring computes a hash every two clock cycles -- each nonce goes through the ring twice before we know if it is a share or not.  This is because the "sweet spot" in unrolling is 64 stages -- unroll less than that and you can't hardwire the K-values into the LUTs.  Unrolling any more than that adds no advantage, and reduces the "granularity" -- greater chance of being left with lots of empty space but still not quite enough for another ring.

So the calculation is hash_rate = num_rings*clock_rate*0.5.


The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
BTCurious
Hero Member
*****
Offline Offline

Activity: 714


^SEM img of Si wafer edge, scanned 2012-3-12.


View Profile
January 02, 2012, 09:14:33 PM
 #67

Would it be possible to have 2 rings which compute once per 2 cycles, like you have now, and one ring that computes once per 4 cycles? I imagine one that computes once per 4 cycles might be smaller, so you may be able to get it on there?

ZedZedNova
Full Member
***
Online Online

Activity: 173

Ooh La La, C'est Zoom!


View Profile
January 02, 2012, 09:58:21 PM
 #68

Secondly, each ring computes a hash every two clock cycles -- each nonce goes through the ring twice before we know if it is a share or not.

I'm new enough to both Bitcoin and FPGA design (I know some folks who design, but do not design myself) this that I'm probably missing something pretty obvious, but is there any benefit of using the first ring to feed the second ring?

Is there any benefit to, or possibility of, moving the blue ring "up" so that the part that jogs up and to the left is in the top left corner? How about rotating the green part such that the part that jogs up and to the left is jogging down and to the right and then located in the lower right corner?

Not knowing the architecture and layout of the target device is driving the second set of questions.

As I said I'm new to FPGA design, but I find it very interesting, and I'm interested in learning. If the questions are "stupid noob" questions, tell me and point me in a direction to go read so I can learn, and I'll go back to lurking. I understand the basic low level components, flip-flops, LUT, logic, etc., but not the FPGA design and layout specifics.

Thanks,

- Zed

eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
January 02, 2012, 10:03:49 PM
 #69

Would it be possible to have 2 rings which compute once per 2 cycles, like you have now, and one ring that computes once per 4 cycles? I imagine one that computes once per 4 cycles might be smaller, so you may be able to get it on there?

Well, it would be smaller, but significantly larger than half-size.  Remember, if you unroll less than 64 stages, you can't hardwire the K-values.  So the 4-cycles-per-hash ring would be unrolled only 32 stages.  Each stage would have to know to switch K-value on odd and even cycles, which adds logic, and I wouldn't be able to precompute nearly as much stuff.  It would also take a lot of effort to rework the design.  I don't think it's a net win.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
BTCurious
Hero Member
*****
Offline Offline

Activity: 714


^SEM img of Si wafer edge, scanned 2012-3-12.


View Profile
January 02, 2012, 10:06:18 PM
 #70

Would it be possible to have 2 rings which compute once per 2 cycles, like you have now, and one ring that computes once per 4 cycles? I imagine one that computes once per 4 cycles might be smaller, so you may be able to get it on there?

Well, it would be smaller, but significantly larger than half-size.  Remember, if you unroll less than 64 stages, you can't hardwire the K-values.  So the 4-cycles-per-hash ring would be unrolled only 32 stages.  Each stage would have to know to switch K-value on odd and even cycles, which adds logic, and I wouldn't be able to precompute nearly as much stuff.  It would also take a lot of effort to rework the design.  I don't think it's a net win.
Ah, fair enough. It may be something to keep in mind for if you really can't cram a third one on there though, assuming a 4-cycle is still smaller than a 2-cycle. Or go with your earlier idea of putting half of one on there, and then using in conjunction with another FPGA, perhaps.

eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
January 02, 2012, 10:07:22 PM
 #71

is there any benefit of using the first ring to feed the second ring?

Not really.  And it would add more special cases... if I get to three rings, I'd have one ring that expects to feed somebody else, one ring that expects to be fed by somebody else, and one ring that expects to feed itself -- three different designs!  Increased debugging/design effort.

Is there any benefit to, or possibility of, moving the blue ring "up" so that the part that jogs up and to the left is in the top left corner?

That's what I'm working on right now.  You'll notice I left a "divot" in the top row right where that funny chunk of empty black space is (I think that's where Xilinx puts the JTAG and configuration logic, which is why you can't use that area).

How about rotating the green part such that the part that jogs up and to the left is jogging down and to the right and then located in the lower right corner?

Yep, that's the other part I'm working on.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
Dexter770221
Legendary
*
Offline Offline

Activity: 1026


View Profile
January 03, 2012, 10:16:09 PM
 #72

You have to be good in chess Wink

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.
ZedZedNova
Full Member
***
Online Online

Activity: 173

Ooh La La, C'est Zoom!


View Profile
January 04, 2012, 01:34:14 AM
 #73

is there any benefit of using the first ring to feed the second ring?

Not really.  And it would add more special cases... if I get to three rings, I'd have one ring that expects to feed somebody else, one ring that expects to be fed by somebody else, and one ring that expects to feed itself -- three different designs!  Increased debugging/design effort.

OK, makes sense.

I was thinking three identical rings with one input and some selector logic. Each ring would always be fed by the selector, and would always output to the selector. The selector could use:

  • In from External (new share)
  • In from Internal (from another ring, 1st sha256 complete)
  • Out to External (2nd sha256 complete)

The selector would need to know when there is an available ring to route the next share, and whether the share that is being routed has 0, 1, or 2 sha256 operations computed.

But the more I think about this, it really boils down to each ring computes the first hash, then feeds itself that result and computes the hash, which it then reports as complete. So the selector logic would add overhead (delay) and complexity, and provides nothing useful. Right?


Is there any benefit to, or possibility of, moving the blue ring "up" so that the part that jogs up and to the left is in the top left corner?

That's what I'm working on right now.  You'll notice I left a "divot" in the top row right where that funny chunk of empty black space is (I think that's where Xilinx puts the JTAG and configuration logic, which is why you can't use that area).

Cool. I saw the divot and it makes complete sense.


How about rotating the green part such that the part that jogs up and to the left is jogging down and to the right and then located in the lower right corner?

Yep, that's the other part I'm working on.

Sweet!

So if this works out, running at 200MHz would yield ~300 MH/s, right? 150% of the device's operating frquency.

- Zed

sadpandatech
Hero Member
*****
Offline Offline

Activity: 504



View Profile
January 04, 2012, 01:41:37 AM
 #74


So if this works out, running at 200MHz would yield ~300 MH/s, right? 150% of the device's operating frquency.

- Zed

If he can get the rings to run at 200, sure. otherwise,

So the calculation is hash_rate = num_rings*clock_rate*0.5.

~241.5 @ 161MHz

A very worthwhile endeavour even at that rate though.

If you're not excited by the idea of being an early adopter 'now', then you should come back in three or four years and either tell us "Told you it'd never work!" or join what should, by then, be a much more stable and easier-to-use system. - GA
It is being worked on by smart people. -DamienBlack
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784


firstbits:1MinerQ


View Profile WWW
January 04, 2012, 05:11:33 AM
 #75

I've done my 2 Layer board design now. Just waiting before re-checking and sending it off to make a few. Size is 50mm x 50mm (2"x2") and is modular so many can plug together in a chain/tree. Wouldn't mind feedback from experts (I'm not one! Just a hobbyist) if they'd like to see design.

I'm wondering how much spare space is generally left over on the Ztex design and others. I want to add a couple 8 bit registers and shift the nonce data in/out serially so would have to modify a working hash core. I'm just going to embark on the details of this now. D/L and install Xilinx DS.


Enigma81
Full Member
***
Offline Offline

Activity: 177



View Profile
January 04, 2012, 05:37:35 AM
 #76

I've done my 2 Layer board design now. Just waiting before re-checking and sending it off to make a few. Size is 50mm x 50mm (2"x2") and is modular so many can plug together in a chain/tree. Wouldn't mind feedback from experts (I'm not one! Just a hobbyist) if they'd like to see design.

I'm wondering how much spare space is generally left over on the Ztex design and others. I want to add a couple 8 bit registers and shift the nonce data in/out serially so would have to modify a working hash core. I just going to embark on the details of this now. D/L and install Xilinx DS.



Take a look at the original fpgaminer code on github - it uses serial communication to communicate the nonce and 'golden hashes'..

VHDL https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/VHDL_Xilinx_Port
Verilog https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/Verilog_Xilinx_Port
Features: * Uses RS232 for communication with PC. * Compatible with ISE and Xilinx devices. * Python scripts act as the controller on the PC.

Enigma
BTCurious
Hero Member
*****
Offline Offline

Activity: 714


^SEM img of Si wafer edge, scanned 2012-3-12.


View Profile
January 04, 2012, 07:51:54 AM
 #77

What limits the clock speed? Is it unreliable performance when it's too high? Can that be solved with higher voltage, like with overclocking?

Enigma81
Full Member
***
Offline Offline

Activity: 177



View Profile
January 04, 2012, 08:13:28 AM
 #78

Typically, what limits FPGA timing is the routing of the interconnects.  An FPGA is configurable, but not infinitely so.  There are only so many possible paths from one LUT to the next..  When people speak of PAR, that's the Placement and Routing of these interconnects.

Each interconnect introduces some type of delay - there is no such thing as a zero latency interconnect.  There is some path delay, some rise and fall time of the signal, etc.

The design max speed will be limited by the slowest of all the interconnects.  If PAR manages to place and route them all with 5ns delay (200MHz), but there is one single connection that has a 20ns delay (50MHz), then the max speed of the entire design will be 50Mhz.  eldentyrell is manually placing and routing the entire design to try and avoid there being a weak link - automatic PAR is pretty good, but it isn't perfect.  I have no doubt that eldentyrell will be able to out-route the automated PAR, but it's a LOT of work.  I can't even imagine the number of hours he has into this.

For reference, the ztex design is currently limited to about 200MHz, but is using just about the entire chip for one double SHA-256 core.  eldentyrell is up to (I think) about 160MHz, but is using far less of the chip - hopefully leaving room for another single SHA-256 round.  The work he has done is really impressive - I honestly didn't think he would get as far as he has.  He must be an incredibly capable FPGA designer.

Enigma
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784


firstbits:1MinerQ


View Profile WWW
January 04, 2012, 10:27:43 AM
 #79

Take a look at the original fpgaminer code on github - it uses serial communication to communicate the nonce and 'golden hashes'..

VHDL https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/VHDL_Xilinx_Port
Verilog https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/Verilog_Xilinx_Port
Features: * Uses RS232 for communication with PC. * Compatible with ISE and Xilinx devices. * Python scripts act as the controller on the PC.

Enigma
Thanks! I'll do that. I only took a cursory look thru the Ztex core to get an idea how it gets data in and out. Is there any noticeable difference in performance/compactness between VHDL and Verilog? I haven't done either for several years and so I have to brush up but I always tended to favour the Verilog as I found it easier to follow and write. So that would be my preference. I was looking today at the interface code and it seems like it'll be easy to alter it to use serial I/O. My worry is about synthesis and placement being sub-optimal afterwards. Anyway, I should probably have my own thread now.

BkkCoins
Hero Member
*****
Offline Offline

Activity: 784


firstbits:1MinerQ


View Profile WWW
January 05, 2012, 06:14:04 AM
 #80

Are you seriously running all those instances? I hope not for too long...

They're spot instances; it's about $7/hr to run 25 of them & they're started/stopped on demand.
Definitely worth it in terms of build time reduction.

-rph

How do you split up the job into multiple parts for each instance? I'm just running my first implementation now on my laptop. C2D T5450 2GB RAM, needless to say it's quite slow. So far 3 hours and still 14,000 unrouted. I've dug up some docs on using cmd line and could probably setup an instance to get me onto a fast spot instance. Just not sure how it can work on multiple. It looks like the "place and route, par" that really needs the muscle.

Edit: Whoa. I guess I should have expected it slows down as it gets harder to route the end.

Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!