Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards

BkkCoins

Hero Member

Offline

Activity: 784
Merit: 1009

firstbits:1MinerQ

Re: FPGA Chip Plot Thread

January 01, 2012, 06:24:31 AM

#61

Are you seriously running all those instances? I hope not for too long...

Klondike Design (store) - Klondike OS ASIC Mining Platform - Development Donations Appreciated.

rph

Full Member

Offline

Activity: 176
Merit: 100

Re: FPGA Chip Plot Thread

January 01, 2012, 07:47:17 PM
Last edit: January 01, 2012, 08:14:32 PM by rph

#62

Quote from: BkkCoins on January 01, 2012, 06:24:31 AM

Are you seriously running all those instances? I hope not for too long...

They're spot instances; it's about $7/hr to run 25 of them & they're started/stopped on demand.
Definitely worth it in terms of build time reduction.

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891

eldentyrell (OP)

Donator
Legendary

Offline

Activity: 1016
Merit: 1005

felonious vagrancy, personified

Re: FPGA Chip Plot Thread

January 02, 2012, 08:25:38 PM

#63

New plot. Two rings, 161mhz. As you can see I'm getting closer to being able to cram that third ring in there.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.

eldentyrell (OP)

Donator
Legendary

Offline

Activity: 1016
Merit: 1005

felonious vagrancy, personified

Re: FPGA Chip Plot Thread

January 02, 2012, 08:26:36 PM

#64

Quote from: rph on January 01, 2012, 04:25:17 AM

heh, I'm still working on that..

-rph

Amazon EC2 FTW!

DeepBit

Donator
Hero Member

Offline

Activity: 532
Merit: 501

We have cookies

Re: FPGA Chip Plot Thread

January 02, 2012, 08:28:21 PM

#65

Quote from: eldentyrell on January 02, 2012, 08:25:38 PM

New plot. Two rings, 161mhz. As you can see I'm getting closer to being able to cram that third ring in there.

Does it means that you are getting 320 MH/s per chip ?

Welcome to my bitcoin mining pool: https://deepbit.net ~ 3600 GH/s, Both payment schemes, instant payout, no invalid blocks !
Coming soon: ICBIT Trading platform

eldentyrell (OP)

Donator
Legendary

Offline

Activity: 1016
Merit: 1005

felonious vagrancy, personified

Re: FPGA Chip Plot Thread

January 02, 2012, 08:54:13 PM

#66

Quote from: DeepBit on January 02, 2012, 08:28:21 PM

Quote from: eldentyrell on January 02, 2012, 08:25:38 PM

New plot. Two rings, 161mhz. As you can see I'm getting closer to being able to cram that third ring in there.

Does it means that you are getting 320 MH/s per chip ?

No.

First off, I haven't yet succeeded in cramming in the third ring, so this is still hypothetical. I want to be very clear about that, although as you can see I'm obviously making major progress in that direction.

Secondly, each ring computes a hash every two clock cycles -- each nonce goes through the ring twice before we know if it is a share or not. This is because the "sweet spot" in unrolling is 64 stages -- unroll less than that and you can't hardwire the K-values into the LUTs. Unrolling any more than that adds no advantage, and reduces the "granularity" -- greater chance of being left with lots of empty space but still not quite enough for another ring.

So the calculation is hash_rate = num_rings*clock_rate*0.5.

BTCurious

Hero Member

Offline

Activity: 714
Merit: 504

^SEM img of Si wafer edge, scanned 2012-3-12.

Re: FPGA Chip Plot Thread

January 02, 2012, 09:14:33 PM

#67

Would it be possible to have 2 rings which compute once per 2 cycles, like you have now, and one ring that computes once per 4 cycles? I imagine one that computes once per 4 cycles might be smaller, so you may be able to get it on there?

Bitcoin-OTC rating

ZedZedNova

Sr. Member

Offline

Activity: 475
Merit: 265

Ooh La La, C'est Zoom!

Re: FPGA Chip Plot Thread

January 02, 2012, 09:58:21 PM

#68

Quote from: eldentyrell on January 02, 2012, 08:54:13 PM

Secondly, each ring computes a hash every two clock cycles -- each nonce goes through the ring twice before we know if it is a share or not.

I'm new enough to both Bitcoin and FPGA design (I know some folks who design, but do not design myself) this that I'm probably missing something pretty obvious, but is there any benefit of using the first ring to feed the second ring?

Is there any benefit to, or possibility of, moving the blue ring "up" so that the part that jogs up and to the left is in the top left corner? How about rotating the green part such that the part that jogs up and to the left is jogging down and to the right and then located in the lower right corner?

Not knowing the architecture and layout of the target device is driving the second set of questions.

As I said I'm new to FPGA design, but I find it very interesting, and I'm interested in learning. If the questions are "stupid noob" questions, tell me and point me in a direction to go read so I can learn, and I'll go back to lurking. I understand the basic low level components, flip-flops, LUT, logic, etc., but not the FPGA design and layout specifics.

Thanks,

- Zed

No mining at the moment.

eldentyrell (OP)

Donator
Legendary

Offline

Activity: 1016
Merit: 1005

felonious vagrancy, personified

Re: FPGA Chip Plot Thread

January 02, 2012, 10:03:49 PM

#69

Quote from: BTCurious on January 02, 2012, 09:14:33 PM

Well, it would be smaller, but significantly larger than half-size. Remember, if you unroll less than 64 stages, you can't hardwire the K-values. So the 4-cycles-per-hash ring would be unrolled only 32 stages. Each stage would have to know to switch K-value on odd and even cycles, which adds logic, and I wouldn't be able to precompute nearly as much stuff. It would also take a lot of effort to rework the design. I don't think it's a net win.

BTCurious

Hero Member

Offline

Activity: 714
Merit: 504

^SEM img of Si wafer edge, scanned 2012-3-12.

Re: FPGA Chip Plot Thread

January 02, 2012, 10:06:18 PM

#70

Quote from: eldentyrell on January 02, 2012, 10:03:49 PM

Quote from: BTCurious on January 02, 2012, 09:14:33 PM

Ah, fair enough. It may be something to keep in mind for if you really can't cram a third one on there though, assuming a 4-cycle is still smaller than a 2-cycle. Or go with your earlier idea of putting half of one on there, and then using in conjunction with another FPGA, perhaps.

Bitcoin-OTC rating

eldentyrell (OP)

Donator
Legendary

Offline

Activity: 1016
Merit: 1005

felonious vagrancy, personified

Re: FPGA Chip Plot Thread

January 02, 2012, 10:07:22 PM

#71

Quote from: ZedZedNova on January 02, 2012, 09:58:21 PM

is there any benefit of using the first ring to feed the second ring?

Not really. And it would add more special cases... if I get to three rings, I'd have one ring that expects to feed somebody else, one ring that expects to be fed by somebody else, and one ring that expects to feed itself -- three different designs! Increased debugging/design effort.

Quote from: ZedZedNova on January 02, 2012, 09:58:21 PM

Is there any benefit to, or possibility of, moving the blue ring "up" so that the part that jogs up and to the left is in the top left corner?

That's what I'm working on right now. You'll notice I left a "divot" in the top row right where that funny chunk of empty black space is (I think that's where Xilinx puts the JTAG and configuration logic, which is why you can't use that area).

Quote from: ZedZedNova on January 02, 2012, 09:58:21 PM

How about rotating the green part such that the part that jogs up and to the left is jogging down and to the right and then located in the lower right corner?

Yep, that's the other part I'm working on.

Dexter770221

Legendary

Offline

Activity: 1029
Merit: 1000

Re: FPGA Chip Plot Thread

January 03, 2012, 10:16:09 PM

#72

You have to be good in chess Wink

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.

ZedZedNova

Sr. Member

Offline

Activity: 475
Merit: 265

Ooh La La, C'est Zoom!

Re: FPGA Chip Plot Thread

January 04, 2012, 01:34:14 AM

#73

Quote from: eldentyrell on January 02, 2012, 10:07:22 PM

Quote from: ZedZedNova on January 02, 2012, 09:58:21 PM

is there any benefit of using the first ring to feed the second ring?

OK, makes sense.

I was thinking three identical rings with one input and some selector logic. Each ring would always be fed by the selector, and would always output to the selector. The selector could use:

In from External (new share)
In from Internal (from another ring, 1st sha256 complete)
Out to External (2nd sha256 complete)

The selector would need to know when there is an available ring to route the next share, and whether the share that is being routed has 0, 1, or 2 sha256 operations computed.

But the more I think about this, it really boils down to each ring computes the first hash, then feeds itself that result and computes the hash, which it then reports as complete. So the selector logic would add overhead (delay) and complexity, and provides nothing useful. Right?

Quote from: eldentyrell on January 02, 2012, 10:07:22 PM

Quote from: ZedZedNova on January 02, 2012, 09:58:21 PM

Is there any benefit to, or possibility of, moving the blue ring "up" so that the part that jogs up and to the left is in the top left corner?

Cool. I saw the divot and it makes complete sense.

Quote from: eldentyrell on January 02, 2012, 10:07:22 PM

Quote from: ZedZedNova on January 02, 2012, 09:58:21 PM

How about rotating the green part such that the part that jogs up and to the left is jogging down and to the right and then located in the lower right corner?

Yep, that's the other part I'm working on.

Sweet!

So if this works out, running at 200MHz would yield ~300 MH/s, right? 150% of the device's operating frquency.

- Zed

No mining at the moment.

sadpandatech

Hero Member

Offline

Activity: 504
Merit: 500

Re: FPGA Chip Plot Thread

January 04, 2012, 01:41:37 AM

#74

Quote from: ZedZedNova on January 04, 2012, 01:34:14 AM

So if this works out, running at 200MHz would yield ~300 MH/s, right? 150% of the device's operating frquency.

- Zed

If he can get the rings to run at 200, sure. otherwise,

So the calculation is hash_rate = num_rings*clock_rate*0.5.

~241.5 @ 161MHz

A very worthwhile endeavour even at that rate though.

If you're not excited by the idea of being an early adopter 'now', then you should come back in three or four years and either tell us "Told you it'd never work!" or join what should, by then, be a much more stable and easier-to-use system.
- GA

It is being worked on by smart people. -DamienBlack

BkkCoins

Hero Member

Offline

Activity: 784
Merit: 1009

firstbits:1MinerQ

Re: FPGA Chip Plot Thread

January 04, 2012, 05:11:33 AM
Last edit: January 04, 2012, 01:59:14 PM by BkkCoins

#75

I've done my 2 Layer board design now. Just waiting before re-checking and sending it off to make a few. Size is 50mm x 50mm (2"x2") and is modular so many can plug together in a chain/tree. Wouldn't mind feedback from experts (I'm not one! Just a hobbyist) if they'd like to see design.

I'm wondering how much spare space is generally left over on the Ztex design and others. I want to add a couple 8 bit registers and shift the nonce data in/out serially so would have to modify a working hash core. I'm just going to embark on the details of this now. D/L and install Xilinx DS.

Klondike Design (store) - Klondike OS ASIC Mining Platform - Development Donations Appreciated.

Enigma81

Full Member

Offline

Activity: 180
Merit: 100

Re: FPGA Chip Plot Thread

January 04, 2012, 05:37:35 AM

#76

Quote from: BkkCoins on January 04, 2012, 05:11:33 AM

I've done my 2 Layer board design now. Just waiting before re-checking and sending it off to make a few. Size is 50mm x 50mm (2"x2") and is modular so many can plug together in a chain/tree. Wouldn't mind feedback from experts (I'm not one! Just a hobbyist) if they'd like to see design.

I'm wondering how much spare space is generally left over on the Ztex design and others. I want to add a couple 8 bit registers and shift the nonce data in/out serially so would have to modify a working hash core. I just going to embark on the details of this now. D/L and install Xilinx DS.

Take a look at the original fpgaminer code on github - it uses serial communication to communicate the nonce and 'golden hashes'..

VHDL https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/VHDL_Xilinx_Port
Verilog https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/Verilog_Xilinx_Port
Features: * Uses RS232 for communication with PC. * Compatible with ISE and Xilinx devices. * Python scripts act as the controller on the PC.

Enigma

BTCurious

Hero Member

Offline

Activity: 714
Merit: 504

^SEM img of Si wafer edge, scanned 2012-3-12.

Re: FPGA Chip Plot Thread

January 04, 2012, 07:51:54 AM

#77

What limits the clock speed? Is it unreliable performance when it's too high? Can that be solved with higher voltage, like with overclocking?

Bitcoin-OTC rating

Enigma81

Full Member

Offline

Activity: 180
Merit: 100

Re: FPGA Chip Plot Thread

January 04, 2012, 08:13:28 AM

#78

Typically, what limits FPGA timing is the routing of the interconnects. An FPGA is configurable, but not infinitely so. There are only so many possible paths from one LUT to the next.. When people speak of PAR, that's the Placement and Routing of these interconnects.

Each interconnect introduces some type of delay - there is no such thing as a zero latency interconnect. There is some path delay, some rise and fall time of the signal, etc.

The design max speed will be limited by the slowest of all the interconnects. If PAR manages to place and route them all with 5ns delay (200MHz), but there is one single connection that has a 20ns delay (50MHz), then the max speed of the entire design will be 50Mhz. eldentyrell is manually placing and routing the entire design to try and avoid there being a weak link - automatic PAR is pretty good, but it isn't perfect. I have no doubt that eldentyrell will be able to out-route the automated PAR, but it's a LOT of work. I can't even imagine the number of hours he has into this.

For reference, the ztex design is currently limited to about 200MHz, but is using just about the entire chip for one double SHA-256 core. eldentyrell is up to (I think) about 160MHz, but is using far less of the chip - hopefully leaving room for another single SHA-256 round. The work he has done is really impressive - I honestly didn't think he would get as far as he has. He must be an incredibly capable FPGA designer.

Enigma

BkkCoins

Hero Member

Offline

Activity: 784
Merit: 1009

firstbits:1MinerQ

Re: FPGA Chip Plot Thread

January 04, 2012, 10:27:43 AM

#79

Quote from: Enigma81 on January 04, 2012, 05:37:35 AM

Thanks! I'll do that. I only took a cursory look thru the Ztex core to get an idea how it gets data in and out. Is there any noticeable difference in performance/compactness between VHDL and Verilog? I haven't done either for several years and so I have to brush up but I always tended to favour the Verilog as I found it easier to follow and write. So that would be my preference. I was looking today at the interface code and it seems like it'll be easy to alter it to use serial I/O. My worry is about synthesis and placement being sub-optimal afterwards. Anyway, I should probably have my own thread now.

Klondike Design (store) - Klondike OS ASIC Mining Platform - Development Donations Appreciated.

BkkCoins

Hero Member

Offline

Activity: 784
Merit: 1009

firstbits:1MinerQ

Re: FPGA Chip Plot Thread

January 05, 2012, 06:14:04 AM
Last edit: January 05, 2012, 07:33:46 AM by BkkCoins

#80

Quote from: rph on January 01, 2012, 07:47:17 PM

Quote from: BkkCoins on January 01, 2012, 06:24:31 AM

Are you seriously running all those instances? I hope not for too long...

They're spot instances; it's about $7/hr to run 25 of them & they're started/stopped on demand.
Definitely worth it in terms of build time reduction.

-rph

How do you split up the job into multiple parts for each instance? I'm just running my first implementation now on my laptop. C2D T5450 2GB RAM, needless to say it's quite slow. So far 3 hours and still 14,000 unrouted. I've dug up some docs on using cmd line and could probably setup an instance to get me onto a fast spot instance. Just not sure how it can work on multiple. It looks like the "place and route, par" that really needs the muscle.

Edit: Whoa. I guess I should have expected it slows down as it gets harder to route the end.

Klondike Design (store) - Klondike OS ASIC Mining Platform - Development Donations Appreciated.

Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »

Bitcoin Forum > Bitcoin > Mining > Hardware > Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards

« previous topic next topic »