Bitcoin Forum
April 30, 2024, 03:57:39 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
  Print  
Author Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards  (Read 119415 times)
ElectricMucus
Legendary
*
Offline Offline

Activity: 1666
Merit: 1057


Marketing manager - GO MP


View Profile WWW
November 01, 2011, 08:32:40 PM
Last edit: November 01, 2011, 08:55:08 PM by ElectricMucus
 #21

I'm trying to derive the theory from reading the FPGA threads, and some rudimentary knowledge. I'm curious if I'm close, can anyone let me know if I'm wrong?

FPGAs (Field programmable gate arrays) are collections of logic gates on a chip. The gate types themselves can be changed, and the wiring between them can be changed, so you can make your own high speed chip layout.

Every timing tick, all gates "do their process", and update their outputs, based on what they just had as inputs.

Corner turns: Is this like, all processing is flowing to the right, and then at the end of the chip you need to do some wiring or tricks to continue processing to the left?

Almost,

The logic elements are nothing but very small RAMs which are refered to as LUTs (Look up Tables).
It is the same thing as writing a Logic Table. So you can realize for example the XORs in SHA-2 with it.

The wiring consists of a number of flipflops connected to the luts and have a backward wire to themselves and to other luts. So the FPGA can be used for useful computation.

All in all pretty wasteful but the high integration makes up for it.
1714492659
Hero Member
*
Offline Offline

Posts: 1714492659

View Profile Personal Message (Offline)

Ignore
1714492659
Reply with quote  #2

1714492659
Report to moderator
1714492659
Hero Member
*
Offline Offline

Posts: 1714492659

View Profile Personal Message (Offline)

Ignore
1714492659
Reply with quote  #2

1714492659
Report to moderator
1714492659
Hero Member
*
Offline Offline

Posts: 1714492659

View Profile Personal Message (Offline)

Ignore
1714492659
Reply with quote  #2

1714492659
Report to moderator
Once a transaction has 6 confirmations, it is extremely unlikely that an attacker without at least 50% of the network's computation power would be able to reverse it.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714492659
Hero Member
*
Offline Offline

Posts: 1714492659

View Profile Personal Message (Offline)

Ignore
1714492659
Reply with quote  #2

1714492659
Report to moderator
1714492659
Hero Member
*
Offline Offline

Posts: 1714492659

View Profile Personal Message (Offline)

Ignore
1714492659
Reply with quote  #2

1714492659
Report to moderator
BTCurious
Hero Member
*****
Offline Offline

Activity: 714
Merit: 504


^SEM img of Si wafer edge, scanned 2012-3-12.


View Profile
November 01, 2011, 08:53:56 PM
 #22

Ah right, that makes sense, thanks! Smiley
And the flipflops, they're synced to the clock signal, I assume then.

Anyway, the LUTs integrated as RAMs makes sense. I'm wondering though how the wiring works, physically. And also what a corner turn is, but I guess that's about it not being easy to switch direction in the wiring, due to implementation details. I'll look it up myself when I have some time though. Unless you feel like explaining Smiley

ElectricMucus
Legendary
*
Offline Offline

Activity: 1666
Merit: 1057


Marketing manager - GO MP


View Profile WWW
November 01, 2011, 09:07:30 PM
 #23

That was just a general description of PLDs, and it is valid down the line from SPLDs, CPLDs and FPGAs

I never really understood the difference, but what I have been able to grasp is that FPGAs have flipflops per LUT. In CPLDs only the immediate neigbours can be addressed directly. In FPGAs you can also address more distant ones.
This is refereed to as wide routing and it takes a longer time till the signal is propagated through these lines. (As for if those are only hard wired lines or if they are buffered somehow idk)

If you have a LUT at a corner or edge you need to utilize those far routings to access the same amount of other LUTs as an inner one.
The trick usually is to use them for other tasks then the inner resources, like I/O (they usually have access to a pin)
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
November 01, 2011, 09:39:07 PM
 #24

I never really understood the difference, but what I have been able to grasp is that FPGAs have flipflops per LUT. In CPLDs only the immediate neigbours can be addressed directly. In FPGAs you can also address more distant ones.
Also, Wikipedia claims that CPLDs don't actually use LUTs to implement logic, which makes sense given that they're descended from PALs and those were purely sum-of-products. (The first ones were pretty much just several PALs glued together with some routing logic, from what I can tell.)

This is refereed to as wide routing and it takes a longer time till the signal is propagated through these lines. (As for if those are only hard wired lines or if they are buffered somehow idk)
All modern ones have active routing that buffers the signal somehow, though the original ones did have hard-wired lines.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
O_Shovah
Sr. Member
****
Offline Offline

Activity: 410
Merit: 252


Watercooling the world of mining


View Profile
December 02, 2011, 07:45:11 PM
 #25

Hi big chip,

How is your work going ?

Any new results yet ?

I really would like to see the fully routed design finished.
Man is stil superior to mashine afther all Wink

eldentyrell (OP)
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
December 03, 2011, 08:14:49 PM
 #26

How is your work going ?
Any new results yet ?

At 1-hash-per-clock (two rings) I am at 143Mhz (mining right now) and close to 150mhz (just one route fails timing).

On a lark I tried out the 1.5-hashes-per-clock (three rings) setup.  It works, but very slowly, mostly because I need to do some extra work to leave space between the rings, and I haven't even begun working on that yet.  Once I have the two-ring design where I want it to be I will backport all of those improvements to the three-ring design.

Still lots of work to be done, and lots on my plate.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
gmaxwell
Moderator
Legendary
*
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 04, 2011, 12:39:06 AM
 #27

At 1-hash-per-clock (two rings) I am at 143Mhz (mining right now) and close to 150mhz (just one route fails timing).
On a lark I tried out the 1.5-hashes-per-clock (three rings) setup.

Your post is worthless without chip-plot-porn. Gimme gimme.

The possibility of 225 MH/s on S6/LX150 sounds quite exciting. I assume the power consumption is fairly low compared to the designs running at higher clock rates?
eldentyrell (OP)
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
December 04, 2011, 05:55:00 AM
 #28

At 1-hash-per-clock (two rings) I am at 143Mhz (mining right now) and close to 150mhz (just one route fails timing).
On a lark I tried out the 1.5-hashes-per-clock (three rings) setup.

Your post is worthless without chip-plot-porn. Gimme gimme.

Ok, fine.



The corner turn on the right hand side wasn't all that difficult.  Unfortunately there's a lot of irregular stuff around the first and last stage (in red), and algorithmically placing that stuff is not feasible.  So instead I've arranged for the top row to gradually "jog" upward which leaves a triangluar "hole" near the first and last stage, and I let Xilinx's tools autoplace the random crap in that hole.  This is what lets me get close to 150mhz.  Right now the hole is way bigger than it needs to be (in the plot a cell is purple even if it's nearly empty); once I get to my target clock speed I'll start shrinking the hole down to a more reasonable size.

The possibility of 225 MH/s on S6/LX150 sounds quite exciting. I assume the power consumption is fairly low compared to the designs running at higher clock rates?

Well, we'll see; no guarantees yet.  I have not taken any careful power measurements; last time I checked my 100mhz two-ring design pulled 5W per board measuring a cluster if 6 boards with a crude kill-a-watt at the wall, so this figure includes inefficiencies introduced by the ATX power supply.

I expect power consumption per-hash-per-second to be similar to any other design with one layer of registers per SHA256-stage, and of course much less than those with two layers of registers per stage.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
gmaxwell
Moderator
Legendary
*
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 04, 2011, 06:12:45 AM
 #29

The corner turn on the right hand side wasn't all that difficult.  Unfortunately there's a lot of irregular stuff around the first and last stage (in red), and algorithmically placing that stuff is not feasible.  So instead I've arranged for the top row to gradually "jog" upward which leaves a triangluar "hole" near the first and last stage, and I let Xilinx's tools autoplace the random crap in that hole.

Makes sense! ... it sure looks like there is actually room to fit another fully unrolled unit, assuming it was wired the other way so that the jog ended up oppose lower jog.
gmaxwell
Moderator
Legendary
*
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 26, 2011, 08:21:25 PM
 #30

I see your user icon has changed.

I for one welcome our new S6-LX150 300MH/s overlord.
O_Shovah
Sr. Member
****
Offline Offline

Activity: 410
Merit: 252


Watercooling the world of mining


View Profile
December 27, 2011, 07:42:37 AM
 #31

So how is etherything going ?

Have you had the time to search for some new paths ?

eldentyrell (OP)
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
December 27, 2011, 10:54:44 PM
 #32

I for one welcome our new S6-LX150 300MH/s overlord.

Ah, not so fast yet.

As expected, moving up the frequency ladder turns into a game of whack-a-mole... fix one thing, something else becomes the critical path.  I'm back to fighting the corner turn.

What was not expected was how hard it would be to get control over the routing from Xilinx's tools.  I can get them to route the corner turn by itself, and I can get everything-but-the-corner turn to route, and I can show that the routing resources used are disjoint, but I can't get them both to route at once!

The sad reality is that Xilinx really does not provide any mechanism at all that says to the router "you absolutely must route this wire along this path". There are placer directives that can force placement, but even the "DIRT strings" used to try to force routing can be ignored by PAR under some circumstances, and I'm hitting them.  Ditto for SmartGuide.

Very frustrating.  I know where the wires should go, but I've spent countless hours trying to "trick" Xilinx's tools into doing what I already know how to do.

The very, very, very last resort is to write my own router by scripting fpga_edline.  I know that sounds desperate, but that's what it might come down to.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
Enigma81
Full Member
***
Offline Offline

Activity: 180
Merit: 100



View Profile
December 28, 2011, 02:08:23 AM
 #33

Have you considered trying to lay this out in an Altera Cyclone IV?  I'm a Xilinx guy from wayyyy back, and I almost always choose a Xilinx part for my projects.. BUT - I have to admit that Altera has some nice things in the Cyclone IV.

Originally, the FPGAMiner code was based upon the Cyclone IV.  It was pretty much abandoned once the (cheaper) spartan 6 was able to outpace what it could do.  Development on that chip stopped - most likely because of the cost premium - but i'm not entirely sure that the Cyclone couldn't have reached more MH/$ if development had continued.  It has less exciting LUTS than the 6 input Spartan 6 type, but it has better routing resources.

Enigma
gmaxwell
Moderator
Legendary
*
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 28, 2011, 05:51:33 AM
 #34

Very frustrating.  I know where the wires should go, but I've spent countless hours trying to "trick" Xilinx's tools into doing what I already know how to do.

The very, very, very last resort is to write my own router by scripting fpga_edline.  I know that sounds desperate, but that's what it might come down to.

On the plus side, if you go that route the result should be more stable— small changes won't cause the darn thing to fail or to achieve drastically worse timing in unrelated areas.
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
December 28, 2011, 11:39:45 AM
 #35

Would what you have now fit into an 'SLX75 ? I'm curious because for playing around they are cheaper to experiment with.

I'm just working on a board design that I'm going to home solder (griddle) and the risk in using a 'SLX75 is lower. My goal is to make lego-block miners that have as little overhead cost as possible. So in crazy fashion I'm doing a 2 Layer board with only FPGA +Pwr. They will connect together like scrabble tiles and communicate via each other to only one master controller (possibly RaspberryPi since it has Linux and network on board).

When I get further I'll make a thread and explain my idea more fully. I'd like to do testing with a 'SLX75 or even 'SLX25 as the loss would be lower if I screw up. They both come in the same FBGA484 pkg. I've done FPGA design before but I'm still very non-expert at understanding how the hashing rounds work etc.


eldentyrell (OP)
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
December 28, 2011, 07:34:25 PM
 #36

Have you considered trying to lay this out in an Altera Cyclone IV?

Yes, I have considered it.  Spartans won out for two reasons:

- A quarter of the LUTs on a Spartan can be turned into rather large shift registers (SRL16), and I use these a lot.

- Much higher register density

Each Spartan SLICE has eight registers, but half of them are very difficult to use (and often ignored by the automatic synthesis tools).  One of the reasons my design is so compact is that I made sure to use those registers.  I have 91% register utilization within the occupied slices of my design (obviously lots of slices are unoccupied so the overall utilization is much lower).

So, a lot of designs port nicely from Spartan to Altera because they were wasting half the registers to begin with.  I'm not, so I would pay a steep penalty.

I have, however, looked at a couple of SASIC platforms that I could port to fairly easily.  If an investor were to fall out of the sky, I know which one to go with.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
eldentyrell (OP)
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
December 28, 2011, 07:41:02 PM
 #37

Would what you have now fit into an 'SLX75 ?

Nope.  The LX75 looks like the "left half" of an LX150.  Since the carry chains run vertically you're forced to orient your words that way, and therefore the LX75 can only fit half as many stages before you have to do a corner-turn.  The headache I'm dealing with now would be multiplied by three.

I prototyped my PCBs using LX45s since they're the smallest chip in the FG484 package, but never ported the design to that chip.

I'm just working on a board design that I'm going to home solder (griddle) and the risk in using a 'SLX75 is lower.

I strongly urge you to use a toaster oven instead of a griddle.  I started out using a griddle but the yield was awful.  It will take you longer to get the toaster oven working right, but once it's dialed in your consistency will be really good.

My goal is to make lego-block miners that have as little overhead cost as possible. So in crazy fashion I'm doing a 2 Layer board with only FPGA +Pwr.

Distributing power to all of those boards is the hard part.  Putting a DC-DC regulator on every board is expensive.  Distributing high-current 1.2V across the boards requires expensive (beefy) connectors between the boards.  Make sure you think this through.


The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
December 29, 2011, 03:10:20 AM
Last edit: December 29, 2011, 03:54:08 AM by BkkCoins
 #38

I strongly urge you to use a toaster oven instead of a griddle.  I started out using a griddle but the yield was awful.  It will take you longer to get the toaster oven working right, but once it's dialed in your consistency will be really good.
Thank you. I'll do that. Hard to get here so I may work out another oven alternative.

Edit: I have an SMD rework station that I haven't used in 3 years but I'm not sure hot air from above would be good enough to do it and may stress the chip too much. Have you tried something like that?

Distributing power to all of those boards is the hard part.  Putting a DC-DC regulator on every board is expensive.  Distributing high-current 1.2V across the boards requires expensive (beefy) connectors between the boards.  Make sure you think this through.
I'll be doing the on-board regulator method as running 1.2V isn't feasible. I want these to connect together in an array that can grow to 64 units or more. I'm thinking about 2x 5A reg. (like AOZ1037 at $1.41 instead of the more typical high amp reg seen on other boards so far. (If I use the same one for 3.3V I can buy 3x qty to save cost.) I'm not too happy with the 80% efficiency or less though.

Enigma81
Full Member
***
Offline Offline

Activity: 180
Merit: 100



View Profile
December 29, 2011, 11:16:54 AM
 #39

I strongly urge you to use a toaster oven instead of a griddle.  I started out using a griddle but the yield was awful.  It will take you longer to get the toaster oven working right, but once it's dialed in your consistency will be really good.
Thank you. I'll do that. Hard to get here so I may work out another oven alternative.

Edit: I have an SMD rework station that I haven't used in 3 years but I'm not sure hot air from above would be good enough to do it and may stress the chip too much. Have you tried something like that?

Distributing power to all of those boards is the hard part.  Putting a DC-DC regulator on every board is expensive.  Distributing high-current 1.2V across the boards requires expensive (beefy) connectors between the boards.  Make sure you think this through.
I'll be doing the on-board regulator method as running 1.2V isn't feasible. I want these to connect together in an array that can grow to 64 units or more. I'm thinking about 2x 5A reg. (like AOZ1037 at $1.41 instead of the more typical high amp reg seen on other boards so far. (If I use the same one for 3.3V I can buy 3x qty to save cost.) I'm not too happy with the 80% efficiency or less though.

You'd be much better off using the AOZ1021 / AOZ1025 combination that the ztex boards use - if you're set on going the on-board regulator route.  SMPS in parallel is way more of a PITA than most people assume...

Enigma

P.S. Distributing a communications chain (USB, Serial, JTAG) is also a major consideration.  A JTAG chain is simple for 2 or 3 devices, but 64..  Another PITA - especially TCK.
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
December 29, 2011, 01:04:06 PM
Last edit: December 29, 2011, 01:39:09 PM by BkkCoins
 #40

I tried to find the AOZ1025 but they seem to be hard to get. Can't find any stock.
Edit: Found it at Arrow... 3000 qty. only.
(Actually I found a couple good alternates from IR and Fairchild. Higher efficiency (90% under load) and pretty cheap. FAN2108, IR3871. Looking at them now as Digikey has.

What happens when you put two AOZ1021 / AOZ1037 in parallel? I thought that would work but then DC-DC regs are new to me. I've only used linear parts before.No longer the plan.

I'll likely drop the 3.3V anyway as an ATX PSU has regulated 3.3 already. Then just drop 12V to 1.2V as that seems to be more efficient and most PSU have more watts available on 12V. So a 20/24 pin adapter with non-standard onboard connector so users don't accidentally plug in a Molex and blow it.


Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!