Bitcoin Forum
December 05, 2016, 10:42:53 AM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
  Print  
Author Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards  (Read 109509 times)
BTC-engineer
Sr. Member
****
Online Online

Activity: 338



View Profile
March 09, 2012, 10:24:49 PM
 #121


The design is very easy to forward-port to the Xilinx 7-series parts; I just haven't had a reason to do that yet.  I've even backwards-ported it to older devices, but the effort/reward tradeoff there doesn't usually work out (it did this time only because I got the chips almost-for-free).  It's also possible to port it to most SASIC platforms, but my "are you serious about this" threshold for exploring that is really really high (and only with people based in the USA since there would be contracts involved).

Congratulations also from me for the great progress in your hard work.

Interesting that you think your design could be easy forward-ported to the new xilinx 28nm FPGA's. This surprise me a litter bit, because I always thought your design is so highly spartan 6 LX150 optimized/specific. How deep did you already look into the Artix architecture and didn't you have to do a lot of work just by newly 'filling up' the bigger chip, independently from the slightly other architecture?

I'm playing with the idea to build up a FPGA board with Artix FPGA's.
One of the fist ones which will come out will be the 352K version of the Artix, but it doesn't look like the first chips will be available <6-8 month :-(  
1480934573
Hero Member
*
Offline Offline

Posts: 1480934573

View Profile Personal Message (Offline)

Ignore
1480934573
Reply with quote  #2

1480934573
Report to moderator
1480934573
Hero Member
*
Offline Offline

Posts: 1480934573

View Profile Personal Message (Offline)

Ignore
1480934573
Reply with quote  #2

1480934573
Report to moderator
1480934573
Hero Member
*
Offline Offline

Posts: 1480934573

View Profile Personal Message (Offline)

Ignore
1480934573
Reply with quote  #2

1480934573
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
March 09, 2012, 10:29:17 PM
 #122

I'll even let somebody bring their own board but I have to keep the board afterwards.  I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo.
I'm not sure I understand this requirement. Are you somehow burning an irreversible encryption key into the chip first? Is there no way to undo that step?

Large Spartan chips like the 150 have a WRITE-ONLY nonvolatile register that can hold a bitstream decryption key.  There is (supposedly) no way to read the key back from the register; all you can do is hand the device an encrypted bitstream and let it use the key to decrypt+load.

The device also has a unique identity register (DNA).  Unfortunately it is utterly trivial to create a circuit that looks exactly like this unique identity register and then modify an unencrypted design to use that instead of the true DNA register.  So, chip-specific designs must be encrypted.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
Inspector 2211
Sr. Member
****
Offline Offline

Activity: 383



View Profile
March 09, 2012, 10:29:48 PM
 #123

Potential bidders for the IP are altera, xilinx, possibly others (like terasic, etc.) and the BTC FPGA community. I know very little about the fpga market but

The topology makes use of a few Xilinx-specific features, so it would require effort to port that.  However, the geometry is very Xilinx-specific.  Porting to Altera is as much work as porting to a SASIC platform like eASIC.

I'd guess that big players (altera,xilinx) wouldn't see BTC mining as a big enough market

Correct.  This is still way below Xilinx's radar.

How do you convince anyone that what you have is legit? You'd have to let them see something under NDA? What if they say "no thanks" and go do it themselves based on what they saw.

When there is a need for me to convince people I will be happy to give live, in-person demos here in NorCal.  I'll even let somebody bring their own board but I have to keep the board afterwards.  I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo.

EldenTyrell, I'm here in the South Bay (with a home office in north-east San Jose and a business/mining office in Santa Clara next to Nvidia) and I have a ZTEX board and I can sell it to you for what I paid for it, or $50 less, or whatever we agree on.

In case you put your bitstream up on Kickstarter, I'll also make a low-to-mid 3-figure pledge for early access to a 240 MH/s or better bitstream. (Right now, it's running at 209 MH/s and I'm not really interested in paying for, say, 220 MH/s.)
TheSeven
Hero Member
*****
Offline Offline

Activity: 504


FPGA Mining LLC


View Profile WWW
March 09, 2012, 10:30:06 PM
 #124

Yery interesting results... I'd like to see a bit more information though:
  • Where is the critical path, and how much could that be optimized? (Can you give a best-case estimate of the physical limits of achievable hashrate?)
  • How many pipeline stages does this design have, per core? Are the sha256 rounds doubly registered?
  • This looks pretty much crammed into the FPGA Smiley
    If you provide this as a hardmacro, is there even sufficient room to easily add a PC interface to it?
  • As the developer of MPBM, and being someone who has done at least a little VHDL design and implemented a miner core, I do understand very well what order of magnitude of effort this is. Especially with this all-broken Xilinx toolchain. However, a simple miner software can be written in basically no time (and that's how MPBM started months ago). But if you design something for flexibility like the new MPBM generation or cgminer, it'll take at least 10 times as long. May I ask how much time you have realistically spent on implementing and optimizing this FPGA design and the neccessary tools to generate it?
  • Assuming the bitcoin FPGA community (and possibly some board vendors) would want you to optimize this design until you're hitting real roadblocks (300MH/s maybe?), and release everything that's neccessary to regenerate and further improve it under an open source license, roughly how much money would we need?

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
March 09, 2012, 10:33:54 PM
 #125

Interesting that you think your design could be easy forward-ported to the new xilinx 28nm FPGA's.

Well, feature size isn't something you can detect using Verilog code...

This surprise me a litter bit, because I always thought your design is so highly spartan 6 LX150 optimized/specific. How deep did you already look into the Artix architecture

Xilinx UG474 says that the 7-series slices (both M+L) are identical to the Virtex-6 slice, which is a strict superset of the Spartan-6 slice.  I verified this by looking at the diagram.  Then I opened up each of the Artix devices in fpga_editor to look at the geometry.  That's about the extent of my investigation.   Mostly stuff just switches faster, uses less power, more SLICEL's, and you get more routing -- but the routing is basically undocumented anyways.

I have to say I am baffled by the bizzarre shape of the Artix fabric.  One of their devices looks like a rectangle with a chunk hacked out of the right hand side and shoved over.  WTF?

I do need the device to be at least 128 slices wide to get a "zero effort" port.  So, Artix200 or higher.  There's a huge hole in the middle of the Artix200, but (unlike the holes in the Spartan6) you get wires that run "over the top of" whatever circuitry is in the hole.  And there are still more than 128 columns even after leaving out the hole.

If there is enough demand for Artix100 I may be able to re-arrange things to fit the narrower device -- we'll see.  I'm hoping the Artix200 comes out very quickly after the 100; if so it should attract the bitcoin miners (unless something crazy happens it should be cheaper $/LUT than the 100).

Artix, but it doesn't look like the first chips will be available <6-8 month :-(  

Yeah, I hear Xilinx's availability estimates are pretty much worthless.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
BTCurious
Hero Member
*****
Offline Offline

Activity: 714


^SEM img of Si wafer edge, scanned 2012-3-12.


View Profile
March 09, 2012, 11:07:45 PM
 #126

*notices the topic title*
Grats on your recent 10MH/s advancement Smiley

kano
Legendary
*
Offline Offline

Activity: 1918


Linux since 1997 RedHat 4


View Profile
March 09, 2012, 11:11:24 PM
 #127

[sarcasm]just make sure you don't use free miners like cgminer where many many hundreds of hours have been spent without the requirement of payment[/sarcasm]

Duh.

I wrote my own miner from scratch; it has longpoll and multipool support.  Just ask Luke-Jr, who has graciously suffered through the pool side of the debugging process Smiley

I can tell you from first-hand experience that writing a miner requires about 1% of the effort I put into the HDL design.  That's not an exaggeration; I kept a (very coarse) log of how I spent my time and it really does work out to about 100:1.  I suspect ztex has had a similar experience.

I don't mean any disrespect to the authors of cgminer/mpbm/etc.  They've done a great thing for the bitcoin mining community.  But these things aren't even in the same league in terms of time commitment.
Yeah if you write a total piece of shit miner Tongue

Edit: So you wrote the fully optimised CL code yourself also without taking that from someone else?
And you worked out the 61 + 61 sha256 optimisation yourself also?
(and all the other optimisations in there) for the stream you've done here?

Pool: https://kano.is BTC: 1KanoiBupPiZfkwqB7rfLXAzPnoTshAVmb
CKPool and CGMiner developer, IRC FreeNode #ckpool and #cgminer kanoi
Help keep Bitcoin secure by mining on pools with Stratum, the best protocol to mine Bitcoins with ASIC hardware
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 966


felonious vagrancy, personified


View Profile WWW
March 09, 2012, 11:51:11 PM
 #128

Edit: So you wrote the fully optimised CL code yourself also without taking that from someone else?
And you worked out the 61 + 61 sha256 optimisation yourself also?
(and all the other optimisations in there) for the stream you've done here?

I think I have created some confusion, and have inadvertently offended you (and others).  Please accept my apologies.

Everything I wrote about "miners" was meant to refer only to the part of the code that runs on the CPU: fetching work from the pool and submitting shares.  I did not mean to imply that writing the OpenCL code that runs on the GPU itself is easy or trivial!  I know that is quite difficult, and no, I have never tried to write GPU hashing code.

Please understand that my response was in the context of what I interpreted (perhaps incorrectly) to be an accusation that any attempt to raise funds for my efforts would somehow be cheating the authors of cgminer/mpbm/etc.  The point I was trying to make is that (1) I am not using any of this software; I wrote my own and (2) if somebody does modify cgminer to act as a front end to my bitstream they won't be using the part of cgminer that was hard to write -- they'll only be using the CPU part.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
kakobrekla
Hero Member
*****
Offline Offline

Activity: 714


Psi laju, karavani prolaze.


View Profile
March 10, 2012, 12:25:22 AM
 #129


    • Assuming the bitcoin FPGA community (and possibly some board vendors) would want you to optimize this design until you're hitting real roadblocks (300MH/s maybe?), and release everything that's neccessary to regenerate and further improve it under an open source license, roughly how much money would we need?


    This has been mislooked?

    TheSeven
    Hero Member
    *****
    Offline Offline

    Activity: 504


    FPGA Mining LLC


    View Profile WWW
    March 10, 2012, 12:34:34 AM
     #130

    I think I have created some confusion, and have inadvertently offended you (and others).  Please accept my apologies.

    I didn't feel offended, and I still don't do. But I have the impression that the bitcoin community in general is very generous as far as donations are concerned Smiley
    It isn't so much the number of people, but rather the amounts of money some people have to spare...

    Everything I wrote about "miners" was meant to refer only to the part of the code that runs on the CPU: fetching work from the pool and submitting shares.  I did not mean to imply that writing the OpenCL code that runs on the GPU itself is easy or trivial!  I know that is quite difficult, and no, I have never tried to write GPU hashing code.

    Please understand that my response was in the context of what I interpreted (perhaps incorrectly) to be an accusation that any attempt to raise funds for my efforts would somehow be cheating the authors of cgminer/mpbm/etc.  The point I was trying to make is that (1) I am not using any of this software; I wrote my own and (2) if somebody does modify cgminer to act as a front end to my bitstream they won't be using the part of cgminer that was hard to write -- they'll only be using the CPU part.

    You apparently have no idea what kind of effort that is, as much as others have no idea how hard it is to optimize an FPGA design.
    Writing good miner software isn't trivial either (MPBM is approaching 10000 lines of code, and there's no OpenCL involved at all).


    To get back to my original question: Do you think that it might be possible to community fund your effort? I wouldn't put too much hope on the FPGA board vendors here (at the current production volumes those are also people who'll never earn any adequate profits for the time that they've spent designing, testing, fixing and organizing things).
    So if we do some fundraising to pay you semi-adequately, would you agree to completely open source this project?
    And we might need a ballpark number of what you would consider an adequate reward...

    My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
    kano
    Legendary
    *
    Offline Offline

    Activity: 1918


    Linux since 1997 RedHat 4


    View Profile
    March 10, 2012, 12:42:48 AM
     #131

    I did put that in sarcasm brackets for a reason Smiley

    Simply coz you give the impression that it's a "get paid lots or no one will be allowed to ever see it."
    If it's a "I wrote and did it all from scratch without any help from looking at anything anyone else has ever done" then I guess that MAY be justified ...

    If you haven't looked at sha256() optimisations then you are somewhere in the ball-park of 5% slower than it could be.

    The 2 simplest and most effective optimisations are:
    (ignoring the midstate as being the real first sha256())
    The first 3 of 64 stages in the 1st of the double sha256() are only needed to be done once per 2^32 hashes (per full nonce range)
    The last 3.5 stages of the 2nd of the double sha256() are not required since you already know the answer at that point.
    There are quite a few other optimisations of W calculations that are constant over a full nonce range
    Then there are the partial calculations of some of the W that are constant over a full nonce range
    Quite a few parts of the early stages of the 2nd double sha256() are reduced to fixed constants also.

    Edit: some of that may not be FPGA related but some of it certainly also is.

    Pool: https://kano.is BTC: 1KanoiBupPiZfkwqB7rfLXAzPnoTshAVmb
    CKPool and CGMiner developer, IRC FreeNode #ckpool and #cgminer kanoi
    Help keep Bitcoin secure by mining on pools with Stratum, the best protocol to mine Bitcoins with ASIC hardware
    PulsedMedia
    Sr. Member
    ****
    Offline Offline

    Activity: 402


    View Profile WWW
    March 10, 2012, 02:09:12 AM
     #132

    Really cool work, for what i understand this already offers around 30% more per cycle? That's simply awesome.
    If i were a miner with a significant any scale and investment into FPGAs i would definitely throw some BTC to your direction, especially if that meant i get unlimited access to the bitstream Smiley


    http://PulsedMedia.com - Semidedicated rTorrent seedboxes
    pieppiep
    Sr. Member
    ****
    Offline Offline

    Activity: 402



    View Profile
    March 10, 2012, 02:15:24 AM
     #133

    I you put this at kickstarter or sell it or what ever, how much do you want for it?
    Is it around $500 or more around $2500 or even $50,000 ?
    How many hours did you spend roughly?
    2112
    Legendary
    *
    Offline Offline

    Activity: 1708



    View Profile
    March 10, 2012, 08:09:18 AM
     #134

     Number of DSP48A1s:                           30 out of     180   16%
    Aha! Interesting. When uncle Moshe (Gavrielov) gives you DSPs, make DSPeade. Wink

    Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
    Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
    BR0KK
    Hero Member
    *****
    Offline Offline

    Activity: 742



    View Profile
    March 10, 2012, 03:50:06 PM
     #135

    is there a way to port it to Ztex  or other FPGA board's?

    Inspector 2211
    Sr. Member
    ****
    Offline Offline

    Activity: 383



    View Profile
    March 10, 2012, 04:19:36 PM
     #136

     Number of DSP48A1s:                           30 out of     180   16%
    Aha! Interesting. When uncle Moshe (Gavrielov) gives you DSPs, make DSPeade. Wink

    Thank you for providing an important puzzle piece on how Dr. Tyrell does it.

    The multiplier in the DSP48-block is not needed in SHA-256, hence what he obviously uses is the 18-bit adder
    BCOUT = B + D.
    He uses 30 DSP blocks, 10 per red / green / blue SHA-256 instance.
    For a 32 bit adder, two 18-bit adders BCOUT=B+D are needed.
    Thus, he can implement five 32-bit adders per SHA instance.

    So, why not just use [slow] 32-bit ripple adders everywhere, and use a few [very fast] DSP adders in some places?

    The answer is, IMHO, that he uses the fast DSP adders only where they feed into longlines.
    Were he to use normal ripple adders where he feeds into longlines, the aggregate delay would limit
    the design to a 5 ns clock cycle.
    Using the fast DSP adders will allow this design, when properly fine-tuned, to march into 4 ns clock cycle
    territory, for a total MH/s number of approximately 125 MH/s or approximately 375 MH/s per Spartan6-150.

    BFL Single, watch out below.
    TheSeven
    Hero Member
    *****
    Offline Offline

    Activity: 504


    FPGA Mining LLC


    View Profile WWW
    March 10, 2012, 04:48:22 PM
     #137

    BFL Single, watch out below.

    Oh yeah! Grin

    750MH/s on X6500, at $550 bulk that's <0.74$/MH or >1.36MH/$. Wow! This can blow away GPUs! Smiley And probably LargeCoin as well...

    My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
    bulanula
    Hero Member
    *****
    Offline Offline

    Activity: 518



    View Profile
    March 10, 2012, 04:58:50 PM
     #138

    Quote from: Inspector 2211
    BFL Single, watch out below.

    What makes you think this cannot similarly be applied to the single ( even after a hardware modification ) Huh
    jamesg
    VIP
    Legendary
    *
    Offline Offline

    Activity: 1330


    AKA: gigavps


    View Profile
    March 10, 2012, 05:04:31 PM
     #139

    Quote from: Inspector 2211
    BFL Single, watch out below.

    What makes you think this cannot similarly be applied to the single ( even after a hardware modification ) Huh

    Bulanula,

    Slow down. Please read his post more carefully. He is suggesting that $$$/Mh is in competition with the BFL single and his math is pretty close. I am getting 830 mh/s for $600 or $.072/Mh which is pretty darn close.
    Turbor
    Legendary
    *
    Offline Offline

    Activity: 1008


    BitMinter


    View Profile WWW
    March 10, 2012, 05:12:51 PM
     #140

    What makes you think this cannot similarly be applied to the single ( even after a hardware modification ) Huh

      Grin

    Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 »
      Print  
     
    Jump to:  

    Sponsored by , a Bitcoin-accepting VPN.
    Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!