BTC-engineer
|
|
March 09, 2012, 10:24:49 PM |
|
The design is very easy to forward-port to the Xilinx 7-series parts; I just haven't had a reason to do that yet. I've even backwards-ported it to older devices, but the effort/reward tradeoff there doesn't usually work out (it did this time only because I got the chips almost-for-free). It's also possible to port it to most SASIC platforms, but my "are you serious about this" threshold for exploring that is really really high (and only with people based in the USA since there would be contracts involved).
Congratulations also from me for the great progress in your hard work. Interesting that you think your design could be easy forward-ported to the new xilinx 28nm FPGA's. This surprise me a litter bit, because I always thought your design is so highly spartan 6 LX150 optimized/specific. How deep did you already look into the Artix architecture and didn't you have to do a lot of work just by newly 'filling up' the bigger chip, independently from the slightly other architecture? I'm playing with the idea to build up a FPGA board with Artix FPGA's. One of the fist ones which will come out will be the 352K version of the Artix, but it doesn't look like the first chips will be available <6-8 month :-(
|
█ ▀██ ███▄ █████ ▄██████████ █████ ▄███████████████ █████▄ ▄██████████████████ ██████ █████████████████████ ███████ ██████████████████████ ████████ ▄████████▀ █████████ ██████ ▄██████ ██████████ ███▀ ▄██████████ ███████████ ██ ████████████ ████████████ █████████████ ██████████ █████████████ ███████ █████████████▄ ██▀ ██████████████ ▀███████████████▄ ▀███████████▀
| FLUX | █ █ █ | VALVE UBISOFT GAMING ECOSYSTEM Origin GAMELOFT █ WEBSITE █ WHITEPAPER █ MEDIUM █ TWITTER █ FACEBOOK █ TELEGRAM █ | █ █ █ | 17 - 24 April Public Sale
|
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 09, 2012, 10:29:17 PM |
|
I'll even let somebody bring their own board but I have to keep the board afterwards. I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo.
I'm not sure I understand this requirement. Are you somehow burning an irreversible encryption key into the chip first? Is there no way to undo that step? Large Spartan chips like the 150 have a WRITE-ONLY nonvolatile register that can hold a bitstream decryption key. There is (supposedly) no way to read the key back from the register; all you can do is hand the device an encrypted bitstream and let it use the key to decrypt+load. The device also has a unique identity register (DNA). Unfortunately it is utterly trivial to create a circuit that looks exactly like this unique identity register and then modify an unencrypted design to use that instead of the true DNA register. So, chip-specific designs must be encrypted.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
Inspector 2211
|
|
March 09, 2012, 10:29:48 PM |
|
Potential bidders for the IP are altera, xilinx, possibly others (like terasic, etc.) and the BTC FPGA community. I know very little about the fpga market but
The topology makes use of a few Xilinx-specific features, so it would require effort to port that. However, the geometry is very Xilinx-specific. Porting to Altera is as much work as porting to a SASIC platform like eASIC. I'd guess that big players (altera,xilinx) wouldn't see BTC mining as a big enough market
Correct. This is still way below Xilinx's radar. How do you convince anyone that what you have is legit? You'd have to let them see something under NDA? What if they say "no thanks" and go do it themselves based on what they saw.
When there is a need for me to convince people I will be happy to give live, in-person demos here in NorCal. I'll even let somebody bring their own board but I have to keep the board afterwards. I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo. EldenTyrell, I'm here in the South Bay (with a home office in north-east San Jose and a business/mining office in Santa Clara next to Nvidia) and I have a ZTEX board and I can sell it to you for what I paid for it, or $50 less, or whatever we agree on. In case you put your bitstream up on Kickstarter, I'll also make a low-to-mid 3-figure pledge for early access to a 240 MH/s or better bitstream. (Right now, it's running at 209 MH/s and I'm not really interested in paying for, say, 220 MH/s.)
|
|
|
|
TheSeven
|
|
March 09, 2012, 10:30:06 PM |
|
Yery interesting results... I'd like to see a bit more information though: - Where is the critical path, and how much could that be optimized? (Can you give a best-case estimate of the physical limits of achievable hashrate?)
- How many pipeline stages does this design have, per core? Are the sha256 rounds doubly registered?
- This looks pretty much crammed into the FPGA
If you provide this as a hardmacro, is there even sufficient room to easily add a PC interface to it? - As the developer of MPBM, and being someone who has done at least a little VHDL design and implemented a miner core, I do understand very well what order of magnitude of effort this is. Especially with this all-broken Xilinx toolchain. However, a simple miner software can be written in basically no time (and that's how MPBM started months ago). But if you design something for flexibility like the new MPBM generation or cgminer, it'll take at least 10 times as long. May I ask how much time you have realistically spent on implementing and optimizing this FPGA design and the neccessary tools to generate it?
- Assuming the bitcoin FPGA community (and possibly some board vendors) would want you to optimize this design until you're hitting real roadblocks (300MH/s maybe?), and release everything that's neccessary to regenerate and further improve it under an open source license, roughly how much money would we need?
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 09, 2012, 10:33:54 PM Last edit: March 09, 2012, 11:22:37 PM by eldentyrell |
|
Interesting that you think your design could be easy forward-ported to the new xilinx 28nm FPGA's.
Well, feature size isn't something you can detect using Verilog code... This surprise me a litter bit, because I always thought your design is so highly spartan 6 LX150 optimized/specific. How deep did you already look into the Artix architecture
Xilinx UG474 says that the 7-series slices (both M+L) are identical to the Virtex-6 slice, which is a strict superset of the Spartan-6 slice. I verified this by looking at the diagram. Then I opened up each of the Artix devices in fpga_editor to look at the geometry. That's about the extent of my investigation. Mostly stuff just switches faster, uses less power, more SLICEL's, and you get more routing -- but the routing is basically undocumented anyways. I have to say I am baffled by the bizzarre shape of the Artix fabric. One of their devices looks like a rectangle with a chunk hacked out of the right hand side and shoved over. WTF? I do need the device to be at least 128 slices wide to get a "zero effort" port. So, Artix200 or higher. There's a huge hole in the middle of the Artix200, but (unlike the holes in the Spartan6) you get wires that run "over the top of" whatever circuitry is in the hole. And there are still more than 128 columns even after leaving out the hole. If there is enough demand for Artix100 I may be able to re-arrange things to fit the narrower device -- we'll see. I'm hoping the Artix200 comes out very quickly after the 100; if so it should attract the bitcoin miners (unless something crazy happens it should be cheaper $/LUT than the 100). Artix, but it doesn't look like the first chips will be available <6-8 month :-(
Yeah, I hear Xilinx's availability estimates are pretty much worthless.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
BTCurious
|
|
March 09, 2012, 11:07:45 PM |
|
*notices the topic title* Grats on your recent 10MH/s advancement
|
|
|
|
kano
Legendary
Offline
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
|
|
March 09, 2012, 11:11:24 PM Last edit: March 09, 2012, 11:23:08 PM by kano |
|
[sarcasm]just make sure you don't use free miners like cgminer where many many hundreds of hours have been spent without the requirement of payment[/sarcasm]
Duh. I wrote my own miner from scratch; it has longpoll and multipool support. Just ask Luke-Jr, who has graciously suffered through the pool side of the debugging process I can tell you from first-hand experience that writing a miner requires about 1% of the effort I put into the HDL design. That's not an exaggeration; I kept a (very coarse) log of how I spent my time and it really does work out to about 100:1. I suspect ztex has had a similar experience. I don't mean any disrespect to the authors of cgminer/mpbm/etc. They've done a great thing for the bitcoin mining community. But these things aren't even in the same league in terms of time commitment. Yeah if you write a total piece of shit miner Edit: So you wrote the fully optimised CL code yourself also without taking that from someone else? And you worked out the 61 + 61 sha256 optimisation yourself also? (and all the other optimisations in there) for the stream you've done here?
|
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 09, 2012, 11:51:11 PM |
|
Edit: So you wrote the fully optimised CL code yourself also without taking that from someone else? And you worked out the 61 + 61 sha256 optimisation yourself also? (and all the other optimisations in there) for the stream you've done here?
I think I have created some confusion, and have inadvertently offended you (and others). Please accept my apologies. Everything I wrote about "miners" was meant to refer only to the part of the code that runs on the CPU: fetching work from the pool and submitting shares. I did not mean to imply that writing the OpenCL code that runs on the GPU itself is easy or trivial! I know that is quite difficult, and no, I have never tried to write GPU hashing code. Please understand that my response was in the context of what I interpreted (perhaps incorrectly) to be an accusation that any attempt to raise funds for my efforts would somehow be cheating the authors of cgminer/mpbm/etc. The point I was trying to make is that (1) I am not using any of this software; I wrote my own and (2) if somebody does modify cgminer to act as a front end to my bitstream they won't be using the part of cgminer that was hard to write -- they'll only be using the CPU part.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
kakobrekla
|
|
March 10, 2012, 12:25:22 AM |
|
- Assuming the bitcoin FPGA community (and possibly some board vendors) would want you to optimize this design until you're hitting real roadblocks (300MH/s maybe?), and release everything that's neccessary to regenerate and further improve it under an open source license, roughly how much money would we need?
This has been mislooked?
|
|
|
|
TheSeven
|
|
March 10, 2012, 12:34:34 AM |
|
I think I have created some confusion, and have inadvertently offended you (and others). Please accept my apologies.
I didn't feel offended, and I still don't do. But I have the impression that the bitcoin community in general is very generous as far as donations are concerned It isn't so much the number of people, but rather the amounts of money some people have to spare... Everything I wrote about "miners" was meant to refer only to the part of the code that runs on the CPU: fetching work from the pool and submitting shares. I did not mean to imply that writing the OpenCL code that runs on the GPU itself is easy or trivial! I know that is quite difficult, and no, I have never tried to write GPU hashing code.
Please understand that my response was in the context of what I interpreted (perhaps incorrectly) to be an accusation that any attempt to raise funds for my efforts would somehow be cheating the authors of cgminer/mpbm/etc. The point I was trying to make is that (1) I am not using any of this software; I wrote my own and (2) if somebody does modify cgminer to act as a front end to my bitstream they won't be using the part of cgminer that was hard to write -- they'll only be using the CPU part.
You apparently have no idea what kind of effort that is, as much as others have no idea how hard it is to optimize an FPGA design. Writing good miner software isn't trivial either (MPBM is approaching 10000 lines of code, and there's no OpenCL involved at all). To get back to my original question: Do you think that it might be possible to community fund your effort? I wouldn't put too much hope on the FPGA board vendors here (at the current production volumes those are also people who'll never earn any adequate profits for the time that they've spent designing, testing, fixing and organizing things). So if we do some fundraising to pay you semi-adequately, would you agree to completely open source this project? And we might need a ballpark number of what you would consider an adequate reward...
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
kano
Legendary
Offline
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
|
|
March 10, 2012, 12:42:48 AM Last edit: March 10, 2012, 02:17:05 AM by kano |
|
I did put that in sarcasm brackets for a reason Simply coz you give the impression that it's a "get paid lots or no one will be allowed to ever see it." If it's a "I wrote and did it all from scratch without any help from looking at anything anyone else has ever done" then I guess that MAY be justified ... If you haven't looked at sha256() optimisations then you are somewhere in the ball-park of 5% slower than it could be. The 2 simplest and most effective optimisations are: (ignoring the midstate as being the real first sha256()) The first 3 of 64 stages in the 1st of the double sha256() are only needed to be done once per 2^32 hashes (per full nonce range) The last 3.5 stages of the 2nd of the double sha256() are not required since you already know the answer at that point. There are quite a few other optimisations of W calculations that are constant over a full nonce range Then there are the partial calculations of some of the W that are constant over a full nonce range Quite a few parts of the early stages of the 2nd double sha256() are reduced to fixed constants also. Edit: some of that may not be FPGA related but some of it certainly also is.
|
|
|
|
PulsedMedia
|
|
March 10, 2012, 02:09:12 AM |
|
Really cool work, for what i understand this already offers around 30% more per cycle? That's simply awesome. If i were a miner with a significant any scale and investment into FPGAs i would definitely throw some BTC to your direction, especially if that meant i get unlimited access to the bitstream
|
|
|
|
pieppiep
|
|
March 10, 2012, 02:15:24 AM |
|
I you put this at kickstarter or sell it or what ever, how much do you want for it? Is it around $500 or more around $2500 or even $50,000 ? How many hours did you spend roughly?
|
|
|
|
2112
Legendary
Offline
Activity: 2128
Merit: 1073
|
|
March 10, 2012, 08:09:18 AM |
|
Number of DSP48A1s: 30 out of 180 16%
Aha! Interesting. When uncle Moshe (Gavrielov) gives you DSPs, make DSPeade.
|
|
|
|
BR0KK
|
|
March 10, 2012, 03:50:06 PM |
|
is there a way to port it to Ztex or other FPGA board's?
|
|
|
|
Inspector 2211
|
|
March 10, 2012, 04:19:36 PM |
|
Number of DSP48A1s: 30 out of 180 16%
Aha! Interesting. When uncle Moshe (Gavrielov) gives you DSPs, make DSPeade. Thank you for providing an important puzzle piece on how Dr. Tyrell does it. The multiplier in the DSP48-block is not needed in SHA-256, hence what he obviously uses is the 18-bit adder BCOUT = B + D. He uses 30 DSP blocks, 10 per red / green / blue SHA-256 instance. For a 32 bit adder, two 18-bit adders BCOUT=B+D are needed. Thus, he can implement five 32-bit adders per SHA instance. So, why not just use [slow] 32-bit ripple adders everywhere, and use a few [very fast] DSP adders in some places? The answer is, IMHO, that he uses the fast DSP adders only where they feed into longlines. Were he to use normal ripple adders where he feeds into longlines, the aggregate delay would limit the design to a 5 ns clock cycle. Using the fast DSP adders will allow this design, when properly fine-tuned, to march into 4 ns clock cycle territory, for a total MH/s number of approximately 125 MH/s or approximately 375 MH/s per Spartan6-150. BFL Single, watch out below.
|
|
|
|
TheSeven
|
|
March 10, 2012, 04:48:22 PM |
|
BFL Single, watch out below.
Oh yeah! 750MH/s on X6500, at $550 bulk that's <0.74$/MH or >1.36MH/$. Wow! This can blow away GPUs! And probably LargeCoin as well...
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
bulanula
|
|
March 10, 2012, 04:58:50 PM |
|
BFL Single, watch out below. What makes you think this cannot similarly be applied to the single ( even after a hardware modification )
|
|
|
|
jamesg
VIP
Legendary
Offline
Activity: 1358
Merit: 1000
AKA: gigavps
|
|
March 10, 2012, 05:04:31 PM |
|
BFL Single, watch out below. What makes you think this cannot similarly be applied to the single ( even after a hardware modification ) Bulanula, Slow down. Please read his post more carefully. He is suggesting that $$$/Mh is in competition with the BFL single and his math is pretty close. I am getting 830 mh/s for $600 or $.072/Mh which is pretty darn close.
|
|
|
|
Turbor
Legendary
Offline
Activity: 1022
Merit: 1000
BitMinter
|
|
March 10, 2012, 05:12:51 PM |
|
|
|
|
|
|