pdki
Newbie
Offline
Activity: 27
Merit: 0
|
|
June 10, 2011, 08:08:35 PM |
|
I think with a real ASIC hardware implementation of sha-256 it should easily be possible to outrun GPUs by at least a factor of 100, because of better space efficiency and the simplicity of the logic involved.
Considering that -you can manufacture one of these for ~2M€ and then get 1000s of these chips -they will not consume much power -they can be put on cheap boards, because no heavy IO is needed (graphic cards are expensive due to the heavy IO with ram)
I am sure this will happen, if Bitcoins really establish as a currency and USD exchange rates stay in the 10$ range. If not, this would be a cheap option for aggressors like governments to take over the network. Much easier and cheaper then trying to shut it down by law.
|
|
|
|
vx609e
Newbie
Offline
Activity: 29
Merit: 0
|
|
June 11, 2011, 03:31:13 PM Last edit: June 11, 2011, 07:41:07 PM by vx609e |
|
Hi, IMO, an ASIC implementation is the way to go. We already have decent RTL (those who contributed to this know who they are and I thank you guys for this). With little modifications to the currently RTL, we could easily daisy chain many "cores" (easiest implementation with current state of project is a token ring over UART...only need to assign a specific address to each core). Let's say each manufactured chip would yield 100 MHash/s. We daisy chain 20 per boards (a board with 20 chips on it is not a big deal) That's 2 GHash/s right there. PCB design and manufacturing would be pretty straight forward. I volunteer for that. The big question: how to we finance an ASIC project? And even more importantly: how do we get it done? 1) Outsource FPGA2ASIC flow to http://www.icnexus.com.tw/product.php?id=25 (first company I found...there's gotta be many others). Get a chips ASAP and limit the risks. With this forum, I'm sure we could get a small EE team together and do all the Synopsis, BIST, test scan, pads design, routing, etc. crap ourselves but there are specialists out there that will do it for us...and chances of success will be much higher with that approach. Being a 100% digital chip (+ regulator and PLL obviously) the project couldn't be easier for these guys (or whatever company that would get the contract)...now to mention they are already in the business of FPGA2ASIC conversion. 2) Crowd funding with kickstarter.com -- If we can get 500 people to pre-order one 2 GHash/s board at 1000$ a piece (a truly good deal IMO), we get a 500k$ budget to do #1. We need 10,000 chips. I think the budget makes sense if we spend 250k$ on design, 100k$ on chips (10$ a piece), 50k$ for tape-out (might be included in design cost...we need to see with the contractor), 10k$ on PCBs and assembly + the rest for overhead. Once we get real quote from contractor, we can adjust the cost per board...I'll I'm putting here are ball park figure to show the potential of this approach. So far in my career all I've done is deal with PCB, FPGA and ASIC designs...this project seem very realistic to me. But maybe I'm day dreaming...please bring me back to earth if I'm doing so. Feedback, suggestions and comments very welcome.
|
|
|
|
FlappySocks
|
|
June 11, 2011, 06:46:28 PM |
|
Unless someone can come up with a working prototype, then I would say outsource it. A company looking for work might do it for a very cost effective price if we can come up with the basic design, software, and cash. They can make their profits up on repeat sales, and improved designs.
|
|
|
|
LazarusLong
Newbie
Offline
Activity: 16
Merit: 0
|
|
June 11, 2011, 07:42:45 PM |
|
I now have a bitfile for the atlys board (spartan 6 - lx45) with depth:=2 and 50mhz
The only problem is, that miner.py refuses to communicate over the serial port. It detects the core, but when it starts "Measuring FPGA performance..." it produces and timeout: "Timed out waiting for FPGA to accept work"
@TheSeven: any idea how to debug or solve the problem? is the miner.py code working for all depths and frequencies?
You'll need to adjust the pin locations for clk_in, rx and tx in the UCF file, and adjust the clock divider for the serial port for the 50MHz frequency. Replace "10000010001" with "0110110010" and "11000011001" with "01010001011" in uart.vhd. And I should probably publish the new version of my miner, it now supports multiple pools, long polling, etc. TheSeven, can you give some lines on how to calculate the deviders, any formula?
|
|
|
|
makomk
|
|
June 11, 2011, 08:37:49 PM |
|
Hmmmm. It looks like Xilinx's synthesis tools aren't very good at inferring the right meaning from various constructs this uses. I'm now seeing if I can convince them to interpret it in more efficient ways, but to be honest it looks like they may just be slightly rubbish. It ought to be possible to cram a full hashing pipeline into a XC6SLX75 in theory, but the current code isn't even getting close.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
comboy
|
|
June 11, 2011, 08:40:18 PM |
|
I don't feel like doing this since it's not mine, but it would be really cool to put xilinx implementation on github too so that people can fork it for different boards optimizations and improve easily with pull requests. If, of course, they want to share their changes.
|
Variance is a bitch!
|
|
|
Bloody Bell
Newbie
Offline
Activity: 18
Merit: 0
|
|
June 11, 2011, 09:10:03 PM |
|
Let's say each manufactured chip would yield 100 MHash/s.
I am pretty sure they can do much more. If a single mid-range fpga can house an entire pipeline and get 50 MH/s, any ASIC must be able to overperform that at least with a factor of ten. there are specialists out there that will do it for us...and chances of success will be much higher with that approach. Considering that the design is very simple, and we don't need to push any limits (as we can simply use more chips instead) probably the manufacturer's team could do it relatively cheaply, it's mostly an automatized process anyway. 2) Crowd funding with kickstarter.com -- If we can get 500 people to pre-order one 2 GHash/s board at 1000$ a piece (a truly good deal IMO), we get a 500k$ budget to do #1. We need 10,000 chips. I think the budget makes sense if we spend 250k$ on design, 100k$ on chips (10$ a piece), 50k$ for tape-out (might be included in design cost...we need to see with the contractor), 10k$ on PCBs and assembly + the rest for overhead. Once we get real quote from contractor, we can adjust the cost per board...I'll I'm putting here are ball park figure to show the potential of this approach. I think the one-time costs are higher. Unless we go for some structured ASIC, which indeed can be done from a few $100K. But the problem with structured ASIC is very similar to the fpgas: we have to pay for all the unused stuff (memory blocks, hardware multipliers, etc) that we don't need, increasing the price and lowering performance. And the projects return would still be threatened by others who starts making real asics. I am also not sure that hunderds of people would commit the neccessary amount. Buying a video card is a much lower risk, as it can be sold anytime and has uses for other purposes. So far in my career all I've done is deal with PCB, FPGA and ASIC designs...this project seem very realistic to me. But maybe I'm day dreaming...please bring me back to earth if I'm doing so. I have only worked with FPGAs, but I don't think you are daydreaming. btw, does anyone know why the "Will fund ASIC board for mining community. Need Hardware devs." topic has been closed?
|
|
|
|
OrphanedGland
Member
Offline
Activity: 70
Merit: 10
|
|
June 11, 2011, 09:27:06 PM |
|
Just coded a fully unrolled SHA256 in VHDL using two different approaches to maximize clock rate, a simple approach that involves precalculating H + K + W, and a more advanced approach that further pipelines each stage. Initial compiles targetted Cyclone IV using web edition quartus (which sucks), with the simple version achieving 110MHz and the advanced version 133MHz. Will be interested to see maximum clock rate that can be achieved on Stratix IV.
|
|
|
|
fpgaminer (OP)
|
|
June 11, 2011, 11:10:43 PM |
|
Hmmmm. It looks like Xilinx's synthesis tools aren't very good at inferring the right meaning from various constructs this uses. I'm now seeing if I can convince them to interpret it in more efficient ways, but to be honest it looks like they may just be slightly rubbish. It ought to be possible to cram a full hashing pipeline into a XC6SLX75 in theory, but the current code isn't even getting close. Yeah, I'm still fighting with ISE. Using SmartXplorer I got ISE to P&R a fully unrolled mining core into my 6SLX150 chip at 50MHz. ~50% resource consumption all around, which is good. Two problems: A) results returned by the running core were erratic, and B) the chip ran very hot. Slow progress, but progress none-the-less.
|
|
|
|
TheSeven
|
|
June 12, 2011, 12:13:56 AM Last edit: June 12, 2011, 12:32:48 AM by TheSeven |
|
Hi,
IMO, an ASIC implementation is the way to go. We already have decent RTL (those who contributed to this know who they are and I thank you guys for this). With little modifications to the currently RTL, we could easily daisy chain many "cores" (easiest implementation with current state of project is a token ring over UART...only need to assign a specific address to each core).
I fully agree that ASIC is the long-term way to go, but this UART token ring thing seems to be rubbish to me. There are well-suited protocols for this, like for example I²C. There are two possibilities: - Build a PCIe mining accelerator card, with some PCIe to I²C (or whatever) bridge, possibly on a CPLD. - Slap an ARM SoC and an ethernet adapter on the board as well and make it run autonomously. Let's say each manufactured chip would yield 100 MHash/s. We daisy chain 20 per boards (a board with 20 chips on it is not a big deal) That's 2 GHash/s right there. PCB design and manufacturing would be pretty straight forward. I volunteer for that.
Good to know, as I have never dealt with this area before. Could you provide an estimate for the non-ASIC cost? (PCB design, prototyping, manufacturing and assembly, voltage regulators, clock generation, ...) The big question: how to we finance an ASIC project? And even more importantly: how do we get it done? 1) Outsource FPGA2ASIC flow to http://www.icnexus.com.tw/product.php?id=25 (first company I found...there's gotta be many others). Get a chips ASAP and limit the risks. With this forum, I'm sure we could get a small EE team together and do all the Synopsis, BIST, test scan, pads design, routing, etc. crap ourselves but there are specialists out there that will do it for us...and chances of success will be much higher with that approach. Being a 100% digital chip (+ regulator and PLL obviously) the project couldn't be easier for these guys (or whatever company that would get the contract)...now to mention they are already in the business of FPGA2ASIC conversion. I've heard rumors that Altera would be doing there HardCopy process for as low as $150K for 1000 chips, which seems very low to me. No idea whether that's true though. We might want to request a quote. Let's say each manufactured chip would yield 100 MHash/s.
I am pretty sure they can do much more. If a single mid-range fpga can house an entire pipeline and get 50 MH/s, any ASIC must be able to overperform that at least with a factor of ten. I'd expect the chips to run at 200-300MHz, and one of my co-workers said that he tried synthesizing the hardcopy process for my VHDL design, and that 20 of those would fit on a single chip. That's 4-6GH/s per chip. I am also not sure that hunderds of people would commit the neccessary amount. Buying a video card is a much lower risk, as it can be sold anytime and has uses for other purposes.
I fully agree on this point, this will probably be the biggest problem, and it sadly wouldn't be an issue for certain governments... btw, does anyone know why the "Will fund ASIC board for mining community. Need Hardware devs." topic has been closed?
Link to that: http://forum.bitcoin.org/index.php?topic=14910.0Just coded a fully unrolled SHA256 in VHDL using two different approaches to maximize clock rate, a simple approach that involves precalculating H + K + W, and a more advanced approach that further pipelines each stage. Initial compiles targetted Cyclone IV using web edition quartus (which sucks), with the simple version achieving 110MHz and the advanced version 133MHz. Will be interested to see maximum clock rate that can be achieved on Stratix IV.
How big was the 133MHz design? (How many KLEs?) Could you share this design? I now have a bitfile for the atlys board (spartan 6 - lx45) with depth:=2 and 50mhz
The only problem is, that miner.py refuses to communicate over the serial port. It detects the core, but when it starts "Measuring FPGA performance..." it produces and timeout: "Timed out waiting for FPGA to accept work"
@TheSeven: any idea how to debug or solve the problem? is the miner.py code working for all depths and frequencies?
You'll need to adjust the pin locations for clk_in, rx and tx in the UCF file, and adjust the clock divider for the serial port for the 50MHz frequency. Replace "10000010001" with "0110110010" and "11000011001" with "01010001011" in uart.vhd. And I should probably publish the new version of my miner, it now supports multiple pools, long polling, etc. TheSeven, can you give some lines on how to calculate the deviders, any formula? the first one is (clock frequency / 115200), the second one is ((clock frequency / 115200) * 1.5)
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
makomk
|
|
June 12, 2011, 03:51:59 PM Last edit: June 12, 2011, 05:35:13 PM by makomk |
|
( Edit: Altera Quartus II claims FMax = 90.16 MHz on EP4CE115 for the xilinx-shiftreg branch, with one or two tweaks to the build config that may not be necessary. Took over 3 hours to build and used pretty much all the FPGA though, so not terribly useful - you would be better off adding another mining core.) Been messing around some more with fpgaminer's code. Users of largish Altera FPGAs might want to try this branch, which skips the last 3 rounds in the fully-unrolled version and allows optimisations based on the fact that part of data is constant. Xilinx users can additionally uncomment "`define USE_RAM_FOR_KS" and combine this with teknohog's serial miner, though this may not work too well. (There's also the xilinx-shiftreg branch which only works for fully-unrolled miners.) Note that I don't have an actual FPGA to test any of this on, so be sure to double-check the thermal results to make sure you're not going to damage your expensive hardware, and make sure it's actually submitting blocks successfully. Also, these are more size improvements than speed improvements, and most people that could benefit have probably got their own better version already. Brief explanation: with the original code, which is what you get with USE_RAM_FOR_KS disabled, Xilinx's xst was doing something daft involving shift registers for K[ s]. Without the xilinx-shiftreg changes, it also failed to use shift registers for W where it kinda made sense to do so; unfortunately with the changes Altera's Quartus tools no longer find the shift registers. I fully agree that ASIC is the long-term way to go, but this UART token ring thing seems to be rubbish to me. There are well-suited protocols for this, like for example I²C.
There are two possibilities: - Build a PCIe mining accelerator card, with some PCIe to I²C (or whatever) bridge, possibly on a CPLD. - Slap an ARM SoC and an ethernet adapter on the board as well and make it run autonomously.
If you're doing HardCopy-style structured ASICs, in theory you could put a fastish 32-bit processor and Ethernet MAC on the ASIC itself. It'd probably only take up a smallish proportion of the chip and you'd just need boot flash and Ethernet PHY chips externally. Not sure how much sense this would make though. How big was the 133MHz design? (How many KLEs?) Could you share this design?
Unfortunately, OrphanedGland and a lot of the other posters in this thread can't reply to it anymore because they're too new. The forum admins have blocked users with a small number of previous posts from posting anywhere except the Newbie forum. Try this thread.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
fpgaminer (OP)
|
|
June 12, 2011, 07:52:48 PM |
|
June 12th, 2011 - Xilinx and VHDL Ports AddedWith many thanks to TheSeven and teknohog, their code has been added to the public repo. TheSeven did a re-implementation in VHDL, with support for Xilinx and ISE. teknohog did a straight port of the Verilog code to simply support Xilinx and ISE. Both include Python miner control scripts, and serial port communication with the FPGA board. I made little to no modification to their code for this first commit. If you appreciate their hard work on this Open Source project, please send them your thanks and donations! TheSeven: 14Jc8vWq1mPv7vWnP5VquZZgpLEtzW2vja teknohog: 1HkL2iLLQe3KJuNCgKPc8ViZs83NJyyQDM Some notes on the current state of this projectAs it stands now, the project makes lots of references to the DE2-115 board, and it being the preferred mining platform. Obviously this isn't the case, nor is it meant to be an advertisement. It was simply the first device supported, and the one that currently has a binary release available. In the near future, I will merge in full Xilinx compatibility changes into the main Verilog code and try to steer the project towards supporting many devices and boards in a Plug-and-Mine fashion (like the DE2-115 currently is); or at least Compile-Plug-and-Mine Also, the directory structure in the repo is not optimal, but that will improve with time as I settle on a structure that fits the project's many needs (multiple code variations, and multiple device specific implementations). I have many other promising patches to merge, including a few of my own So keep watching this thread!
|
|
|
|
Pixie
Newbie
Offline
Activity: 17
Merit: 0
|
|
June 12, 2011, 08:20:34 PM |
|
If you're doing HardCopy-style structured ASICs, in theory you could put a fastish 32-bit processor and Ethernet MAC on the ASIC itself. It'd probably only take up a smallish proportion of the chip and you'd just need boot flash and Ethernet PHY chips externally. Not sure how much sense this would make though.
Depending on expense, i'd look at adding a small XMOS processor to the board. It being transputer in essence was designed to talk other chips like ASICs and other XMOS. The ASIC is left doing its special magic, the XMOS handles everything else (including block submits etc.) and can be easily connected up into massive rigs as required.
|
|
|
|
njloof
Member
Offline
Activity: 73
Merit: 10
|
|
June 12, 2011, 08:31:42 PM |
|
I made little to no modification to their code for this first commit. If you appreciate their hard work on this Open Source project, please send them your thanks and donations!
TheSeven: 14Jc8vWq1mPv7vWnP5VquZZgpLEtzW2vja teknohog: 1HkL2iLLQe3KJuNCgKPc8ViZs83NJyyQDM
fpgaminer: 1NT4RyJMqtRuDRr6zHdXdKSpmX3SR5he6zThanks to the three of you for your work to date! Plonk, plonk, plonk.
|
|
|
|
makomk
|
|
June 12, 2011, 09:12:05 PM Last edit: June 12, 2011, 09:34:58 PM by makomk |
|
Depending on expense, i'd look at adding a small XMOS processor to the board. It being transputer in essence was designed to talk other chips like ASICs and other XMOS. The ASIC is left doing its special magic, the XMOS handles everything else (including block submits etc.) and can be easily connected up into massive rigs as required.
They're kinda neat, fairly powerful, and the chips aren't massively expensive either. I actually have an XMOS XK-1 boards here flashing an LED at me accusingly. The caveats are that USB is a bit tricky, requiring external components and using up nearly all available I/O ports on that core (so you really need a more expensive dual core chip for that), the chip has some slightly interesting power sequencing and reset requirements, it has no internal Flash so you need a seperate SPI Flash chip for firmware, and it has no internal driver for a crystal oscillator. ( Edit: oh, and you're limited to 64 kilobytes of RAM per core and your code needs to fit into that too.) An ARM microcontroller with integrated Ethernet MAC and USB might work out better. Of course, there'll be some tricky board design and manufacture anyway for the ASIC, so that may not necessarily be a huge obstacle. Edit 2: Also, a totally untested 90MHz/90 MHash/sec bitstream for the DE2-115 is now in a branch on my git repo. PowerPlay estimates 4.4W of heat, I think? Anyway, don't blame me if you blow up your expensive board. There's a reason I'm not including instructions here.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
TheSeven
|
|
June 13, 2011, 12:08:37 AM |
|
If you're doing HardCopy-style structured ASICs, in theory you could put a fastish 32-bit processor and Ethernet MAC on the ASIC itself. It'd probably only take up a smallish proportion of the chip and you'd just need boot flash and Ethernet PHY chips externally. Not sure how much sense this would make though.
Depending on expense, i'd look at adding a small XMOS processor to the board. It being transputer in essence was designed to talk other chips like ASICs and other XMOS. The ASIC is left doing its special magic, the XMOS handles everything else (including block submits etc.) and can be easily connected up into massive rigs as required. I don't like this idea too much. I'd keep the ASIC/FPGA interface as simple as possible, but I'd prefer a bus for easy scalability. I²C springs to my mind. This way you can keep the ASIC simple and don't waste precious space, don't waste the control processor circuitry when chaining multiple ASICs, and can use the same ASIC for both PCIe-based accelerator cards and standalone mining boards. The latter would get a simple ARM SoC with updatable firmware (based on linux?) and an ethernet interface. Edit 2: Also, a totally untested 90MHz/90 MHash/sec bitstream for the DE2-115 is now in a branch on my git repo. PowerPlay estimates 4.4W of heat, I think? Anyway, don't blame me if you blow up your expensive board. There's a reason I'm not including instructions here.
I'm fairly certain that this estimate is way too low.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
Capitan
Member
Offline
Activity: 112
Merit: 10
|
|
June 13, 2011, 02:10:42 AM |
|
Maybe a stupid question, but why can't existing crypto hardware acceleration chips be used? Why does a new one have to be built?
|
|
|
|
saardrimer
Newbie
Offline
Activity: 2
Merit: 0
|
|
June 13, 2011, 02:30:22 AM |
|
Great project(s) and discussion here. Also, the directory structure in the repo is not optimal, but that will improve with time as I settle on a structure that fits the project's many needs (multiple code variations, and multiple device specific implementations).
I've written a document that proposes a directory structure for FPGA designs that is version control friendly and scalable: https://www.boldport.com/docs/fpgaprojAlso, my (new) project http://www.boldport.comgenerates build environments for FPGA projects. It uses Makefiles instead of the IDE GUIs and maintains the structure defined in the document above. I'm happy to help and contribute to any FPGA project that wants to use the structure and/or "boldport flow". (Any feedback on the structure, and boldport, would be greatly appreciated!) cheers, saar. http://www.saardrimer.com
|
|
|
|
eturnerx
Member
Offline
Activity: 84
Merit: 10
|
|
June 13, 2011, 05:56:12 AM |
|
Maybe a stupid question, but why can't existing crypto hardware acceleration chips be used? Why does a new one have to be built?
Do you have a suggestion? The only IC I could find that do SHA256 are kinda slow when looking at the throughput needed for mining. Sure you could chain a ton of them (they're cheap in lots of 1000) but the control logic surrounding it could get expensive. The custom crypto-smashing machines so far have been FPGA or structured ASIC based. I think the one that cracked the RC4 challenge was like $250K worth or something.
|
|
|
|
Rodyland
|
|
June 13, 2011, 11:41:17 AM |
|
btw, does anyone know why the "Will fund ASIC board for mining community. Need Hardware devs." topic has been closed?
Was a fat finger error by a mod - thread is now unlocked.
|
Beware the weak hands! 1NcL6Mjm4qeiYYi2rpoCtQopPrH4PyKfUC GPG ID: E3AA41E3
|
|
|
|