|
rph
|
 |
August 22, 2011, 02:32:23 AM Last edit: August 22, 2011, 03:26:12 AM by rph |
|
You want 3-input adders on 6 series Spartans, not 2-input. And yes, of course you can reduce the critical path to a single adder, but it requires an immense quantity of registers.
On S3E I've had slightly better results with 2-input (reaching 200MHz comfortably in -5). The SRL16s implement multi-stage delays really efficiently. My strategy is to optimize for area until it's clear that I can't possibly fit another engine, then optimize for speed until the device is full. -rph
|
|
|
|
rph
|
 |
August 22, 2011, 02:53:05 AM Last edit: August 22, 2011, 03:07:18 AM by rph |
|
For anyone who has run this design on the LX9 microboard, what sort of hashrate did you get? And how many slices were used (and at what unrolling level?).
200MHz 5034 FF [44%] 3247 LUT6 [56%] 0 BRAM 0 DSP48A1 3.125MH/s in xc6slx9-2. It finishes 1 SHA256(SHA256(x)) every 64 clocks. With a few tricks it could probably fit 2 engines, for 6.25 MH/s total. Not exactly going to beat an ATI GPU, but it's a fun toy.  -rph
|
|
|
|
inh
|
 |
August 23, 2011, 09:17:04 PM |
|
Could improvements on LUT usage (or routing, or both?) be made based on which pins were being used for IO? I'd imagine if the pin was physically far from where the end of the logic was, it would make routing harder. Shorter paths == happier routing? Just a thought.
|
|
|
|
fizzisist
|
 |
August 23, 2011, 09:23:01 PM |
|
I don't know if there's any interest in it, but I created a wiki at fpgamining.com. It might be a good place for gathering documentation about your development here. Feel free to make use of it!
|
|
|
|
|
Keninishna
|
 |
August 26, 2011, 06:06:55 AM |
|
check cablesaurus.com they have a pre order going for fpga miner boards that do 100-200 mh/s for cheaper than most dev fpgas.
|
|
|
|
Silverpike
Newbie
Offline
Activity: 54
Merit: 0
|
 |
August 26, 2011, 07:52:59 AM |
|
check cablesaurus.com they have a pre order going for fpga miner boards that do 100-200 mh/s for cheaper than most dev fpgas.
That is very misleading. That is only a pre-order. These boards won't be available for at least a month or so yet, and the price IIRC is around $400.
|
|
|
|
makomk
|
 |
August 26, 2011, 09:39:41 AM |
|
That is very misleading. That is only a pre-order. These boards won't be available for at least a month or so yet, and the price IIRC is around $400. Yeah, they're not actually that much cheaper than the equivalent full dev boards and your ability to develop other stuff for them is vey constrained. I'd suggest buying a DE0-nano if it wasn't for the fact that they've cheaped out on the power circuitry to the point it's very difficult to get reliable mining at any speed, let alone max the chip out. If they'd used more efficient power regulation circuitry it ought to top out at about 25 MHash/sec, probably even with just USB power, which isn't bad for the price.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
newMeat1
|
 |
August 26, 2011, 02:29:05 PM Last edit: August 26, 2011, 02:51:11 PM by newMeat1 |
|
Yeah, they're not actually that much cheaper than the equivalent full dev boards I take exception to that! Especially if you get the dual FPGA board, it's quite a bit cheaper than any commercial dev board. I mean, you get two Spartan 6 LX150's for the same price as a dev board with just one Even if you get the single FPGA board, it's about 2/3 the cost of any commercial dev board
|
|
|
|
makomk
|
 |
August 26, 2011, 06:10:48 PM Last edit: August 26, 2011, 09:04:23 PM by makomk |
|
I take exception to that! Especially if you get the dual FPGA board, it's quite a bit cheaper than any commercial dev board. I mean, you get two Spartan 6 LX150's for the same price as a dev board with just one
I have to admit that the dual FPGA board is looking rather better price-wise. Even if you get the single FPGA board, it's about 2/3 the cost of any commercial dev board
Presumably it has less flexibility too, though I'm not sure precisely what you're planning... ( Edit: On an unrelated note, #*!$ing Altera and their undocumented altsource_probe JTAG protocol... Edit 2, hours later: Here, have a incredibly ugly Python hack to speak to FPGAs running fpgaminer's virtual_wire-based code over USB Blaster. It won't bite much.)
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
makomk
|
 |
August 27, 2011, 09:54:09 AM Last edit: August 27, 2011, 11:39:17 AM by makomk |
|
OK, I'm now mining on an Altera FPGA with poclbm. (Well, some heavily hacked-together Python code based on poclbm to be exact.) You can find the code in question on Git here but it's a bit of a pain to use right now; you have to create a new BSDL directory, obtain the BSDL file for the FPGA you're using and copy it to the directory, edit the source to use the correct directory, and then run it and hope it finds the correct USB Blaster and works. Oh, and it needs UrJTAG installed. In theory this means that you can now mine using fpgaminer's code for Altera FPGAs that communicates over JTAG without having any Altera software installed. (It also has long polling support obviously.) Edit: Now has a bsdl/ directory in the source tree to place the bsdl files in.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
fpgaminer (OP)
|
 |
August 27, 2011, 06:02:33 PM |
|
My god makomk, how did you work out how to talk to altsource_probes? Debug the protocol? I will certainly be reading over that code today  Fantastic work. I'll have to see if Altera provides something similar to BSCAN_SPARTAN6, though, which should make the whole thing a lot easier. Only problem with the UrJTAG Python code is that pexpect is UNIX specific. There's a Windows port called wexpect I have been playing with. Its project on Google Code is inaccessible at the moment, for reasons unknown. The random wexpect.py file I found lying around some dusty corner of the internet works, but has a few bugs I had to work around. It's a real shame. UrJTAG is a nice program. Perhaps it would be worthwhile to write a SWIG based wrapper around all of its (apparently undocumented) C API.
|
|
|
|
makomk
|
 |
August 27, 2011, 06:39:47 PM Last edit: August 27, 2011, 07:17:41 PM by makomk |
|
My god makomk, how did you work out how to talk to altsource_probes? Debug the protocol? I will certainly be reading over that code today  Fantastic work. Via somewhat dubious methods that I'm really hoping won't get me in trouble with Altera legal - you might want to hold off on that. It's actually an incredibly straightforward bit of functionality on top of the rather hairier - but documented - virtual JTAG layer. Thankfully someone else had already written code for talking to UrJTAG and enumerating virtual JTAG nodes, even if they did foolishly believe the documentation (which someone else had fortunately already discovered was wrong.) I'll have to see if Altera provides something similar to BSCAN_SPARTAN6, though, which should make the whole thing a lot easier. I don't think they do. According to the virtual JTAG documentation, their approach is really complicated and involves duplicating the entire JTAG state machine in LUTs using code that we don't have access to at a low enough level. Edit: The closest equivalent is their virtual JTAG layer; you should be able to do pretty much all of the same things with that as you can with BSCAN_SPARTAN6, but talking to it involves most of the same complications as talking to altsource_probes. At least virtual JTAG is documented I guess. Only problem with the UrJTAG Python code is that pexpect is UNIX specific. There's a Windows port called wexpect I have been playing with. Its project on Google Code is inaccessible at the moment, for reasons unknown. The random wexpect.py file I found lying around some dusty corner of the internet works, but has a few bugs I had to work around.
It's a real shame. UrJTAG is a nice program. Perhaps it would be worthwhile to write a SWIG based wrapper around all of its (apparently undocumented) C API.
I didn't notice that issue...
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
inh
|
 |
August 27, 2011, 09:55:31 PM |
|
Mining through JTAG is pretty sweet Would any of you mind breaking down exactly what data is fed to the FPGA for number crunching? I've seen the wiki that says what goes in to a block header hash but I still have no idea what work a pool/bitcoind instance hands out after a getwork request.
|
|
|
|
Silverpike
Newbie
Offline
Activity: 54
Merit: 0
|
 |
August 27, 2011, 10:06:41 PM |
|
OK, I'm now mining on an Altera FPGA with poclbm. (Well, some heavily hacked-together Python code based on poclbm to be exact.) You can find the code in question on Git here but it's a bit of a pain to use right now; you have to create a new BSDL directory, obtain the BSDL file for the FPGA you're using and copy it to the directory, edit the source to use the correct directory, and then run it and hope it finds the correct USB Blaster and works. Oh, and it needs UrJTAG installed. In theory this means that you can now mine using fpgaminer's code for Altera FPGAs that communicates over JTAG without having any Altera software installed. (It also has long polling support obviously.) Edit: Now has a bsdl/ directory in the source tree to place the bsdl files in. Uhh, why?  I guess there is some appeal to not needing the FPGA software, but JTAG is ungodly slow. It's not an appropriate means of communication between a mining host and the FPGA at all. I don't think this idea is wise nor useful (except to save the step of FPGA programming I guess).
|
|
|
|
fpgaminer (OP)
|
 |
August 27, 2011, 10:12:25 PM |
|
A getwork request returns a Target (256-bits), Hash1 (256-bits), Midstate (256-bits), and Data (1024-bits). A SHA-256 hash needs two pieces of information, a 256-bit starting state, and 512-bits of data. It returns the resulting 256-bit hash. Here are the steps for a Bitcoin Hash: data = highest 512-bits of Data for nonce in range(0, 2**32): 4th 32-bit word of data = nonce hash = sha256(Midstate, data) hash = sha256(Sha256InitialState, Hash1 << 256 | hash)
if hash <= Target: Send a result back to the pool server or bitcoind
Note that Sha256InitialState is a 256-bit number defined by the SHA-256 standard. For FPGA mining, we assume that Hash1 is always the same (which it is), and that Target is always 0x00000000_FFFFFFFF_..._FFFFFFFF (which it is, for pool mining). As you can see in the first line of the pseudo-code above, only the last 512-bits of Data are used for hashing in this algorithm. Also, everything after the 4th 32-bit word of data is always the same. So, after all is said and done, the FPGA only needs the 256-bit Midstate, and 96-bits of Data. It then returns any 32-bit nonces that give us a hash <= 0x00000000_FFFFFFFF_..._FFFFFFFF. I've seen the wiki that says what goes in to a block header hash Than I should note that the 1024-bit Data value returned by getwork is actually just the block header, padded to 1024-bits. It's padded using SHA-256's padding algorithm. Check the SHA-256 wiki page if you are interested.
|
|
|
|
makomk
|
 |
August 27, 2011, 11:05:21 PM |
|
Uhh, why?  I guess there is some appeal to not needing the FPGA software, but JTAG is ungodly slow. It's not an appropriate means of communication between a mining host and the FPGA at all. I don't think this idea is wise nor useful (except to save the step of FPGA programming I guess). Because it's what the existing code that fpgaminer wrote was using and it saves trying to hook up an extra cable of some kind for communication, basically. There are already methods of running miners without using the FPGA software if you don't want to use JTAG.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
inh
|
 |
August 27, 2011, 11:28:22 PM |
|
Uhh, why?  I guess there is some appeal to not needing the FPGA software, but JTAG is ungodly slow. It's not an appropriate means of communication between a mining host and the FPGA at all. I don't think this idea is wise nor useful (except to save the step of FPGA programming I guess). Because it's what the existing code that fpgaminer wrote was using and it saves trying to hook up an extra cable of some kind for communication, basically. There are already methods of running miners without using the FPGA software if you don't want to use JTAG. I'm no expert but JTAG is plenty fast enough to deal with a getwork request every 21 seconds or so  There isn't a lot of data. fpgaminer thank you for the explanation!
|
|
|
|
makomk
|
 |
August 28, 2011, 01:02:23 PM |
|
Only problem with the UrJTAG Python code is that pexpect is UNIX specific. There's a Windows port called wexpect I have been playing with. Its project on Google Code is inaccessible at the moment, for reasons unknown. The random wexpect.py file I found lying around some dusty corner of the internet works, but has a few bugs I had to work around. Apparently there's something called winpexpect; I have no idea how well it works though. It's a real shame. UrJTAG is a nice program. Perhaps it would be worthwhile to write a SWIG based wrapper around all of its (apparently undocumented) C API.
It looks like I might want to do that anyway; UrJTAG is using excessive CPU for me and I tracked it down to its readline support, which cannot be disabled at runtime.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
|