|
TheSeven
|
 |
July 03, 2011, 01:19:28 PM |
|
Damn, that's just about the worst case. There are only two means of accessing this FPGA: - Ethernet (not exactly trivial on the FPGA side, but in theory this board could mine standalone!) - A built-in USB blaster (USB to JTAG bridge), which cannot easily be accessed from anything else than the Altera tools.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
lame.duck
Legendary
Offline
Activity: 1270
Merit: 1000
|
 |
July 03, 2011, 01:58:00 PM |
|
Damn, that's just about the worst case. There are only two means of accessing this FPGA: - Ethernet (not exactly trivial on the FPGA side, but in theory this board could mine standalone!) - A built-in USB blaster (USB to JTAG bridge), which cannot easily be accessed from anything else than the Altera tools.
Hm, there is a FPGA communication package that 'speaks' UDP on opencores that could be used. Of course the can be some packets missing, since UDP has no provision for package loss ( i would use a sequence number an send a nonce multiple times ... The other was would be the jtag solution, there is an advance debug system on opencores that uses jtag, as far i understand the docs, it does not require Altera software, but i think it needs some more effort to get it running. I did not dug into, as i am a fan of the serial solution but on the other hand, i still looking for transfering the jtag-communication to an arm system so i could power off my PC at night (given there are no compilation task pending)
|
|
|
|
TheSeven
|
 |
July 03, 2011, 02:44:54 PM |
|
Damn, that's just about the worst case. There are only two means of accessing this FPGA: - Ethernet (not exactly trivial on the FPGA side, but in theory this board could mine standalone!) - A built-in USB blaster (USB to JTAG bridge), which cannot easily be accessed from anything else than the Altera tools.
Hm, there is a FPGA communication package that 'speaks' UDP on opencores that could be used. Of course the can be some packets missing, since UDP has no provision for package loss ( i would use a sequence number an send a nonce multiple times ... The other was would be the jtag solution, there is an advance debug system on opencores that uses jtag, as far i understand the docs, it does not require Altera software, but i think it needs some more effort to get it running. I did not dug into, as i am a fan of the serial solution but on the other hand, i still looking for transfering the jtag-communication to an arm system so i could power off my PC at night (given there are no compilation task pending) The problem with the JTAG solution is that you need to have a way to access the JTAG interface in the first place. And IIRC communicating with those blasters is a PITA. UDP might make more sense, but I've never dealt with that so far. Hooking up some level shifters to one of those gigabit transceivers and driving RS232 on them might be another possibility 
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
vx609e
Newbie
Offline
Activity: 29
Merit: 0
|
 |
July 03, 2011, 11:54:07 PM |
|
UDP might make more sense, but I've never dealt with that so far.
According to this tutorial, it looks easy stuff: http://www.fpga4fun.com/10BASE-T2.html
|
|
|
|
TheSeven
|
 |
July 04, 2011, 08:27:25 AM |
|
Hm, this assumes that you're directly driving the pins of the ethernet port. On this board, we have an ethernet PHY chip located in between (which should make things easier), and probably have an ethernet MAC block on the FPGA (which should make things easier as well, but we'll first need to figure out how to actually use it). I'm wondering if there is any ethernet FPGA design for this board floating around on the net?
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
|
Dolphin
Newbie
Offline
Activity: 10
Merit: 0
|
 |
July 05, 2011, 07:56:51 AM |
|
Hi there! I've just gotten out of the newbie area, so finally I can post here too  First I want to thank all of you for the great effords you have put in making this FPGA solution for bitcoin mining. I'm new to bitcoin but I already calculated and understanded, that I won't be able to make much money/bitcoins by buying me a handful of GPUs. Electricity costs too much for me and the risks are high, that difficulty keeps on growing. So I look for alternative / new ways for mining. I'm also an electrical engineer and really interested in FPGA development. But I still have much to learn, so I will take the oppertunity by looking at this project. So far I have managed to synthesize the Xilinx VHDL port (I think I have to thank TheSeven for that) for my tiny AVnet Spartan3a board. http://shop.trenz-electronic.de/catalog/product_info.php?products_id=456 The board posseses a Xilinx Spartan3A, 400K gates, speed grade -4 FPGA. I had to "modify" the board, since the original TI voltage regulators somehow burnt through and I wasn't able to get replacement parts for them of the same type. So the current voltage regulator is an LM350 (3A version) for the core voltage. I had great problems to get this project up and running for me. At first ISE wasn't able to map the design onto my Spartan or the serial communication was not working (was the FPGA working at all?). At this time I think I used the verilog port for Xilinx. The next problem was to adjust the clock rate to fit my board. I had to reduce the clock rate to 64MHz. At the end I had problems in getting the python miner running on windows. Here are my statistics for this project, for anyone who is interested in them: - ISE Version used: 12.3 - FPGA type: xc3s400a,ft256,-4 - Clock rate: 69,34 MHz (I consider this overclocked, since it definitely massively violates timing constraints... 64 MHz looked way more stable) - Performance: Measuring FPGA performance... 2.167228 MH/s - pyfpgaminer successfully submitted some shares (so FPGA seems to calculate correctly) - Synthesis results: Logic Utilization | Used | Available | Utilization | Number of Slice Flip Flops | 3,948 | 7,168 | 55% | Number of occupied Slices | 3,435 | 3,584 | 95% | Total Number of 4 input LUTs | 5,477 | 7,168 | 76% | Average Fanout of Non-Clock Nets | 2.56 | | | | | | |
- Path delay example: 23.088ns (11.918ns logic, 11.170ns route) (51.6% logic, 48.4% route) - Empirical temperatures: - FPGA: almost cold - Voltage regulator: ... lets see ... OUCH ... it's hot! - ESD protection: still intact on FPGA after empirical temperature tests... Next steps for me: - Fully understand the design, the SHA256 algorithm and the miner code  - Try to optimize some bits - Get a damn big and cheap FPGA from somewhere for further testing - I want those 100+ MHash/s - Report back with new results
|
|
|
|
themike5000
Member

Offline
Activity: 99
Merit: 10
|
 |
July 05, 2011, 12:31:15 PM |
|
Is there any way to run the fpgaminer program on a pc without a quartus license?
|
Vertcoin: VdHjU3L2dcHCR3uQmqpM6mf4LCvp2678wh
|
|
|
lame.duck
Legendary
Offline
Activity: 1270
Merit: 1000
|
 |
July 05, 2011, 01:24:38 PM |
|
Is there any way to run the fpgaminer program on a pc without a quartus license?
No, at present you would require at least the quartus web edition. You could use the jtag access method with quartus but this would require some scripting which is quite complicated to setup if you didn't that before. the serial solutions is much easier to go.
|
|
|
|
themike5000
Member

Offline
Activity: 99
Merit: 10
|
 |
July 05, 2011, 01:38:11 PM |
|
Is there any way to run the fpgaminer program on a pc without a quartus license?
No, at present you would require at least the quartus web edition. You could use the jtag access method with quartus but this would require some scripting which is quite complicated to setup if you didn't that before. the serial solutions is much easier to go. I downloaded the web edition, hoping that without license i could still get it to work, no dice.
|
Vertcoin: VdHjU3L2dcHCR3uQmqpM6mf4LCvp2678wh
|
|
|
LazarusLong
Newbie
Offline
Activity: 16
Merit: 0
|
 |
July 09, 2011, 01:31:20 PM |
|
Hi guys, I have finished the first version of my C port of TheSevens pyminer fpga-cminer 0.1 : http://pastebin.com/JRPsnpeJIt is not as advanced yet but it works on systems without python. My next step is a I2C interface for mutliple boards with small hashing power, as I have plenty of them  b.r Lazarus
|
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
 |
July 11, 2011, 11:38:26 PM |
|
This may be a dumb question, but ...
Does the FPGA bitcoin miner use an assembly-line approach? That is, while you're doing SHA round 5 on nonce 1, are you doing SHA round 4 on nonce 2, round 3 on nonce 3, round 2 on nonce 4, round 1 on nonce 5 and preparing nonce 6?
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
fpgaminer (OP)
|
 |
July 12, 2011, 01:59:26 AM |
|
This may be a dumb question, but ...
Does the FPGA bitcoin miner use an assembly-line approach? That is, while you're doing SHA round 5 on nonce 1, are you doing SHA round 4 on nonce 2, round 3 on nonce 3, round 2 on nonce 4, round 1 on nonce 5 and preparing nonce 6?
Great question! Yes, that is what it does when it is fully unrolled. You can reduce the amount of unrolling which, by your analogue, would be like shrinking the number of workers on the assembly line, and make the conveyor belt of this SHA computing factory move slower. Taken to the extreme, there could be just one factory worker, banging away on a single nonce for 128 cycles before moving onto the next nonce. Rolling it up, reducing the number of workers, helps save resource consumption on the FPGA and allows it to fit on a smaller FPGA. But since the conveyor is moving slower, performance is reduced by the same amount.
|
|
|
|
newMeat1
|
 |
July 12, 2011, 02:17:08 AM |
|
So is there any overhead loss when you "reduce the number of workers?" Or is it linear? half the workers, half the hash rate?
|
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
 |
July 12, 2011, 02:59:14 AM |
|
Great question! Yes, that is what it does when it is fully unrolled. You can reduce the amount of unrolling which, by your analogue, would be like shrinking the number of workers on the assembly line, and make the conveyor belt of this SHA computing factory move slower. Taken to the extreme, there could be just one factory worker, banging away on a single nonce for 128 cycles before moving onto the next nonce. Awesome. One more question -- when fully unrolled, how many clock cycles per nonce tested?
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
fpgaminer (OP)
|
 |
July 12, 2011, 05:02:14 AM |
|
So is there any overhead loss when you "reduce the number of workers?" Or is it linear? half the workers, half the hash rate? There is overhead, because when fully unrolled it can take advantage of several optimizations specific to each round of SHA-256 with respect to Bitcoin. When you roll it up, you lose all those optimizations; each round calculator/worker needs to be generalized. Like the assembly like example, if each worker only does one specific thing, they can be very efficient at that one thing. But if they need to do different things at different times then they lose that specialization advantage. Awesome. One more question -- when fully unrolled, how many clock cycles per nonce tested? One cycle per nonce, fully unrolled. So 100MHz = 100MHash/s. And in case it isn't clear, that is a full Bitcoin Hash; two passes of SHA-256 every clock cycle.
|
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
 |
July 12, 2011, 11:48:10 PM |
|
One cycle per nonce, fully unrolled. So 100MHz = 100MHash/s. And in case it isn't clear, that is a full Bitcoin Hash; two passes of SHA-256 every clock cycle. Wow. That's quite impressive. Now we need to make an ASIC that runs at 1GHz with 12 fully-unrolled miners on it. Then we need to put four of them on a card.
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
kokjo
Legendary
Offline
Activity: 1050
Merit: 1000
You are WRONG!
|
 |
July 13, 2011, 09:48:27 AM |
|
One cycle per nonce, fully unrolled. So 100MHz = 100MHash/s. And in case it isn't clear, that is a full Bitcoin Hash; two passes of SHA-256 every clock cycle. Wow. That's quite impressive. Now we need to make an ASIC that runs at 1GHz with 12 fully-unrolled miners on it. Then we need to put four of them on a card. hmm. if we did that there would be a bottleneck. there would be a need to call getwork every 1/3 s. for every chip, right?
|
"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
|
|
|
JoelKatz
Legendary
Offline
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
|
 |
July 13, 2011, 09:53:48 AM |
|
hmm. if we did that there would be a bottleneck. there would be a need to call getwork every 1/3 s. for every chip, right?
I load-tested my getwork optimization patches with a script that does 1,000 getwork queries. The script takes about 1/10th of a second. You need one work unit per 2^32 hashes. We can easily generate 10,000 work units a second without even doing any serious optimization. That would sustain 43THash/s on my lowly Core 2 Quad.
|
I am an employee of Ripple. Follow me on Twitter @JoelKatz 1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
|
|
|
|