Bitcoin Forum
May 11, 2024, 10:05:22 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4]  All
  Print  
Author Topic: Nanominer Announcement  (Read 11679 times)
phr33
Full Member
***
Offline Offline

Activity: 226
Merit: 100


View Profile
April 10, 2012, 06:37:50 PM
 #61

I'm still green when it comes to implementing these brute force cores, but I'm picking up.

I was browsing through your code and one thing struck me; you are using the 256 bit "data" input both for setting the internal state of the first sha round and as the end part of whats hashed in the first round.
In the Icarus and Open-Source-FPGA-Bitcoin-Miner code they have separate 256 bit init state and 96 bits of "data" that is appended to the nonce.

I can't quite work out what those 96 bits are. Bit 64 to 127 of the header. Reversed or not, it doesnt make much sense to me. It should at least not be the same as the init of the hash round.

My BTC input: 1GAtPwoTGPQ35y9QugJueum5GzaEzLYjiQ
My GPG ID: B0CCFD4A
I HATE TABLES I HATE TABLES I HA(╯°□°)╯︵ ┻━┻ TABLES I HATE TABLES I HATE TABLES
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
wondermine (OP)
Newbie
*
Offline Offline

Activity: 59
Merit: 0


View Profile
April 10, 2012, 06:57:51 PM
Last edit: April 11, 2012, 11:43:26 AM by wondermine
 #62

Sorry to do the double-post but here are some numbers from the Cyclone V series FPGAs:

5CGXBC7D6F31C7 (Grade 7): 215.84 219.93 MHz

I'm liking this device family already.  As always, more to come.

P.S. This design does not "take up too many IOs", taking up too many IOs assumes you don't use some sort of serialization, which is rather absurd.

P.P.S. When compiled for the Stratix III EP3SL100F1152C2, the fmax is reported as 229.52 MHz, if you were wondering.

*These values are sans optimizations... if anyone can tell me how to make Quartus not synthesize away multiple cores, please let me know, and then I can give you some numbers that more likely reflect reality. (Although there's probably a problem with the core.vhd I need to fix... I work way too much and I have an exam later... I need to leave this alone.)
wondermine (OP)
Newbie
*
Offline Offline

Activity: 59
Merit: 0


View Profile
April 10, 2012, 07:00:06 PM
 #63

I'm still green when it comes to implementing these brute force cores, but I'm picking up.

I was browsing through your code and one thing struck me; you are using the 256 bit "data" input both for setting the internal state of the first sha round and as the end part of whats hashed in the first round.
In the Icarus and Open-Source-FPGA-Bitcoin-Miner code they have separate 256 bit init state and 96 bits of "data" that is appended to the nonce.

I can't quite work out what those 96 bits are. Bit 64 to 127 of the header. Reversed or not, it doesnt make much sense to me. It should at least not be the same as the init of the hash round.

Probably belongs in a SHA-256 thread, but I think what you're referring to has to do with precalculated(or -able) round values, and the initial value for those.
If there's a problem with the core.vhd, well, that's cause it's a work in progress, it'll get worked out.  And smaller... Smiley
lame.duck
Legendary
*
Offline Offline

Activity: 1270
Merit: 1000


View Profile
April 10, 2012, 07:40:47 PM
 #64

P.S. This design does not "take up too many IOs", taking up too many IOs assumes you don't use some sort of serialization, which is rather absurd.

Well, maybe you could tell me  how you would call it to produce an ip core with 288 IOs for a device that has only 167 (or so) soldered on a board with some SDRAM etc ending with  80 Usable user IOs? Btw, you could enligten me wich microcontroller you plan to use that can write 256 bits at once. There is no bottleneck in using  some sort of serialization at all. And even if there were, you could always reduce the bandwith requirement by implementing roll-n-times in hardware.
phr33
Full Member
***
Offline Offline

Activity: 226
Merit: 100


View Profile
April 10, 2012, 08:10:04 PM
 #65

P.S. This design does not "take up too many IOs", taking up too many IOs assumes you don't use some sort of serialization, which is rather absurd.

Well, maybe you could tell me  how you would call it to produce an ip core with 288 IOs for a device that has only 167 (or so) soldered on a board with some SDRAM etc ending with  80 Usable user IOs? Btw, you could enligten me wich microcontroller you plan to use that can write 256 bits at once. There is no bottleneck in using  some sort of serialization at all. And even if there were, you could always reduce the bandwith requirement by implementing roll-n-times in hardware.


The "control" entity is obviously not meant to be the top of the design. You would accompany it with some kind of interface. Check the Open source miner project. There you have both RS232 interface and through Altera's "virtual wire".
As for bandwidth, you really don't need any. You just send a bunch of bytes (256-isch) to fire off a decent sized job Smiley

My BTC input: 1GAtPwoTGPQ35y9QugJueum5GzaEzLYjiQ
My GPG ID: B0CCFD4A
Jason
Member
**
Offline Offline

Activity: 114
Merit: 10


View Profile
April 11, 2012, 01:57:30 PM
Last edit: April 11, 2012, 04:07:22 PM by Jason
 #66

I just took a quick look over Nanominer's code to see what he's doing.  He appears to be implementing SHA-256 a bit differently from the other approaches I've seen.  In particular, he has not unrolled the hash so it requires 64 clock cycles to complete each hash.  This would be analagous to compiling fpgaminer's code with LOG_LOOP2=6.  However, Nanominer then appears to be running 10 (configurable) of these cores in parallel with each other.  With a clock rate of 200MHz, this would lead to 200*10/(64*2) or about 16 MH/s by my calculations (since bitcoin hashes require two SHA-256 hashes each).

Wondermine, I'm not sure how you are coming up with the higher numbers for hash rates.  You would need to fit 50 of these on a 115K LE Cyclone IV along with the associated control circuitry in order to reach the same ballpark as can be achieved with fpgaminer's code with Makomk's modifications.  Do you really expect to be able to fit significantly more than this on this FPGA?

Oh, and if you want to preserve logic so that the optimizer does not get rid of it, use the preserve_fanout_free_node option in the assignments editor on the pin(s) you want to preserve and then you should be able to see how much additional optimization the compiler is capable of.

BM-2D7sazxZugpTgqm3M2MCi5C1t8Du8BN11f
phr33
Full Member
***
Offline Offline

Activity: 226
Merit: 100


View Profile
April 12, 2012, 07:12:36 PM
 #67

I'm still green when it comes to implementing these brute force cores, but I'm picking up.

I was browsing through your code and one thing struck me; you are using the 256 bit "data" input both for setting the internal state of the first sha round and as the end part of whats hashed in the first round.
In the Icarus and Open-Source-FPGA-Bitcoin-Miner code they have separate 256 bit init state and 96 bits of "data" that is appended to the nonce.

I can't quite work out what those 96 bits are. Bit 64 to 127 of the header. Reversed or not, it doesnt make much sense to me. It should at least not be the same as the init of the hash round.

Probably belongs in a SHA-256 thread, but I think what you're referring to has to do with precalculated(or -able) round values, and the initial value for those.
If there's a problem with the core.vhd, well, that's cause it's a work in progress, it'll get worked out.  And smaller... Smiley

I had a second look. I think I was right in the first place. There are 76 'static' bytes in the header before the nonce. The midstate is the internal state of the hash core after hashing the first 64 bytes. The remaining 12 bytes will be paired up with the 4 byte nonce and 48 bytes of padding (The header is 80 bytes, but will be padded up to even multiple of 64 bytes - the sha256 blocksize).

So you really do need both the 32 byte midstate and the 12 byte 'data'.

But as you said; that's all in the control logic and I understand it's under development Smiley
Nice work!

My BTC input: 1GAtPwoTGPQ35y9QugJueum5GzaEzLYjiQ
My GPG ID: B0CCFD4A
lame.duck
Legendary
*
Offline Offline

Activity: 1270
Merit: 1000


View Profile
March 21, 2013, 10:03:10 PM
 #68

From the reame file:
Quote
I have also run at 170MHz (SPEED_MHZ=17) using a custom hardwired 1.2 volt core supply which
gave my maximum achived throughput of 28.3 MHash/s. Attempting to run at 180MHz gave bad hash
results, so this is the limit. Note that this draws 1.7amps which is outside the spec of the
DE0-Nano regulators, hence the custom power supply hack.

40 MHash/s is  probably not possible, the 28.3 should be  possible, another design by makomk  is/was  running at 27,5 MHz, mayby you could push  it  a little further using the timing headroom of the chips and/or higher  volages and cooling  effort.
kramble
Sr. Member
****
Offline Offline

Activity: 384
Merit: 250



View Profile WWW
March 31, 2013, 05:52:46 PM
Last edit: March 31, 2013, 06:47:12 PM by kramble
 #69

40 MHash/s is  probably not possible, the 28.3 should be  possible, another design by makomk  is/was  running at 27,5 MHz, mayby you could push  it  a little further using the timing headroom of the chips and/or higher  volages and cooling  effort.

I agree. I was really pushing it running at that speed (it was a bit of a dare to see if I could match Makomk's results), and you have to be very careful with the power and cooling (it certainly won't work just using the USB supply, and would be foolish to try).

I'm not really that much of an expert with the Quartus software, so by tweaking the compiler settings it may be possible to better this (the fmax for this 170MHz build was around 150MHz, so it should not really have worked at all). One thing to check is the PLL multiply/divide ratios as its using 50MHz * 17 / 5 which may not be an optimal way to configure it. I did try running at 180Mhz but got no hashes at all, and decided that was probably the best I could manage so I called it a day at that.

EDIT I just realized this is a year old thread, and only peripherally to do with the DE0-Nano. Perhaps Ersch would like to PM wondermine to see if he actually tested it at 40MH/s (he seems to still be active on the board, just about).

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
Pages: « 1 2 3 [4]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!