Bitcoin Forum
November 13, 2024, 10:26:07 PM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 [33] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 »
  Print  
Author Topic: FPGA development board "Icarus" - DisContinued/ important announcement  (Read 207282 times)
Glasswalker
Sr. Member
****
Offline Offline

Activity: 407
Merit: 250



View Profile WWW
March 09, 2012, 07:30:58 PM
 #641

Even if you could do it without horribly bad things happening Wink I doubt you would get any benefit from it.

Reason being the limiting factor right now isn't the clock speed of the chips, but the delay caused by the critical path. If the critical path takes 5ns, your max clock speed attainable will be 200Mhz. If your critical path takes 10ns, it's 100Mhz, and so on... The problem is that by doubling up on rising/falling edge, you're doing work twice per clock. So if it takes 10ns, you need 20ns of clearance because of the "double work" meaning that the same 5ns critical path delay that caused a 200Mhz cap before, will now cause a 100Mhz cap (or lower due to other inefficiencies). So you will AT BEST break even with performance, but more likely make it worse.

BattleDrome: Blockchain based Gladiator Combat for fun and profit!
http://www.battledrome.io/
Inspector 2211
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250



View Profile
March 09, 2012, 07:34:41 PM
 #642

I would be grateful if someone with good FPGA programming experience answers this question:

Is it possible to make use of both clock edges to improve the mining speed?

For example: replacing always@(posedge CLK) by always@(posedge CLK or negedge CLK)

Zhang've told me that this would lead to a disaster! I am still wondering if its possible to use a double edged clock design @ lower MHz "100->133"!

The top frequency for this fully unrolled and cascaded double SHA-256 is determined by one of the following two constraints:
- Longlines from one stage to the next (as EldenTyrell has pointed out, longlines are used in the middle of both 64 stage SHA blocks, because the FPGA is simply not wide enough)
- The 32 bit wide ripple carry adder, I think there are 6 or 7 of them per stage, two if them in series.
  Thus, if, say, one of these adders took 2.3 ns (guessing), then two of them in series would take 4.6 ns and that's that.

(That said: I don't know which of these two constraints is the costliest/slowest/longest one, and I'd be glad if someone could point that out and specify both of them in nanoseconds.)

So, just because you clock the flip-flops on both clock edges and thus halve the clock, your mining is not going to get any faster.

               ▄█▄
            ▄█ ▀█▀
     ▄ ▄███▄▄████▄▀ ▄▄▀▄
    ▀█▄████
██████▀▄█████▀▄▀
   ▄█▀▄
███████████████████▄
 ▄██▀█▀
▀▀▀███▀▀▀█████▄▄▄▀█▀▄
 ▄█▀▀   ▀█
███▀▄████████ █▀█▄▄
██▀  ▀ ▀ ▀
██████████▄   ▄▀▀█▄
     ▀ ▀
  ███▀▀▀▀▀████▌ ▄  ▀
          ████████████▌   █
        █████████████▀
        ▀▀▀██▀▀██▀▀
           ▀▀  ▀▀
BTC-GREEN       ▄▄████████▄▄
    ▄██████████████▄
  ▄██████
██████████████▄
 ▄███
███████████████████▄
▄█████████████████████████▄
██████████████████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
 ▀███████████████████████▀
  ▀█████████████████████▀
    ▀█████████████████
       ▀▀█████████▀▀
Ecological Community in the Green Planet
❱❱❱❱❱❱     WHITEPAGE   |   ANN THREAD     ❰❰❰❰❰❰
           ▄███▄▄
       ▄▄█████████▄
      ▄████████████▌
   ▄█████████████▄▄
 ▄████████████████████
███████████████▄
▄████████████████████▀
███████████████████████▀
 ▀▀██████▀██▌██████▀
   ▀██▀▀▀  ██  ▀▀▀▀▀▀
           ██
           ██▌
          ▐███▄
.
Energizer
Sr. Member
****
Offline Offline

Activity: 273
Merit: 250



View Profile
March 09, 2012, 08:22:32 PM
 #643

Even if you could do it without horribly bad things happening Wink I doubt you would get any benefit from it.

Reason being the limiting factor right now isn't the clock speed of the chips, but the delay caused by the critical path. If the critical path takes 5ns, your max clock speed attainable will be 200Mhz. If your critical path takes 10ns, it's 100Mhz, and so on... The problem is that by doubling up on rising/falling edge, you're doing work twice per clock. So if it takes 10ns, you need 20ns of clearance because of the "double work" meaning that the same 5ns critical path delay that caused a 200Mhz cap before, will now cause a 100Mhz cap (or lower due to other inefficiencies). So you will AT BEST break even with performance, but more likely make it worse.

Thank you for your detailed answer!
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
March 09, 2012, 09:06:48 PM
 #644

In verilog, the generate block when you put a for loop in it, will synthesize that out into multiple blocks of logic (think of it as a fast way to instantiate chunks of logic multiple times over).

So when he's copying data from registers in S[i-1] to the current registers you're right he's moving it from the previous pipeline stage to the current pipeline stage. But that for loop instantiates the number of stages in the pipe as STAGES. (so 64 by default). That's the full 64 stage sha pipeline. Each individual block within a stage doesn't seem to be split further.

At least that's what I got out of his method by reading the code, and it's how I've built mine Wink

That entirely depends on your definition of the word "stage", which seems to match my definition of the word "round" in this case.
So in your terminology, each stage has a latency of two clock cycles.
In my terminology, a pipeline stage has a latency of one clock cycle per definition, so one "iteration" of that generate loop actually produces two (chained) pipeline stages. Then, afterwards, 64 of those stage pairs are chained together to form a single sha256 core, and two of those cores are chained together to form a full bitcoin hasher core. Then there's usually some more control logic around it which adds a couple of cycles of latency, so we'll end up somewhere around 260 clock cycles total latency.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Glasswalker
Sr. Member
****
Offline Offline

Activity: 407
Merit: 250



View Profile WWW
March 10, 2012, 01:26:31 AM
 #645

Yeah it could just be a terminology thing Wink lol

So what you're saying you see 2 clock cycles within the generate loop? I had misinterpreted that then I thought each iteration of the loop was only one clock cycle.

Can you elaborate on which point in the loop is which clock cycle? (looking at it again I'm still only seeing one clock cycle lol, so either I'm badly mis-reading, or I'm just missing something).

Thanks!

BattleDrome: Blockchain based Gladiator Combat for fun and profit!
http://www.battledrome.io/
Energizer
Sr. Member
****
Offline Offline

Activity: 273
Merit: 250



View Profile
March 10, 2012, 02:07:19 AM
 #646

Today I've uploaded the 200MHz bitstream on 6 boards. 4 boards got high % of invalid shares! and 2 boards had normal invalid rate "1 board got 0% and another 0.1%". I have the 6 boards divided on two towers, each has 3 boards. I found out that the 2 boards that got 0 and 0.1 invalids are the top ones! "room temp: ~22C".

If anyone is planning to upgrade to the 200 MHz bitstream without side fans, it is better not to have the boards on top of each other in a tower! and make sure there is enough space between each board!

I will re-test everything soon with side fans!
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
March 10, 2012, 02:12:06 AM
 #647

Yeah it could just be a terminology thing Wink lol

So what you're saying you see 2 clock cycles within the generate loop? I had misinterpreted that then I thought each iteration of the loop was only one clock cycle.

Can you elaborate on which point in the loop is which clock cycle? (looking at it again I'm still only seeing one clock cycle lol, so either I'm badly mis-reading, or I'm just missing something).

Thanks!

Just look at https://bitcointalk.org/index.php?topic=51371.msg792660#msg792660 again.
If you've never done hardware design before, this might be a bit confusing, but everything that's written in that HDL file will happen in parallel, not sequentially as it would be in most programming languages.
In the loop, the previous state and data are copied to state_buf/data_buf. In parallel, the (old) contents of state_buf/data_buf are used to calculate the next state/data value.
Because of this, it will take two clock cycles for the values from S[i-1].state to propagate (though state_buf) to state. The generate loop basically just duplicates that code 64 times, but has no effect on "execution order", if there even is such a thing in HDL.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
March 10, 2012, 02:13:42 AM
 #648

Today I've uploaded the 200MHz bitstream on 6 boards. 4 boards got high % of invalid shares! and 2 boards had normal invalid rate "1 board got 0% and another 0.1%". I have the 6 boards divided on two towers, each has 3 boards. I found out that the 2 boards that got 0 and 0.1 invalids are the top ones! "room temp: ~22C".

If anyone is planning to upgrade to the 200 MHz bitstream without side fans, it is better not to have the boards on top of each other in a tower! and make sure there is enough space between each board!

I will re-test everything soon with side fans!

Roughly how much is "a high percentage"?

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Energizer
Sr. Member
****
Offline Offline

Activity: 273
Merit: 250



View Profile
March 10, 2012, 02:21:37 AM
 #649

Today I've uploaded the 200MHz bitstream on 6 boards. 4 boards got high % of invalid shares! and 2 boards had normal invalid rate "1 board got 0% and another 0.1%". I have the 6 boards divided on two towers, each has 3 boards. I found out that the 2 boards that got 0 and 0.1 invalids are the top ones! "room temp: ~22C".

If anyone is planning to upgrade to the 200 MHz bitstream without side fans, it is better not to have the boards on top of each other in a tower! and make sure there is enough space between each board!

I will re-test everything soon with side fans!

Roughly how much is "a high percentage"?

around 7%, 10%, 12%, and 3% in just few mints! I then reset those 4 boards to the 190MHz bitstream, and kept the top ones @200MHz. The top ones been mining for more than 12 hours with the same invalid rate almost 0%!
Energizer
Sr. Member
****
Offline Offline

Activity: 273
Merit: 250



View Profile
March 10, 2012, 02:28:02 AM
 #650

Today I've uploaded the 200MHz bitstream on 6 boards. 4 boards got high % of invalid shares! and 2 boards had normal invalid rate "1 board got 0% and another 0.1%". I have the 6 boards divided on two towers, each has 3 boards. I found out that the 2 boards that got 0 and 0.1 invalids are the top ones! "room temp: ~22C".

If anyone is planning to upgrade to the 200 MHz bitstream without side fans, it is better not to have the boards on top of each other in a tower! and make sure there is enough space between each board!

I will re-test everything soon with side fans!

Roughly how much is "a high percentage"?

around 7%, 10%, 12%, and 3% in just few mints! I then reset those 4 boards to the 190MHz bitstream, and kept the top ones @200MHz. The top ones been mining for more than 12 hours with the same invalid rate almost 0%!

The first one and last one are running @200MHz

Invalid shares│Current│Average
 (K not zero) │MHash/s│MHash/s
      0 (0.0%)│ 400.03│ 393.83
      3 (0.1%)│ 379.83│ 365.15
      2 (0.1%)│ 379.88│ 380.07
      6 (0.2%)│ 380.18│ 374.18
      1 (0.0%)│ 379.79│ 376.41
      3 (0.1%)│ 399.92│ 399.31
kano
Legendary
*
Offline Offline

Activity: 4620
Merit: 1851


Linux since 1997 RedHat 4


View Profile
March 10, 2012, 02:35:07 AM
 #651

The ZTEX bitstream allows you to adjust the MHz value via direct USB calls.

Would that be possible with the Icarus (with a new bitstream) also?

My thoughts on getting cgminer to handle the ZTEX (which I failed to get done) were somewhat similar to a part of ZTEX's Java code.
Basically in the initial run to step up the MHz 1 or 2 at a time until hardware errors, and then forever after, step it down 1 every time it got a hardware error - with some RPC API option to force it back to the initial "steeping up" status or even to set the value from the RPC API (to step down from)

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
March 10, 2012, 02:46:22 AM
 #652

The ZTEX bitstream allows you to adjust the MHz value via direct USB calls.

Would that be possible with the Icarus (with a new bitstream) also?

My thoughts on getting cgminer to handle the ZTEX (which I failed to get done) were somewhat similar to a part of ZTEX's Java code.
Basically in the initial run to step up the MHz 1 or 2 at a time until hardware errors, and then forever after, step it down 1 every time it got a hardware error - with some RPC API option to force it back to the initial "steeping up" status or even to set the value from the RPC API (to step down from)

With the current interface, even if it would support that, the error rate resolution would be way too low. IMHO this is something that should be handled by the µC on the ztex board, offloading the work from the miner. The Icarus doesn't have a µC, so you have no other option than a software implementation, and that might not work too well with the rather slow serial port...

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
kano
Legendary
*
Offline Offline

Activity: 4620
Merit: 1851


Linux since 1997 RedHat 4


View Profile
March 10, 2012, 02:57:37 AM
 #653

Not sure if you meant this, but a Hardware Error is simple - it's when the value returned was no good
(and cgminer keeps track of that also)
So what I meant was each time a bad share was returned you simply step the clock down as small as possible.
As I've mentioned before - I've had zero HW errors since I started mining with my 2 Icarus.

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
Glasswalker
Sr. Member
****
Offline Offline

Activity: 407
Merit: 250



View Profile WWW
March 10, 2012, 04:48:37 AM
 #654

Just look at https://bitcointalk.org/index.php?topic=51371.msg792660#msg792660 again.
If you've never done hardware design before, this might be a bit confusing, but everything that's written in that HDL file will happen in parallel, not sequentially as it would be in most programming languages.
In the loop, the previous state and data are copied to state_buf/data_buf. In parallel, the (old) contents of state_buf/data_buf are used to calculate the next state/data value.
Because of this, it will take two clock cycles for the values from S[i-1].state to propagate (though state_buf) to state. The generate loop basically just duplicates that code 64 times, but has no effect on "execution order", if there even is such a thing in HDL.

Thanks! I've done hardware design before, but only very simple circuits in HDL. I've always been an oldschool schematic/block diagram guy, did some HDL way back, but only simple stuff, then haven't touched it since. So getting back into it now. As I said I've been writing my own bitcoin mining core, which can be hopefully synthesized for multiple boards and wrapped in whatever PC comms layer we want. But it's slow going lol...

Thanks for pointing that out, I had missed that double stage assignment. That's what I was looking for and just not seeing it. (I do know how blocking versus non blocking assignments work though lol)

Thanks for taking the time to answer that. Smiley

(interesting that so far my design doesn't have this extra stage in it, my SHA core is done, and I'm just building the testbenches for it now to validate it. But my SHA core (I believe) runs in 64 clock cycles (probably 65-66 due to initial loading logic, I'll have to doublecheck). This is purely un-optimized right now, for now I'm just getting a working SHA core and then building a bitcoin core, and finally I'll go back and tune/optimize. (right now I'm at about 50% utilization on an LX75 but my delay on my critical path is high, 11ns, so I'm limited at just under 100MHz, I'm getting one SHA hash per clock. So of I can get that to 100MHz initially, and can cram 4 of these cores into an LX150 I can get 2 bitcoin hashes per clock, at 100MHz. We'll see how it validates on the testbenches though, and if I'm able to optimize it (and how well). I'm hoping to opensource this, but want to get it to a working state first (a little embarrassed to release it in it's current state lol). Then hopefully the community can optimize further. I'll probably just release it and put up a donation address or something.

I know the LX75 is over-constraining and tends to screw up routing so I'm targeting it first as a "stress test", then once I get the design working on that I'll move it to the LX150 and see how it goes.

Also my design was is a "clean room" implementation of SHA256. I have gotten "tips" by a few on the forums here though for optimization methods. And I have looked over the ZTex code of course, but frankly I found it hard to read in places. So I figured re-implementing it would be a good learning experience to get my Verilog skills polished up anyway. I wrote it directly from the SHA2 spec without having the ZTex code open. (it's as cleanroom as you can get these days lol).

Right now I'm running into issues with the Xilinx simulator. It's being bitchy about simulating my code (even though it synthesizes fine), which is why I haven't completed a testbench sim of it yet. Also getting a lot of warnings about unconnected nets in synth, but that's because the top level module (bitcoin hashing core) isn't done yet. Just the lower level SHA core.

BattleDrome: Blockchain based Gladiator Combat for fun and profit!
http://www.battledrome.io/
chungenhung
Legendary
*
Offline Offline

Activity: 1134
Merit: 1005


View Profile
March 10, 2012, 07:17:01 AM
 #655

for those of your that have the boards, do they come with iron rods so you can stack them?
coblee
Donator
Legendary
*
Offline Offline

Activity: 1654
Merit: 1351


Creator of Litecoin. Cryptocurrency enthusiast.


View Profile
March 10, 2012, 09:39:38 AM
 #656

Lytro pic of my Icarus miner: https://bitcointalk.org/index.php?topic=68115.0

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
March 10, 2012, 10:35:50 AM
 #657

Not sure if you meant this, but a Hardware Error is simple - it's when the value returned was no good
(and cgminer keeps track of that also)
So what I meant was each time a bad share was returned you simply step the clock down as small as possible.
As I've mentioned before - I've had zero HW errors since I started mining with my 2 Icarus.
That's true, but ztex does more in this area. They don't just check difficulty 1 nonces, but rather the current nonce the FPGA is working on every couple of milliseconds. That way they can judge a lot better whether an invalid was just bad luck (some single event glitch) or whether that chip is being constantly overloaded.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
kano
Legendary
*
Offline Offline

Activity: 4620
Merit: 1851


Linux since 1997 RedHat 4


View Profile
March 10, 2012, 10:49:40 AM
 #658

Not sure if you meant this, but a Hardware Error is simple - it's when the value returned was no good
(and cgminer keeps track of that also)
So what I meant was each time a bad share was returned you simply step the clock down as small as possible.
As I've mentioned before - I've had zero HW errors since I started mining with my 2 Icarus.
That's true, but ztex does more in this area. They don't just check difficulty 1 nonces, but rather the current nonce the FPGA is working on every couple of milliseconds. That way they can judge a lot better whether an invalid was just bad luck (some single event glitch) or whether that chip is being constantly overloaded.
Well ztex may be more complex, but this simple solution would work also (if I could adjust the clock)
Still zero HW errors - 7 days Smiley
It certainly wouldn't bother me in the least if it was 1 or 2 MHz below the failure point since I would hazard a guess that the life of the FPGA would not be very good if it was constantly at borderline failure.
(from what Energizer posted the current setting is obviously further below the limit than 1 or 2)

ngzhang: is that a difficult thing to consider? (Being able to adjust the MHz clock speed with a command?)

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
ngzhang (OP)
Hero Member
*****
Offline Offline

Activity: 592
Merit: 501


We will stand and fight.


View Profile
March 10, 2012, 11:41:58 AM
 #659

Not sure if you meant this, but a Hardware Error is simple - it's when the value returned was no good
(and cgminer keeps track of that also)
So what I meant was each time a bad share was returned you simply step the clock down as small as possible.
As I've mentioned before - I've had zero HW errors since I started mining with my 2 Icarus.
That's true, but ztex does more in this area. They don't just check difficulty 1 nonces, but rather the current nonce the FPGA is working on every couple of milliseconds. That way they can judge a lot better whether an invalid was just bad luck (some single event glitch) or whether that chip is being constantly overloaded.
Well ztex may be more complex, but this simple solution would work also (if I could adjust the clock)
Still zero HW errors - 7 days Smiley
It certainly wouldn't bother me in the least if it was 1 or 2 MHz below the failure point since I would hazard a guess that the life of the FPGA would not be very good if it was constantly at borderline failure.
(from what Energizer posted the current setting is obviously further below the limit than 1 or 2)

ngzhang: is that a difficult thing to consider? (Being able to adjust the MHz clock speed with a command?)

in my plan, a similar feature maybe add to next version of bitsteam: adjust frequency full-automatic.
ngzhang (OP)
Hero Member
*****
Offline Offline

Activity: 592
Merit: 501


We will stand and fight.


View Profile
March 10, 2012, 03:34:22 PM
 #660

 Grin





i repeat, test in progress. will not answer any email until bulk orders finish (3/11-13).
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 [33] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!