Bitcoin Forum
September 16, 2024, 03:09:57 PM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 [112] 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 ... 181 »
  Print  
Author Topic: Klondike - 16 chip ASIC Open Source Board - Preliminary  (Read 435356 times)
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 08:47:07 AM
 #2221

Sorry if I am going on about this. May I suggest:
Create a tiny interrupt handling routine that gets and stores one byte in a buffer and increment the store memory pointer and returns from the interrupt again. Mean and lean!
something like:
Code:
constant int dataBufferLength=32;
byte interruptDataBuffer[dataBufferLength];
byte* interruptDataBufferAddress=&interruptDataBuffer;
byte* lastReadBufferAddress=&interruptDataBuffer;
handleInterrupt:
    reg a = getDataByteFromInterrupt();
    get ix from interruptDataBufferAddress;
    store a at ix;
    increment ix;
    if (ix == &interruptDataBuffer+dataBufferLength) ix=interruptDataBufferAddress;
    store ix at interruptDataBufferAddress;
    returnFromInterrupt;

And in your main program you can then compare lastReadBufferAddress with interruptDataBufferAddress to see if byte(s) have been received.

I know I am simplifying things, but interrupt handlers should be as short as possible.

I also have a question:
It is possible, although unlikely, that more than one chip has a result at the same time. Do you know from which chip the data is coming ? If so, this should be taken into account in the above pseudo code. If not, could it be the reason for your data issue?

Good luck solving the issue at hand!
I don't want to sound mean but what do you think I do in my handler? I read a byte and store it in a queue. In the past it returns asap, but that just means that it has to respond to 4 separate interrupts in a timely manner. Since the bytes arrive in a burst sequentially non-stop, it works better to interrupt on the first and loop to grab all four. Otherwise you have four cases where it needs to respond fast enough via IRQ latency instead of one. Now that I have removed all interrupt code except for the Result handler (even USB is now in polling mode) it just makes it clear that the interrupt response is not the problem. There is nothing to block the interrupt handler and it polls the FIFO for 4 bytes without any chance of being delayed. I never see FIFO Over Run errors now but it doesn't improve the long term error rate.

My biggest score today was realizing that since the data is inverted I need to initialize the FIFO with 0xFF instead of 0x00 so that any missed bits are shifted as inverted zeros not ones. Once I added a reset init of xFF after every result the error rate dropped way down. But even after all this it's still running around 3-5%. When I scoped the bad result cases I saw that the data appeared shifted by 1 bit - so a single bit was being left over from one result to another, or captured sometime in the space between results (noise?). By resetting to 0xFF after each capture it primes the FIFO and ensures that one error bit doesn't create a long string of error results by staying out of sync.

The case of two results happening at once is sufficiently rare that it doesn't matter. Either the ASIC has circuitry to detect busy (which I doubt given how they've gone minimal on anything related to comm.) or the collision just nukes both results. The probability of that happening is so low that it has no bearing on error rates in the > 0.1% range. If the ASICs are actually in sync due to being driven by the same clock source (which I doubt) then it is possible that one result has priority over the other and succeeds. Anyway, the probabilty right now with 4 chips is about 128/(nonce range size), or 1:8388608. (128 because the result clock appears to be hashclock/128, so 1 result takes the time of 128 hashes).

I believe something else is at work here to create the errors. Either PLL noise, instability, or some error on shifting data into the ASIC. Remember that any corruption on data in is going to give error results out when the host tries to verify back to the correct original data. I've spent a lot of time looking at scope traces. The only thing I've definitely been able to detect that way so far is that sometimes a result is captured one bit out of sync, ie. it had an extra first bit that pushed all the consequent bits off. But visually that first bit doesn't look different than cases where it captured fine. There is no extra clock bit and it's not out of position. I only know it's one bit off by writing down the scoped bits pattern and comparing to the result captured. There isn't even noise, and it doesn't happen during a capture but sometime before a capture starts, or after it ends, corrupting the next one. Are my antenna like wires connecting the red board causing spurious clk edges?

Writing this just gave me an idea.

LaserHorse
Full Member
***
Offline Offline

Activity: 140
Merit: 100



View Profile
July 11, 2013, 09:03:17 AM
 #2222

Writing this just gave me an idea.

objectification == best consultant

PiMiner - control & monitor your miners with Raspberry Pi   •   BTC: 1AV5JekeEVET5u2jTsLDMRsTtagrBnNTBR
BenTuras
Hero Member
*****
Offline Offline

Activity: 826
Merit: 1001



View Profile
July 11, 2013, 09:06:52 AM
 #2223

I don't want to sound mean but ...
Writing this just gave me an idea.
No worries, I have a thick skin and I am glad I could be of indirect help Wink
Works for me too, explain to someone how good I solved something and then realize ...

I am selling in stock OneStringMiner boards, based on the Bitfury chips. Have a look here: https://bitcointalk.org/index.php?topic=495536.0
Bicknellski
Hero Member
*****
Offline Offline

Activity: 924
Merit: 1000



View Profile
July 11, 2013, 09:38:28 AM
 #2224

Writing this just gave me an idea.

objectification == best consultant

"Teaching often leads to the teacher learning more than the student." - Bicknellski

Dogie trust abuse, spam, bullying, conspiracy posts & insults to forum members. Ask the mods or admins to move Dogie's spam or off topic stalking posts to the link above.
turtle83
Sr. Member
****
Offline Offline

Activity: 322
Merit: 250


Supersonic


View Profile WWW
July 11, 2013, 10:45:33 AM
 #2225

Writing this just gave me an idea.

objectification == best consultant

"Teaching often leads to the teacher learning more than the student." - Bicknellski

http://www.codinghorror.com/blog/2012/03/rubber-duck-problem-solving.html

BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 11:17:45 AM
Last edit: July 11, 2013, 11:44:22 AM by BkkCoins
 #2226

Well, at first I thought it was going to work. I got around 100 good nonces before a bad one. But it didn't hold up. It's not worse but only a little better. Although at higher clocks like 350 it works much better than before. Seems the NOR gates condition well enough at 350 but at this speed the IRQ response becomes an issue, and IOC gets me the extra uS needed.

My idea: change the ISR to trigger on IOC for the CLK rather than the UART byte ready (RCIF). And then use a timeout to filter random single bit triggers. This gives me 7 bits extra time to read that first byte, and if noise triggers < 8 bits between results the timeout resets it. And first trials with ktest were super positive. I went all the way up to 390 with manual data and zero errors over a few dozen work units - something never seen before by me.

Alas, in cgminer it did well at first but soon over time averaged out to not much better than before. I tried playing with various timeout counts, and reset methods.

Right now at 350MHz it's 1 for 270, showing that it can do it. But when will it break down and start averaging out? I don't know if I should spend more time on this or assume that ferrite beads on the PLL supply will help a lot, and make new boards. I don't have any beads here to try. I have them on the way along with parts for building 13 more boards. I'll add it to the current changes.

I also added a "noise" count so that every time it detects a <8 bit trigger it counts it. And on that I either see many if the timeout is low, or none if the timeout is longer. So that just adds to the confusion and points towards there is no real noise bits.

Note: to others with boards, eg. TH. If you can get ferrite beads (as in updated BOM) then you may want to test with them cut into the AVDD PLL power lines on each chip). I can't say if they'll help but they could make all the difference here, and it would be nice to know for a final board.

update: 350 MHz, 7 for 522... 1.3%
again: 14 for 699... 2%

LaserHorse
Full Member
***
Offline Offline

Activity: 140
Merit: 100



View Profile
July 11, 2013, 11:51:04 AM
 #2227

Right now at 350MHz it's 1 for 270, showing that it can do it. But when will it break down and start averaging out? I don't know if I should spend more time on this or assume that ferrite beads on the PLL supply will help a lot, and make new boards. I don't have any beads here to try. I have them on the way along with parts for building 13 more boards. I'll add it to the current changes.

fwiw - If those beads were your first instinct, then they're most likely the solution. 
I know waiting for resources has a way of driving me down some winding sideroads …

just my 2 bits - thnx for sharing the deets with us!

PiMiner - control & monitor your miners with Raspberry Pi   •   BTC: 1AV5JekeEVET5u2jTsLDMRsTtagrBnNTBR
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 11:57:15 AM
Last edit: July 11, 2013, 12:10:54 PM by BkkCoins
 #2228

I pushed my current changes up to Github so others can use them.

Note in klondike.c for varying chip counts you need to hard code the actual chip count and if one bank or two. If one bank the range size gets doubled. If two banks it shouldn't be. This isn't coded properly yet and later will be detected during init.

This new code has much better timing values for TMR0 and removes all interrupt services except for result rx. It uses polling instead for USB and TMR0. I'm not sure if this is needed but I'm trying to give the result capture as fast response as possible.

It's working better at 350 than at 300 now. Does that indicate stabilty issues with interference, resonance?

****
Also, the clock value is now same as MHz rate not double like before. So set 300 for 300 etc. and default changed to 256.

Note the Rx edge may be different if a second NOR gate is not used. Rising edge for me but with one gate falling edge, unless you add an inverter to both data and clk (good idea).

marto74
Hero Member
*****
Offline Offline

Activity: 728
Merit: 500



View Profile WWW
July 11, 2013, 03:23:01 PM
 #2229

K1

Hashing @ 300 mhz


Hashing @ 350 mhz


We run it for short periods 2-3 min , because there is only heatsink

http://technobit.eu
tips : 12DNdacCtUZ99qcP74FwchaCPzeDL9Voff
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 04:06:47 PM
 #2230

K1

We run it for short periods 2-3 min , because there is only heatsink
Good to see a K1 running. What sort of HW error rate do you get?

I'm exploring the power supply here.
I have a 100kHz 0.5V amplitude signal in 5mS bursts every 10mS on my 1.2V core power. It's showing on 12V in from PSU, and on 3.3V. Does anyone have experience with what can cause that kind of pulsation? When looking in closely at the 100kHz it's quite substantial spikes with dampening oscillations of approx. 100 MHz, 400mV. That's pretty big on a 1.2V supply.

There's also 600kHz pulses from the switching supply but they're much smaller, about 10mV.

These are present whether I enable or disable the hash clock so I don't think it's the hashing doing it. I've tried turning off everything nearby including FL. lights, TV, Laptop, raspi etc.



Close up of 100kHz spike made up of 100MHz oscillation.


terrahash
Member
**
Offline Offline

Activity: 86
Merit: 10


View Profile
July 11, 2013, 05:44:42 PM
 #2231

As you know we have all the 16 chips populated. This is what I changed in klondike.c

Code:
    Status.ChipCount = 16; // just for testing
   
    // pre-calc nonce range values
    BankSize = Status.ChipCount/2; //(Status.ChipCount+1)/2;
    Status.MaxCount = WORK_TICKS / BankSize;
    NonceRanges[0] = 0;
    for(BYTE x = 1; x < BankSize; x++)
        NonceRanges[x] = NonceRanges[x-1] + BankRanges[BankSize-1];  // single bank, double range size

Now all the nonces are coming back twice. If we change back the last line to
Code:
NonceRanges[x] = NonceRanges[x-1] + 2*BankRanges[BankSize-1];
the nonces come back only once, but take about double the time. What's the correct setting for 16 chips?
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 06:18:28 PM
 #2232

As you know we have all the 16 chips populated. This is what I changed in klondike.c

Code:
   Status.ChipCount = 16; // just for testing
    
    // pre-calc nonce range values
    BankSize = Status.ChipCount/2; //(Status.ChipCount+1)/2;
    Status.MaxCount = WORK_TICKS / BankSize;
    NonceRanges[0] = 0;
    for(BYTE x = 1; x < BankSize; x++)
        NonceRanges[x] = NonceRanges[x-1] + BankRanges[BankSize-1];  // single bank, double range size

Now all the nonces are coming back twice. If we change back the last line to
Code:
NonceRanges[x] = NonceRanges[x-1] + 2*BankRanges[BankSize-1];
the nonces come back only once, but take about double the time. What's the correct setting for 16 chips?
Oh yes, I forgot to mention something else. You want BankRanges not doubled but have to uncomment line 49 in the asic.c write code. As below,

    // disable for single bank last_bit0 = last_bit1 = split;

should be,

    last_bit0 = last_bit1 = split;

This causes it to write the high bit 0 for bank 1, and high bit 1 for bank 2, effectively splitting the ranges over both banks.

Give that a whirl. I haven't mounted chips in the second bank yet and so this will be first testing of that.

edit: Also, did you notice I pushed another update a bit later today which worked better for error rates.

terrahash
Member
**
Offline Offline

Activity: 86
Merit: 10


View Profile
July 11, 2013, 06:53:13 PM
 #2233

edit: Also, did you notice I pushed another update a bit later today which worked better for error rates.

Yes I pulled the latest updates. I think the error rate has come down.

Oh yes, I forgot to mention something else. You want BankRanges not doubled but have to uncomment line 49 in the asic.c write code. As below,

    // disable for single bank last_bit0 = last_bit1 = split;

should be,

    last_bit0 = last_bit1 = split;

This causes it to write the high bit 0 for bank 1, and high bit 1 for bank 2, effectively splitting the ranges over both banks.

Give that a whirl. I haven't mounted chips in the second bank yet and so this will be first testing of that.

I tried this. I am still getting the nonce back twice.
menace_one
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile
July 11, 2013, 07:20:53 PM
 #2234

@ BKK and TH: nice to see you talking at this high technical level.   Grin

@All: are there any options of watercooling?
user toolhead created a very nice solution for Burnins Board and Burnin will actually directly assemble (optionally) his boards with toolheads Watercooling block.

so it would be very nice if I could use my pump and radiator for all my boards including my TH's K64.

BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 07:47:30 PM
 #2235

I tried this. I am still getting the nonce back twice.
Probably it's running twice as long as it should, so the tick count needs to be half.
ie.

Status.MaxCount = WORK_TICKS / BankSize;

needs to be,

Status.MaxCount = WORK_TICKS / BankSize / 2;

(dividing by ChipCount will work unless an odd number of chips is mounted)

The time interval between two nonces should be 2^32/clk/16.
eg. at 128 MHz
2^32 / 128000000 /16 = 2.09 seconds

at 300 MHz, 0.895 seconds.

In ktest if you send "ww" it will do 2 works and check the time between them.
It now sends multi-char cmds as a sequence of cmds.

terrahash
Member
**
Offline Offline

Activity: 86
Merit: 10


View Profile
July 11, 2013, 07:49:15 PM
 #2236

After uncommenting line 49 in asic.c, I have noticed the following:

1. In ktest, the nonces are still coming back twice.
2. In cgminer, the hash rate has gone upto 2.4 GH/s (5s) and 790 MH/s (avg). The avg never went above 200 before.

I have also noticed that a lot of times there is no activity in the log except for the regular USB scanning. This period of lull always follows "Pushing work from pool 0 to hash queue". I am assuming this is because sometimes the nonce never comes back, and you probably have a certain wait period in cgminer, after which it re-sends the work data to the board.
terrahash
Member
**
Offline Offline

Activity: 86
Merit: 10


View Profile
July 11, 2013, 07:54:11 PM
 #2237

Before reducing tick count:

Code:
tg@tg-DP700A3D-DM700A3D-DB701A3D-DP700A7D:~/Desktop/Github/klondike/utils$ ./ktest
Klondike device opened

Version:10, ProductID:K16, Serial#:deadbeef
Cmds [WAISCE.Q]:

ww

State:W, ASICs:16, Slaves:0
WorkQ:0, WorkID:01, Temp:98, Fan:0, ErrCount:0, HashCount:0, MaxCount:2048
Cmds [WAISCE.Q]:

State:W, ASICs:16, Slaves:0
WorkQ:1, WorkID:01, Temp:98, Fan:0, ErrCount:0, HashCount:1, MaxCount:2048
Cmds [WAISCE.Q]:

Nonce Found - WorkID:01, Value:749fcd72 (0.305 secs) , Nonce:b2cc9f74 GOOD
Cmds [WAISCE.Q]:

Nonce Found - WorkID:01, Value:749fcd72 (1.048 secs) , Nonce:b2cc9f74 GOOD
Cmds [WAISCE.Q]:

Nonce Found - WorkID:02, Value:749fcd72 (0.983 secs) , Nonce:b2cc9f74 GOOD
Cmds [WAISCE.Q]:

Nonce Found - WorkID:02, Value:749fcd72 (1.048 secs) , Nonce:b2cc9f74 GOOD
Cmds [WAISCE.Q]:

After reducing tick count:

Code:
tg@tg-DP700A3D-DM700A3D-DB701A3D-DP700A7D:~/Desktop/Github/klondike/utils$ ./ktest
Klondike device opened

Version:10, ProductID:K16, Serial#:deadbeef
Cmds [WAISCE.Q]:
ww

State:W, ASICs:16, Slaves:0
WorkQ:0, WorkID:01, Temp:118, Fan:0, ErrCount:0, HashCount:0, MaxCount:1024
Cmds [WAISCE.Q]:

State:W, ASICs:16, Slaves:0
WorkQ:1, WorkID:01, Temp:118, Fan:0, ErrCount:0, HashCount:1, MaxCount:1024
Cmds [WAISCE.Q]:

Nonce Found - WorkID:01, Value:749fcd72 (0.305 secs) , Nonce:b2cc9f74 GOOD
Cmds [WAISCE.Q]:

Nonce Found - WorkID:02, Value:749fcd72 (1.016 secs) , Nonce:b2cc9f74 GOOD
Cmds [WAISCE.Q]:
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 07:56:53 PM
Last edit: July 11, 2013, 08:10:28 PM by BkkCoins
 #2238

I have also noticed that a lot of times there is no activity in the log except for the regular USB scanning. This period of lull always follows "Pushing work from pool 0 to hash queue". I am assuming this is because sometimes the nonce never comes back, and you probably have a certain wait period in cgminer, after which it re-sends the work data to the board.
No. It sends the work to the board only once. Some work has no nonces. About 37% according to kano's stats, and 37% has 1 nonce, and if I recall 17% has 2 , and less often 3 and 4 etc. Back up this thread a ways for actual numbers.

Strangely I think the BFL source code only submits 1 nonce/work because it calls work_completed after the first nonce which removes the work from the queue such that further nonces won't get found. I could be missing something there but that's how I read it.

edit: Kano's numbers were,

1000 results:
374/364/175/64/19/1/2/1/0/0

(thats 0/1/2/3/4/5/6/7/8/9 nonces per work unit.

BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 11, 2013, 08:02:57 PM
Last edit: July 11, 2013, 08:13:05 PM by BkkCoins
 #2239

That's good then.
In the first it's 0.983 + 1.048 = 2.031 seconds
In the second it's just 1.016 seconds.
You have to ignore the first time interval because that's the time from cmd to nonce which  indicates how far into the range it is rather than the full range time.
Add up the times with the same WorkID (but don't use the first Work unit since).

1.016 seconds = 264208125 ~ 264 MHz (but probably 260 because I always cut the work short rather than long). Cutting short has no down side but running long gives duplicates.

TomKeddie
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
July 11, 2013, 08:09:24 PM
 #2240

@All: are there any options of watercooling?
user toolhead created a very nice solution for Burnins Board and Burnin will actually directly assemble (optionally) his boards with toolheads Watercooling block.

so it would be very nice if I could use my pump and radiator for all my boards including my TH's K64.

There is a separate thread on cooling, https://bitcointalk.org/index.php?topic=208381.0.

This thread is for the electronics, firmware and drivers.  BKK isn't delivering a product, he's delivering a design - is different point-of-view from Burnin.
Pages: « 1 ... 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 [112] 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 ... 181 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!