BkkCoins (OP)
|
|
July 11, 2013, 08:47:07 AM |
|
Sorry if I am going on about this. May I suggest: Create a tiny interrupt handling routine that gets and stores one byte in a buffer and increment the store memory pointer and returns from the interrupt again. Mean and lean! something like: constant int dataBufferLength=32; byte interruptDataBuffer[dataBufferLength]; byte* interruptDataBufferAddress=&interruptDataBuffer; byte* lastReadBufferAddress=&interruptDataBuffer; handleInterrupt: reg a = getDataByteFromInterrupt(); get ix from interruptDataBufferAddress; store a at ix; increment ix; if (ix == &interruptDataBuffer+dataBufferLength) ix=interruptDataBufferAddress; store ix at interruptDataBufferAddress; returnFromInterrupt;
And in your main program you can then compare lastReadBufferAddress with interruptDataBufferAddress to see if byte(s) have been received. I know I am simplifying things, but interrupt handlers should be as short as possible. I also have a question: It is possible, although unlikely, that more than one chip has a result at the same time. Do you know from which chip the data is coming ? If so, this should be taken into account in the above pseudo code. If not, could it be the reason for your data issue? Good luck solving the issue at hand! I don't want to sound mean but what do you think I do in my handler? I read a byte and store it in a queue. In the past it returns asap, but that just means that it has to respond to 4 separate interrupts in a timely manner. Since the bytes arrive in a burst sequentially non-stop, it works better to interrupt on the first and loop to grab all four. Otherwise you have four cases where it needs to respond fast enough via IRQ latency instead of one. Now that I have removed all interrupt code except for the Result handler (even USB is now in polling mode) it just makes it clear that the interrupt response is not the problem. There is nothing to block the interrupt handler and it polls the FIFO for 4 bytes without any chance of being delayed. I never see FIFO Over Run errors now but it doesn't improve the long term error rate. My biggest score today was realizing that since the data is inverted I need to initialize the FIFO with 0xFF instead of 0x00 so that any missed bits are shifted as inverted zeros not ones. Once I added a reset init of xFF after every result the error rate dropped way down. But even after all this it's still running around 3-5%. When I scoped the bad result cases I saw that the data appeared shifted by 1 bit - so a single bit was being left over from one result to another, or captured sometime in the space between results (noise?). By resetting to 0xFF after each capture it primes the FIFO and ensures that one error bit doesn't create a long string of error results by staying out of sync. The case of two results happening at once is sufficiently rare that it doesn't matter. Either the ASIC has circuitry to detect busy (which I doubt given how they've gone minimal on anything related to comm.) or the collision just nukes both results. The probability of that happening is so low that it has no bearing on error rates in the > 0.1% range. If the ASICs are actually in sync due to being driven by the same clock source (which I doubt) then it is possible that one result has priority over the other and succeeds. Anyway, the probabilty right now with 4 chips is about 128/(nonce range size), or 1:8388608. (128 because the result clock appears to be hashclock/128, so 1 result takes the time of 128 hashes). I believe something else is at work here to create the errors. Either PLL noise, instability, or some error on shifting data into the ASIC. Remember that any corruption on data in is going to give error results out when the host tries to verify back to the correct original data. I've spent a lot of time looking at scope traces. The only thing I've definitely been able to detect that way so far is that sometimes a result is captured one bit out of sync, ie. it had an extra first bit that pushed all the consequent bits off. But visually that first bit doesn't look different than cases where it captured fine. There is no extra clock bit and it's not out of position. I only know it's one bit off by writing down the scoped bits pattern and comparing to the result captured. There isn't even noise, and it doesn't happen during a capture but sometime before a capture starts, or after it ends, corrupting the next one. Are my antenna like wires connecting the red board causing spurious clk edges? Writing this just gave me an idea.
|
|
|
|
LaserHorse
|
|
July 11, 2013, 09:03:17 AM |
|
Writing this just gave me an idea.
objectification == best consultant
|
|
|
|
BenTuras
|
|
July 11, 2013, 09:06:52 AM |
|
I don't want to sound mean but ... Writing this just gave me an idea.
No worries, I have a thick skin and I am glad I could be of indirect help Works for me too, explain to someone how good I solved something and then realize ...
|
|
|
|
Bicknellski
|
|
July 11, 2013, 09:38:28 AM |
|
Writing this just gave me an idea.
objectification == best consultant "Teaching often leads to the teacher learning more than the student." - Bicknellski
|
|
|
|
|
BkkCoins (OP)
|
|
July 11, 2013, 11:17:45 AM Last edit: July 11, 2013, 11:44:22 AM by BkkCoins |
|
Well, at first I thought it was going to work. I got around 100 good nonces before a bad one. But it didn't hold up. It's not worse but only a little better. Although at higher clocks like 350 it works much better than before. Seems the NOR gates condition well enough at 350 but at this speed the IRQ response becomes an issue, and IOC gets me the extra uS needed.
My idea: change the ISR to trigger on IOC for the CLK rather than the UART byte ready (RCIF). And then use a timeout to filter random single bit triggers. This gives me 7 bits extra time to read that first byte, and if noise triggers < 8 bits between results the timeout resets it. And first trials with ktest were super positive. I went all the way up to 390 with manual data and zero errors over a few dozen work units - something never seen before by me.
Alas, in cgminer it did well at first but soon over time averaged out to not much better than before. I tried playing with various timeout counts, and reset methods.
Right now at 350MHz it's 1 for 270, showing that it can do it. But when will it break down and start averaging out? I don't know if I should spend more time on this or assume that ferrite beads on the PLL supply will help a lot, and make new boards. I don't have any beads here to try. I have them on the way along with parts for building 13 more boards. I'll add it to the current changes.
I also added a "noise" count so that every time it detects a <8 bit trigger it counts it. And on that I either see many if the timeout is low, or none if the timeout is longer. So that just adds to the confusion and points towards there is no real noise bits.
Note: to others with boards, eg. TH. If you can get ferrite beads (as in updated BOM) then you may want to test with them cut into the AVDD PLL power lines on each chip). I can't say if they'll help but they could make all the difference here, and it would be nice to know for a final board.
update: 350 MHz, 7 for 522... 1.3% again: 14 for 699... 2%
|
|
|
|
LaserHorse
|
|
July 11, 2013, 11:51:04 AM |
|
Right now at 350MHz it's 1 for 270, showing that it can do it. But when will it break down and start averaging out? I don't know if I should spend more time on this or assume that ferrite beads on the PLL supply will help a lot, and make new boards. I don't have any beads here to try. I have them on the way along with parts for building 13 more boards. I'll add it to the current changes.
fwiw - If those beads were your first instinct, then they're most likely the solution. I know waiting for resources has a way of driving me down some winding sideroads … just my 2 bits - thnx for sharing the deets with us!
|
|
|
|
BkkCoins (OP)
|
|
July 11, 2013, 11:57:15 AM Last edit: July 11, 2013, 12:10:54 PM by BkkCoins |
|
I pushed my current changes up to Github so others can use them.
Note in klondike.c for varying chip counts you need to hard code the actual chip count and if one bank or two. If one bank the range size gets doubled. If two banks it shouldn't be. This isn't coded properly yet and later will be detected during init.
This new code has much better timing values for TMR0 and removes all interrupt services except for result rx. It uses polling instead for USB and TMR0. I'm not sure if this is needed but I'm trying to give the result capture as fast response as possible.
It's working better at 350 than at 300 now. Does that indicate stabilty issues with interference, resonance?
**** Also, the clock value is now same as MHz rate not double like before. So set 300 for 300 etc. and default changed to 256.
Note the Rx edge may be different if a second NOR gate is not used. Rising edge for me but with one gate falling edge, unless you add an inverter to both data and clk (good idea).
|
|
|
|
marto74
|
|
July 11, 2013, 03:23:01 PM |
|
K1 Hashing @ 300 mhz Hashing @ 350 mhz We run it for short periods 2-3 min , because there is only heatsink
|
|
|
|
BkkCoins (OP)
|
|
July 11, 2013, 04:06:47 PM |
|
K1
We run it for short periods 2-3 min , because there is only heatsink
Good to see a K1 running. What sort of HW error rate do you get? I'm exploring the power supply here. I have a 100kHz 0.5V amplitude signal in 5mS bursts every 10mS on my 1.2V core power. It's showing on 12V in from PSU, and on 3.3V. Does anyone have experience with what can cause that kind of pulsation? When looking in closely at the 100kHz it's quite substantial spikes with dampening oscillations of approx. 100 MHz, 400mV. That's pretty big on a 1.2V supply. There's also 600kHz pulses from the switching supply but they're much smaller, about 10mV. These are present whether I enable or disable the hash clock so I don't think it's the hashing doing it. I've tried turning off everything nearby including FL. lights, TV, Laptop, raspi etc. Close up of 100kHz spike made up of 100MHz oscillation.
|
|
|
|
terrahash
Member
Offline
Activity: 86
Merit: 10
|
|
July 11, 2013, 05:44:42 PM |
|
As you know we have all the 16 chips populated. This is what I changed in klondike.c Status.ChipCount = 16; // just for testing // pre-calc nonce range values BankSize = Status.ChipCount/2; //(Status.ChipCount+1)/2; Status.MaxCount = WORK_TICKS / BankSize; NonceRanges[0] = 0; for(BYTE x = 1; x < BankSize; x++) NonceRanges[x] = NonceRanges[x-1] + BankRanges[BankSize-1]; // single bank, double range size
Now all the nonces are coming back twice. If we change back the last line to NonceRanges[x] = NonceRanges[x-1] + 2*BankRanges[BankSize-1]; the nonces come back only once, but take about double the time. What's the correct setting for 16 chips?
|
|
|
|
BkkCoins (OP)
|
|
July 11, 2013, 06:18:28 PM |
|
As you know we have all the 16 chips populated. This is what I changed in klondike.c Status.ChipCount = 16; // just for testing // pre-calc nonce range values BankSize = Status.ChipCount/2; //(Status.ChipCount+1)/2; Status.MaxCount = WORK_TICKS / BankSize; NonceRanges[0] = 0; for(BYTE x = 1; x < BankSize; x++) NonceRanges[x] = NonceRanges[x-1] + BankRanges[BankSize-1]; // single bank, double range size
Now all the nonces are coming back twice. If we change back the last line to NonceRanges[x] = NonceRanges[x-1] + 2*BankRanges[BankSize-1]; the nonces come back only once, but take about double the time. What's the correct setting for 16 chips? Oh yes, I forgot to mention something else. You want BankRanges not doubled but have to uncomment line 49 in the asic.c write code. As below, // disable for single bank last_bit0 = last_bit1 = split; should be, last_bit0 = last_bit1 = split; This causes it to write the high bit 0 for bank 1, and high bit 1 for bank 2, effectively splitting the ranges over both banks. Give that a whirl. I haven't mounted chips in the second bank yet and so this will be first testing of that. edit: Also, did you notice I pushed another update a bit later today which worked better for error rates.
|
|
|
|
terrahash
Member
Offline
Activity: 86
Merit: 10
|
|
July 11, 2013, 06:53:13 PM |
|
edit: Also, did you notice I pushed another update a bit later today which worked better for error rates.
Yes I pulled the latest updates. I think the error rate has come down. Oh yes, I forgot to mention something else. You want BankRanges not doubled but have to uncomment line 49 in the asic.c write code. As below,
// disable for single bank last_bit0 = last_bit1 = split;
should be,
last_bit0 = last_bit1 = split;
This causes it to write the high bit 0 for bank 1, and high bit 1 for bank 2, effectively splitting the ranges over both banks.
Give that a whirl. I haven't mounted chips in the second bank yet and so this will be first testing of that.
I tried this. I am still getting the nonce back twice.
|
|
|
|
menace_one
Newbie
Offline
Activity: 32
Merit: 0
|
|
July 11, 2013, 07:20:53 PM |
|
@ BKK and TH: nice to see you talking at this high technical level. @All: are there any options of watercooling? user toolhead created a very nice solution for Burnins Board and Burnin will actually directly assemble (optionally) his boards with toolheads Watercooling block. so it would be very nice if I could use my pump and radiator for all my boards including my TH's K64.
|
|
|
|
BkkCoins (OP)
|
|
July 11, 2013, 07:47:30 PM |
|
I tried this. I am still getting the nonce back twice.
Probably it's running twice as long as it should, so the tick count needs to be half. ie. Status.MaxCount = WORK_TICKS / BankSize; needs to be, Status.MaxCount = WORK_TICKS / BankSize / 2; (dividing by ChipCount will work unless an odd number of chips is mounted) The time interval between two nonces should be 2^32/clk/16. eg. at 128 MHz 2^32 / 128000000 /16 = 2.09 seconds at 300 MHz, 0.895 seconds. In ktest if you send "ww" it will do 2 works and check the time between them. It now sends multi-char cmds as a sequence of cmds.
|
|
|
|
terrahash
Member
Offline
Activity: 86
Merit: 10
|
|
July 11, 2013, 07:49:15 PM |
|
After uncommenting line 49 in asic.c, I have noticed the following:
1. In ktest, the nonces are still coming back twice. 2. In cgminer, the hash rate has gone upto 2.4 GH/s (5s) and 790 MH/s (avg). The avg never went above 200 before.
I have also noticed that a lot of times there is no activity in the log except for the regular USB scanning. This period of lull always follows "Pushing work from pool 0 to hash queue". I am assuming this is because sometimes the nonce never comes back, and you probably have a certain wait period in cgminer, after which it re-sends the work data to the board.
|
|
|
|
terrahash
Member
Offline
Activity: 86
Merit: 10
|
|
July 11, 2013, 07:54:11 PM |
|
Before reducing tick count: tg@tg-DP700A3D-DM700A3D-DB701A3D-DP700A7D:~/Desktop/Github/klondike/utils$ ./ktest Klondike device opened
Version:10, ProductID:K16, Serial#:deadbeef Cmds [WAISCE.Q]:
ww
State:W, ASICs:16, Slaves:0 WorkQ:0, WorkID:01, Temp:98, Fan:0, ErrCount:0, HashCount:0, MaxCount:2048 Cmds [WAISCE.Q]:
State:W, ASICs:16, Slaves:0 WorkQ:1, WorkID:01, Temp:98, Fan:0, ErrCount:0, HashCount:1, MaxCount:2048 Cmds [WAISCE.Q]:
Nonce Found - WorkID:01, Value:749fcd72 (0.305 secs) , Nonce:b2cc9f74 GOOD Cmds [WAISCE.Q]:
Nonce Found - WorkID:01, Value:749fcd72 (1.048 secs) , Nonce:b2cc9f74 GOOD Cmds [WAISCE.Q]:
Nonce Found - WorkID:02, Value:749fcd72 (0.983 secs) , Nonce:b2cc9f74 GOOD Cmds [WAISCE.Q]:
Nonce Found - WorkID:02, Value:749fcd72 (1.048 secs) , Nonce:b2cc9f74 GOOD Cmds [WAISCE.Q]:
After reducing tick count: tg@tg-DP700A3D-DM700A3D-DB701A3D-DP700A7D:~/Desktop/Github/klondike/utils$ ./ktest Klondike device opened
Version:10, ProductID:K16, Serial#:deadbeef Cmds [WAISCE.Q]: ww
State:W, ASICs:16, Slaves:0 WorkQ:0, WorkID:01, Temp:118, Fan:0, ErrCount:0, HashCount:0, MaxCount:1024 Cmds [WAISCE.Q]:
State:W, ASICs:16, Slaves:0 WorkQ:1, WorkID:01, Temp:118, Fan:0, ErrCount:0, HashCount:1, MaxCount:1024 Cmds [WAISCE.Q]:
Nonce Found - WorkID:01, Value:749fcd72 (0.305 secs) , Nonce:b2cc9f74 GOOD Cmds [WAISCE.Q]:
Nonce Found - WorkID:02, Value:749fcd72 (1.016 secs) , Nonce:b2cc9f74 GOOD Cmds [WAISCE.Q]:
|
|
|
|
BkkCoins (OP)
|
|
July 11, 2013, 07:56:53 PM Last edit: July 11, 2013, 08:10:28 PM by BkkCoins |
|
I have also noticed that a lot of times there is no activity in the log except for the regular USB scanning. This period of lull always follows "Pushing work from pool 0 to hash queue". I am assuming this is because sometimes the nonce never comes back, and you probably have a certain wait period in cgminer, after which it re-sends the work data to the board.
No. It sends the work to the board only once. Some work has no nonces. About 37% according to kano's stats, and 37% has 1 nonce, and if I recall 17% has 2 , and less often 3 and 4 etc. Back up this thread a ways for actual numbers. Strangely I think the BFL source code only submits 1 nonce/work because it calls work_completed after the first nonce which removes the work from the queue such that further nonces won't get found. I could be missing something there but that's how I read it. edit: Kano's numbers were, 1000 results: 374/364/175/64/19/1/2/1/0/0 (thats 0/1/2/3/4/5/6/7/8/9 nonces per work unit.
|
|
|
|
BkkCoins (OP)
|
|
July 11, 2013, 08:02:57 PM Last edit: July 11, 2013, 08:13:05 PM by BkkCoins |
|
That's good then. In the first it's 0.983 + 1.048 = 2.031 seconds In the second it's just 1.016 seconds. You have to ignore the first time interval because that's the time from cmd to nonce which indicates how far into the range it is rather than the full range time. Add up the times with the same WorkID (but don't use the first Work unit since).
1.016 seconds = 264208125 ~ 264 MHz (but probably 260 because I always cut the work short rather than long). Cutting short has no down side but running long gives duplicates.
|
|
|
|
TomKeddie
|
|
July 11, 2013, 08:09:24 PM |
|
@All: are there any options of watercooling? user toolhead created a very nice solution for Burnins Board and Burnin will actually directly assemble (optionally) his boards with toolheads Watercooling block.
so it would be very nice if I could use my pump and radiator for all my boards including my TH's K64.
There is a separate thread on cooling, https://bitcointalk.org/index.php?topic=208381.0. This thread is for the electronics, firmware and drivers. BKK isn't delivering a product, he's delivering a design - is different point-of-view from Burnin.
|
|
|
|
|