Bitcoin Forum
September 16, 2024, 04:10:10 PM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 [100] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 ... 181 »
  Print  
Author Topic: Klondike - 16 chip ASIC Open Source Board - Preliminary  (Read 435356 times)
GandalfG
Sr. Member
****
Offline Offline

Activity: 259
Merit: 250


Dig your freedom


View Profile
July 04, 2013, 07:45:54 AM
 #1981

"handling an interrupt, so interrupts can be interrupted."  BKK "Don Rumsfeld" Coins  Grin



“There are interrupted interrupts; there are things we interrupt that we interrupt.
There are interrupts uninterrupted; that is to say, there are things that we now interrupt we don't interrupt.
But there are also uninterrupted uninterrupts – there are things we do not interrupt we don't interrupt.”

—Klondike Secretary of Design, BKK "Donald Rumsfeld" Coins

O lol , I laughed to tears Smiley

Want to say thanks? 16ragydppe9QFRVhrdwEUjgfMS7KCfEFGY
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 04, 2013, 04:23:23 PM
 #1982

Todays Update.

I spent all day testing and trying to find what is causing HW errors.
I also did some comparison/companion testing with the Erupter that a very generous turtle83 sent me. The Klondike and Erupter ran fine together, and the cgminer menu items seem to be fine now too, after updating to 3.3.1.

I spent a lot of time analysing share.logs and running the data through my kslog util to generate work data for ktest. What I found out was that almost all the HW errors are non-repeatable. If I take accepted data and feed it back in manually I get the same nonce out. When I feed similar data that resulted in error nonces I usually get NO nonce out at all. This seems to indicate some problem with midstate/precalc/data not getting into the ASIC correctly rather than errors caused by bad result capture. Now I checked my code several times trying to find anywhere the data gets corrupted before pushing to the ASIC and can't see it.

As the day progressed I found the error rate dropping off as well. After a run of 1.5 hours along with the Erupter I found that the Klondike had a bout a 3% error rate, and the Erupter about 1.5%. But I'd been getting a lot of Rejected shares and I wondered if that was due to the slow speed and delays in submitted shares or what. So this evening I switched from 50btc (getwork) to BTCGuild (stratum) and saw that Rejects dropped a lot, and so far HW Errors are completely gone to 0 (knock on wood).

So it could even be that some problem with generating work with GetWork is sending bad data to the Klondike (?? weird), as with stratum (local block generation) I have not been getting HW errors. I'm trying to understand how that can be. Never see USB disconnects at all now. And if HW errors drop right off with stratum, then I'll probably add another ASIC and start checking the chaining next. Right, now running at 150 MHz clock, no heat sink and it's a bit hottish, but touchable with fingers for about 5 seconds.

Or maybe error rates actually get lower as the clock rate rises because going from 128 to 150 seems to have lowered the HW errors. Hmmm. Figure that out.

Plan for tomorrow: solder down more chips.


kano
Legendary
*
Offline Offline

Activity: 4592
Merit: 1851


Linux since 1997 RedHat 4


View Profile
July 04, 2013, 04:36:08 PM
 #1983

A few things:
1) The HW: is reported in 1diff, but 3.3.1 (and earlier) report A: and R: in shares (which can be any diff - depends on what you are talking to)
Current git reports them in 1diff (i.e. the next cgminer version will only be 1diff for all of HW, A and R) - we changed that a few days ago in git.
In API devs I report both.

2) In current git I have also implemented what we call cps - on Icarus and ModMinerQuad. AMU (asic miner USB) is Icarus
For my mining the AMU at 335MH/s it gets around 1% errors (certainly less than 1.5%)
Without cps you would expect more errors

3) In my API stats I've added 2 new fields: "USB Pipe" amd "USB Delay"
If "USB Pipe" is non-zero then there are USB problems happening that could also be causing errors.
"USB Delay" shows if there are timing 'issues' occurring in the code (cps fixes these and reports them in "USB Delay")

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
cardcomm
Sr. Member
****
Offline Offline

Activity: 294
Merit: 250



View Profile
July 04, 2013, 04:46:31 PM
 #1984

Sweet!!! Now things are getting really interesting.  Smiley

Thanks again for the hard work and determination!

Easily see your cgminer status with my cgminerLCDStats app:  http://cardcomm.github.io/cgminerLCDStats/
Did my post help you or make you laugh? Let me know with Bitcoins at: 1CQfpMHQ5zVuZ5i9uxSHSSx4J8ZhehSjn3  Smiley
Bicknellski
Hero Member
*****
Offline Offline

Activity: 924
Merit: 1000



View Profile
July 04, 2013, 04:58:49 PM
 #1985

That will do BKK. That will do.



Dogie trust abuse, spam, bullying, conspiracy posts & insults to forum members. Ask the mods or admins to move Dogie's spam or off topic stalking posts to the link above.
alfabitcoin
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


View Profile
July 04, 2013, 05:23:22 PM
 #1986

So pool protocol cause a high hw errors?
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 04, 2013, 05:37:36 PM
 #1987

So pool protocol cause a high hw errors?
Makes no sense, I know. And I'm not saying it does, but when I switched to stratum the rates dropped right down. Still scratching my head. I'm just letting both Erupter and Klondike run now. Klondike currently has A:99 R:0 HW:2 - which is the best it's been yet, though not as good as the Erupter at A:293 R:0 HW:3.

k9quaint
Legendary
*
Offline Offline

Activity: 1190
Merit: 1000



View Profile
July 04, 2013, 05:46:15 PM
 #1988

Todays Update.

I spent all day testing and trying to find what is causing HW errors.
I also did some comparison/companion testing with the Erupter that a very generous turtle83 sent me. The Klondike and Erupter ran fine together, and the cgminer menu items seem to be fine now too, after updating to 3.3.1.

I spent a lot of time analysing share.logs and running the data through my kslog util to generate work data for ktest. What I found out was that almost all the HW errors are non-repeatable. If I take accepted data and feed it back in manually I get the same nonce out. When I feed similar data that resulted in error nonces I usually get NO nonce out at all. This seems to indicate some problem with midstate/precalc/data not getting into the ASIC correctly rather than errors caused by bad result capture. Now I checked my code several times trying to find anywhere the data gets corrupted before pushing to the ASIC and can't see it.

As the day progressed I found the error rate dropping off as well. After a run of 1.5 hours along with the Erupter I found that the Klondike had a bout a 3% error rate, and the Erupter about 1.5%. But I'd been getting a lot of Rejected shares and I wondered if that was due to the slow speed and delays in submitted shares or what. So this evening I switched from 50btc (getwork) to BTCGuild (stratum) and saw that Rejects dropped a lot, and so far HW Errors are completely gone to 0 (knock on wood).

So it could even be that some problem with generating work with GetWork is sending bad data to the Klondike (?? weird), as with stratum (local block generation) I have not been getting HW errors. I'm trying to understand how that can be. Never see USB disconnects at all now. And if HW errors drop right off with stratum, then I'll probably add another ASIC and start checking the chaining next. Right, now running at 150 MHz clock, no heat sink and it's a bit hottish, but touchable with fingers for about 5 seconds.

Or maybe error rates actually get lower as the clock rate rises because going from 128 to 150 seems to have lowered the HW errors. Hmmm. Figure that out.

Plan for tomorrow: solder down more chips.

Awesome sauce. The getwork vs stratum is puzzling.

Bitcoin is backed by the full faith and credit of YouTube comments.
alfabitcoin
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


View Profile
July 04, 2013, 05:52:35 PM
 #1989

So pool protocol cause a high hw errors?
Makes no sense, I know. And I'm not saying it does, but when I switched to stratum the rates dropped right down. Still scratching my head. I'm just letting both Erupter and Klondike run now. Klondike currently has A:99 R:0 HW:2 - which is the best it's been yet, though not as good as the Erupter at A:293 R:0 HW:3.
Well, it make sense and it does not. You have designed k16 from scratch, you dont have asic comm protocol source, you dont know Erupter comm protocol either. So something there are causing the problem. Maybe avolon will release com protocol soon to be sure.
fasmax
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250


View Profile
July 04, 2013, 05:52:59 PM
 #1990

Concerning the 128 MHZ vs 150 MHZ issue maybe the internal PLL has stability problems at different  frequency's.  
cardcomm
Sr. Member
****
Offline Offline

Activity: 294
Merit: 250



View Profile
July 04, 2013, 06:34:52 PM
 #1991


Isn't the GetWork protocol deprecated anyway? Not that it shouldn't work, but I thought stratum was the preferred protocol.

Easily see your cgminer status with my cgminerLCDStats app:  http://cardcomm.github.io/cgminerLCDStats/
Did my post help you or make you laugh? Let me know with Bitcoins at: 1CQfpMHQ5zVuZ5i9uxSHSSx4J8ZhehSjn3  Smiley
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 04, 2013, 06:38:38 PM
 #1992

Concerning the 128 MHZ vs 150 MHZ issue maybe the internal PLL has stability problems at different  frequency's.  
I hadn't thought about that, but it's possible and perhaps it's tuned for higher frequencies then I'm currently using. We'll see pretty soon. After I get a few more chips mounted I'll add a heat sink and bump up the clock. I think my plan is to add one more on the same bank, and then after that two more on the opposite bank.



Isn't the GetWork protocol deprecated anyway? Not that it shouldn't work, but I thought stratum was the preferred protocol.
I haven't been following that but I'm sure stratum is preferred. And if it works that much better, for whatever reasons, then I'm not going to worry much about getwork.

****
I pushed new updates to github earlier with some small tweaks.

The firmware now takes clock cfg values from 256 up to 900. They are double-the-mhz rate so that's 128 - 450 MHz (not that you can run at 450 but the PLL on the ASIC accepts values that high). The code now detects when <500 and sets the half-clock bit when below. It also excludes 451-499 (ie. 225-249 MHz) by forcing to 450 since the PLL doesn't support that range.

cp1
Hero Member
*****
Offline Offline

Activity: 616
Merit: 500


Stop using branwallets


View Profile
July 04, 2013, 06:43:01 PM
 #1993

What's the input clock for the avalon running at?

Guide to armory offline install on USB key:  https://bitcointalk.org/index.php?topic=241730.0
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 04, 2013, 06:46:31 PM
 #1994

What's the input clock for the avalon running at?
32 MHz

There are 2 PLL control values, R and N. By setting R=32 you get N = 2x MHz rate, which is what I expose as the clk cfg value. Documented range is 500 - 900. But a "half rate" bit allows dividing that by 2. So for N < 500 I set that bit and use 2N for the control value. I don't allow a cfg value below 256 even though the PLL allows down to 250.

siran
Newbie
*
Offline Offline

Activity: 18
Merit: 0


View Profile
July 04, 2013, 06:50:56 PM
 #1995

I hadn't thought about that, but it's possible and perhaps it's tuned for higher frequencies then I'm currently using. We'll see pretty soon. After I get a few more chips mounted I'll add a heat sink and bump up the clock. I think my plan is to add one more on the same bank, and then after that two more on the opposite bank.

About that heatsink. Isn't it true, that avalon chips must be cooled from below? I mean you cannot put heatsink on top of the chip, but below the PCB with silicone thermal pad. It's just like block erupter is cooled.
BkkCoins (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
July 04, 2013, 06:52:35 PM
 #1996

I hadn't thought about that, but it's possible and perhaps it's tuned for higher frequencies then I'm currently using. We'll see pretty soon. After I get a few more chips mounted I'll add a heat sink and bump up the clock. I think my plan is to add one more on the same bank, and then after that two more on the opposite bank.

About that heatsink. Isn't it true, that avalon chips must be cooled from below? I mean you cannot put heatsink on top of the chip, but below the PCB with silicone thermal pad. It's just like block erupter is cooled.
Yes, that's right. The heat sink is mounted under the board. There are 1cm x 1cm exposed pads with thermal vias to help dissipation to the heat sink. Or, rather, the chips are mounted on bottom and heat sink on top - so the board is upside down...

Bicknellski
Hero Member
*****
Offline Offline

Activity: 924
Merit: 1000



View Profile
July 04, 2013, 07:27:56 PM
 #1997

Concerning the 128 MHZ vs 150 MHZ issue maybe the internal PLL has stability problems at different  frequency's.  
I hadn't thought about that, but it's possible and perhaps it's tuned for higher frequencies then I'm currently using. We'll see pretty soon. After I get a few more chips mounted I'll add a heat sink and bump up the clock. I think my plan is to add one more on the same bank, and then after that two more on the opposite bank.



Isn't the GetWork protocol deprecated anyway? Not that it shouldn't work, but I thought stratum was the preferred protocol.
I haven't been following that but I'm sure stratum is preferred. And if it works that much better, for whatever reasons, then I'm not going to worry much about getwork.

****
I pushed new updates to github earlier with some small tweaks.

The firmware now takes clock cfg values from 256 up to 900. They are double-the-mhz rate so that's 128 - 450 MHz (not that you can run at 450 but the PLL on the ASIC accepts values that high). The code now detects when <500 and sets the half-clock bit when below. It also excludes 451-499 (ie. 225-249 MHz) by forcing to 450 since the PLL doesn't support that range.


Liquid cooling... I wanna see 450.

Dogie trust abuse, spam, bullying, conspiracy posts & insults to forum members. Ask the mods or admins to move Dogie's spam or off topic stalking posts to the link above.
Igor_Rast
Newbie
*
Offline Offline

Activity: 40
Merit: 0


View Profile
July 04, 2013, 08:57:21 PM
 #1998

Concerning the 128 MHZ vs 150 MHZ issue maybe the internal PLL has stability problems at different  frequency's.  
I hadn't thought about that, but it's possible and perhaps it's tuned for higher frequencies then I'm currently using. We'll see pretty soon. After I get a few more chips mounted I'll add a heat sink and bump up the clock. I think my plan is to add one more on the same bank, and then after that two more on the opposite bank.



Isn't the GetWork protocol deprecated anyway? Not that it shouldn't work, but I thought stratum was the preferred protocol.
I haven't been following that but I'm sure stratum is preferred. And if it works that much better, for whatever reasons, then I'm not going to worry much about getwork.

****
I pushed new updates to github earlier with some small tweaks.

The firmware now takes clock cfg values from 256 up to 900. They are double-the-mhz rate so that's 128 - 450 MHz (not that you can run at 450 but the PLL on the ASIC accepts values that high). The code now detects when <500 and sets the half-clock bit when below. It also excludes 451-499 (ie. 225-249 MHz) by forcing to 450 since the PLL doesn't support that range.


Liquid cooling... I wanna see 450.

Dunk it in mineral Oil  Tongue
babcoccl
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
July 04, 2013, 09:51:20 PM
 #1999

Concerning the 128 MHZ vs 150 MHZ issue maybe the internal PLL has stability problems at different  frequency's.  
I hadn't thought about that, but it's possible and perhaps it's tuned for higher frequencies then I'm currently using. We'll see pretty soon. After I get a few more chips mounted I'll add a heat sink and bump up the clock. I think my plan is to add one more on the same bank, and then after that two more on the opposite bank.



Isn't the GetWork protocol deprecated anyway? Not that it shouldn't work, but I thought stratum was the preferred protocol.
I haven't been following that but I'm sure stratum is preferred. And if it works that much better, for whatever reasons, then I'm not going to worry much about getwork.

****
I pushed new updates to github earlier with some small tweaks.

The firmware now takes clock cfg values from 256 up to 900. They are double-the-mhz rate so that's 128 - 450 MHz (not that you can run at 450 but the PLL on the ASIC accepts values that high). The code now detects when <500 and sets the half-clock bit when below. It also excludes 451-499 (ie. 225-249 MHz) by forcing to 450 since the PLL doesn't support that range.


In my RL job, I previously worked on a project where a PLL was throwing our whole system out of whack. The problem was that it would lock about 50 percent of the time so we would get intermittent valid data with occasional garbage. After thoroughly tracing out various components we observed that there was an unusual amount of noise getting into the PLL thereby causing it to lose it's lock occasionally. This was compounded by there being varying degrees of noise for various frequencies. Once we filtered these out we were able to maintain a continuous lock and produce clean data.

PLL might be a good place to start looking. Just make sure your PLL maintains a good lock.
Taugeran
Hero Member
*****
Offline Offline

Activity: 658
Merit: 500


CCNA: There i fixed the internet.


View Profile
July 05, 2013, 12:57:09 AM
 #2000

I dont remember if anyone has asked this prior. Ive been silently watching in the background...

Anyway for the PIC firmware, i remember you stating that it subdivides the nonce range by n chips and pushes those ranges to the chips.

How difficult/possible would it be to rework the FW to do 1 job per chip?

This is just out of curiosity since i put in an order for 5 chips in a group buy + board ( once finalized [TY, T13Hydra]).

-Taugeran

Bitfury HW & Habañero : 1.625Th/s
tips/Donations: 1NoS89H3Mr6U5CmP4VwWzU2318JEMxHL1
Come join Coinbase
Pages: « 1 ... 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 [100] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 ... 181 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!