Bitcoin Forum
May 22, 2024, 07:37:05 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 [80] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 ... 150 »
  Print  
Author Topic: [ANN] TeamRedMiner v0.10.10 - Ironfish/Kaspa/ZIL/Kawpow/Etchash and More  (Read 211467 times)
kerney666
Member
**
Offline Offline

Activity: 658
Merit: 86


View Profile
June 11, 2019, 02:43:37 PM
 #1581

2 kerney666

Hi, my rig can't run longer then 1-2 hours on v 0.4.5 and newer. One GPU hangs (this is from 0.5.1 after autoconfig):

[2019-06-10 21:22:55] GPU 0 [57C, fan 87%] cnr: 2.486kh/s, avg 2.477kh/s, pool 2.709kh/s a:46 r:0 hw:0
[2019-06-10 21:22:55] GPU 1 [58C, fan 85%] cnr: 2.490kh/s, avg 2.482kh/s, pool 1.261kh/s a:22 r:0 hw:0
[2019-06-10 21:22:55] GPU 2 [56C, fan 86%] cnr: 2.486kh/s, avg 2.478kh/s, pool 2.969kh/s a:51 r:0 hw:1
[2019-06-10 21:22:55] GPU 3 [63C, fan 87%] cnr: 2.480kh/s, avg 2.468kh/s, pool 2.377kh/s a:42 r:0 hw:0
[2019-06-10 21:22:55] GPU 4 [45C, fan 90%] cnr: 2.483kh/s, avg 2.471kh/s, pool 2.737kh/s a:47 r:0 hw:3
[2019-06-10 21:22:55] Total                cnr: 12.42kh/s, avg 12.38kh/s, pool 12.05kh/s a:208 r:0 hw:4
[2019-06-10 21:23:05] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

but v0.4.4 with exactly same config can run for weeks:

[2019-06-10 19:40:27] Stats Uptime: 13 days, 12:58:15
[2019-06-10 19:40:27] GPU 0 [59C, fan 87%] cnr: 2.471kh/s, avg 2.470kh/s, pool 2.340kh/s a:7780 r:0 hw:17
[2019-06-10 19:40:27] GPU 1 [60C, fan 85%] cnr: 2.474kh/s, avg 2.473kh/s, pool 2.423kh/s a:8057 r:0 hw:39
[2019-06-10 19:40:27] GPU 2 [57C, fan 86%] cnr: 2.471kh/s, avg 2.471kh/s, pool 2.337kh/s a:7768 r:0 hw:106
[2019-06-10 19:40:27] GPU 3 [63C, fan 87%] cnr: 2.464kh/s, avg 2.464kh/s, pool 2.381kh/s a:7917 r:0 hw:73
[2019-06-10 19:40:27] GPU 4 [55C, fan 88%] cnr: 2.470kh/s, avg 2.468kh/s, pool 2.259kh/s a:7519 r:0 hw:321
[2019-06-10 19:40:27] Total                cnr: 12.35kh/s, avg 12.35kh/s, pool 11.74kh/s a:39041 r:0 hw:556
[2019-06-10 19:40:39] Pool pool.supportxmr.com received new job. (job_id: +kOSIEF95a5dlkxX6slHR0EW+l34)

I know, I'm pushing hard on limit, but what changed in TR miner 0.4.5 that causes this instability?  With 0.4.4 and older was this rig rock stable. OS is Linux & amd18.3 drivers.

Thank you for answer,

Migo

Hi!

Man, it's such a hard question to answer. The changes between 0.4.4 and 0.4.5 are really tiny, and nothing that "should" affect anything in terms of stability. For cn/r, absolutely nothing of interest was touched in the kernels, and not anything specific in the host-side code either. For every release, we get a few people telling us how stable things are with the new version, then a others that (like you) unfortunately have a harder time keeping things running smoothly.

Since you're running linux, do you see anything interesting in your "dmesg" output from the kernel when a crash occurs?

-- K


lupaarSen
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
June 11, 2019, 02:45:55 PM
Last edit: June 11, 2019, 02:59:03 PM by lupaarSen
 #1582

I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S

Rakly3
Newbie
*
Offline Offline

Activity: 45
Merit: 0


View Profile
June 11, 2019, 03:32:33 PM
Last edit: June 11, 2019, 03:57:45 PM by Rakly3
 #1583

Thank you for the link! I can't believe it never came up before.
My pleasure Smiley


I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S


Man... you all use so much power on your vegas.
I have 1950h/s - 1990h/s with both the core and mem at 880mV (stock timings still, I don't have them that long yet. Still learning Vega Smiley ))
The core powerdraw is avg 117W, with dips to 95W and spikes to 130W (Eth & CNr)
The mem i can't get an accurate reading, not even with HWinfo. It is stuck at 1.2V without budging once, but changing the mV does have impact on stability and hashrate, so I'm sure it's not 1.2V)

I ran several autotunes and used those configs
ilovetrm

As a sidenote. the list devices batfile recognizes all my 4x0 and 5x0 as 580's.
Dunno if that's normal?
And with autotune, my 4gb cards are faster than my 8gb cards.
It's a mishmash of brands and memory brands tho.(no vega's in these rigs just fyi) (the MSI armor 470 8Gb Micron is just abysmal. Barely breaking 700h/s)
kerney666
Member
**
Offline Offline

Activity: 658
Merit: 86


View Profile
June 11, 2019, 03:39:52 PM
 #1584

I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S


Wow, I think you're the first one to report a clearly degraded hashrate going from 0.4.x to 0.5.0! So, am I understanding you correctly in that you had 2100 h/s with the same clocks and timings with TRM 0.4.x, but running 0.5.0 or 0.5.1 only gives you 1950 h/s?

Do you know the CN config you used for 2100 h/s in previous versions? Also, are you 100% certain the mem timings did stick? Your hashrates are very close to what my Gigabyte V56 Hynix gets at stock timings vs modded timings.
lupaarSen
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
June 11, 2019, 04:01:37 PM
 #1585

Thank you for the link! I can't believe it never came up before.
My pleasure Smiley


I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S


Man... you all use so much power on your vegas.
I have 1950h/s - 1990h/s with both the core and mem at 880mV (stock timings still, I don't have them that long yet. Still learning Vega Smiley ))
The core powerdraw is avg 117W, with dips to 95W and spikes to 130W (Eth & CNr)
The mem i can't get an accurate reading, not even with HWinfo. It is stuck at 1.2V without budging once, but changing the mV does have impact on stability and hashrate, so I'm sure it's not 1.2V)

I ran several autotunes and used those configs
ilovetrm

As a sidenote. the list devices batfile recognizes all my 4x0 and 5x0 as 580's.
Dunno if that's normal?
And with autotune, my 4gb cards are faster than my 8gb cards.
It's a mishmash of brands and memory brands tho.(no vega's in these rigs just fyi) (the MSI armor 470 8Gb Micron is just abysmal. Barely breaking 700h/s)

https://ibb.co/vc2qXPj

Not changing the fact i get bewteen 100-200H/s more...
Rakly3
Newbie
*
Offline Offline

Activity: 45
Merit: 0


View Profile
June 11, 2019, 04:05:46 PM
Last edit: June 11, 2019, 04:22:12 PM by Rakly3
 #1586

I managed to solve today "one card crashing issue" on windows 10 Looks like compute mode switch is broken either by script and by swiching it manually in amd "control center" give me same result. So too gets cards too work i need to be in "graphic mode"... But in registry editor  :

Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}

You can see your cards 0001 0002 etc... and there is "blokchain support setting" or something like that " with "0 value" I think that it is Huh
new name of "compute mode" after swichng it to "1 value"  everefing still works fine  Smiley
im on 19.6.1 driver
I just uninstalled all the AMD software and kept the drivers + a tool that can enable/disable compute mode. Card behaves now.
IMO it's wattman causing much of system instabilities. Even if you never 'turned it on', it is still being used.
CRAP!
It happened again!
Guess uninstalling the AMD software didn't fix diddly squat.
drivers 19.4.1
TRM versions 4.5 and up. Actually, I came here for the 5.x version yesterday to try and fix the problem.
Meanwhile I changed my other rigs to 5.1 too for the autotune, but I only have this problem on my vega rig.

https://ibb.co/vc2qXPj

Not changing the fact i get bewteen 100-200H/s more...
Sweet! Don't worry, I'll get there Wink
lupaarSen
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
June 11, 2019, 04:08:49 PM
 #1587

Thank you for the link! I can't believe it never came up before.
My pleasure Smiley


I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S


Man... you all use so much power on your vegas.
I have 1950h/s - 1990h/s with both the core and mem at 880mV (stock timings still, I don't have them that long yet. Still learning Vega Smiley ))
The core powerdraw is avg 117W, with dips to 95W and spikes to 130W (Eth & CNr)
The mem i can't get an accurate reading, not even with HWinfo. It is stuck at 1.2V without budging once, but changing the mV does have impact on stability and hashrate, so I'm sure it's not 1.2V)

I ran several autotunes and used those configs
ilovetrm

As a sidenote. the list devices batfile recognizes all my 4x0 and 5x0 as 580's.
Dunno if that's normal?
And with autotune, my 4gb cards are faster than my 8gb cards.
It's a mishmash of brands and memory brands tho.(no vega's in these rigs just fyi) (the MSI armor 470 8Gb Micron is just abysmal. Barely breaking 700h/s)

https://ibb.co/vc2qXPj

Not changing the fact i get bewteen 100-200H/s more...

Muuum i'm thinking that i didn't use timings before and 56 pulse is a nano PCB; do you think timings can screw up all of this? i'm using newly Minerstat OS exp. (18.04-19.10) flashing the stable version now...if it's not good ill back to *+* with stock timings and report here...antoher question i set up my mem voltage to 875-900-950 for a mem clock of 950 any advise?
migo77
Newbie
*
Offline Offline

Activity: 23
Merit: 1


View Profile
June 11, 2019, 04:16:17 PM
 #1588

2 kerney666

Hi, my rig can't run longer then 1-2 hours on v 0.4.5 and newer. One GPU hangs (this is from 0.5.1 after autoconfig):

[2019-06-10 21:22:55] GPU 0 [57C, fan 87%] cnr: 2.486kh/s, avg 2.477kh/s, pool 2.709kh/s a:46 r:0 hw:0
[2019-06-10 21:22:55] GPU 1 [58C, fan 85%] cnr: 2.490kh/s, avg 2.482kh/s, pool 1.261kh/s a:22 r:0 hw:0
[2019-06-10 21:22:55] GPU 2 [56C, fan 86%] cnr: 2.486kh/s, avg 2.478kh/s, pool 2.969kh/s a:51 r:0 hw:1
[2019-06-10 21:22:55] GPU 3 [63C, fan 87%] cnr: 2.480kh/s, avg 2.468kh/s, pool 2.377kh/s a:42 r:0 hw:0
[2019-06-10 21:22:55] GPU 4 [45C, fan 90%] cnr: 2.483kh/s, avg 2.471kh/s, pool 2.737kh/s a:47 r:0 hw:3
[2019-06-10 21:22:55] Total                cnr: 12.42kh/s, avg 12.38kh/s, pool 12.05kh/s a:208 r:0 hw:4
[2019-06-10 21:23:05] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

but v0.4.4 with exactly same config can run for weeks:

[2019-06-10 19:40:27] Stats Uptime: 13 days, 12:58:15
[2019-06-10 19:40:27] GPU 0 [59C, fan 87%] cnr: 2.471kh/s, avg 2.470kh/s, pool 2.340kh/s a:7780 r:0 hw:17
[2019-06-10 19:40:27] GPU 1 [60C, fan 85%] cnr: 2.474kh/s, avg 2.473kh/s, pool 2.423kh/s a:8057 r:0 hw:39
[2019-06-10 19:40:27] GPU 2 [57C, fan 86%] cnr: 2.471kh/s, avg 2.471kh/s, pool 2.337kh/s a:7768 r:0 hw:106
[2019-06-10 19:40:27] GPU 3 [63C, fan 87%] cnr: 2.464kh/s, avg 2.464kh/s, pool 2.381kh/s a:7917 r:0 hw:73
[2019-06-10 19:40:27] GPU 4 [55C, fan 88%] cnr: 2.470kh/s, avg 2.468kh/s, pool 2.259kh/s a:7519 r:0 hw:321
[2019-06-10 19:40:27] Total                cnr: 12.35kh/s, avg 12.35kh/s, pool 11.74kh/s a:39041 r:0 hw:556
[2019-06-10 19:40:39] Pool pool.supportxmr.com received new job. (job_id: +kOSIEF95a5dlkxX6slHR0EW+l34)

I know, I'm pushing hard on limit, but what changed in TR miner 0.4.5 that causes this instability?  With 0.4.4 and older was this rig rock stable. OS is Linux & amd18.3 drivers.

Thank you for answer,

Migo

Hi!

Man, it's such a hard question to answer. The changes between 0.4.4 and 0.4.5 are really tiny, and nothing that "should" affect anything in terms of stability. For cn/r, absolutely nothing of interest was touched in the kernels, and not anything specific in the host-side code either. For every release, we get a few people telling us how stable things are with the new version, then a others that (like you) unfortunately have a harder time keeping things running smoothly.

Since you're running linux, do you see anything interesting in your "dmesg" output from the kernel when a crash occurs?

-- K




Hi, thank you for your answer! I'm sorry I've not looked into dmesg. I'll stop 0.4.4 and run 0.5.1 again to look at dmesg. Can I provide some more info after crash?

0.4.4 run nicely from last 0.5.1 experiment yesterday:

[2019-06-11 18:16:28] Stats Uptime: 0 days, 20:38:06
[2019-06-11 18:16:28] GPU 0 [59C, fan 87%] cnr: 2.470kh/s, avg 2.470kh/s, pool 2.443kh/s a:512 r:0 hw:2
[2019-06-11 18:16:28] GPU 1 [60C, fan 84%] cnr: 2.470kh/s, avg 2.475kh/s, pool 2.537kh/s a:532 r:0 hw:0
[2019-06-11 18:16:28] GPU 2 [56C, fan 85%] cnr: 2.468kh/s, avg 2.471kh/s, pool 2.303kh/s a:482 r:0 hw:8
[2019-06-11 18:16:28] GPU 3 [63C, fan 87%] cnr: 2.461kh/s, avg 2.464kh/s, pool 2.401kh/s a:503 r:0 hw:7
[2019-06-11 18:16:28] GPU 4 [55C, fan 88%] cnr: 2.464kh/s, avg 2.468kh/s, pool 2.314kh/s a:485 r:0 hw:26
[2019-06-11 18:16:28] Total                cnr: 12.33kh/s, avg 12.35kh/s, pool 12.00kh/s a:2514 r:0 hw:43
[2019-06-11 18:16:30] Pool pool.supportxmr.com received new job. (job_id: BI3HJirVchNMe6LpNGRZuX5bez1a)

Now I'm on 0.5.1 for debuging:

          Team Red Miner version 0.5.1
[2019-06-11 18:18:19] Auto-detected AMD OpenCL platform 0
[2019-06-11 18:18:20] Initializing GPU 0.
[2019-06-11 18:18:21] Initializing GPU 1.
[2019-06-11 18:18:22] Initializing GPU 2.
[2019-06-11 18:18:23] Initializing GPU 3.
[2019-06-11 18:18:24] Initializing GPU 4.
[2019-06-11 18:18:25] Watchdog thread starting.
[2019-06-11 18:18:25] Runtime Command Keys: h - help, s - stats, e - enable gpu, d - disable gpu, t - tuning mode, q - quit
[2019-06-11 18:18:25] API initialized on 127.0.0.1:4028
[2019-06-11 18:18:25] Successfully initialized GPU 0: Vega with 64 CU (PCIe 03:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 1: Vega with 64 CU (PCIe 08:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 2: Vega with 64 CU (PCIe 0b:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 3: Vega with 64 CU (PCIe 0e:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 4: Vega with 64 CU (PCIe 11:00.0) (CN 16*14:CAA)


Thank you,

Migo  
XxXBigDickXxX
Newbie
*
Offline Offline

Activity: 25
Merit: 2


View Profile
June 12, 2019, 06:22:03 AM
 #1589

Hi Kerney666! Will we have a miner for RandomX? Huh
migo77
Newbie
*
Offline Offline

Activity: 23
Merit: 1


View Profile
June 12, 2019, 06:54:49 AM
 #1590

2 kerney666

Hi, my rig can't run longer then 1-2 hours on v 0.4.5 and newer. One GPU hangs (this is from 0.5.1 after autoconfig):

[2019-06-10 21:22:55] GPU 0 [57C, fan 87%] cnr: 2.486kh/s, avg 2.477kh/s, pool 2.709kh/s a:46 r:0 hw:0
[2019-06-10 21:22:55] GPU 1 [58C, fan 85%] cnr: 2.490kh/s, avg 2.482kh/s, pool 1.261kh/s a:22 r:0 hw:0
[2019-06-10 21:22:55] GPU 2 [56C, fan 86%] cnr: 2.486kh/s, avg 2.478kh/s, pool 2.969kh/s a:51 r:0 hw:1
[2019-06-10 21:22:55] GPU 3 [63C, fan 87%] cnr: 2.480kh/s, avg 2.468kh/s, pool 2.377kh/s a:42 r:0 hw:0
[2019-06-10 21:22:55] GPU 4 [45C, fan 90%] cnr: 2.483kh/s, avg 2.471kh/s, pool 2.737kh/s a:47 r:0 hw:3
[2019-06-10 21:22:55] Total                cnr: 12.42kh/s, avg 12.38kh/s, pool 12.05kh/s a:208 r:0 hw:4
[2019-06-10 21:23:05] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

but v0.4.4 with exactly same config can run for weeks:

[2019-06-10 19:40:27] Stats Uptime: 13 days, 12:58:15
[2019-06-10 19:40:27] GPU 0 [59C, fan 87%] cnr: 2.471kh/s, avg 2.470kh/s, pool 2.340kh/s a:7780 r:0 hw:17
[2019-06-10 19:40:27] GPU 1 [60C, fan 85%] cnr: 2.474kh/s, avg 2.473kh/s, pool 2.423kh/s a:8057 r:0 hw:39
[2019-06-10 19:40:27] GPU 2 [57C, fan 86%] cnr: 2.471kh/s, avg 2.471kh/s, pool 2.337kh/s a:7768 r:0 hw:106
[2019-06-10 19:40:27] GPU 3 [63C, fan 87%] cnr: 2.464kh/s, avg 2.464kh/s, pool 2.381kh/s a:7917 r:0 hw:73
[2019-06-10 19:40:27] GPU 4 [55C, fan 88%] cnr: 2.470kh/s, avg 2.468kh/s, pool 2.259kh/s a:7519 r:0 hw:321
[2019-06-10 19:40:27] Total                cnr: 12.35kh/s, avg 12.35kh/s, pool 11.74kh/s a:39041 r:0 hw:556
[2019-06-10 19:40:39] Pool pool.supportxmr.com received new job. (job_id: +kOSIEF95a5dlkxX6slHR0EW+l34)

I know, I'm pushing hard on limit, but what changed in TR miner 0.4.5 that causes this instability?  With 0.4.4 and older was this rig rock stable. OS is Linux & amd18.3 drivers.

Thank you for answer,

Migo

Hi!

Man, it's such a hard question to answer. The changes between 0.4.4 and 0.4.5 are really tiny, and nothing that "should" affect anything in terms of stability. For cn/r, absolutely nothing of interest was touched in the kernels, and not anything specific in the host-side code either. For every release, we get a few people telling us how stable things are with the new version, then a others that (like you) unfortunately have a harder time keeping things running smoothly.

Since you're running linux, do you see anything interesting in your "dmesg" output from the kernel when a crash occurs?

-- K




Hi, this time it run bit longer:


[2019-06-12 04:25:27] GPU 4 [55C, fan 88%] cnr: 2.483kh/s, avg 2.485kh/s, pool 2.078kh/s a:225 r:0 hw:18
[2019-06-12 04:25:27] Total                cnr: 12.42kh/s, avg 12.43kh/s, pool 11.40kh/s a:1240 r:0 hw:25
[2019-06-12 04:25:57] Stats Uptime: 0 days, 10:07:37
[2019-06-12 04:25:57] GPU 0 [60C, fan 88%] cnr: 2.484kh/s, avg 2.486kh/s, pool 2.371kh/s a:259 r:0 hw:0
[2019-06-12 04:25:57] GPU 1 [61C, fan 85%] cnr: 2.489kh/s, avg 2.490kh/s, pool 2.168kh/s a:236 r:0 hw:3
[2019-06-12 04:25:57] GPU 2 [57C, fan 86%] cnr: 2.485kh/s, avg 2.487kh/s, pool 2.320kh/s a:256 r:0 hw:2
[2019-06-12 04:25:57] GPU 3 [63C, fan 87%] cnr: 2.478kh/s, avg 2.480kh/s, pool 2.458kh/s a:264 r:0 hw:2
[2019-06-12 04:25:57] GPU 4 [46C, fan 89%] cnr: 2.484kh/s, avg 2.484kh/s, pool 2.076kh/s a:225 r:0 hw:18
[2019-06-12 04:25:57] Total                cnr: 12.42kh/s, avg 12.43kh/s, pool 11.39kh/s a:1240 r:0 hw:25
[2019-06-12 04:26:09] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

relevant dmesg output sent to PM, don't want to pollute thread...

Thanx,

Milan
argominer
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
June 12, 2019, 08:15:04 AM
 #1591

I think i have same issue (0.5.1). 6 x vega56 with timmings run maybe hour and then crach.

Watchdog GPU 0: stuck in enqueue, reporting.
GPU 0: detected DEAD (03:00.0), will execute restart script watchdog.bat
Rakly3
Newbie
*
Offline Offline

Activity: 45
Merit: 0


View Profile
June 12, 2019, 03:56:34 PM
Last edit: June 20, 2019, 09:43:30 AM by Rakly3
 #1592

I think i have same issue (0.5.1). 6 x vega56 with timmings run maybe hour and then crach.

Watchdog GPU 0: stuck in enqueue, reporting.
GPU 0: detected DEAD (03:00.0), will execute restart script watchdog.bat
[2019-06-12 11:45:51] GPU 5: detected DEAD (03:00.0), no restart script configured, will continue mining.

Bus 3? What about the others with same problem?
H110 d3a mobo, bus 3 is the x16 slot on this mobo. igfx is turned on as primary display adapter.
it's always the same card/bus. I'll try switching it around with another vega and see what happens.

BTW where can i find some info about setting up a watchdog script (bat)? Like how to send runtime commands to teamredminer. Or the id of the open/running miner?

I got bored last night and wrote this if any one wants a setup template

Code:
@echo off
set ALGO=
set POOL=
set PORT=
set WALLET=
set PASSWORD=x
set DEVICES=
set INTENSITY=

:: !! optional: create logfile(s)? set LOG=YES
set LOG=YES

:: !! optional: reorder GPU's according to bus number. set REORDER=YES
:: !! NOTE! Intensity and Devices will correspond to reorder.
set REORDER=

:: !! optional: works only for cryptonote algos and if pool allows! Otherwise leave blank. (soesn't work with Nicehash)
set RIGNAME=
set DIFFICULTY=

:: !! optional: name of mining pool. (for logfile purposes only. Can be left blank.)
set POOLNAME=

:: !! optional: pause for error message? set PAUSE=YES (Prevents the command window from closing if you have a problem launching the miner.)
set PAUSE=



:: --------------------Change below settings at own risk!--------------------------

set GPU_MAX_ALLOC_PERCENT=100
set GPU_SINGLE_ALLOC_PERCENT=100
set GPU_MAX_HEAP_SIZE=100
set GPU_USE_SYNC_OBJECTS=1
set CUR_YYYY=%date:~10,4%
set CUR_MM=%date:~4,2%
set CUR_DD=%date:~7,2%
if defined PORT set PORT=:%PORT%
if defined DEVICES set DEVICES=-d %DEVICES%
if defined RIGNAME set RIGNAME=--rig_id %RIGNAME%
if defined DIFFICULTY set DIFFICULTY=.%DIFFICULTY%
if not exist LOGS\%POOLNAME% mkdir LOGS\%POOLNAME%
if "%LOG%"=="YES" set LOG=--log_file=LOGS\%POOLNAME%\LOG_%POOLNAME%_%CUR_YYYY%.%CUR_MM%.%CUR_DD%_%ALGO%.txt
if "%REORDER%"=="YES" set REORDER=--bus_reorder


@echo on
teamredminer.exe -a %ALGO% -o %POOL%%PORT%%DIFFICULTY% -u %WALLET% -p %PASSWORD% %REORDER% %DEVICES% --cn_config=%INTENSITY% %LOG% %RIGNAME%

@if "%PAUSE%"=="YES" pause (
) else (
@exit
)
I actually just wanted some structure in my logfiles for troubleshooting these dead GPU issues but ended up with this Cheesy
Rakly3
Newbie
*
Offline Offline

Activity: 45
Merit: 0


View Profile
June 12, 2019, 05:05:09 PM
Last edit: June 12, 2019, 05:59:54 PM by Rakly3
 #1593

GPU 2: detected DEAD (13:00.0)

completely different card & bus now :/

I'm still using the stock bios btw. Different clockspeeds or voltages don't seem to impact it.
Timing level don't seem to matter either (driver setting, no mod) But I havn't tested for that specifically yet.

2nd run since switching the GPU's, And although it's not the same card or bus as before, it is again the one on bus 13
(bus 3 prior to swapping 2 cards.)

The card that is about to fail also always seem to spike in hasrate (by about 200-300h/s) right before crashing.


Before the swap
[2019-06-12 11:43:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.887kh/s a:90 r:3 hw:0
[2019-06-12 11:44:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.884kh/s a:90 r:3 hw:0
[2019-06-12 11:44:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.882kh/s a:90 r:3 hw:0
[2019-06-12 11:45:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.880kh/s a:90 r:3 hw:0
[2019-06-12 11:45:48] GPU 5 [54C, fan 58%] cnr: 2.359kh/s, avg 1.980kh/s, pool 1.877kh/s a:90 r:3 hw:0
[2019-06-12 11:45:51] GPU 5: detected DEAD (03:00.0), no restart script configured, will continue mining.



After the swap:
[2019-06-12 18:58:31] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.603kh/s a:17 r:0 hw:0
[2019-06-12 18:59:01] GPU 2 [56C, fan 62%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.555kh/s a:17 r:0 hw:0
[2019-06-12 18:59:31] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.509kh/s a:17 r:0 hw:0
[2019-06-12 19:00:01] GPU 2 [49C, fan 62%] cnr: 2.125kh/s, avg 1.967kh/s, pool 3.463kh/s a:17 r:0 hw:0
[2019-06-12 19:00:03] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.


[2019-06-12 19:24:23] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.667kh/s a:5 r:0 hw:0
[2019-06-12 19:24:53] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.601kh/s a:5 r:0 hw:0
[2019-06-12 19:25:23] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.539kh/s a:5 r:0 hw:0
[2019-06-12 19:25:53] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.979kh/s, pool 1.482kh/s a:5 r:0 hw:0
[2019-06-12 19:26:23] GPU 2 [55C, fan 60%] cnr: 2.293kh/s, avg 1.973kh/s, pool 1.429kh/s a:5 r:0 hw:0
[2019-06-12 19:26:34] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.


Spikes like these first of all make me think the mem clockspeed boosted somehow. Is there a way to track that? (with logging, I'm not gonna watch a graph all day)
kerney666
Member
**
Offline Offline

Activity: 658
Merit: 86


View Profile
June 12, 2019, 05:27:19 PM
 #1594

GPU 2: detected DEAD (13:00.0)

completely different card & bus now :/

I'm still using the stock bios btw. Different clockspeeds or voltages don't seem to impact it.
Timing level don't seem to matter either (driver setting, no mod) But I havn't tested for that specifically yet.

Hi! Even though I’m not replying to every message I’m always reading everything. I’m currently testing a few things and have gone through our full commit history from 0.4.4 and forward.

We have a few small bug fixes on the way out, but it’s impossible to tell if any of those are the underlying issue here. I’ll reach out to a few of you in pm to see if you can run some test builds. Would love to nail this, if possible.
Rakly3
Newbie
*
Offline Offline

Activity: 45
Merit: 0


View Profile
June 12, 2019, 05:35:16 PM
Last edit: June 12, 2019, 06:16:03 PM by Rakly3
 #1595

GPU 2: detected DEAD (13:00.0)

completely different card & bus now :/

I'm still using the stock bios btw. Different clockspeeds or voltages don't seem to impact it.
Timing level don't seem to matter either (driver setting, no mod) But I havn't tested for that specifically yet.

Hi! Even though I’m not replying to every message I’m always reading everything. I’m currently testing a few things and have gone through our full commit history from 0.4.4 and forward.

We have a few small bug fixes on the way out, but it’s impossible to tell if any of those are the underlying issue here. I’ll reach out to a few of you in pm to see if you can run some test builds. Would love to nail this, if possible.
oh hi!
I often edit my posts with more info, like I just did now. (So to not spam the thread, I talk a lot.)
I added some hashing speed data from the logs.
I don't have the logs from before, I deleted them while testing my start_template.bat
But I did see the same thing.
While copying some of the log data i also noticed there seems to be a temp drop. (below 55C ramp up?)
Afterburner is controlling the fans.
But i find 3 examples a bit small to go on atm. Although, in migo's log data I see 55C or less too on his crashing card, but no rampup, prolly because he alread is at 2.2k+?

I'm gonna do another run, and after that swap the cards back to their origal order.
I have no idea how long it will take though. It ran for hours overnight just fine.
Rakly3
Newbie
*
Offline Offline

Activity: 45
Merit: 0


View Profile
June 12, 2019, 08:13:34 PM
Last edit: June 12, 2019, 08:29:45 PM by Rakly3
 #1596

Before the swap
[2019-06-12 11:43:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.887kh/s a:90 r:3 hw:0
[2019-06-12 11:44:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.884kh/s a:90 r:3 hw:0
[2019-06-12 11:44:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.882kh/s a:90 r:3 hw:0
[2019-06-12 11:45:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.880kh/s a:90 r:3 hw:0
[2019-06-12 11:45:48] GPU 5 [54C, fan 58%] cnr: 2.359kh/s, avg 1.980kh/s, pool 1.877kh/s a:90 r:3 hw:0
[2019-06-12 11:45:51] GPU 5: detected DEAD (03:00.0), no restart script configured, will continue mining.



After the swap:
[2019-06-12 18:58:31] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.603kh/s a:17 r:0 hw:0
[2019-06-12 18:59:01] GPU 2 [56C, fan 62%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.555kh/s a:17 r:0 hw:0
[2019-06-12 18:59:31] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.509kh/s a:17 r:0 hw:0
[2019-06-12 19:00:01] GPU 2 [49C, fan 62%] cnr: 2.125kh/s, avg 1.967kh/s, pool 3.463kh/s a:17 r:0 hw:0
[2019-06-12 19:00:03] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.


[2019-06-12 19:24:23] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.667kh/s a:5 r:0 hw:0
[2019-06-12 19:24:53] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.601kh/s a:5 r:0 hw:0
[2019-06-12 19:25:23] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.539kh/s a:5 r:0 hw:0
[2019-06-12 19:25:53] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.979kh/s, pool 1.482kh/s a:5 r:0 hw:0
[2019-06-12 19:26:23] GPU 2 [55C, fan 60%] cnr: 2.293kh/s, avg 1.973kh/s, pool 1.429kh/s a:5 r:0 hw:0
[2019-06-12 19:26:34] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.

So this time something a little different happened, but still conforms to my mem clockspeed theory.

[2019-06-12 21:08:46] GPU 2 [55C, fan 62%] cnr: 1.988kh/s, avg 1.985kh/s, pool 2.857kh/s a:22 r:1 hw:0
[2019-06-12 21:09:16] GPU 2 [57C, fan 60%] cnr: 1.987kh/s, avg 1.986kh/s, pool 2.838kh/s a:22 r:1 hw:0
[2019-06-12 21:09:46] GPU 2 [56C, fan 62%] cnr: 2.150kh/s, avg 1.985kh/s, pool 2.820kh/s a:22 r:1 hw:0
[2019-06-12 21:10:16] GPU 2 [54C, fan 54%] cnr: 1.550kh/s, avg 1.982kh/s, pool 2.802kh/s a:22 r:1 hw:0
[2019-06-12 21:10:46] GPU 2 [54C, fan 60%] cnr: 1.549kh/s, avg 1.979kh/s, pool 2.784kh/s a:22 r:1 hw:0
...
[2019-06-12 21:14:16] GPU 2 [52C, fan 53%] cnr: 1.549kh/s, avg 1.961kh/s, pool 2.666kh/s a:22 r:1 hw:0
[2019-06-12 21:14:46] GPU 2 [47C, fan 43%] cnr: 1.551kh/s, avg 1.950kh/s, pool 2.650kh/s a:22 r:1 hw:0
[2019-06-12 21:14:51] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.


Instead of crashing immediatly, it seems the clockspeed got reset to the bios default first.

Now I'm gonna swap the 2 cards back to their original place.
arlekin
Jr. Member
*
Offline Offline

Activity: 55
Merit: 12


View Profile
June 12, 2019, 11:07:20 PM
 #1597

I have the same problem on 4 rigs with 4 various Vega 56 cards (ref with samsun g and asus/powercolor with hynix, with timings and without timings ). The card runs for an hour or two and then the hash rate drops to zero or to 1400 h/s with a DEAD message. I have observed in several rigs, but for some reason problems always with the first PCI slot (PCIe 3:0.0). If you do not stick the card into the first PCI slot (PCI 16x), then the problem disappears. Now I just do not use 16x pci slot   Sad
migo77
Newbie
*
Offline Offline

Activity: 23
Merit: 1


View Profile
June 13, 2019, 07:21:54 AM
Last edit: June 13, 2019, 09:15:42 AM by migo77
 #1598

I have the same problem on 4 rigs with 4 various Vega 56 cards (ref with samsun g and asus/powercolor with hynix, with timings and without timings ). The card runs for an hour or two and then the hash rate drops to zero or to 1400 h/s with a DEAD message. I have observed in several rigs, but for some reason problems always with the first PCI slot (PCIe 3:0.0). If you do not stick the card into the first PCI slot (PCI 16x), then the problem disappears. Now I just do not use 16x pci slot   Sad

Hi, interesting... for me it is GPU4 and it is located at PCIex16 slot too.

LnkSta:   Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
lebuawu2
Jr. Member
*
Offline Offline

Activity: 176
Merit: 2


View Profile
June 13, 2019, 01:41:48 PM
 #1599

Hi Dev,

Why I often got this message :
Quote
Dev pool connection was closed due to an error.
Mining will proceed at reduced rate while not connected to dev pool.

but after sometime it will connect to dev pool :
Quote
Dev pool connected and ready.

is there any solution for this message?



Thanks.
kerney666
Member
**
Offline Offline

Activity: 658
Merit: 86


View Profile
June 13, 2019, 04:06:15 PM
 #1600

Hi Dev,

Why I often got this message :
Quote
Dev pool connection was closed due to an error.
Mining will proceed at reduced rate while not connected to dev pool.

but after sometime it will connect to dev pool :
Quote
Dev pool connected and ready.

is there any solution for this message?



Thanks.

Well, it should really only happen when we restart our servers, which isn't often at all. So, there's something else cutting the connection for you. In some cases there are home routers that cut connections, in other cases many countries/jurisdictions don't like neither SSL traffic nor mining traffic. May I ask where you're located? You can PM me if you want to. Also, I'd love to know what version of TRM you're running? We have noticed a higher degree of short-lived inbound connections lately, but can't explain why.

Pages: « 1 ... 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 [80] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 ... 150 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!