STR359 (OP)
Newbie
Offline
Activity: 7
Merit: 0
|
|
June 30, 2018, 05:33:05 PM Last edit: July 10, 2018, 05:56:46 PM by STR359 |
|
Hello everyone! Need your advice please on following problem I have with one if my rigs. The rig is with 4xRX580 8Gb cards ASUS. 3x OC version and one STRIX (with 3 fans). The problem is that it was running 6 months without any problems at all. But since last month or so one of the cards (OC one) started to drop hash rate randomly. It is mining 30.1Mh/s and all of a sudden drops to ~28Mh/s, in next second is OK again. In a minute drops to ~26Mhs and next second again to 30.1 as usual. And so on until I restart the rig. In the beginning started to do it rare but now it is very often. When I restart the rig it is mining without any problem for several hours and starts do this again.
First thing came to my mind I did reimage of Windows. I have made an IMAGE of C partition in the moment I tested the rig and it was stable 5 months ago. So the image is pretty fine and well tested. Nothing changed to card behavior. Did a clean install of drives - same thing. Last version of drivers and claymore - same thing. Changed the card to different riser and different power rails (for riser and 8pin). Same card have same behavior no matter the riser. So I isolated risers and cables as issue. Isolated drivers and claymore version. Windows installation as well. almost forgot. Tried with stock overclocking settings - same thing.
So please for advice, can I do something else or the card is on its way for refurbishment.
mining ether in ethermine pool. windows 10 (1709) 4G RAM, 30G virtual memory 850W corsair gold power supply. Problematic card settings: 1150/2120@0.85V. fans ~50% and temp 64/73 for CPU/VRMs. Using overdriveN tool for tuning and memory tweaked with Polaris editor with proposed setting for hynix memory. As I told earlier since last month this card was running very fine with same settings since january. Nothing changed to any settings.
Sorry for long post and all advices are much appreciated!
|
|
|
|
tg88
Legendary
Online
Activity: 2492
Merit: 1491
Payment Gateway Allows Recurring Payments
|
|
July 01, 2018, 12:30:12 AM |
|
Try to reduce the clocks and see if you get a more stable hashrate.
|
|
|
|
xxcsu
|
|
July 01, 2018, 12:40:56 AM |
|
what was the gpu/mem temperature/cooling fan speed when your card was stable ? what is the gpu/mem temperature/cooling fan speed now when your card started to have incosistent hashrate? If the ambient temperature changed ,and / or you card running much hotter that can be one reason! Try to go lower with gpu/mem frequency also try to play with the mem/gpu voltage. or place a extra cooling fan behind your rig , and push some colder air to your gpu's
|
|
|
|
STR359 (OP)
Newbie
Offline
Activity: 7
Merit: 0
|
|
July 01, 2018, 06:48:18 AM |
|
Hi and thanks for your feedback.
Temperatures are almost the same around 64-65 CPU, 73-74 VRMs. Fan speeds are around 45% when it is cold in the room and around 60-63% when it was much hotter. That was ~2 months ago. Now the weather here is fine as outside temperatures, respectively inside as well. When the inconsistent hash rate is happening no change to any parameter, manual or from the card itself. I mean clocks, voltages or etc. Monitoring them with HWinfo and GPU-z. Fan speed is auto corrected from the card of course but in above mentioned range. I have not seen fan speed more than 66% in very hot day.
Now I gave the cards 2 steps higher voltage to 0.9V but same behavior. And as I stated above I tried with stock cpu/mem clocks, voltages and hash rate was jumping occasionally minus 2-3-5Mhs down but in his case not from 30.1 but from 28.1.
I will try with 1200/2050 just for confining your suggestions. The strange thing is that if I restart the rig the hashrate is consistent for several hours no matter clock/voltage. So I will need some days in order to prove the behavior with new settings.
Many thanks once again!
|
|
|
|
Geraldo
Sr. Member
Offline
Activity: 588
Merit: 272
⭐⭐⭐⭐⭐
|
|
July 01, 2018, 07:06:06 AM |
|
Have you tried to unplug one card? (one that was working correctly) then you will have three cards on your systems (2 cards that are working good plus one troubled card) If that isn't fixing your issue, may you need to re-mod that card (the troubled one).
|
|
|
|
STR359 (OP)
Newbie
Offline
Activity: 7
Merit: 0
|
|
July 03, 2018, 07:45:56 PM |
|
Hi A little update. Changed speed to 1200/2050@0.912V and so far 2 days with no fluctuations. Which is a bit odd to me as I tried before with stock settings 1360/2000 and the problem still exists. But now with this ones it seems OK. I am wondering could it be something with drivers. Somehow this particular card to be mapped to the certain driver parts somewhere in the OS and no matter I move it to assign same drivers to this physical card. I know this is a wild guess but just wondering. In this case pure reinstall would do the job but as I mentioned I have image of C: drive and revert back if something goes wrong in order to eliminate OS as a problem. And with this particular OS it was working almost 6 months without any issues. No updates installed since then.
Haven not tried to remove one card but this will be the next step. Will update in a few days again. @Geraldo: clarify please what you mean by "re-mode the card"? To flash the BIOS again with "Turbo" memory timings or just tweaking the voltages and Mhz with different values?
Cheers!
|
|
|
|
markiz73
|
|
July 03, 2018, 08:20:21 PM |
|
I think the whole problem is in the tension on the GPU's core. 0.85 is a very small voltage, not all video cards can work stably with this voltage. Therefore, when there is a shortage of power, the video card also resets the hashrate.The optimal voltage for the GPU core is in the range of 0.9-1 volts.
|
|
|
|
STR359 (OP)
Newbie
Offline
Activity: 7
Merit: 0
|
|
July 03, 2018, 09:07:15 PM |
|
Hi markiz73 How you would explain that this card was working absolutely fine for about 5 months and showed some mood just now. That puzzle me. Because if the problem was with voltage or speed it would have showed this problem at very beginning as I initially tested all cards and installed them in the RIG. I am trying to explain myself is there something that could wear out. If it was a fan it would be expected but for well managed and appropriate cooled card I cannot explain. But now I am trying with 1175/2050@0.912V in order to reduce heat a bit and crossing fingers to continue working fine next few days. If yes will try to increase memory clock. Still looking for opinions though! Every brainstorming will help me and hopefully someone else in the future.
Just side note: Very thankful to you all and entire community. Cheers!
|
|
|
|
Geraldo
Sr. Member
Offline
Activity: 588
Merit: 272
⭐⭐⭐⭐⭐
|
|
July 04, 2018, 08:04:06 AM |
|
@Geraldo: clarify please what you mean by "re-mod e the card"? = To flash the BIOS again.
Yeah. That was I mean
|
|
|
|
abhiseshakana
Legendary
Offline
Activity: 2408
Merit: 2281
From Zero to 2 times Self-Made Legendary
|
|
July 04, 2018, 12:21:58 PM Last edit: July 09, 2018, 07:22:06 AM by abhiseshakana |
|
Did you check the memory errors using HWINFO ?? if there is a memory error reported, it looks like your GPU is overclocked too much.
Resets the one card with a problem to default clock settings, then start mining to see the next result. But If you don’t get any memory errors, that means your GPU is having no problems running at those clock rate.
One more thing, make sure your GPU Workload still on Compute mode and try to Disable ULPS using Trixx.
|
| | | . .Duelbits. | | | █▀▀▀▀▀ █ █ █ █ █ █ █ █ █ █ █ █▄▄▄▄▄ | TRY OUR
NEW UNIQUE GAMES! | | . ..DICE... | ███████████████████████████████ ███▀▀ ▀▀███ ███ ▄▄▄▄ ▄▄▄▄ ███ ███ ██████ ██████ ███ ███ ▀████▀ ▀████▀ ███ ███ ███ ███ ███ ███ ███ ███ ▄████▄ ▄████▄ ███ ███ ██████ ██████ ███ ███ ▀▀▀▀ ▀▀▀▀ ███ ███▄▄ ▄▄███ ███████████████████████████████ | . .MINES. | ███████████████████████████████ ████████████████████████▄▀▄████ ██████████████▀▄▄▄▀█████▄▀▄████ ████████████▀ █████▄▀████ █████ ██████████ █████▄▀▀▄██████ ███████▀ ▀████████████ █████▀ ▀██████████ █████ ██████████ ████▌ ▐█████████ █████ ██████████ ██████▄ ▄███████████ ████████▄▄ ▄▄█████████████ ███████████████████████████████ | . .PLINKO. | ███████████████████████████████ █████████▀▀▀ ▀▀▀█████████ ██████▀ ▄▄███ ███ ▀██████ █████ ▄▀▀ █████ ████ ▀ ████ ███ ███ ███ ███ ███ ███ ████ ████ █████ █████ ██████▄ ▄██████ █████████▄▄▄ ▄▄▄█████████ ███████████████████████████████ | 10,000x MULTIPLIER | │ | NEARLY UP TO .50%. REWARDS | | | ▀▀▀▀▀█ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▄▄█ |
|
|
|
markiz73
|
|
July 04, 2018, 07:32:26 PM |
|
Hi markiz73 How you would explain that this card was working absolutely fine for about 5 months and showed some mood just now. That puzzle me. Because if the problem was with voltage or speed it would have showed this problem at very beginning as I initially tested all cards and installed them in the RIG. I am trying to explain myself is there something that could wear out. If it was a fan it would be expected but for well managed and appropriate cooled card I cannot explain. But now I am trying with 1175/2050@0.912V in order to reduce heat a bit and crossing fingers to continue working fine next few days. If yes will try to increase memory clock. Still looking for opinions though! Every brainstorming will help me and hopefully someone else in the future.
Just side note: Very thankful to you all and entire community. Cheers!
Hi, I'm an engineer and for many years I've heard one phrase: "This device worked properly for a month, six months, a year " But if the device stops working, then you need to look for the cause of the failure, and not look what happened yesterday. If the video card issues a different hashrate for a certain time, I first look at its power. But generally give advice on the Internet is not a thankful business. The problem can be in the contacts of the power circuit. There are 6 power lines on the video card. 5 power lines goes to the GPU and 1 to the memory of the video card. Perhaps the problem is in these nodes of the video card and now the video card does not have enough power at 0.85 volts. But all this can be learned after diagnosis.
|
|
|
|
swogerino
Legendary
Offline
Activity: 3332
Merit: 1248
Bitcoin Casino Est. 2013
|
|
July 04, 2018, 07:48:18 PM |
|
If you have a spare computer or a spare motherboard, which every enthusiast miner should have you can do an elementary test. Remove the card and test in on a Pciex16 slot , the bigger slot on the motherboard where card is installed usually for gaming.
If it gives you the same hashrate with fluctuations it means that the card itself needs to be diagnosed, as already suggested reflash the bios with another bios, Anorach tech is a website full of these things which you can download and find which one works best for you. If none works ,card I can say it is faulty.
If the card gives a stable 30 mhs in this new motherboard for days, it means something has changed in your Windows configuration, especially if you have left updates on.
|
| | | | | | | ███▄▀██▄▄ ░░▄████▄▀████ ▄▄▄ ░░████▄▄▄▄░░█▀▀ ███ ██████▄▄▀█▌ ░▄░░███▀████ ░▐█░░███░██▄▄ ░░▄▀░████▄▄▄▀█ ░█░▄███▀████ ▐█ ▀▄▄███▀▄██▄ ░░▄██▌░░██▀ ░▐█▀████ ▀██ ░░█▌██████ ▀▀██▄ ░░▀███ | | ▄▄██▀▄███ ▄▄▄████▀▄████▄░░ ▀▀█░░▄▄▄▄████░░ ▐█▀▄▄█████████ ████▀███░░▄░ ▄▄██░███░░█▌░ █▀▄▄▄████░▀▄░░ █▌████▀███▄░█░ ▄██▄▀███▄▄▀ ▀██░░▐██▄░░ ██▀████▀█▌░ ▄██▀▀██████▐█░░ ███▀░░ | | | | |
|
|
|
STR359 (OP)
Newbie
Offline
Activity: 7
Merit: 0
|
|
July 08, 2018, 06:54:13 AM |
|
Hi guys! When I configured the card 1175/2050 it was fluctuating again but now with settings 1130/2050@ 0.9V it is OK again for 2 days now. But now 2 cards are showing much more fan speed than other two. I have set target temp to 65 degrees. Before in order to maintain this value fans were at 44-46% in similar room temp as current one but now it is around 60%. Just for this 2 cards. They were cleaned regularly and are far apart. So I am pretty sure that it is a time to re-flash the cards. This is the only obvious reason for such odd behavior. The concerning thing is that the fan issue is affecting 2 cards. One of them is the one with fluctuations.
I was trying to avoid windows reinstalation but now I will have to do it together with cards reflash. I think if I will not do it the odd issues will continue so in order to not full this article I will do a clean install of everything and BIOS reflash. Than will share the results.
To answer @abhiseshakana: No memory errors. There are several showing here and there but for months. To summup for around 30 days I have total 4-5 errors for all 4 cards. So I think this is considered as normal.
|
|
|
|
STR359 (OP)
Newbie
Offline
Activity: 7
Merit: 0
|
|
July 09, 2018, 07:30:26 PM |
|
Last update I think! Reinstall of windows 10 version 1803, AMD drivers 17.1.14. Problematic card was reflashed first with original bios than again with modded one. Same behavior. Tried to remove one of other working cards and the issue is the same. It appears much rare but still appears. I saw on GPUz during this low hashrate the VDDC power draw drops as well. I am not sure why but this is related to low hashrate. So I am pretty exhausted with ideas and all point to card problem. But until it gets fully broken I cannot claim for warranty situation. Many thanks to all of you and if someone think of something else even orthodox just let me know and I will test.
Cheers!
|
|
|
|
Lunga Chung
Member
Offline
Activity: 277
Merit: 23
|
|
July 09, 2018, 09:07:20 PM |
|
Replacing riser with new gen 009 worked for me, on 12 rigs some cards now and then drop some hash but it gets back in few sec, don't know why are bothered with this at all
|
|
|
|
STR359 (OP)
Newbie
Offline
Activity: 7
Merit: 0
|
|
July 10, 2018, 03:55:20 PM Last edit: July 10, 2018, 05:57:02 PM by STR359 |
|
Hi Nice one thanks! v009 has some kind of voltage regulator. I did not think of this possibility at all. I will purchase one and will try for sure. I bought v006 as they are simple and was hoping less things to go wrong with this load and temperatures. Thanks again.
|
|
|
|
|