Bitcoin Forum
June 15, 2024, 04:57:37 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 »  All
  Print  
Author Topic: VEGA 64 - Cannot Enable HBCC on 2/6 cards in rig after reboot.  (Read 5689 times)
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 23, 2017, 08:12:22 PM
 #1

I'm really stuck trying to diagnose this problem.

The problem is straightforward to describe.  I have 6xVEGA 64s.  I go to Wattman to enable/adjust HBCC for each card.  When I click "apply" the screen will flicker once and the setting will take.  However for 2/6 of the cards it just crashes Wattman.  When I open Wattman up the setting remains unchanged for that card.

At one point all cards were working fine.  It was not until I rebooted that I got into this state and now I cannot get out of it.

It gets a bit weirder.  I've tried the following on both the latest blockchain drivers and the latest non-blockchain drivers. 
- Safe Mode
- DDU, remove all (also tried same steps with AMD Clean Utility)
- Boot with 1 GPU
- Install drivers
- Shutdown, connect another GPU, boot, wait for drivers to see new card
- repeat last step until all 6 cards are working

Again 2 cards will unable to adjust HBCC.  I get no error message, no details as to why, the setting just refuses to adjust for two of the cards.

I am running:
- Windows 10
- 32GB RAM
- 64GB Virtual Memory (required for xmr-stak-amd setup getting 1900H+).


Question:
- Is there an error log AMD spits out I can try and look at?
- Windows records no event logs when HBCC adjustment causes Wattman to crash.
- Bios issue?
- What if anything can I do to continue to diagnose this?

Any other info I can provide to help others help me?

TLDR; Help me please! My rig is short 1200H because two cards won't enable HBCC.
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 24, 2017, 07:52:04 PM
 #2

More info.

So the two problematic cards are plugged into PCI-E 3.0 1x slots (#2 and #3 on my mobo in order from top to bottom).

So I swapped a working HBCC card with one of the ones that I cannot enable HBCC on.

It seems whichever two cards are plugged into these PCI-E slots are problematic as the swapped card now cannot enable/adjust HBCC.

So I verified this a few more times.  It appears whatever two cards I have plugged into the #2 and #3 pci-e slot cannot enable HBCC. 

This problem also goes away entirely if I just run 4 GPUs, then those slots are fully functional.

rednoW
Legendary
*
Offline Offline

Activity: 1510
Merit: 1003


View Profile
September 24, 2017, 08:14:33 PM
 #3

More info.

So the two problematic cards are plugged into PCI-E 3.0 1x slots (#2 and #3 on my mobo in order from top to bottom).

So I swapped a working HBCC card with one of the ones that I cannot enable HBCC on.

It seems whichever two cards are plugged into these PCI-E slots are problematic as the swapped card now cannot enable/adjust HBCC.

So I verified this a few more times.  It appears whatever two cards I have plugged into the #2 and #3 pci-e slot cannot enable HBCC. 

This problem also goes away entirely if I just run 4 GPUs, then those slots are fully functional.



It can be that HBCC technology only supported by modern cpu's pci-e lanes (ryzen, modern intel) and can have problems if used with chipset pci-e lanes ... just a guess ... I've read something from AMD documentation ...
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 24, 2017, 08:26:54 PM
 #4

Hmmm.  I suppose this is some important info given your response.

Mobo:
MSI Z270-A Pro ATX LGA1151

CPU:
INTEL® CELERON® Processor G3930

Perhaps the CPU is the problem?
rednoW
Legendary
*
Offline Offline

Activity: 1510
Merit: 1003


View Profile
September 24, 2017, 08:51:27 PM
 #5

Hmmm.  I suppose this is some important info given your response.

Mobo:
MSI Z270-A Pro ATX LGA1151

CPU:
INTEL® CELERON® Processor G3930

Perhaps the CPU is the problem?

I think your cpu is not a problem. The problem can be the way how motherboard bios or windows decided to connect you 5-th or 6-th gpu to the system - directly to cpu pci-e lanes or to southbridge switch.

I don't know if it is our case but you can read here https://github.com/RadeonOpenCompute/ROCm
Search for Supported CPUs chapter
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 24, 2017, 10:22:43 PM
 #6

So reading up on ROCm support CPUs I noticed it claims to require Gen3 PCIE. 

My bios I had set to Gen 2.  I changed it to Gen 3.  Did not help. 

Given at one point both these cards worked.  That one point was the very time I booted with them plugged in. 

So it is very likely a Windows issue.  I should try a complete clean install. 

Is there anyway to check if the pci-e lane is directly to cpu pci-e lanes or to southbridge switch?  I examined Windows System Information and could not tell from there.
dj--alex
Member
**
Offline Offline

Activity: 81
Merit: 10


View Profile
September 25, 2017, 10:11:49 AM
 #7

you tries use Linux?
on linux i have autooverclock and autofan on nvidia
i heard AMD-gpupro driver is good work on Vega now
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 26, 2017, 03:07:59 AM
 #8

More info.

I followed the power table mods seen here (https://bitcointalk.org/index.php?topic=2002025.msg22158580#msg22158580)

It worked on 4 of the 6 cards.  The same two cards when I go hit "reset" after reboot retain the default stock settings.  All other cards are functioning nicely with those power table mods.

So, the same two cards refuse to read their registry settings from the same location as all the others.  Definitely something up with Windows I think.  I might try Linux.
rafinirt
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
September 26, 2017, 05:18:53 PM
 #9

Hey there,

just registered here because I got the exact same problem with 6x VEGA 56... 4 running with HBCC and 2 not.. AMD Settings crashes every time after activating hbcc on any card.

Let me know if you can figure something out Smiley

Would love to use linux, because I hate that gui stuff on windows, but as far as I know there is no HBCC support on linux yet...
fanatic26
Hero Member
*****
Offline Offline

Activity: 756
Merit: 560


View Profile
September 26, 2017, 05:27:40 PM
 #10

What effect does the HBCC have on mining? From everything I read it is a technology with no real use. In lab tests they had to absolutely destroy the card running 8k resolution benchmarks to even see HBCC in use. In real world gaming and workstation applications it is 100% useless.

Stop buying industrial miners, running them at home, and then complaining about the noise.
rafinirt
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
September 26, 2017, 05:39:05 PM
 #11

The effect on xmr-stak-amd is 1300H/S vs 1900H/S, big deal.. Sad
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 26, 2017, 08:32:02 PM
 #12

The effect on xmr-stak-amd is 1300H/S vs 1900H/S, big deal.. Sad

Yup, exactly that.  I am losing 1200H because 2 cards won't enable HBCC.  I mean not the end of the world but I hate having problems I cannot solve.  

My next move is likely a clean install of Windows.

Given I tried OLDComer's power table mod and it took perfectly on 4/6 cards.  The same 2 cards that I cannot enable HBCC on simply ignore it.  The registry section for those 2 cards has the soft power table entry and it is correct.  Yet those 2 cards just reset to AMD Stock settings when I hit reset in Wattman while the other 4 jump to OLDComer's settings.  

Clearly this is a clue or related to the same HBCC problem (or an annoying coincidence).

So I am thinking this problem is solvable.  I am thinking Windows has done something wrong with the card.
rafinirt
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
September 26, 2017, 09:33:32 PM
 #13

My next move is likely a clean install of Windows.

That's why I hate Windows... this randomness sucks  Embarrassed
Please let me know if it can fix the problem... Smiley

If not we have to wait for HBCC support in the linux drivers I guess...
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 27, 2017, 06:32:32 PM
 #14

More info.

I got a dummy hdmi plug.  So far my VEGA 64s have not needed one.  They always run fine monitor or not.

While xmr-stak-amd was mining I plugged a dummy plug into one of the two problematic cards.  Everything froze, screen went black, flickered a bit, then came back.  xmr-stak-amd looked stalled but eventually came back to life.

Check hash rates.  The one card I plugged the dummy plug into was hashing at 1900H but only for ~10sec!!  Then it dropped back down.

So, I figure lets reboot with the hdmi dummy plug connected from the start.  If I do this, I cannot enable HBCC on any card as the machine always crashes. 

So the best I can get now is to get everything all setup for 1900H (clock speeds, HBCC toggled).  Then once xmr-stak is running I can shove a dummy plug into a card hashing low and get ~10s of 1900H before it drops down to 1300H. 
rafinirt
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
September 27, 2017, 08:33:23 PM
 #15

Ok, weird... Thanks for the info, I'll try tomorrow with a second monitor - let's see what happens...  Huh
cryptoinvestor_x
Newbie
*
Offline Offline

Activity: 71
Merit: 0


View Profile
September 28, 2017, 01:01:08 AM
 #16

Exact same issue, when plugged in PCI-E 1x any alteration to HBCC even though it is shown in AMD Control Center has not effect. Hash is 1300 instead of the 1900 it should reach.

I have also run into a bug that is preventing me from installing 6 Vega on a MSI Z97 Gaming 5 board.

I have substituted it for an R9 290 and the BSOD on driver installs for the 6th card have went away.

Seems like 4x Vega Rigs are trouble free though.
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
September 29, 2017, 04:43:21 PM
 #17

Does anyone know of a way to compare the PCIE lane setup across GPU's or at least how Windows sees it?

According to GPU-Z they are all PCIE 3.0 16x.  I need more details than that. 
jointheredditarmy
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
October 02, 2017, 08:45:49 AM
 #18

Has anyone managed to figure this out yet? I'm thinking about adding 2 more Vega to my rig of 4 (and fill out the 2 empty slots in this server case).

I'm guessing from this post I absolutely should not be doing that?
miner1337 (OP)
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
October 02, 2017, 07:48:49 PM
 #19

I have been unable to locate a solution. 

More info though.  If I do a completely clean install with 1 GPU connected.  Then boot with all 6.  Disable crossfire.  I can enable HBCC on 5/6 cards before reboot.  Reboot once and it is 4/6 cards forever.  Given we have to reboot to install the soft power tables I don't even consider this a half a solution. 

I am sure the problem is addressable by AMD. 

For now mine monero with four and mine something else with the other two.
l1xx
Member
**
Offline Offline

Activity: 115
Merit: 10


View Profile
October 02, 2017, 09:27:53 PM
 #20

How do you make them mine monero with 4 and something else with the other two?

A friend with 5 found a solution, that is working (at least for him). I am going to try it with the next restart.
After reboot - change virtual ram size. In his case he changed from 90 to 80. But I guess any change would do the trick. And then move sliders and confirm the HBCC size.

For me it worked with 5 once, when I upped the virtual ram from 64 to 80. That info I got from him have not tried yet, as I am very busy today, but will do with the next restart.
In any case, to make 6 work, it seems we need 96 or more virtual ram, and I simply don't have that much space on the SSD. Someone with bigger SSD should try it.
Pages: [1] 2 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!