Bitcoin Forum
May 21, 2024, 06:20:20 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: KnC Titan Unofficial Support Thread  (Read 1386 times)
r00tdude (OP)
Full Member
***
Offline Offline

Activity: 206
Merit: 100


View Profile
November 21, 2015, 08:54:48 PM
 #1

I found it incredibly unusual that there wasn't an official or unofficial support thread for us Titan owners here, especially given there's no real official method of sending a rig for repair and receiving it back with certainty. My RMA process was fraught with pitfalls - so much so that in the end I decided it best to just do what I could with my cubes that were misbehaving rather than risk sending them back to KnC for repair, and possibly being charged/getting lost or stolen or broken/never getting them back/getting them back in time for the next millenium... you get the idea. I've spent an inordinate amount of time coming up with a tuning procedure that works 100% of the time with my rigs that I've wanted to share for a long time - and I'd like to know what others have done to overcome bad controllers/cables/PSU issues, etc.

So I'll kick off this thread with my tuning procedure. As anyone with one of these rigs knows, they are as fidgety as a blind man in a shooting gallery. The dies, over time, go dark or just inactive and no amount of rebooting (cold or warm) will bring them back. When this happens...

    Clock Speed Phase

    • Do a cold-boot as you normally would to recover an iffy die.
    • Do a factory-reset on your rig (FWIW, it seems like clearing the saved settings and history of a cubes settings possibly get them hashing again).
    • After the reset, put in a pool you know to give you a consistent hashrate like coinotron and let the rig settle out.
    • Note the dies that aren't initializing or hashing - leave your voltages alone during this phase. Set every non-initializing/hashing die to 150Mhz and warm-reboot. Consider dies hashing properly at 300Mhz (stock speed) to be locked in.
    • Again, let the rig settle in. Any die that has recovered at 150Mhz and is hashing, turn up one setting to 175Mhz. The ones that don't, set them to 100Mhz and warm-reboot.
    • Settle. Any dies at 175 that were previously hashing and aren't? Set those back to 150Mhz and consider them locked in. Ones that are still hashing at 175Mhz, tick them up to 200Mhz. Any dies still inactive at 100Mhz? Shut those OFF - they aren't really worth working on further. They won't recover without other means.
    • Settle. Any dies hashing at 200Mhz? If so, tick those up a slot again. Not hashing? Drop them back to 175Mhz and consider them locked in.
    • Settle. Continue this until all dies are locked in and hashing. The clock speed phase is over.

    Power Tuning Phase - After all dies are locked into their highest possible frequency, now's the time to tune back the core-voltage to minimize power consumption and heat (a bit).

    • The stock V of -0.0366 is fine for OVERCLOCKED dies. That is, those that you push up to 325Mhz. Most all of my 300Mhz dies run at -0.0513 with minimal (<1% HW errors).
    • As a matter of principle, you can undervolt two steps for every 25Mhz clock speed drop. Meaning, if a die will hash at 200Mhz, a setting of -0.1099 will likely supply enough V to maintain the hashing and keep errors low. This will also mitigate supplying power the die doesn't need to hash and tick down the heat generated by the die as well. Win/Win.
    • How do you know about HW errors? SSH to the rig's IP address and issue a 'screen -r' command at the prompt and look over the last info next to each cube 'HW: XX/XX%" - you want the percentage to be below 1%. Tick up any cube's dies one step of core-V at a time and restart mining to verify until the dies are getting enough power.

    That's all the tough stuff. If you do this right, you'll be rewarded with a much more efficient rig, both power and temp-wise, with more working dies. Even a die hashing at 100Mhz is an improvement over a non-hashing one at 300Mhz. Pay particular attention to temps after you are done. Tune 'hot' cubes and dies down to fit a temp envelope of lower than 105 degrees C to keep things going smoothly. Yes, many say the cooler the better, however, I have cubes and dies that won't hash at all when they are cool (75 degrees C) but hash fine at 100 degrees.

    Always perform a visual inspection of cables, connectors and PSUs at least once or twice a week to note discoloration or melting cables and replace them as necessary, immediately. As an added measure, note the cube's number on the rig itself, and tune that cube down a bit (Mhz-wise) to let the cube draw less current over those cables - they'll last longer that way. I know there are many here that run their cubes balls-out at 325 and full V all day and all night long - but unless you are running 12 gauge PSU cables, there will be a come-uppance for you and your rigs (and possibly your house).

    I'm sure many of you have your own procedures, but these have served me well (so far) - and I hope others join this thread and add their own experiences and wisdom. I'm definitely looking for someone to chime in on how to remove the heatsinks and re-do their thermal paste applications on these things. I have some dies myself that run 10-20 degrees hotter than others, even in the same cube, and would love to be walked through removal of the heatsink so as not to damage the cube altogether.

    All we have is each other - let's share experiences and info.

    Donations Welcome: 163fDhK9sNwL7fdjWK6QZ6gNYayYDRWGFn
    [/list]

    Bitrated user: r00tdude.
    GenTarkin
    Legendary
    *
    Offline Offline

    Activity: 2450
    Merit: 1002


    View Profile
    November 21, 2015, 09:04:28 PM
    Last edit: November 21, 2015, 09:22:26 PM by GenTarkin
     #2

    Im gonna go out on a limb here and guess you havent seen my unofficial insanely awesome firmware for the Titan =)
    It does everything u mentioned above... for the most part automatically.

    https://github.com/GenTarkin/Titan  - is the informational Github


    Also, Im working on the next release which will feature "Energy Saver" ... basically it performs  per-die voltage tuning.


    My progress and conversations w/ the community largely take place in this thread: https://bitcointalk.org/index.php?topic=170332.41700

    One of the original reasons I started this project is because my titan when I purchased it had "dropping die" problems too and it became a pita. After figuring out the DCDC's per die were fully programmable I came up w/ a method to perform a power cycle on the die's DCDC's to bring it back online, w/o requiring a power cycle of the entire rig.
    From there, w/ many supporters, donations & people buying my firmware ...the project grew into something way more "intelligent" then Id ever expected.

    Further notes:
    1. for 325mhz I recommened NEVER running it at -0.0366v for an entire cube!! One or 2 dies is fine if needed .. the current draw on the DCDC's is over their rated limit, the current draw on the 6pin PCIE connector is already well beyond spec even at lower voltage.
    2. 105C is WAYYYYY to hot for these DCDC's given the current we are pulling from them. 85C is the max I recommend. The reason is, the DCDC's lifespan is drastically decreased if asking them to deliver their max current at temps around 85C and higher.  IIRC at 85C ish, .7v these DCDC's are derated to like 35Amps max.  Theres a graph in the documentation for these DCDC's that show what Im talking about.
    3. At 42Amps and higher the DCDC's effeciency start decline RAPIDLY!
    4. 46Amps is their max before they start "sputtering current"
    5. 50Amps is their overcurrent shutdown point I believe.

    GenTarkin's MOD Kncminer Titan custom firmware! v1.0.4! -- !!NO LONGER AVAILABLE!!
    Donations: bitcoin- 1Px71mWNQNKW19xuARqrmnbcem1dXqJ3At || litecoin- LYXrLis3ik6TRn8tdvzAyJ264DRvwYVeEw
    r00tdude (OP)
    Full Member
    ***
    Offline Offline

    Activity: 206
    Merit: 100


    View Profile
    November 22, 2015, 09:55:58 PM
     #3

    I own a copy of your firmware and look forward to the self-leveling function - and as it isn't available yet, I think the primer is still relevant, no? Also, you are selling it for $50 on eBay, so having a method of performing a tune-up, gratis, is a good thing.

    What I'd love to see from you if you're willing to contribute back to the community, is a photo walk-through of the heat-sink removal so people can do their own maintenance on the thermal paste on those. I've gone as far as cracking the cube and removing the screws from the heatsink, but haven't had the balls to pull it off, as I didn't want to do any damage if there was a step I was missing. That'd be very helpful for those of us with dies that run much hotter than others, meaning they likely aren't making good contact with the heatsink.

    Also, among others, I'm sure, we've been asking for fixes and enhancements to your firmware since you first released it - posting a URL where we can vote on the enhancements we'd most like to see would be helpful.

    FWIW, I have cubes, as do many others, that will show activity (they sit idle) and they won't hash, regardless of whether your firmware is trying to awaken them or not. Is there another method we can try to revive those? I have a full cube doing exactly this and reviving any of the dies on that cube would be a huge benefit.

    Thanks for your contributions!

    Bitrated user: r00tdude.
    GenTarkin
    Legendary
    *
    Offline Offline

    Activity: 2450
    Merit: 1002


    View Profile
    November 24, 2015, 02:03:07 AM
     #4

    I own a copy of your firmware and look forward to the self-leveling function - and as it isn't available yet, I think the primer is still relevant, no? Also, you are selling it for $50 on eBay, so having a method of performing a tune-up, gratis, is a good thing.

    What I'd love to see from you if you're willing to contribute back to the community, is a photo walk-through of the heat-sink removal so people can do their own maintenance on the thermal paste on those. I've gone as far as cracking the cube and removing the screws from the heatsink, but haven't had the balls to pull it off, as I didn't want to do any damage if there was a step I was missing. That'd be very helpful for those of us with dies that run much hotter than others, meaning they likely aren't making good contact with the heatsink.

    Also, among others, I'm sure, we've been asking for fixes and enhancements to your firmware since you first released it - posting a URL where we can vote on the enhancements we'd most like to see would be helpful.

    FWIW, I have cubes, as do many others, that will show activity (they sit idle) and they won't hash, regardless of whether your firmware is trying to awaken them or not. Is there another method we can try to revive those? I have a full cube doing exactly this and reviving any of the dies on that cube would be a huge benefit.

    Thanks for your contributions!

    I dont actually have a titan physically, been coding everything remotely =)

    I would like to see logs of the cubes that still sit around idle even tho the firmware has powered them off and on again.

    GenTarkin's MOD Kncminer Titan custom firmware! v1.0.4! -- !!NO LONGER AVAILABLE!!
    Donations: bitcoin- 1Px71mWNQNKW19xuARqrmnbcem1dXqJ3At || litecoin- LYXrLis3ik6TRn8tdvzAyJ264DRvwYVeEw
    r00tdude (OP)
    Full Member
    ***
    Offline Offline

    Activity: 206
    Merit: 100


    View Profile
    November 26, 2015, 03:52:08 PM
     #5

    Quote

    I dont actually have a titan physically, been coding everything remotely =)

    I would like to see logs of the cubes that still sit around idle even tho the firmware has powered them off and on again.

    The dude can provide - PM me with your email address and I'll send over some logs of a rig with some sleepy dies.

    Also, if you're doing the coding, I've sent over a really useful feature to bitcoinlasvegas for implementation (months ago actually) - regarding die freq and volt profiles (backups saved to a path on the rig) that are called with a cron job to automatically change hash speed based on time - faster in the cool evenings, slower during the day... It'd be very useful to have a feature like that.

    Bitrated user: r00tdude.
    coinut
    Full Member
    ***
    Offline Offline

    Activity: 253
    Merit: 100


    View Profile
    December 02, 2015, 06:41:54 AM
     #6

    hey there got a question.

    I have a couple of titans that I'm trying to get running stable which seems like a hard task still going after many nights trying new settings.
    I have one stable so far the other Is giving me trouble with random dies dropping out.

    I noticed that the stable titans controller  has 3 green led lights on when booted and the other has only 2 green leds lit up, the led in the middle of the three is off.
    I was wondering if anyone knows what the 3 green leds mean and maybe even what the middle green led light off means?

    cheers
    Pages: [1]
      Print  
     
    Jump to:  

    Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!