Bitcoin Forum
May 14, 2024, 06:04:28 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Warning: One or more bitcointalk.org users have reported that they strongly believe that the creator of this topic is a scammer. (Login to see the detailed trust ratings.) While the bitcointalk.org administration does not verify such claims, you should proceed with extreme caution.
Pages: « 1 2 3 4 5 [6] 7 »  All
  Print  
Author Topic: Antminer S3 batch 6 overclocking  (Read 22991 times)
canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 01:01:22 AM
 #101

4. If you've recently started the rig, could you post the cgminer startup string (usually logged in system log ...?)?
From the process tab:
cgminer --bitmain-options 115200:32:8:14:275:0a82 -o stratum+tcp://us1.ghash.io:3333 -O USER.WORKER:Any -o stratum+tcp://stratum.mining.eligius.st:3334 -O ADDR_WORKER:Any -o stratum+tcp://mint.bitminter.com:3333 -O USER_WORKER:PASS --bitmain-nobeeper --api-listen --api-network --bitmain-checkn2diff --bitmain-hwerror --version-file /usr/bin/compile_time --lowmem


I remember reading somewhere that it is not a good idea to run without a queue, rather have a reduced one to the 2048 one that ships. I run mine with  --queue 1024, so you could possibly try  --queue 512 if you are averse to having a long one. (I am not conversant with the why's of this, so please don't ask!)

Here's where that came from, ckolivas's recommendation when he released the S3 binary I am currently using on the OC unit:
Here's an updated S3 binary.

http://ck.kolivas.org/apps/cgminer/antminer/s3/4.6.1-141020/cgminer

Recommended if you're mining on p2pool for the default binary actually discards stale shares which you should never do, especially on p2pool. Also includes changes to queuing and memory usage that were necessary on S4 but probably only of minor benefit here. Recommend you edit the cgminer startup script to remove the --queue value entirely, and add --lowmem. Performance should be pretty much unchanged.

Death spiral on the OC unit definitive now, average down to 523 GH/s.  All chips good with "o", still just the 3 HW errors, temps 42/38.  This would bother me less if I had some explanation for what is happening.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
1715666668
Hero Member
*
Offline Offline

Posts: 1715666668

View Profile Personal Message (Offline)

Ignore
1715666668
Reply with quote  #2

1715666668
Report to moderator
Transactions must be included in a block to be properly completed. When you send a transaction, it is broadcast to miners. Miners can then optionally include it in their next blocks. Miners will be more inclined to include your transaction if it has a higher transaction fee.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715666668
Hero Member
*
Offline Offline

Posts: 1715666668

View Profile Personal Message (Offline)

Ignore
1715666668
Reply with quote  #2

1715666668
Report to moderator
1715666668
Hero Member
*
Offline Offline

Posts: 1715666668

View Profile Personal Message (Offline)

Ignore
1715666668
Reply with quote  #2

1715666668
Report to moderator
1715666668
Hero Member
*
Offline Offline

Posts: 1715666668

View Profile Personal Message (Offline)

Ignore
1715666668
Reply with quote  #2

1715666668
Report to moderator
pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
December 22, 2014, 01:11:33 AM
 #102


Here's where that came from, ckolivas's recommendation when he released the S3 binary I am currently using on the OC unit:
Here's an updated S3 binary.

http://ck.kolivas.org/apps/cgminer/antminer/s3/4.6.1-141020/cgminer

Recommended if you're mining on p2pool for the default binary actually discards stale shares which you should never do, especially on p2pool. Also includes changes to queuing and memory usage that were necessary on S4 but probably only of minor benefit here. Recommend you edit the cgminer startup script to remove the --queue value entirely, and add --lowmem. Performance should be pretty much unchanged.

Death spiral on the OC unit definitive now, average down to 523 GH/s.  All chips good with "o", still just the 3 HW errors, temps 42/38.  This would bother me less if I had some explanation for what is happening.

On the queue, I'll take ckolivas' word over what I may recall ...

And yes, that is more like a death spiral, too damn right! I'd have expected it to be flashing x's all over + an increase in HW errors at this point, but then again the new binaries support that --bitmain-hwerror option (or such like) that I have never gotten my head around! I think its that time to try the 0800 setting .... I am convinced it is a heat problem your rigs are encountering due to its consistency in drop-off, so reducing the voltage may help (but may need a power cycle, infact I'd say do one even though I do all my tests initially without one).

canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 01:24:56 AM
 #103

And yes, that is more like a death spiral, too damn right! I'd have expected it to be flashing x's all over + an increase in HW errors at this point, but then again the new binaries support that --bitmain-hwerror option (or such like) that I have never gotten my head around! I think its that time to try the 0800 setting .... I am convinced it is a heat problem your rigs are encountering due to its consistency in drop-off, so reducing the voltage may help (but may need a power cycle, infact I'd say do one even though I do all my tests initially without one).

OK, I'm going to call it a death spiral now.  98 minutes into the run, 15m hashrate of 375 at the pool, miner reports average dropped to 495.  All chips still "o", temps 41/41, and a total of 5 HW errors.  But I would also expect the symptoms you describe with an overheat, and I'm not seeing them.  No increasing temps, no "x"s on the chips, HW errors still negligible.  Fan speeds are dropping, now 1800-1900, down from 2200-2300.  So the unit is getting cooler as it slows down.  I saw the same behavior on all six units, so we are missing something here.

On the one test unit, I will now try 250/0750, hardware reset then software reset.  Will see what happens!  One thing I am wondering is if we are triggering an internal chip overheat of some kind, that throttles it back but doesn't report a bunch of failures.  So we try to push them harder, and they actually go slower.  But I'm just guessing at this point.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
December 22, 2014, 01:30:04 AM
 #104

<snip> .... </snip> But I would also expect the symptoms you describe with an overheat, and I'm not seeing them.  No increasing temps, no "x"s on the chips, HW errors still negligible.  Fan speeds are dropping, now 1800-1900, down from 2200-2300.  So the unit is getting cooler as it slows down. .... <snip> ... </snip>

There, my friend, is the sign you are looking for!

Fan speeds dropping ... unit is getting cooler

That means there are some chips that have gone offline! EDIT: Either that, or the chips are not getting work ..... now you see how that the queue argument was formented in my mind?

pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
December 22, 2014, 01:35:32 AM
 #105

On the one test unit, I will now try 250/0750, hardware reset then software reset.  Will see what happens!  One thing I am wondering is if we are triggering an internal chip overheat of some kind, that throttles it back but doesn't report a bunch of failures.  So we try to push them harder, and they actually go slower.  But I'm just guessing at this point.

Its your rig, but I'd try 275/0800 first. Simply enter the voltage, save and apply then do a power cycle.
On pushing them harder if they are going to throttle back without showing any outward sign ..... that is the question.

canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 01:40:55 AM
 #106

<snip> .... </snip> But I would also expect the symptoms you describe with an overheat, and I'm not seeing them.  No increasing temps, no "x"s on the chips, HW errors still negligible.  Fan speeds are dropping, now 1800-1900, down from 2200-2300.  So the unit is getting cooler as it slows down. .... <snip> ... </snip>

There, my friend, is the sign you are looking for!

Fan speeds dropping ... unit is getting cooler

That means there are some chips that have gone offline! EDIT: Either that, or the chips are not getting work ..... now you see how that the queue argument was formented in my mind?

Does it?  Why are all chips still reporting "o" and not "-" or "x"?  That is why I'm wondering about the internal throttling.  There should be plenty of work, I have tried leaving queue at default as well as deleting queue parameter completely.  I have not tried --queue 1024 yet though.  But I saw the same result with two different pools and with and without deletion of the parameter.

New test run started: 1024 firmware, 250/0750, queue left at stock, cgminer left at stock.  Only change to stock is the additional options in cgminer.lua, which do not change the 250 values.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
December 22, 2014, 01:44:27 AM
 #107

New test run started: 1024 firmware, 250/0750, queue left at stock, cgminer left at stock.  Only change to stock is the additional options in cgminer.lua, which do not change the 250 values.
OK, I need to catch up on my sleep ..... getting to 2am where I am so I'll pick up from where you got to in a several hours' time.

canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 01:44:57 AM
 #108

On the one test unit, I will now try 250/0750, hardware reset then software reset.  Will see what happens!  One thing I am wondering is if we are triggering an internal chip overheat of some kind, that throttles it back but doesn't report a bunch of failures.  So we try to push them harder, and they actually go slower.  But I'm just guessing at this point.

Its your rig, but I'd try 275/0800 first. Simply enter the voltage, save and apply then do a power cycle.
On pushing them harder if they are going to throttle back without showing any outward sign ..... that is the question.

Sorry, missed this post.  I will try 275/0800 after we see what happens with 250/0750.  My procedure lately is to put in freq/volt, save & apply, power cycle, then system/reboot.  I added on the last step after just a power cycle would result in bad stats in Miner Status.  Have you tried OC on an S3+ with factory thermal paste?  The six I am working with are untouched.  So if it is a chip-level thermal problem that is not reported to the sw, then I guess that could explain the difference here.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
December 22, 2014, 01:55:18 AM
 #109

Sorry, missed this post.  I will try 275/0800 after we see what happens with 250/0750.  My procedure lately is to put in freq/volt, save & apply, power cycle, then system/reboot.  I added on the last step after just a power cycle would result in bad stats in Miner Status.

I used to have that issue (and always had to do an SSH reboot after a power cycle). Of late, though, it seems to have been fixed .... all I did once is before I rebooted via SSH, I first went into the System -> Start Up tab and stopped cgminer, then entered reboot in the SSH window. Now the stats are OK after a brutal power cycle .... (but I was called names when I mentioned the stats problem on this forum!)

Have you tried OC on an S3+ with factory thermal paste?  The six I am working with are untouched.  So if it is a chip-level thermal problem that is not reported to the sw, then I guess that could explain the difference here.

I did OC one S3+ before I redid the paste which worked OK at 262.5 (which was the only "good" freq I had at the time). I now redo all the S3's I get by, at the very least, putting heat-pads on the chips, so for now all have been "modded" if you like.

canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 04:46:42 AM
 #110

It took longer, but 250/0750 also a dead end.  Over 500 first hour, then down.  After 3h down to 485 average, 458 last hour on the pool.  All "o" on the chips and only 1 HW.  Dropping to 243/0750 for the next run.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 05:09:57 AM
 #111

My procedure lately is to put in freq/volt, save & apply, power cycle, then system/reboot.  I added on the last step after just a power cycle would result in bad stats in Miner Status.

I used to have that issue (and always had to do an SSH reboot after a power cycle). Of late, though, it seems to have been fixed .... all I did once is before I rebooted via SSH, I first went into the System -> Start Up tab and stopped cgminer, then entered reboot in the SSH window. Now the stats are OK after a brutal power cycle .... (but I was called names when I mentioned the stats problem on this forum!)

It doesn't happen every time, but a significant % of power cycles result in Miner Status using the previous elapsed time, so everything gets pushed lower.  A warm reboot always results in the correct outcome.  This is a funny hobby to have, these things have minds of their own.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 08:59:48 AM
 #112

It took longer, but 250/0750 also a dead end.  Over 500 first hour, then down.  After 3h down to 485 average, 458 last hour on the pool.  All "o" on the chips and only 1 HW.  Dropping to 243/0750 for the next run.


243/0750 also a dead end, dropped to 4h running average of 471, with last hour on the pool at 449.  Trying 237/0750.

I also started playing with the slowest of the others (225 for 453, no voltage change).  Trying 250/0750, and so far 2 hours in it is rock solid at the expected 504.  Not going to count those chickens, will see what it looks like tomorrow.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 11:50:54 AM
 #113

It took longer, but 250/0750 also a dead end.  Over 500 first hour, then down.  After 3h down to 485 average, 458 last hour on the pool.  All "o" on the chips and only 1 HW.  Dropping to 243/0750 for the next run.


243/0750 also a dead end, dropped to 4h running average of 471, with last hour on the pool at 449.  Trying 237/0750.

I also started playing with the slowest of the others (225 for 453, no voltage change).  Trying 250/0750, and so far 2 hours in it is rock solid at the expected 504.  Not going to count those chickens, will see what it looks like tomorrow.

237 was repeated "x" from chips over four reboots, so bailed.  231 was 428 over an hour at the pool, also bailed.  That unit just doesn't want to OC.  Reverted to 225/no-volt.

Other unit, however, continues to hum along at 504 after 5h.  I do not understand these things.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
December 22, 2014, 11:58:10 AM
 #114

It took longer, but 250/0750 also a dead end.  Over 500 first hour, then down.  After 3h down to 485 average, 458 last hour on the pool.  All "o" on the chips and only 1 HW.  Dropping to 243/0750 for the next run.


243/0750 also a dead end, dropped to 4h running average of 471, with last hour on the pool at 449.  Trying 237/0750.

I also started playing with the slowest of the others (225 for 453, no voltage change).  Trying 250/0750, and so far 2 hours in it is rock solid at the expected 504.  Not going to count those chickens, will see what it looks like tomorrow.

237 was repeated "x" from chips over four reboots, so bailed.  231 was 428 over an hour at the pool, also bailed.  That unit just doesn't want to OC.  Reverted to 225/no-volt.

Other unit, however, continues to hum along at 504 after 5h.  I do not understand these things.

I noticed that repeated x's clear when you restart cgminer e.g via the System Start Up tab (which also happens when you Save & Apply settings) - just saying so you can reduce your restart / reboot cycle length.

Been said a lot of times. S3's are the same, but different ..... most of the S3 variants in my stable run best at different settings, and those with the same settings, on the same pool, over the same connection produce differing results! So there goes.

canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
December 22, 2014, 08:16:22 PM
 #115

It took longer, but 250/0750 also a dead end.  Over 500 first hour, then down.  After 3h down to 485 average, 458 last hour on the pool.  All "o" on the chips and only 1 HW.  Dropping to 243/0750 for the next run.


243/0750 also a dead end, dropped to 4h running average of 471, with last hour on the pool at 449.  Trying 237/0750.

I also started playing with the slowest of the others (225 for 453, no voltage change).  Trying 250/0750, and so far 2 hours in it is rock solid at the expected 504.  Not going to count those chickens, will see what it looks like tomorrow.

237 was repeated "x" from chips over four reboots, so bailed.  231 was 428 over an hour at the pool, also bailed.  That unit just doesn't want to OC.  Reverted to 225/no-volt.

Other unit, however, continues to hum along at 504 after 5h.  I do not understand these things.

I noticed that repeated x's clear when you restart cgminer e.g via the System Start Up tab (which also happens when you Save & Apply settings) - just saying so you can reduce your restart / reboot cycle length.

Been said a lot of times. S3's are the same, but different ..... most of the S3 variants in my stable run best at different settings, and those with the same settings, on the same pool, over the same connection produce differing results! So there goes.

All three OC experiments failed.  Both of the other units I tried 250/0750 with looked fantastic for the first hour, then slid into slow decline without any symptoms other than declining hashrate.  They bottomed out at about 430 after 12 hours.  Reverting to stock and taking a break!

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
December 23, 2014, 02:57:04 AM
 #116

All three OC experiments failed.  Both of the other units I tried 250/0750 with looked fantastic for the first hour, then slid into slow decline without any symptoms other than declining hashrate.  They bottomed out at about 430 after 12 hours.  Reverting to stock and taking a break!

I am not by any means disputing your results as that would be futile, however if I may point out my experience with S3's, it is very rare that you find any two units hashing the same, even when using similar freq & voltage settings, PSU and connected to the same pool.

In your case however, that you have experienced the same wall of performance after the first hour, and that happening on most of your units, leads me to believe the issue must be local to your setup.

My first instinct was a heat / temperature problem, however, you seem to discount this as a possibility, and that the units run as normal on stock frequencies seems to justify that. I am not completely sold though since I have never run an S3 variant at temps as high as yours. Before the season change and my moving my rigs outside, I had them all running the fans at full pelt, aka blue wire hack, which kept the temps low, and now that the seasons have changed and my rigs reside in the garden, my temps are a lot lower.

My second guess is how you are powering your rigs. Though you appear to have the right PSU's, this may be a cause if you have them all plugged into the same extension lead. They may not trip your circuit, but they will compete for power, especially if you are running them off a 110v circuit.

If you choose to attempt the OC again, I wish you luck, but I think in your instance I have given as much help as I possibly can, bearing in mind I am neither an engineer nor an electrician.

canford
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile WWW
January 01, 2015, 06:07:28 AM
 #117

All three OC experiments failed.  Both of the other units I tried 250/0750 with looked fantastic for the first hour, then slid into slow decline without any symptoms other than declining hashrate.  They bottomed out at about 430 after 12 hours.  Reverting to stock and taking a break!

I am not by any means disputing your results as that would be futile, however if I may point out my experience with S3's, it is very rare that you find any two units hashing the same, even when using similar freq & voltage settings, PSU and connected to the same pool.

In your case however, that you have experienced the same wall of performance after the first hour, and that happening on most of your units, leads me to believe the issue must be local to your setup.

My first instinct was a heat / temperature problem, however, you seem to discount this as a possibility, and that the units run as normal on stock frequencies seems to justify that. I am not completely sold though since I have never run an S3 variant at temps as high as yours. Before the season change and my moving my rigs outside, I had them all running the fans at full pelt, aka blue wire hack, which kept the temps low, and now that the seasons have changed and my rigs reside in the garden, my temps are a lot lower.

My second guess is how you are powering your rigs. Though you appear to have the right PSU's, this may be a cause if you have them all plugged into the same extension lead. They may not trip your circuit, but they will compete for power, especially if you are running them off a 110v circuit.

If you choose to attempt the OC again, I wish you luck, but I think in your instance I have given as much help as I possibly can, bearing in mind I am neither an engineer nor an electrician.

Thanks for trying to help!  I am both an engineer and an electrician.  I do not think I have power issues, as all 6 units are running from Corsair 750 or 800W supplies, with all four PCI-E connectors well seated.  Three power supplies are on one 20A 120V circuit, and three are on another.  So there is plenty of spare power.

I think what is happening is a slow overheating.  Interestingly, the S3s do not appear to respond by increasing the fan speeds, instead the hash rate goes down as we have discussed.  All six units are in a well ventilated space that stays around 18C.  The S3s report temps around 39-42C, so not very high.

If I find the time, I may try redoing the paste and/or maxing the fan speeds to see if that helps.  I will also keep an eye out for any S3 firmware updates that fix something.  For now, I have concluded that my six units are not overclockable other than the slight bump I get from setting the frequencies to 225/231/243/237/225/237 for the six.

Very frustrating to see such promising hashrate increases in the first hour, but then nothing sustainable.  But overall I am impressed with the units, they are all working to spec and running just fine.

Пoльзyйтecь бecплaтнo и пишитe чтo вaм нyжнo yлyчшить:trd.ai
Bидeo, кaк пoльзoвaтьcя пpoeктoм:https://www.youtube.com/watch?v=pNhx715vOOk&feature=youtu.be
kaltar
Sr. Member
****
Offline Offline

Activity: 805
Merit: 250


View Profile
March 16, 2015, 06:54:18 PM
 #118

Even though this thread is a bit old, i just wanted to stop by and tell you that YOU my friend are a GOD.

i followed your instructions and my s3+ is doing 553gh stable, with very little HW like 0.0002% for 48hours.
tonight i will doing my other 5 units

Happy Hashing to everyone
Bicknellski
Hero Member
*****
Offline Offline

Activity: 924
Merit: 1000



View Profile
April 18, 2015, 12:57:31 PM
 #119

Still running your S3's Pek?

Dogie trust abuse, spam, bullying, conspiracy posts & insults to forum members. Ask the mods or admins to move Dogie's spam or off topic stalking posts to the link above.
pekatete (OP)
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile WWW
April 18, 2015, 01:00:38 PM
 #120

Still running your S3's Pek?
Yep, still running them ...

Pages: « 1 2 3 4 5 [6] 7 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!