Bitcoin Forum
May 13, 2024, 04:54:46 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Efudd Z-Series Fuddware 2.3 -Z11/Z11e/Z11j/Z9/Mini  (Read 45450 times)
efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 28, 2018, 09:40:56 PM
 #441

Folk,

I'd like to share a lesson's learned that may be useful to others. I have a Z9mini here that I just checked and it has a hash rate of 0. Immediately I went to the logs to see what was going on and found the following:

Code:
Nov 28 21:33:52 (none) local0.err cgminer[23558]: bm1740_verify_nonce_integrality CRC error. cal-crc=374c, chip-crc=60bf
Nov 28 21:33:52 (none) local0.warn cgminer[23558]: receive a error nonce. total = 8908
Nov 28 21:33:52 (none) local0.err cgminer[23546]: bm1740_verify_nonce_integrality CRC error. cal-crc=ac2d, chip-crc=3f77
Nov 28 21:33:52 (none) local0.warn cgminer[23546]: receive a error nonce. total = 8705

The key here is that these are happening constantly, every second, the count is up to 8000+. (If these are infrequet, every few minutes, to hours, they can be ignored) What's going on? To figure that out, let's take a look at the process list ("System" -> "Monitor")... and what do I find?

Code:
23546 23545 root     S <   225m  98%  50% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog
23558 23557 root     S <   257m 111%  40% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog

Two copies of cgminer running! How could that happen? The answer is in this little program right here:

Code:
1012     1 root     S     2152   1%   0% {monitorcg} /bin/sh /sbin/monitorcg

This is a factory process that tries to be a "watchdog" for cgminer and restart it if it is not running. From the factory it ran every 20 seconds, but I modified it to sleep for 60 seconds to try to limit the possibility of this race condition.

What happens is if you change frequency or pool configuration, cgminer is stopped and restarted. While that stop/start is occurring, monitorcg has a change to see cgminer is not running and start one itself. End result: Two cgminer's stepping on each other.

I may end up removing /sbin/monitorcg from the firmware as I've attempted to fix this particular race a myriad of ways... but when two separate processes (web interface actions and monitorcg) are both touching the same resource ("cgminer"), there is not any good way to prevent them from stepping on each other unless they are talking to each other constantly to achieve what is called "quorum".

What's the lesson here? Many times the errors that you may see are a function of this particular race condition.... and if you have two cgminer processes running, the fix is to kill/restart them. The simplest way to do that is ust to go to the frequency page and click submit. That will terminate both cgminers and hopefully restart it before monitorcg tries to help. A guaranteed way to fix it is to reboot, but I am not a fan of unnecessary reboots.

Hopefully this bit of information will be useful to someone. I've been meaning to write posts like this explaining various scenarios for a while.

Thank you,

Jason

1715576086
Hero Member
*
Offline Offline

Posts: 1715576086

View Profile Personal Message (Offline)

Ignore
1715576086
Reply with quote  #2

1715576086
Report to moderator
1715576086
Hero Member
*
Offline Offline

Posts: 1715576086

View Profile Personal Message (Offline)

Ignore
1715576086
Reply with quote  #2

1715576086
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
chipless
Jr. Member
*
Offline Offline

Activity: 559
Merit: 4


View Profile
November 28, 2018, 10:37:23 PM
 #442

Folk,

I'd like to share a lesson's learned that may be useful to others. I have a Z9mini here that I just checked and it has a hash rate of 0. Immediately I went to the logs to see what was going on and found the following:

Code:
Nov 28 21:33:52 (none) local0.err cgminer[23558]: bm1740_verify_nonce_integrality CRC error. cal-crc=374c, chip-crc=60bf
Nov 28 21:33:52 (none) local0.warn cgminer[23558]: receive a error nonce. total = 8908
Nov 28 21:33:52 (none) local0.err cgminer[23546]: bm1740_verify_nonce_integrality CRC error. cal-crc=ac2d, chip-crc=3f77
Nov 28 21:33:52 (none) local0.warn cgminer[23546]: receive a error nonce. total = 8705

The key here is that these are happening constantly, every second, the count is up to 8000+. (If these are infrequet, every few minutes, to hours, they can be ignored) What's going on? To figure that out, let's take a look at the process list ("System" -> "Monitor")... and what do I find?

Code:
23546 23545 root     S <   225m  98%  50% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog
23558 23557 root     S <   257m 111%  40% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog

Two copies of cgminer running! How could that happen? The answer is in this little program right here:

Code:
1012     1 root     S     2152   1%   0% {monitorcg} /bin/sh /sbin/monitorcg

This is a factory process that tries to be a "watchdog" for cgminer and restart it if it is not running. From the factory it ran every 20 seconds, but I modified it to sleep for 60 seconds to try to limit the possibility of this race condition.

What happens is if you change frequency or pool configuration, cgminer is stopped and restarted. While that stop/start is occurring, monitorcg has a change to see cgminer is not running and start one itself. End result: Two cgminer's stepping on each other.

I may end up removing /sbin/monitorcg from the firmware as I've attempted to fix this particular race a myriad of ways... but when two separate processes (web interface actions and monitorcg) are both touching the same resource ("cgminer"), there is not any good way to prevent them from stepping on each other unless they are talking to each other constantly to achieve what is called "quorum".

What's the lesson here? Many times the errors that you may see are a function of this particular race condition.... and if you have two cgminer processes running, the fix is to kill/restart them. The simplest way to do that is ust to go to the frequency page and click submit. That will terminate both cgminers and hopefully restart it before monitorcg tries to help. A guaranteed way to fix it is to reboot, but I am not a fan of unnecessary reboots.

Hopefully this bit of information will be useful to someone. I've been meaning to write posts like this explaining various scenarios for a while.

Thank you,

Jason

This should fix the problem the majority of the time if not completely. Doing a double check gives cgminer 120 seconds and a double check before it starts a new instance of the miner.

place this in the monitorcg file

#!/bin/sh
#set -x
check_inter="60s"
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      chk_again
   fi

chk_again() {
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      /etc/init.d/cgminer.sh restart
   fi
}
done


There also needs to be some checking added for the asic status. If there is x number of acics reporting an x for status then cgminer restarts or the system reboots. This can help when overclocking and out of the blue a board fails. The miner will at least restart rather then stay dropped out losing speed

Share your results with others on my Discord channel
https://discord.gg/6t62apJ
efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 28, 2018, 10:53:28 PM
 #443

...snip...
This should fix the problem the majority of the time if not completely. Doing a double check gives cgminer 120 seconds and a double check before it starts a new instance of the miner.

place this in the monitorcg file

#!/bin/sh
#set -x
check_inter="60s"
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      chk_again
   fi

chk_again() {
while true; do
   sleep $check_inter
   #date
   a="$(ps | grep cgminer | grep -v 'grep cgminer')"
   if [ -z "$a" ] ; then
      /etc/init.d/cgminer.sh restart
   fi
}
done


There also needs to be some checking added for the asic status. If there is x number of acics reporting an x for status then cgminer restarts or the system reboots. This can help when overclocking and out of the blue a board fails. The miner will at least restart rather then stay dropped out losing speed

Please clean up your quotes instead of re-quoting everyone else before you each time.

That will not fix the race. You can add "chk_again" as many times as you want, it does not remove it.

Jason

efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 28, 2018, 10:58:21 PM
 #444

... snip ...
Kernel log,monitor log, screenshot System,Miner Status and img fan emu. Password in PM
Link https://dropmefiles.com/3iSv8

Thank you for those details. I think I might know what is going on -- can you check your PM and email me at the email address I provided?

Once we can get confirmation, I should be able to get this fixed reasonably quickly.
....snip...
Ok, this is going to take a little longer. This is Roskomnadzor. I am looking for a workaround.

xkosx - by the time you wake up the issue should be resolved. I have migrated primary services to something not blocked by Roskomnadzor. In fact, I can already see russian installations coming online.

Thank you,

Jason

badbart
Member
**
Offline Offline

Activity: 449
Merit: 24


View Profile
November 28, 2018, 11:56:37 PM
 #445

I installed 2.1 and I don't have an option to upload a licences file. 

The system page says:
Efudd's Z9 Series Firmware v2.1
No dev-fee until 12/01/2018!

But under upgrade no option to upload a license file.

P.S. My Z9 is running faster now then your old firm ware with no clock changes.
efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 12:18:07 AM
 #446

I installed 2.1 and I don't have an option to upload a licences file. 

The system page says:
Efudd's Z9 Series Firmware v2.1
No dev-fee until 12/01/2018!

But under upgrade no option to upload a license file.

P.S. My Z9 is running faster now then your old firm ware with no clock changes.

Refresh your browser cache on the upgrade page. Shift-f5 or cmd-f5 if you are on a Mac. Once uploaded, the license will tell you if applied. The summary page will update on the next poll or restart with your license status. That page may be cached as well.. same thing.

In the next release (currently being tested), I have fixed the page cache issues.

-Jason

efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 01:33:28 AM
Last edit: November 29, 2018, 02:24:39 AM by efudd
 #447

Folk,

For the month of December, I will be running a contest for users of the Z9 and Z9 Mini version 2.1 or later firmware. There will be one automatic entry per day per miner. On 12/24, a random machine will be selected. The Summary page on the version 2.1 firmware will be automatically updated to let you know who the winner is. Details will be posted in the thread here and on the Equihash discord.

I've created a thread to discuss this at https://bitcointalk.org/index.php?topic=5077347.0

Users will find your Summary page on the miner updating with this information over the next 24 hours automatically.

I have put details in the original post, but will copy here also.



Thank you,

Jason

waterman
Full Member
***
Offline Offline

Activity: 192
Merit: 119


★Bitvest.io★ Play Plinko or Invest!


View Profile
November 29, 2018, 02:29:23 AM
 #448

Excellent good job dude! I want that PS4  Grin

efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 02:35:42 AM
 #449

Excellent good job dude! I want that PS4  Grin

<best phone operator voice> Install now! Supplies are Limited! </phone operator voice>

I was gonna return it and then thought... wait a second!

Best of luck to you. It'll go to someone!

(I can't help but type this while reading it in a "Saul Goodman" voice).

Jason

Marchcat2008
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
November 29, 2018, 06:19:09 AM
 #450

@efudd Please read PM. Thank's.
efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 04:26:50 PM
 #451

@efudd Please read PM. Thank's.

@Marchcat2008 - Responded. AT this point in time, I am not selling new licenses. The developer supported version will remain available. If this changes in the future, I will update the original post in this thread.

Thank you,

Jason

efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 05:26:24 PM
 #452

Folk,

I wanted to get some feedback on dev-fees: Once per day, or split up throughout the day? I've had feedback from both, but am leaning towards once-per-day.

I personally think that the once-per-day has the least impact since it greatly reduces the swapping/moving things around.

Can you please share you view point on this and reasoning why?

Thank you,

Jason

Pizzi_h
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
November 29, 2018, 08:09:16 PM
 #453

Thoughts.

z9 mini
I have my miners Hosted outside, OR direct outside air, we have been having around -15c and the miner worked great got it up to stable at 681mhz since release.
2 fans front 1800rpm rear 1640 rpm Chips temp around 28-30c  hash Avg 14.9ksols

Today the weather Drastically changed to +2c and i got all 3 boards xxxx

seems that it was the bm1740_verify_nonce_integrality CRC error. Reebooted but only took like 7min then got the same error again.
But after that ive only been able to maintain 656mhz.

soon as i go above that i loose one board.

CAN it be possible that the colder the chips can be maintained the higher mhz we can maintain? I never tried above 681mhz
j.weber
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
November 29, 2018, 08:22:55 PM
 #454

Definitely once a day. Just a quick question, is there a way I can set the frequency for the different hashboards via PuTTY / the JSON?
efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 08:27:35 PM
 #455

Thoughts.

z9 mini
I have my miners Hosted outside, OR direct outside air, we have been having around -15c and the miner worked great got it up to stable at 681mhz since release.
2 fans front 1800rpm rear 1640 rpm Chips temp around 28-30c  hash Avg 14.9ksols

Today the weather Drastically changed to +2c and i got all 3 boards xxxx

seems that it was the bm1740_verify_nonce_integrality CRC error. Reebooted but only took like 7min then got the same error again.
But after that ive only been able to maintain 656mhz.

soon as i go above that i loose one board.

CAN it be possible that the colder the chips can be maintained the higher mhz we can maintain? I never tried above 681mhz

This is a very good question. First on the CRC error -- that is going to happen some and is only a problem if it is constant. It happens on even the stock firmwares depending on machines, temps, frequencies, and phase of the moon.

Temperatures will play into how far you can push these, but there is not a clear formula for that. What's really interesting is I have a customer with a large install (1000+ machines) who has observed that there is a point where the machines get too cold and slow down! I'm unsure of the exact details on the temperatures, just the observation that was shared with me.

So yes, temperature has a play both when going up and when going down.

The summers here are very hot -- my miners I had to constantly tune even through the day to get maximum out of them; they always ran best at night.

I hope this helps some.

Jason

efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 08:36:27 PM
 #456

Definitely once a day. Just a quick question, is there a way I can set the frequency for the different hashboards via PuTTY / the JSON?

Yessir, bitmain-freq1, bitmain-freq2, bitmain-freq3 are the 3 variables for that.

The only caveat is if you set the frequencies via that method the web interface will not get updated to reflect it until you go into the web interface and "save frequencies".

Jason


Pizzi_h
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
November 29, 2018, 08:37:24 PM
 #457


This is a very good question. First on the CRC error -- that is going to happen some and is only a problem if it is constant. It happens on even the stock firmwares depending on machines, temps, frequencies, and phase of the moon.

Temperatures will play into how far you can push these, but there is not a clear formula for that. What's really interesting is I have a customer with a large install (1000+ machines) who has observed that there is a point where the machines get too cold and slow down! I'm unsure of the exact details on the temperatures, just the observation that was shared with me.

So yes, temperature has a play both when going up and when going down.

The summers here are very hot -- my miners I had to constantly tune even through the day to get maximum out of them; they always ran best at night.

I hope this helps some.

Jason

I Reflashed the firmware and now i can push it higher then 656mhz. Temps now is 49-50c fans 2000 rpm

Okey. well if it starts to drop When there is to cold outside i have to split the intake air abit Smiley Thanks for that info.

Another question..

I also tried the Biggie firmware earlier, with the same Mhz it could spike to +17ksols avg was the around 14.5 i think
i got higher spikes with the "biggie" firmware then the mini.

Avg is better at the mini FW though Smiley
Good job! will buy the license when you start bringing in new ones Smiley
efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 08:42:17 PM
 #458

...snip...

I Reflashed the firmware and now i can push it higher then 656mhz. Temps now is 49-50c fans 2000 rpm

Okey. well if it starts to drop When there is to cold outside i have to split the intake air abit Smiley Thanks for that info.

Another question..

I also tried the Biggie firmware earlier, with the same Mhz it could spike to +17ksols avg was the around 14.5 i think
i got higher spikes with the "biggie" firmware then the mini.

Avg is better at the mini FW though Smiley
Good job! will buy the license when you start bringing in new ones Smiley


The spikes are gonna be completely random for what it is worth. Your miner could get really lucky on calculations for a few seconds and jump to 2x what you would otherwise expect, but the average is where the truth really sits.

I honestly am not sure I am going to sell new licenses and instead stick with the dev supported model. It actually is cheaper for users that way to be honest... it'll take 3-6 months of runtime or more for me to make up what the license fee was at 3%. It's just a lot easier on me to not manage individual licenses.

Jason

efudd (OP)
Member
**
Offline Offline

Activity: 504
Merit: 51


View Profile
November 29, 2018, 09:31:20 PM
 #459

Folk,

Due to a seriously idiotic oversight on my part, the frequency list for the Mini's max's at 700. I'll be releasing a 'b' variant this evening that corrects the issue. If you are already running the 2.1 version, you will receive a notice on your Overview page with details on the 'b' release, as well as a direct link to download it.

Thank you,

Jason

chipless
Jr. Member
*
Offline Offline

Activity: 559
Merit: 4


View Profile
November 29, 2018, 09:34:09 PM
 #460

Thoughts.

z9 mini
I have my miners Hosted outside, OR direct outside air, we have been having around -15c and the miner worked great got it up to stable at 681mhz since release.
2 fans front 1800rpm rear 1640 rpm Chips temp around 28-30c  hash Avg 14.9ksols

Today the weather Drastically changed to +2c and i got all 3 boards xxxx

seems that it was the bm1740_verify_nonce_integrality CRC error. Reebooted but only took like 7min then got the same error again.
But after that ive only been able to maintain 656mhz.

soon as i go above that i loose one board.

CAN it be possible that the colder the chips can be maintained the higher mhz we can maintain? I never tried above 681mhz

The recommended min temp is around 40c colder you may lose speed and too hot you will lose speed. The optimal temp here seems to be about 55c   They put out enough heat to keep my whole house warm. I adjust the temp by opening or closing windows.

Share your results with others on my Discord channel
https://discord.gg/6t62apJ
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!