Great input,
I also tested with 7 different PSU's, same results, as for the controller, I switched the boards around on the controller, and in my case, the same board has errors along the controller ports.
A thing i noted, is that whenever the pool stops responding, those errors just come up, and then when it starts receiving back jobs, everything comes back to normal (just not the HW Errors part of course), this happens like 100% of the times so far.
So i'd say it's a software bug mixed with hardware issues.
Also, then the pool stops responding, the fans spin up to their max (bad software design i'd say).
Also tried to underclock to under 15GH/s values, same issues happen.
As for the networking part, well, I have other things running on the network for months now with no issues... so I cut that one out
Let's wait and see what bitmain does about this
Hi Guys,
Just a quick followup, I've had a chance to so some more testing with 5 D3's and did the following:
1) Tested with a single APW3++ PSU on a 230V mains source
2) Tested with a 3600 Watts single rail PSU (Lab testbench PSU, so not a converted server PSU or PC PSU)
3) Tested with a power conditioner (used in high end audio setups)
4) Flashed the latest available firmware from:
https://s3.cn-north-1.amazonaws.com.cn/shop-bitmain/download/Antminer-D3-201709131713-0M.tar.gz5) Hard wire connection to a Cisco 24Port managed switch (tested on different ports with different cables)
6) Set the mining pool to Antpool
7) Inspected the hashbords of a single D3 for damaged solder points powerlanes etc, loose/missing heatsinks -> All looked fine.
Results:
- All of the D3's had the random error mode red led warning flash
- All of the D3's had the random (The red led is in sync with this message) "
read_temp_func: can't read all sensor's temperature, close PIC and need reboot!!!" message in the kernel log's.
- All of the D3's had the occasional "all x'es" on 1 or 2 hashbords, and returning to normal after a "reboot"
This seems to be what most of the contributors to this thread experienced as well.
- So it seems that it's not related to the stability of the used power supply.
- Chances that each D3 (at least in the case of the posters in this thread, that have several D3's that exhibit the exact same behavior) has 1 or 2 malfunctioning hashbords seems unlikely,
and I assume that Bitmain would notice this with their Quality Assurance tests.
That kind of leaves me with:
- Software bugs
- Controller (board) bugs
What I still want to test:
- Is the behavior the same when disconnecting 1 or 2 hashboards.
I'm hoping that Bitmain is able to sort this out with a firmware update (if it's indeed a software problem), however... They might not be inclined to do so since the rapid increase in Dash difficulty might make this
an uninteresting investment.