Title: S9 - Fan Error when 2+ boards hooked up
Post by: jcarpenter09 on January 22, 2018, 04:00:07 PM
Hello,
Just recently purchased a used S9, 1 board not hashing. I had the previous owner reboot the device and saw it come back to hashing on 2 cards. He had it hooked up with an APW3++ to 220v. I brought it home, hooked it up to the APW3++ on 110v, but left the non hashing card unhooked (knowing the power limitations of the APW3++ on 110v). Here's what happened:
1. I let it sit for over an hour. No hashing. Checked the kernel log, recognized both boards, but started cycling through checking the fans and said "Fatal error: fan speed too low" 2. Unhooked I/O cable for 2 boards, leaving board #1 connected. Boots up, starts hashing within 5 minutes. 3. Disconnected board #1 and connected #2, started hashing with 5 minutes. 4. Hooked both boards back up, kernel log shows fan error.
Any ideas?
Title: Re: S9 - Fan Error when 2+ boards hooked up
Post by: fanatic26 on January 22, 2018, 06:07:58 PM
Flash the device with different firmware? Try a different fan port on the controller?
Title: Re: S9 - Fan Error when 2+ boards hooked up
Post by: jcarpenter09 on January 22, 2018, 11:05:56 PM
Here is my kernel log. Booting Linux on physical CPU 0x0 Initializing cgroup subsys cpuset Linux version 3.10.31-ltsi-00003-gcf03eb9 (lzq@armdev01) (gcc version 4.7.3 20121106 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2012.11-20121123 - Linaro GCC 2012.11) ) #81 SMP Mon Apr 25 11:20:36 CST 2016 CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c5387d CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache Machine: Altera SOCFPGA, model: Altera SOCFPGA Cyclone V Memory policy: ECC disabled, Data cache writealloc On node 0 totalpages: 258048 free_area_init_node: node 0, pgdat 806e5cc0, node_mem_map 8072a000 Normal zone: 2016 pages used for memmap Normal zone: 0 pages reserved Normal zone: 258048 pages, LIFO batch:31 PERCPU: Embedded 8 pages/cpu @80f17000 s11200 r8192 d13376 u32768 pcpu-alloc: s11200 r8192 d13376 u32768 alloc=8*4096 pcpu-alloc: [0] 0 [0] 1 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 256032 Kernel command line: mem=1008M console=ttyS0,115200 root=/dev/mtdblock3 rw rootfstype=jffs2 PID hash table entries: 4096 (order: 2, 16384 bytes) Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1008MB = 1008MB total Memory: 1015844k/1015844k available, 16348k reserved, 0K highmem Virtual kernel memory layout: vector : 0xffff0000 - 0xffff1000 ( 4 kB) fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) vmalloc : 0xbf800000 - 0xff000000 (1016 MB) lowmem : 0x80000000 - 0xbf000000 (1008 MB) modules : 0x7f000000 - 0x80000000 ( 16 MB) .text : 0x80008000 - 0x8065a930 (6475 kB) .init : 0x8065b000 - 0x806adbc0 ( 331 kB) .data : 0x806ae000 - 0x806e9990 ( 239 kB) .bss : 0x806e9990 - 0x80729384 ( 255 kB) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 Hierarchical RCU implementation. NR_IRQS:16 nr_irqs:16 16 sched_clock: 32 bits at 100MHz, resolution 10ns, wraps every 42949ms Console: colour dummy device 80x30 Calibrating delay loop... 1196.85 BogoMIPS (lpj=5984256) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 512 CPU: Testing write buffer coherency: ok ftrace: allocating 17687 entries in 52 pages CPU0: thread -1, cpu 0, socket 0, mpidr 80000000 Setting up static identity map for 0x804ab220 - 0x804ab278 CPU1: failed to come online Brought up 1 CPUs SMP: Total of 1 processors activated (1196.85 BogoMIPS). CPU: All CPU(s) started in SVC mode. devtmpfs: initialized NET: Registered protocol family 16 fpga bridge driver DMA: preallocated 256 KiB pool for atomic coherent allocations L310 cache controller enabled l2x0: 8 ways, CACHE_ID 0x410030c9, AUX_CTRL 0x32460000, Cache size: 524288 B syscon fffef000.l2-cache: regmap [mem 0xfffef000-0xfffeffff] registered syscon ffd05000.rstmgr: regmap [mem 0xffd05000-0xffd05fff] registered syscon ffc25000.sdrctl: regmap [mem 0xffc25000-0xffc25fff] registered syscon ff800000.l3regs: regmap [mem 0xff800000-0xff800fff] registered syscon ffd08000.sysmgr: regmap [mem 0xffd08000-0xffd0bfff] registered hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers. hw-breakpoint: maximum watchpoint size is 4 bytes. altera_hps2fpga_bridge fpgabridge.2: fpga bridge [hps2fpga] registered as device hps2fpga altera_hps2fpga_bridge fpgabridge.2: init-val not specified altera_hps2fpga_bridge fpgabridge.3: fpga bridge [lshps2fpga] registered as device lwhps2fpga altera_hps2fpga_bridge fpgabridge.3: init-val not specified altera_hps2fpga_bridge fpgabridge.4: fpga bridge [fpga2hps] registered as device fpga2hps altera_hps2fpga_bridge fpgabridge.4: init-val not specified bio: create slab <bio-0> at 0 FPGA Mangager framework driver SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb pps_core: LinuxPPS API ver. 1 registered pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> PTP clock support registered Switching to clocksource timer0 NET: Registered protocol family 2 TCP established hash table entries: 8192 (order: 4, 65536 bytes) TCP bind hash table entries: 8192 (order: 4, 65536 bytes) TCP: Hash tables configured (established 8192 bind 8192) TCP: reno registered UDP hash table entries: 512 (order: 2, 16384 bytes) UDP-Lite hash table entries: 512 (order: 2, 16384 bytes) NET: Registered protocol family 1 RPC: Registered named UNIX socket transport module. RPC: Registered udp transport module. RPC: Registered tcp transport module. RPC: Registered tcp NFSv4.1 backchannel transport module. hw perfevents: enabled with ARMv7 Cortex-A9 PMU driver, 7 counters available arm-pmu arm-pmu: PMU:CTI successfully enabled for 1 cores NFS: Registering the id_resolver key type Key type id_resolver registered Key type id_legacy registered NTFS driver 2.1.30 [Flags: R/W]. jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc. msgmni has been set to 1984 io scheduler noop registered (default) Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled ffc02000.serial0: ttyS0 at MMIO 0xffc02000 (irq = 194) is a 16550A console [ttyS0] enabled altera_fpga_manager ff706000.fpgamgr: fpga manager [Altera FPGA Manager] registered as minor 0 brd: module loaded denali-nand-dt ff900000.nand: Dump timing register values:acc_clks: 4, re_2_we: 20, re_2_re: 20 we_2_re: 12, addr_2_data: 14, rdwr_en_lo_cnt: 2 rdwr_en_hi_cnt: 2, cs_setup_cnt: 2 ONFI param page 0 valid ONFI flash detected NAND device: Manufacturer ID: 0x2c, Chip ID: 0xda (Micron MT29F2G08ABAEAWP), 256MiB, page size: 2048, OOB size: 64 Bad block table found at page 131008, version 0x01 Bad block table found at page 130944, version 0x01 5 ofpart partitions found on MTD device denali-nand Creating 5 MTD partitions on "denali-nand": 0x000000000000-0x000001000000 : "NAND Flash Boot Area 16MB" 0x000001000000-0x000002000000 : "NAND Flash Boot Area backup1 16MB" 0x000002000000-0x000003000000 : "NAND Flash Boot Area backup2 16MB" 0x000003000000-0x00000b000000 : "NAND Flash jffs2 Root Filesystem 128MB" 0x00000b000000-0x000010000000 : "NAND Flash jffs2 Root Filesystem 80MB" dw_spi_mmio fff00000.spi: master is unqueued, this is deprecated CAN device driver interface c_can_platform ffc00000.d_can: invalid resource c_can_platform ffc00000.d_can: control memory is not used for raminit c_can_platform ffc00000.d_can: c_can_platform device registered (regs=bf8dc000, irq=163) stmmac_hw_init: 1000M stmmac - user ID: 0x10, Synopsys ID: 0x37 Ring mode enabled DMA HW capability register supported Enhanced/Alternate descriptors Enabled extended descriptors RX Checksum Offload Engine supported (type 2) TX Checksum insertion supported Enable RX Mitigation via HW Watchdog Timer libphy: stmmac: probed eth0: PHY ID 0007c0f1 at 0 IRQ POLL (stmmac-0:00) active usbcore: registered new interface driver usb-storage mousedev: PS/2 mouse device common for all mice i2c /dev entries driver Synopsys Designware Multimedia Card Interface Driver dwmmc_socfpga ff704000.dwmmc0: couldn't determine pwr-en, assuming pwr-en = 0 dwmmc_socfpga ff704000.dwmmc0: Using internal DMA controller. dwmmc_socfpga ff704000.dwmmc0: Version ID is 240a dwmmc_socfpga ff704000.dwmmc0: DW MMC controller at irq 171, 32 bit host data width, 1024 deep fifo mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, actual 396825HZ div = 63) dwmmc_socfpga ff704000.dwmmc0: 1 slots initialized ledtrig-cpu: registered to indicate activity on CPUs usbcore: registered new interface driver usbhid usbhid: USB HID core driver oprofile: using arm/armv7-ca9 TCP: cubic registered NET: Registered protocol family 10 sit: IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 NET: Registered protocol family 15 can: controller area network core (rev 20120528 abi 9) NET: Registered protocol family 29 can: raw protocol (rev 20120528) can: broadcast manager protocol (rev 20120528 t) can: netlink gateway (rev 20130117) max_hops=1 8021q: 802.1Q VLAN Support v1.8 Key type dns_resolver registered VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4 ThumbEE CPU extension supported. Registering SWP/SWPB emulation handler mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 300000Hz, actual 297619HZ div = 84) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 200000Hz, actual 200000HZ div = 125) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 100000Hz, actual 100000HZ div = 250) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, actual 396825HZ div = 63) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 300000Hz, actual 297619HZ div = 84) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 200000Hz, actual 200000HZ div = 125) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 100000Hz, actual 100000HZ div = 250) jffs2: jffs2_scan_inode_node(): CRC failed on node at 0x059937cc: Read 0xffffffff, calculated 0x3ebc8775 jffs2: Empty flash at 0x05993824 ends at 0x05994000 jffs2: jffs2_scan_inode_node(): CRC failed on node at 0x073087c4: Read 0xffffffff, calculated 0xcfaca8a3 VFS: Mounted root (jffs2 filesystem) on device 31:3. devtmpfs: mounted Freeing unused kernel memory: 328K (8065b000 - 806ad000) eth0: device MAC address 4a:67:55:e8:8d:3a init phy ok PHY DMA init OK eth0: device MAC address 00:e9:6d:16:03:f9 init phy ok PHY DMA init OK IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready libphy: stmmac-0:00 - Link is Up - 100/Full IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready In axi fpga driver! Original value in RESET_MANAGER_BASE_ADDR + BRGMODRST_ADDR is 0x0 request_mem_region OK! AXI fpga dev virtual address is 0xbf942000 *base_vir_addr = 0xc50f In fpga mem driver! request_mem_region OK! fpga mem virtual address is 0xc0000000 eth0: device MAC address 00:e9:6d:16:03:f9 init phy ok PHY DMA init OK IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready eth0: device MAC address 00:e9:6d:16:03:f9 init phy ok PHY DMA init OK IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready libphy: stmmac-0:00 - Link is Up - 100/Full IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready This is C5 board. DETECT HW version=0000c50f miner ID : 00f14388186cfa13 Miner Type = S9 AsicType = 1387 real AsicNum = 63 use critical mode to search freq... get PLUG ON=0x00000007 Find hashboard on Chain[0] Find hashboard on Chain[1] Find hashboard on Chain[2] set_reset_allhashboard = 0x0000ffff Check chain[0] PIC fw version=0x03 Check chain[1] PIC fw version=0x03 Check chain[2] PIC fw version=0x03 chain[0]: [63:22] [63:5] [63:25] [63:4] [63:5] [63:19] [63:255] [63:255] has freq in PIC, will disable freq setting. chain[0] has freq in PIC and will jump over... Chain[0] has core num in PIC Chain[0] ASIC[12] has core num=3 Chain[0] ASIC[15] has core num=2 Chain[0] ASIC[17] has core num=4 Chain[0] ASIC[43] has core num=1 Check chain[0] PIC fw version=0x03 chain[1]: [63:22] [63:5] [63:24] [63:24] [63:70] [63:0] [63:255] [63:255] has freq in PIC, will disable freq setting. chain[1] has freq in PIC and will jump over... Chain[1] has core num in PIC Chain[1] ASIC[1] has core num=1 Chain[1] ASIC[15] has core num=3 Chain[1] ASIC[38] has core num=1 Chain[1] ASIC[62] has core num=1 Check chain[1] PIC fw version=0x03 chain[2]: [63:22] [63:5] [63:24] [63:25] [63:40] [63:53] [63:255] [63:255] has freq in PIC, will disable freq setting. chain[2] has freq in PIC and will jump over... Chain[2] has core num in PIC Chain[2] ASIC[4] has core num=1 Chain[2] ASIC[12] has core num=2 Chain[2] ASIC[15] has core num=2 Chain[2] ASIC[19] has core num=1 Chain[2] ASIC[20] has core num=1 Chain[2] ASIC[21] has core num=3 Chain[2] ASIC[23] has core num=1 Chain[2] ASIC[25] has core num=1 Chain[2] ASIC[27] has core num=2 Chain[2] ASIC[28] has core num=9 Chain[2] ASIC[30] has core num=5 Chain[2] ASIC[32] has core num=5 Chain[2] ASIC[33] has core num=1 Chain[2] ASIC[35] has core num=4 Chain[2] ASIC[37] has core num=5 Chain[2] ASIC[38] has core num=1 Chain[2] ASIC[39] has core num=1 Chain[2] ASIC[41] has core num=5 Chain[2] ASIC[42] has core num=1 Chain[2] ASIC[43] has core num=1 Chain[2] ASIC[49] has core num=12 Chain[2] ASIC[50] has core num=1 Chain[2] ASIC[58] has core num=1 Chain[2] ASIC[62] has core num=7 Check chain[2] PIC fw version=0x03 get PIC voltage=74 on chain[0], value=900 get PIC voltage=108 on chain[1], value=880 get PIC voltage=6 on chain[2], value=940 set_reset_allhashboard = 0x00000000 chain[0] temp offset record: 62,-3,32,-4,0,0,0,0 chain[0] temp chip I2C addr=0x98 chain[1] temp offset record: 62,-4,32,-4,0,0,0,0 chain[1] temp chip I2C addr=0x98 chain[2] temp offset record: 62,-4,32,-7,0,0,0,0 chain[2] temp chip I2C addr=0x98 set_reset_allhashboard = 0x0000ffff set_reset_allhashboard = 0x00000000 CRC error counter=0 set command mode to VIL
--- check asic number After Get ASIC NUM CRC error counter=0 set_baud=0 The min freq=700 set real timeout 52, need sleep=379392 After TEST CRC error counter=0 set_reset_allhashboard = 0x0000ffff set_reset_allhashboard = 0x00000000 search freq for 1 times, completed chain = 3, total chain num = 3 set_reset_allhashboard = 0x0000ffff set_reset_allhashboard = 0x00000000 restart Miner chance num=2 waiting for receive_func to exit! waiting for pic heart to exit! bmminer not found= 365 root 0:00 grep bmminer
bmminer not found, restart bmminer ... This is user mode for mining This is C5 board. Miner Type = S9 Miner compile time: Tue Aug 15 11:37:46 CST 2017 type: Antminer S9set_reset_allhashboard = 0x0000ffff set_reset_allhashboard = 0x00000000 set_reset_allhashboard = 0x0000ffff miner ID : 00f14388186cfa13 set_reset_allhashboard = 0x0000ffff get fan[0] speed=1680 get fan[0] speed=1680 get fan[0] speed=1680 Checking fans!get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=1680 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4200 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 get fan[1] speed=4080 get fan[0] speed=4320 Fatal Error: some Fan lost or Fan speed low!
Title: Re: S9 - Fan Error when 2+ boards hooked up
Post by: Fujitsu on August 10, 2018, 04:50:40 AM
how did you fix this problem? I have the same problem now
get fan[3] speed=720 get fan[5] speed=4560 Fatal Error: some Fan lost or Fan speed low!
both fans work but at low RPM
Title: Re: S9 - Fan Error when 2+ boards hooked up
Post by: NotFuzzyWarm on August 10, 2018, 01:17:18 PM
No, only 1 fan is working. The one reading 720 is only spinning because of the other fan forcing air through it. Usually it is the exhaust fan that dies.
Title: Re: S9 - Fan Error when 2+ boards hooked up
Post by: tim-bc on September 02, 2018, 03:03:41 AM
No, only 1 fan is working. The one reading 720 is only spinning because of the other fan forcing air through it. Usually it is the exhaust fan that dies.
You are right, but it is not always the exhaust fan that dies. In my high corrosion area the intake fan fails maybe 4 out of 5 times. The higher numbered fan sockets are closer to the edge of the controller board. Just follow the wire from the socket to find out which fan you need to replace. (So in your case, fan[3] is lower and its socket is farther from the edge of the board).
|