寒冬中的温暖-SUN E4500温度过高当机

上个周末,一台数据库服务器SUN E4500因为故障,温度过高导致当机,那么温度有多高呢?
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C) [ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to over-temperature condition on SBus FFB SOC+ IO board 1 [ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C) [ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled [ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C) [ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to over-temperature condition on SBus FFB SOC+ IO board 1 [ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C) [ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled [ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C) [ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to over-temperature condition on SBus FFB SOC+ IO board 1 [ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C) [ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled [ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C) [ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to over-temperature condition on SBus FFB SOC+ IO board 1 [ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C) [ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled [ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C) [ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to over-temperature condition on SBus FFB SOC+ IO board 1 [ID 470940 kern.warning] WARNING: SBus FFB SOC+ IO board 1 still too hot (temperature: 68C). Overtemp shutdown started
系统Shutdown的时候,温度达到了68度。在这寒冷的冬日里,这个温度真实太温暖了。 启动后检查,是一块IO板出了问题:
bash-2.03# /usr/platform/sun4u/sbin/prtdiag -v System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise E4500/E5500 系统时钟频率:100 MHz 内存大小:2048Mb ========================= CPUs ========================= Run Ecache CPU CPU Brd CPU Module MHz MB Impl. Mask --- --- ------- ----- ------ ------ ---- 0 0 0 400 8.0 US-II 10.0 0 1 1 400 8.0 US-II 10.0 2 4 0 400 8.0 US-II 10.0 2 5 1 400 8.0 US-II 10.0 4 8 0 400 8.0 US-II 10.0 4 9 1 400 8.0 US-II 10.0 ========================= 内存 ========================= Intrlv. Intrlv. Brd Bank MB Status Condition Speed Factor With --- ----- ---- ------- ---------- ----- ------- ------- 0 0 1024 Active OK 60ns 2-way A 2 0 1024 Active OK 60ns 2-way A ========================= IO 卡 ========================= Bus Freq Brd Type MHz Slot Name Model --- ---- ---- ---------- ---------------------------- -------------------- 1 SBus 25 0 SUNW,socal/sf (scsi-3) 501-5266 1 SBus 25 3 SUNW,hme 1 SBus 25 3 SUNW,fas/sd (block) 1 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060 1 UPA 100 2 FFB, Double Buffered SUNW,501-4790 Detached Boards =============== Slot State Type Info ---- --------- ------ ----------------------------------------- 3 failed disk Disk 0: no disk Disk 1: no disk 系统中失败的字段取代单元 (FRU): ============================================== disk-board 在 IO 板上不可用 #3 上 PROM 错误字符串:fail 失败的字段取代单元为 IO 板 3 Detected System Faults ====================== Board 1 fault: Overtemp Detected Sat Dec 16 02:24:21 2006 Unit 2 Core Power Supply failure Detected Fri Dec 15 23:24:23 2006 Unit 1 Core Power Supply failure Detected Fri Dec 15 23:24:23 2006 PROM detected failure Detected Fri Dec 15 23:24:23 2006 最近的 AC 电源故障: ============================= Fri May 27 14:53:06 2005 ========================= 环境状态 ========================= Keyswitch position is in Normal Mode System Power Status: Minimum Available System LED Status: GREEN YELLOW GREEN WARNING ON ON BLINKING Fans: ----- Unit Status ---- ------ Rack OK Key OK AC OK System Temperatures (Celsius): ------------------------------ Brd State Current Min Max Trend --- ------- ------- --- --- ----- 0 OK 39 36 43 stable 1 WARNING 66 46 67 stable 2 OK 39 36 43 stable 4 OK 53 50 55 stable CLK OK 38 37 40 stable Power Supplies: --------------- Supply Status --------- ------ 0 OK 1 FAIL 2 FAIL 3 OK PPS OK System 3.3v OK System 5.0v OK Peripheral 5.0v OK Peripheral 12v OK Auxilary 5.0v OK Peripheral 5.0v precharge OK Peripheral 12v precharge OK System 3.3v precharge OK System 5.0v precharge OK AC Power OK ========================= HW Revisions ========================= ASIC Revisions: --------------- Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type Attributes --- --- -- ----- ----- ---- ---- ---- ---------- ---------- 0 1 5 CPU 100MHz Capable 1 1 5 1 22 UPA-SBus-SOC+ 100MHz Capable 2 1 5 CPU 100MHz Capable 3 Unknown 100MHz Capable 4 1 5 CPU 100MHz Capable Board 1 FFB Hardware Configuration: ----------------------------------- Board rev: 2 FBC version: 0x3241906d DAC: Brooktree 9070, version 1 3DRAM: Mitsubishi 130b, version 2 System Board PROM revisions: ---------------------------- Board 0: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50 Board 1: FCODE 1.8.29 2001/06/18 17:26 iPOST 3.4.29 2001/06/18 17:49 Board 2: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50 Board 4: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
更郁闷的是,目前这台服务器处于关键运营时期,还不能重新启动更换硬件。 只好等下次何时Down机。 -The End-
此条目发表在 未分类 分类目录。将固定链接加入收藏夹。

评论功能已关闭。