今天检查Exadata的IB网络时,使用 infinicheck 检查,发现db节点有报错,cell节点正常。
当前主机是Exadata X5-2:
[root@dm01db01 ibdiagtools]# /opt/oracle.SupportTools/CheckHWnFWProfile -c loose [SUCCESS] The hardware and firmware matches supported profile for server=ORACLE_SERVER_X5-2 [root@dm01db01 ibdiagtools]#
infinicheck的执行结果(该命令可以有很丰富的参数,但是也可以不带任何参数,缺省就可以):
INFINICHECK [Network Connectivity, Configuration and Performance] [Version IBD VER 2.d ] Verifying User Equivalance of user=root to all hosts. (If it isn't setup correctly, an authentication prompt will appear to push keys to all the nodes) Verifying User Equivalance of user=root to all cells. (If it isn't setup correctly, an authentication prompt will appear to push keys to all the nodes) #### CONNECTIVITY TESTS #### [COMPUTE NODES -> STORAGE CELLS] (30 seconds approx.) [SUCCESS]..............Results OK [SUCCESS]....... All can talk to all storage cells Verifying Subnet Masks on all nodes [SUBNET MASKS DIFFER].....2 entries found Prechecking for uniformity of rds-tools on all nodes [SUCCESS].... rds-tools version is the same across the cluster Checking for bad links in the fabric [SUCCESS].......... No bad fabric links found [COMPUTE NODES -> COMPUTE NODES] (30 seconds approx.) [SUCCESS]..............Results OK [SUCCESS]....... All hosts can talk to all other nodes #### PERFORMANCE TESTS #### [(1) Storage Cell to Compute Node] (375 seconds approx) [ INFO ].............Performance test between 192.168.10.5 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.6 and 192.168.10.4 has been started. [ INFO ].............Performance test between 192.168.10.7 and 192.168.10.2 has been started. [ INFO ].............Performance test between 192.168.10.8 and 192.168.10.1 has been started. [ INFO ].............Performance test between 192.168.10.9 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.10 and 192.168.10.4 has been started. [CRITICAL].............192.168.10.3 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [CRITICAL].............192.168.10.4 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [CRITICAL].............192.168.10.3 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [CRITICAL].............192.168.10.4 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [(2) Every COMPUTE NODE to another COMPUTE NODE] (195 seconds approx) [ INFO ].............Performance test between 192.168.10.2 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.1 and 192.168.10.4 has been started. [CRITICAL].............192.168.10.3 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [CRITICAL].............192.168.10.4 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [(3) Every COMPUTE NODE to ALL STORAGE CELLS] (looking for SymbolErrors) (195 seconds approx) [ INFO ].............Performance test between 192.168.10.5 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.6 and 192.168.10.4 has been started. [ INFO ].............Performance test between 192.168.10.7 and 192.168.10.2 has been started. [ INFO ].............Performance test between 192.168.10.8 and 192.168.10.1 has been started. [ INFO ].............Performance test between 192.168.10.9 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.10 and 192.168.10.4 has been started. [CRITICAL].............192.168.10.3 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [CRITICAL].............192.168.10.4 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [CRITICAL].............192.168.10.3 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [CRITICAL].............192.168.10.4 rds-stress commands did not run as expected on this host.PLEASE run [./infinicheck -z] to cleanup before re-run.PLEASE ensure that user equivalence for root is setup (./infinicheck -s) Also ensure all other workloads are turned off [SUCCESS]....... No port errors found Infinicheck failures reported.. please check log files
从这里我们看到,凡是到db节点的都报错。
infinicheck命令底层是调用的rds-stress命令,例如: rds-stress -r 192.168.10.1 -p 10584
当然,除了infinicheck意外,还有其他很多检查方法,比如rds-ping(ExaWatcher和OSWatcher中调用的这个命令)。
很奇怪,为什么就db节点报错?
于是,使用infinicheck 带参数-b -g 来检查和配置一下DB节点的IB的SSH连通性:
这里我犯了个错误:这个命令需要配置IB的基于IP的SSH(root),而不是主机名
INFINICHECK [Network Connectivity, Configuration and Performance] [Version IBD VER 2.d ] ping: unknown host dm01db01-priv [FAILURE] Host dm01db01-priv is Unreachable and is excluded from testing ping: unknown host dm01db02-priv [FAILURE] Host dm01db02-priv is Unreachable and is excluded from testing Please supply Infiniband IP addresses only in cell_ib_group
这里很清晰的告诉我们,ping不通,O(∩_∩)O哈哈~,这个就好办了。
接下来,我们手工ping看看:
[root@dm01db01 ~]# ping dm01db02-priv ping: unknown host [root@dm01db01 ~]#
那么ping第2个节点的主机名试试看,证实一下是不是解析的问题:
[root@dm01db01 ~]# ping 192.168.10.3 PING 192.168.10.3 (192.168.10.3) 56(84) bytes of data. 64 bytes from 192.168.10.3: icmp_seq=1 ttl=64 time=0.026 ms 64 bytes from 192.168.10.3: icmp_seq=2 ttl=64 time=0.025 ms ^C --- 192.168.10.3 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1902ms rtt min/avg/max/mdev = 0.025/0.025/0.026/0.005 ms [root@dm01db01 ~]#
这里我们看到,果然是解析的问题。
由于IB网络是Exadata内部互联用的,因此没有在DNS解析,只在/etc/hosts中解析。
而/etc/hosts文件是由onecommand配置的(除非手工安装,否则使用了onecommand后,所有配置文件都由onecommand根据配置xml文件自动生成)
[root@dm01db01 ~]# cat /etc/hosts #### BEGIN Generated by Exadata. DO NOT MODIFY #### 127.0.0.1 localhost.localdomain localhost # 192.168.10.1 dm01db01-priv1.lunar.com dm01db01-priv1 # 192.168.10.2 dm01db01-priv2.lunar.com dm01db01-priv2 此处略去了vip和scan ip #### END Generated by Exadata #### #### BEGIN Added by Configuration Utility #### 192.168.10.1 dm01db01-priv1 dm01db01-priv1.800best.com 192.168.10.10 dm01cel03-priv2 dm01cel03-priv2.800best.com 192.168.10.2 dm01db01-priv2 dm01db01-priv2.800best.com 192.168.10.3 dm01db02-priv1 dm01db02-priv1.800best.com 192.168.10.4 dm01db02-priv2 dm01db02-priv2.800best.com 192.168.10.5 dm01cel01-priv1 dm01cel01-priv1.800best.com 192.168.10.6 dm01cel01-priv2 dm01cel01-priv2.800best.com 192.168.10.7 dm01cel02-priv1 dm01cel02-priv1.800best.com 192.168.10.8 dm01cel02-priv2 dm01cel02-priv2.800best.com 192.168.10.9 dm01cel03-priv1 dm01cel03-priv1.800best.com #### END Added by Configuration Utility #### [root@dm01db01 ~]#
从这里我们看到,IB网络的IP配置格式是错误的,正确的是:
127.0.0.1 localhost.localdomain localhost
错误的是:
192.168.10.1 dm01db01-priv1.lunar.com dm01db01-priv1
修改了上述hosts文件后,
纠正hosts文件后,发现ping主机名的问题解决了:
[root@dm01db01 ~]# ping dm01db01-priv1 PING dm01db01-priv1.800best.com (192.168.10.1) 56(84) bytes of data. 64 bytes from dm01db01-priv1.800best.com (192.168.10.1): icmp_seq=1 ttl=64 time=0.010 ms 64 bytes from dm01db01-priv1.800best.com (192.168.10.1): icmp_seq=2 ttl=64 time=0.010 ms ^C --- dm01db01-priv1.800best.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1586ms rtt min/avg/max/mdev = 0.010/0.010/0.010/0.000 ms [root@dm01db01 ~]# ping dm01db01-priv1.800best.com PING dm01db01-priv1.800best.com (192.168.10.1) 56(84) bytes of data. 64 bytes from dm01db01-priv1.800best.com (192.168.10.1): icmp_seq=1 ttl=64 time=0.011 ms 64 bytes from dm01db01-priv1.800best.com (192.168.10.1): icmp_seq=2 ttl=64 time=0.008 ms ^C --- dm01db01-priv1.800best.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1525ms rtt min/avg/max/mdev = 0.008/0.009/0.011/0.003 ms [root@dm01db01 ~]#
这里还有个问题很奇怪,cell节点的hosts文件也是错误的,但是却可以ping通,怀疑跟DNS缓存有关系:
[root@dm01cel02 ~]# cat /etc/hosts #### BEGIN Generated by Exadata. DO NOT MODIFY #### 127.0.0.1 localhost.localdomain localhost # 192.168.10.7 dm01cel02-priv1.800best.com dm01cel02-priv1 # 192.168.10.8 dm01cel02-priv2.800best.com dm01cel02-priv2 10.45.1.194 dm01cel02.800best.com dm01cel02 #### END Generated by Exadata #### #### BEGIN Added by Configuration Utility #### 192.168.10.1 dm01db01-priv1 dm01db01-priv1.800best.com 192.168.10.10 dm01cel03-priv2 dm01cel03-priv2.800best.com 192.168.10.2 dm01db01-priv2 dm01db01-priv2.800best.com 192.168.10.3 dm01db02-priv1 dm01db02-priv1.800best.com 192.168.10.4 dm01db02-priv2 dm01db02-priv2.800best.com 192.168.10.5 dm01cel01-priv1 dm01cel01-priv1.800best.com 192.168.10.6 dm01cel01-priv2 dm01cel01-priv2.800best.com 192.168.10.7 dm01cel02-priv1 dm01cel02-priv1.800best.com 192.168.10.8 dm01cel02-priv2 dm01cel02-priv2.800best.com 192.168.10.9 dm01cel03-priv1 dm01cel03-priv1.800best.com #### END Added by Configuration Utility #### [root@dm01cel02 ~]# ping dm01db01-priv1 PING dm01db01-priv1 (192.168.10.1) 56(84) bytes of data. 64 bytes from dm01db01-priv1 (192.168.10.1): icmp_seq=1 ttl=64 time=0.056 ms 64 bytes from dm01db01-priv1 (192.168.10.1): icmp_seq=2 ttl=64 time=0.064 ms 64 bytes from dm01db01-priv1 (192.168.10.1): icmp_seq=3 ttl=64 time=0.051 ms ^C --- dm01db01-priv1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2637ms rtt min/avg/max/mdev = 0.051/0.057/0.064/0.005 ms [root@dm01cel02 ~]# ping dm01db01-priv1.800best.com PING dm01db01-priv1 (192.168.10.1) 56(84) bytes of data. 64 bytes from dm01db01-priv1 (192.168.10.1): icmp_seq=1 ttl=64 time=0.043 ms 64 bytes from dm01db01-priv1 (192.168.10.1): icmp_seq=2 ttl=64 time=0.027 ms 64 bytes from dm01db01-priv1 (192.168.10.1): icmp_seq=3 ttl=64 time=0.029 ms ^C --- dm01db01-priv1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2833ms rtt min/avg/max/mdev = 0.027/0.033/0.043/0.007 ms [root@dm01cel02 ~]#
现在,再次使用infinicheck 带参数-b -g 来检查一下DB节点的IB的SSH连通性,这次没问题了(过程很长,而且无需用户输入和交互,因此不列举在这里了)。
配置完成后,测试一下:
[root@dm01db01 oracle.SupportTools]# dcli -g all_ibip_group -l root 'date' 192.168.10.1: Sun Apr 5 08:14:58 CST 2015 192.168.10.2: Sun Apr 5 08:14:58 CST 2015 192.168.10.3: Sun Apr 5 08:14:59 CST 2015 192.168.10.4: Sun Apr 5 08:14:58 CST 2015 192.168.10.5: Sun Apr 5 08:14:58 CST 2015 192.168.10.6: Sun Apr 5 08:14:59 CST 2015 192.168.10.7: Sun Apr 5 08:14:58 CST 2015 192.168.10.8: Sun Apr 5 08:14:58 CST 2015 192.168.10.9: Sun Apr 5 08:14:58 CST 2015 192.168.10.10: Sun Apr 5 08:14:58 CST 2015 [root@dm01db01 oracle.SupportTools]#
这里我们看到,每个节点有2个IP,是因为从11.2.3.3.0以后,Exadata为了增加带宽,不再绑定IB了,所有的IB私有网络类似一个大的VLAN。
下面我们再次执行一下infinicheck ,因为网络不好,infinicheck执行时间大概要5分钟以上,因此使用screen后台执行:
[root@dm01db01 ~]# screen -S lunar [root@dm01db01 ~]# date;/opt/oracle.SupportTools/ibdiagtools/infinicheck;date Sun Apr 5 08:33:13 CST 2015 INFINICHECK [Network Connectivity, Configuration and Performance] [Version IBD VER 2.d ] Verifying User Equivalance of user=root to all hosts. (If it isn't setup correctly, an authentication prompt will appear to push keys to all the nodes) Verifying User Equivalance of user=root to all cells. (If it isn't setup correctly, an authentication prompt will appear to push keys to all the nodes) #### CONNECTIVITY TESTS #### [COMPUTE NODES -> STORAGE CELLS] (30 seconds approx.) [SUCCESS]..............Results OK [SUCCESS]....... All can talk to all storage cells Verifying Subnet Masks on all nodes [SUCCESS] ......... Subnet Masks is same across the network Prechecking for uniformity of rds-tools on all nodes [SUCCESS].... rds-tools version is the same across the cluster Checking for bad links in the fabric [SUCCESS].......... No bad fabric links found [COMPUTE NODES -> COMPUTE NODES] (30 seconds approx.) [SUCCESS]..............Results OK [SUCCESS]....... All hosts can talk to all other nodes #### PERFORMANCE TESTS #### [(1) Storage Cell to Compute Node] (375 seconds approx) [ INFO ].............Performance test between 192.168.10.5 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.6 and 192.168.10.4 has been started. [ INFO ].............Performance test between 192.168.10.7 and 192.168.10.2 has been started. [ INFO ].............Performance test between 192.168.10.8 and 192.168.10.1 has been started. [ INFO ].............Performance test between 192.168.10.9 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.10 and 192.168.10.4 has been started. [SUCCESS]..............Results OK [(2) Every COMPUTE NODE to another COMPUTE NODE] (195 seconds approx) [ INFO ].............Performance test between 192.168.10.2 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.1 and 192.168.10.4 has been started. [SUCCESS]..............Results OK [(3) Every COMPUTE NODE to ALL STORAGE CELLS] (looking for SymbolErrors) (195 seconds approx) [ INFO ].............Performance test between 192.168.10.5 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.6 and 192.168.10.4 has been started. [ INFO ].............Performance test between 192.168.10.7 and 192.168.10.2 has been started. [ INFO ].............Performance test between 192.168.10.8 and 192.168.10.1 has been started. [ INFO ].............Performance test between 192.168.10.9 and 192.168.10.3 has been started. [ INFO ].............Performance test between 192.168.10.10 and 192.168.10.4 has been started. [SUCCESS]..............Results OK [SUCCESS]....... No port errors found INFINICHECK REPORTS SUCCESS FOR NETWORK CONNECTIVITY and PERFORMANCE ----------DIAGNOSTICS ----------- 6 Cell ips found: .. 192.168.10.5 | 192.168.10.6 | 192.168.10.7 | 192.168.10.8 | 192.168.10.9 | 192.168.10.10 4 Host ips found: .. 192.168.10.3 | 192.168.10.4 | 192.168.10.2 | 192.168.10.1 ########## Host to Cell Connectivity ########## Analyzing cells_conntest.log... [SUCCESS]..... All nodes can talk to all other nodes Now Analyzing Compute Node-Compute Node connectivity ########## Inter-Host Connectivity ########## Analyzing hosts_conntest.log... [SUCCESS]..... All hosts can talk to all its peers ########## Performance Diagnostics ########## ### [(1) STORAGE CELL to COMPUTE NODE ###### Analyzing perf_cells.log.* logfile(s).... --------Throughput results using rds-stress -------- 2300 MB/s and above is expected for runs on quiet machines dm01db02( 192.168.10.3 ) to dm01cel01( 192.168.10.5 ) : 3987 MB/s...OK dm01db02( 192.168.10.4 ) to dm01cel01( 192.168.10.6 ) : 4204 MB/s...OK dm01db01( 192.168.10.2 ) to dm01cel02( 192.168.10.7 ) : 3848 MB/s...OK dm01db01( 192.168.10.1 ) to dm01cel02( 192.168.10.8 ) : 3876 MB/s...OK dm01db02( 192.168.10.3 ) to dm01cel03( 192.168.10.9 ) : 3868 MB/s...OK dm01db02( 192.168.10.4 ) to dm01cel03( 192.168.10.10 ) : 3971 MB/s...OK ########## Performance Diagnostics ########## #### [(2) Every DBNODE to its PEER ###### Analyzing perf_hosts.log.* logfile(s).... --------Throughput results using rds-stress -------- 2300 MB/s and above is expected for runs on quiet machines dm01db02( 192.168.10.3 ) to dm01db01( 192.168.10.2 ) : 3958 MB/s...OK dm01db02( 192.168.10.4 ) to dm01db01( 192.168.10.1 ) : 3865 MB/s...OK ------------------------- Results are available in the file diagnostics.output Sun Apr 5 08:39:30 CST 2015 [root@dm01db01 ~]#
这里我们看到,每秒的吞吐量在3.8GB/s~4GB左右(注意是大B),很强悍,O(∩_∩)O哈哈~