联系:QQ(5163721)
标题:ASM NORMAL REDUNDANCY情况下,谁完成了数据的镜像IO?
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
前几天,一些朋友讨论ASM中,如果是NORMAL redundancy磁盘组,数据的镜像是由oracle rdbms进程完成,还是由ASM的进程完成镜像的工作。
我们知道,ASM NORMAL REDUNDANCY磁盘组类似于RAID 10的操作,也就是镜像+条带划。
.
在传统架构中,oracle只负责写入一份数据,数据保护(镜像)是由存储或者RAID卡来完成的,那么在ASM中是否也是DB完成一次写入,ASM进行同步呢?
根据下面的测试,结论是:
DB的进程完成数据库中所有应用数据的IO操作,包括镜像数据的IO。而ASM进程只负责元数据(metadata extent)的维护和IO。
.
具体测试如下:
首先,我们创建一个normal redundancy的磁盘组,用来放数据库的redo,比如 +REDODG:
SQL>select GROUP_NUMBER,NAME,SECTOR_SIZE,BLOCK_SIZE,ALLOCATION_UNIT_SIZE,TYPE FROM V$ASM_DISKGROUP where name='REDODG'; GROUP_NUMBER NAME SECTOR_SIZE BLOCK_SIZE ALLOCATION_UNIT_SIZE TYPE ------------ ------------------------------ ----------- ---------- -------------------- ------ 6 REDODG 512 4096 1048576 NORMAL SQL> SQL>col path for a50 SQL>col library for a15 SQL>select GROUP_NUMBER,DISK_NUMBER,REDUNDANCY,LIBRARY,NAME,PATH from v$asm_disk WHERE GROUP_NUMBER=6; GROUP_NUMBER DISK_NUMBER REDUNDA LIBRARY NAME PATH ------------ ----------- ------- --------------- ------------------------------ -------------------------------------------------- 6 1 UNKNOWN System REDODG_0001 /dev/mapper/redolun2 6 0 UNKNOWN System REDODG_0000 /dev/mapper/redolun1 SQL>
这个两个磁盘的failure group的信息如下:
SQL>SELECT GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,LIBRARY,NAME,FAILGROUP,PATH,REPAIR_TIMER FROM V$ASM_DISK WHERE GROUP_NUMBER=6; GROUP_NUMBER DISK_NUMBER STATE REDUNDA LIBRARY NAME FAILGROUP ------------ ----------- -------- ------- --------------- ------------------------------ ------------------------------ PATH REPAIR_TIMER -------------------------------------------------- ------------ 6 1 NORMAL UNKNOWN System V5DATA_0001 V5DATA_0001 /dev/mapper/v5lun2 0 6 0 NORMAL UNKNOWN System V5DATA_0000 V5DATA_0000 /dev/mapper/v5lun1 0 SQL> [oracle@lunardb1 ~]$ ll /dev/mapper/redolun* brw-rw---- 1 oracle oinstall 253, 8 Jun 16 10:39 /dev/mapper/redolun1 brw-rw---- 1 oracle oinstall 253, 9 Jun 16 10:39 /dev/mapper/redolun2 [oracle@lunardb1 ~]$
然后,我们使用REDODG创建了9组redo log group(这套10204的RAC的redo都放在上面了):
[oracle@lunardb1 ~]$ ss SQL*Plus: Release 10.2.0.4.0 - Production on Tue Jun 16 10:37:49 2015 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options sys@LUNAR>select * from v$log; GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- 1 1 4623 52428800 1 YES INACTIVE 902491454 16-JUN-15 2 1 4621 52428800 1 YES INACTIVE 901886291 16-JUN-15 3 2 1621 52428800 1 YES INACTIVE 900432674 16-JUN-15 4 2 1624 52428800 1 NO CURRENT 902514208 16-JUN-15 5 1 4624 1073741824 1 YES INACTIVE 902511227 16-JUN-15 6 1 4625 1073741824 1 NO CURRENT 903006387 16-JUN-15 7 1 4622 1073741824 1 YES INACTIVE 901890974 16-JUN-15 8 2 1622 1073741824 1 YES INACTIVE 901661757 16-JUN-15 9 2 1623 1073741824 1 YES INACTIVE 901886509 16-JUN-15 9 rows selected. sys@LUNAR>col member for a70 sys@LUNAR>select * from v$logfile; GROUP# STATUS TYPE MEMBER IS_ ---------- ------- ------- ---------------------------------------------------------------------- --- 2 ONLINE +REDODG/lunar/onlinelog/group_2.269.855587247 NO 1 ONLINE +REDODG/lunar/onlinelog/group_1.270.855587247 NO 3 ONLINE +REDODG/lunar/onlinelog/group_3.264.855587433 NO 4 ONLINE +REDODG/lunar/onlinelog/group_4.263.855587433 NO 5 ONLINE +REDODG/lunar/onlinelog/group_5.341.855591573 NO 6 ONLINE +REDODG/lunar/onlinelog/group_6.303.855591671 NO 7 ONLINE +REDODG/lunar/onlinelog/group_7.403.855591683 NO 8 ONLINE +REDODG/lunar/onlinelog/redo08.log NO 9 ONLINE +REDODG/lunar/onlinelog/redo09.log NO 9 rows selected.
这个数据库实例的LGWR进程号为 11159:
[oracle@lunardb1 ~]$ ps -ef|grep lgwr|grep lunar oracle 11159 1 0 Mar03 ? 08:01:25 ora_lgwr_lunar1 [oracle@lunardb1 ~]$ sys@lunar>select spid from v$process where PROGRAM like '%LGWR%'; SPID ------------ 11159 sys@lunar>
现在我们使用strace跟踪一下这个进程在数据库切换日志时的动作,如果lgwr进程只写了一个设备,比如/dev/mapper/redolun1或者/dev/mapper/redolun2,那么可以再跟踪一下ASMB进程。
.
如果LGWR进程写了两个设备,即/dev/mapper/redolun2和/dev/mapper/redolun1都写入了相应的IO,那么我们可以认为,数据库的LGWR自己完成了primary extent和mirror extent的全部操作。
这也是Oracle 文档中一直说明的一点“ASM负责ASM实例的metadata的IO,而DB完成应用实际数据的IO”。
具体跟踪文件如下:
首先我们看到oracle将相同的内容 [oracle@lunardb1 ~]$ tail -f /tmp/lgwr_lunar1_strace-1.log 。。。。。。。。。。。。。。。。。。。。。。。 11159 0.000078 times(NULL) = 1336555656 11159 0.000043 pread(16, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\245K\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512 11159 0.007057 times(NULL) = 1336555657 11159 0.000081 times(NULL) = 1336555657 11159 0.000045 pread(16, "\25\302\0\0f\0\0\0\342\334\6\0\377\377\1\4\375-\0\0\3\0\2\0\0\0\0\0\0\0+"..., 16384, 586383360) = 16384 11159 0.000222 times(NULL) = 1336555657 11159 0.000077 times(NULL) = 1336555657 11159 0.000059 pwrite(17, "\1\"\0\0\1\0\0\0\27\22\0\0\0\200\246\335\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 1400898048) = 512 11159 0.005443 times(NULL) = 1336555658 11159 0.000063 times(NULL) = 1336555658 11159 0.000049 pwrite(16, "\1\"\0\0\1\0\0\0\27\22\0\0\0\200\246\335\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 1400898048) = 512 11159 0.004075 times(NULL) = 1336555658 11159 0.000098 times(NULL) = 1336555658 11159 0.000120 pread(16, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\245K\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512 11159 0.000148 times(NULL) = 1336555658 11159 0.000068 times(NULL) = 1336555658 11159 0.000044 pwrite(16, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\255\364\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512 11159 0.000472 times(NULL) = 1336555658 11159 0.000060 times(NULL) = 1336555658 11159 0.000052 pwrite(17, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\255\364\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512 11159 0.000399 times(NULL) = 1336555658 11159 0.000075 times(NULL) = 1336555658 。。。。。。。。。。。。。。。。。。。。。。。。。。。
上面的跟踪文件可以很清晰的看到,LGWR进程连续写了2分相同的数据到fd为16和17的设备上。
那么16和17是什么呢:
[oracle@lunardb1 fd]$ cd /proc/11159/fd [oracle@lunardb1 fd]$ ls -lrt total 0 lr-x------ 1 oracle oinstall 64 Jun 13 17:04 0 -> /dev/null lrwx------ 1 oracle oinstall 64 Jun 13 17:04 9 -> /u01/oracle/app/product/10.2/db_1/dbs/lkinstlunar1 (deleted) l-wx------ 1 oracle oinstall 64 Jun 13 17:04 8 -> /u01/oracle/app/admin/lunar/bdump/alert_lunar1.log lrwx------ 1 oracle oinstall 64 Jun 13 17:04 7 -> /u01/oracle/app/product/10.2/db_1/dbs/hc_lunar1.dat l-wx------ 1 oracle oinstall 64 Jun 13 17:04 6 -> /u01/oracle/app/admin/lunar/bdump/alert_lunar1.log l-wx------ 1 oracle oinstall 64 Jun 13 17:04 5 -> /u01/oracle/app/admin/lunar/udump/lunar1_ora_11099.trc lr-x------ 1 oracle oinstall 64 Jun 13 17:04 4 -> /dev/null lr-x------ 1 oracle oinstall 64 Jun 13 17:04 3 -> /dev/null l-wx------ 1 oracle oinstall 64 Jun 13 17:04 2 -> /u01/oracle/app/admin/lunar/bdump/lunar1_lgwr_11159.trc lr-x------ 1 oracle oinstall 64 Jun 13 17:04 18 -> /u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msb lrwx------ 1 oracle oinstall 64 Jun 13 17:04 17 -> /dev/mapper/redolun2 lrwx------ 1 oracle oinstall 64 Jun 13 17:04 16 -> /dev/mapper/redolun1 lrwx------ 1 oracle oinstall 64 Jun 13 17:04 15 -> socket:[32662] lrwx------ 1 oracle oinstall 64 Jun 13 17:04 14 -> /u01/oracle/app/product/10.2/db_1/dbs/hc_lunar1.dat lr-x------ 1 oracle oinstall 64 Jun 13 17:04 13 -> /u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msb lr-x------ 1 oracle oinstall 64 Jun 13 17:04 12 -> /dev/zero lr-x------ 1 oracle oinstall 64 Jun 13 17:04 11 -> /dev/zero lrwx------ 1 oracle oinstall 64 Jun 13 17:04 10 -> socket:[32659] lr-x------ 1 oracle oinstall 64 Jun 13 17:04 1 -> /dev/null [oracle@lunardb1 fd]$ [oracle@lunardb1 fd]$ ll 17 lrwx------ 1 oracle oinstall 64 Jun 13 17:04 17 -> /dev/mapper/redolun2 [oracle@lunardb1 fd]$ ll 16 lrwx------ 1 oracle oinstall 64 Jun 13 17:04 16 -> /dev/mapper/redolun1 [oracle@lunardb1 fd]$
这里看到,16和17就是redodg所使用的两个磁盘。也就是说,LGWR自己完成了primary extent和mirror extent的IO操作。
至此已经很清楚了,那么可以我们可以推断,DBWR等数据库操作也是有DB自己的进程完成了,而ASM只负责元数据的IO操作和维护。
具体的测试,有兴趣的可以自己跟踪。
上面的跟踪信息还可以看到,实际上oracle使用AIO的方式(使用io_submit,io_getevents等),定期同步控制文件的信息,仍然是写16和17两个设备。
并且通知ARCH进程进行归档操作,并在完成后,写入alert.log的过程:
。。。。。。。。。。。。。。。。。。。 11159 0.000050 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2 11159 0.000127 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 1 11159 0.000545 times(NULL) = 1336555658 11159 0.000053 io_getevents(46982646722560, 1, 1023, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}}, {600, 0}) = 1 11159 0.000048 times(NULL) = 1336555658 11159 0.000075 times(NULL) = 1336555658 11159 0.000068 pread(17, "\25\302\0\0\t\3\0\0\376\326\6\0\377\377\1\4\311N\0\0\4\2\2\0\337\7\0\0\0\0\0\0"..., 16384, 591937536) = 16384 11159 0.000230 times(NULL) = 1336555658 11159 0.000086 times(NULL) = 1336555658 11159 0.000045 pread(17, "\25\302\0\0\253\1\0\0\t\251\n\0\377\377\1\0041\330\0\0\270\35\2264\1\0\0\0\363\27\0\0"..., 16384, 588038144) = 16384 11159 0.005841 times(NULL) = 1336555659 11159 0.000094 times(NULL) = 1336555659 11159 0.000046 pread(17, "\25\302\0\0.\0\0\0)\251\n\0\377\377\1\4\356\231\0\0\0\220\1\0\24\22\0\0\2\0\0\0"..., 16384, 587300864) = 16384 11159 0.000195 times(NULL) = 1336555659 11159 0.000076 times(NULL) = 1336555659 11159 0.000099 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2 11159 0.000164 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}, {0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 2 11159 0.000329 times(NULL) = 1336555659 11159 0.000050 times(NULL) = 1336555659 11159 0.000065 times(NULL) = 1336555659 11159 0.000045 pread(17, "\25\302\0\0,\0\0\0\27\251\n\0\377\377\1\4\310Z\0\0\17\0\0\0-\246\3465\0\0\375>"..., 16384, 587268096) = 16384 11159 0.000221 times(NULL) = 1336555659 11159 0.000098 times(NULL) = 1336555659 11159 0.000054 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2 11159 0.000121 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}}, {600, 0}) = 1 11159 0.000379 times(NULL) = 1336555659 11159 0.000048 io_getevents(46982646722560, 1, 1023, {{0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 1 11159 0.000047 times(NULL) = 1336555659 11159 0.000077 times(NULL) = 1336555659 11159 0.000053 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2 11159 0.000108 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}, {0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 2 11159 0.000425 times(NULL) = 1336555659 11159 0.000038 times(NULL) = 1336555659 11159 0.000073 times(NULL) = 1336555659 11159 0.000050 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2 11159 0.000114 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 1 11159 0.000421 times(NULL) = 1336555659 11159 0.000041 io_getevents(46982646722560, 1, 1023, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}}, {600, 0}) = 1 11159 0.000047 times(NULL) = 1336555659 11159 0.000076 times(NULL) = 1336555659 11159 0.000054 io_submit(46982646722560, 2, {{0x2abafffaa6d8, 0, 1, 0, 16}, {0x2abafffaa3c8, 0, 1, 0, 17}}) = 2 11159 0.000128 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}, {0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 2 11159 0.000318 times(NULL) = 1336555659 11159 0.000038 times(NULL) = 1336555659 11159 0.000060 times(NULL) = 1336555659 11159 0.000044 pread(16, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\16\243\0\0\0\0\0\0\0\3 \n-\250\371\232"..., 16384, 581976064) = 16384 11159 0.000244 times(NULL) = 1336555660 11159 0.000067 times(NULL) = 1336555660 11159 0.000117 times(NULL) = 1336555660 11159 0.000044 times(NULL) = 1336555660 11159 0.000037 times(NULL) = 1336555660 11159 0.000343 times(NULL) = 1336555660 11159 0.000065 semctl(720901, 51, SETVAL, 0x7fff00000001) = 0 11159 0.000081 times(NULL) = 1336555660 11159 0.000053 pread(16, "\25\302\0\0f\0\0\0\342\334\6\0\377\377\1\4\375-\0\0\3\0\2\0\0\0\0\0\0\0+V"..., 16384, 586383360) = 16384 11159 0.000234 times(NULL) = 1336555660 11159 0.000062 times(NULL) = 1336555660 11159 0.000081 semctl(720901, 18, SETVAL, 0x2abb00000001) = 0 11159 0.000062 semctl(720901, 19, SETVAL, 0x2abb00000001) = 0 11159 0.000123 semctl(720901, 20, SETVAL, 0x2abb00000001) = 0 11159 0.000251 open("/proc/11356/stat", O_RDONLY) = 19 11159 0.000113 read(19, "11356 (oracle) S 1 11356 11356 0"..., 999) = 249 11159 0.000118 close(19) = 0 11159 0.000120 semctl(720901, 36, SETVAL, 0x2abb00000001) = 0 11159 0.000239 close(8) = 0 11159 0.000044 open("/u01/oracle/app/admin/lunar/bdump/alert_lunar1.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 8 11159 0.000069 writev(8, [{"Tue Jun 16 14:47:51 2015\n", 25}, {"Thread 1 advanced to log sequenc"..., 52}, {"\n", 1}], 3) = 78 11159 0.000075 times(NULL) = 1336555660 11159 0.000043 times(NULL) = 1336555660 11159 0.000053 close(8) = 0 11159 0.000053 open("/u01/oracle/app/admin/lunar/bdump/alert_lunar1.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 8 11159 0.000057 writev(8, [{" Current log# 2 seq# 4631 mem# "..., 79}, {"\n", 1}], 2) = 80 11159 0.000061 times(NULL) = 1336555660 11159 0.000043 times(NULL) = 1336555660 11159 0.000043 semtimedop(720901, 0x7fff585eeef0, 1, {1, 960000000}) = 0 11159 0.105071 times(NULL) = 1336555670 11159 0.000058 times(NULL) = 1336555670 11159 0.000102 times(NULL) = 1336555670 。。。。。。。。。。。。。。。。。。。。。。。。。。。 [oracle@lunardb1 ~]$
至此,已经完全可以得出结论,ASM的冗余操作分为两部分:
1,数据库中实际应用数据的冗余,primary extent和mirror extent都由数据库自己完成
2,ASM的元数据的镜像操作由ASM进程自己完成。