关于本文本文是Oracle AWR(Automatic Workload Repository)报告的完整解析指南,涵盖所有关键性能指标。建议配合实际AWR报告阅读,以深入理解各项指标的含义。
什么是AWR?
AWR (Automatic Workload Repository) 是Oracle 10g开始引入的自动工作负载存储库,用于收集和维护数据库性能统计信息。
关键知识点- AWR数据存储在SYSAUX表空间
- 默认快照间隔1小时
- 10g保存7天,11g保存8天
- 可通过
DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS修改配置
AWR维护进程
AWR主要由MMON(Manageability Monitor Process)和它的工作进程(m00x)维护:
- 启动slave进程m00x做AWR快照
- 当某个度量阀值被超过时发出alert告警
- 为最近改变过的SQL对象捕获指标信息
AWR常用操作
-- 手动创建快照
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT;
-- 创建AWR基线
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE(
start_snap_id,
end_snap_id,
baseline_name
);
-- 生成AWR报告
@?/rdbms/admin/awrrpt -- 本实例
@?/rdbms/admin/awrrpti -- RAC中选择实例号
@?/rdbms/admin/awrddrpt -- AWR比对报告
@?/rdbms/admin/awrgrpt -- RAC全局AWR
1. 报告总结 Report Summary
报告头部信息
WORKLOAD REPOSITORY report for
DB Name DB Id Instance Inst Num Startup Time Release RAC
------------ ----------- ------------ -------- --------------- ----------- ---
MAC 2629627371 askmaclean.com 1 22-Jan-13 16:49 11.2.0.3.0 YES
Host Name Platform CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
MAC10 AIX-Based Systems (64-bit) 128 32 320.00
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 5853 23-Jan-13 15:00:56 3,520 1.8
End Snap: 5854 23-Jan-13 15:30:41 3,765 1.9
Elapsed: 29.75 (mins)
DB Time: 7,633.76 (mins)
关键概念:DB TimeDB TIME = 所有前台session花费在database调用上的总和时间
- 仅包含前台进程(foreground sessions)
- 包括CPU时间、IO Time、和其他非空闲等待时间
- DB Time ≠ 响应时间
核心公式 DB TIME = DB CPU + Non-Idle Wait + Wait on CPU queue
Average Active Session (AAS) = DB Time / Elapsed Time
AAS示例解读:
| DB Time | Elapsed Time | AAS | 分析 |
|---|---|---|---|
| 60 min | 60 min | 1 | 负载一般 |
| 1 min | 60 min | 0.017 | 负载很轻 |
| 60000 min | 60 min | 1000 | 系统可能hang了 |
1.1内存参数大小 Cache Sizes
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 49,152M 49,152M Std Block Size: 8K
Shared Pool Size: 13,312M 13,312M Log Buffer: 334,848K
注意事项- 在ASMM、AMM和11gR2 MSMM下,Buffer Cache和Shared Pool Size的begin/end值可能会变化
- Shared Pool一直收缩可能导致row cache lock等解析等待
- Shared Pool持续grow说明原有大小不足以满足需求(可能是大量硬解析)
1.2Load Profile 负载画像
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
DB Time(s): 256.6 0.2 0.07 0.03
DB CPU(s): 3.7 0.0 0.00 0.00
Redo size: 1,020,943.0 826.5
Logical reads: 196,888.0 159.4
Block changes: 6,339.4 5.1
Physical reads: 5,076.7 4.1
Physical writes: 379.2 0.3
User calls: 10,157.4 8.2
Parses: 204.0 0.2
Hard parses: 0.9 0.0
W/A MB processed: 5.0 0.0
Logons: 1.7 0.0
Executes: 3,936.6 3.2
Rollbacks: 1,126.3 0.9
Transactions: 1,235.3
% Blocks changed per Read: 53.49 Recursive Call %: 98.04
Rollback per transaction %: 36.57 Rows per Sort: 73.70
Load Profile 关键指标解读
| 指标 | 说明 |
|---|---|
| Redo size | 单位bytes,用于估量update/insert/delete的频率。大的redo size对LGWR写日志和ARCH归档造成I/O压力 |
| Logical Read | 单位:次数×块数。逻辑读耗CPU,主频和核数都很重要。逻辑读高则DB CPU往往高,也可能看到latch: cache buffer chains等待 |
| Block changes | 单位:次数×块数,描绘数据变化频率 |
| Physical Read | 单位:次数×块数。物理读消耗IO读,体现在IOPS和吞吐量等维度上 |
| Physical writes | 主要是DBWR写datafile,也有direct path write。DBWR长期写出慢会导致log file switch(checkpoint not complete) |
| Parses | 解析次数,包括软解析+硬解析 |
| Hard parses | 万恶之源,可能导致Cursor pin s on X, library cache: mutex X等等待。硬解析最好少于每秒20次 |
| Executes | 执行次数,反应执行频率 |
| Transactions | 每秒事务数,是数据库层的TPS |
派生指标公式 % Blocks changed per Read = (block changes) / (logical reads)
Recursive Call % = (recursive calls) / (user calls)
Rollback per transaction % = (rollback) / (transactions)
Rows per Sort = (sorts(rows)) / (sorts(disk) + sorts(memory))
1.3Instance Efficiency Percentages 实例效率
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.97 Redo NoWait %: 100.00
Buffer Hit %: 97.43 In-memory Sort %: 100.00
Library Hit %: 99.88 Soft Parse %: 99.58
Execute to Parse %: 94.82 Latch Hit %: 99.95
Parse CPU to Parse Elapsd %: 1.75 % Non-Parse CPU: 99.85
健康指标参考值- 80%以上: %Non-Parse CPU
- 90%以上: Buffer Hit%, In-memory Sort%, Soft Parse%
- 95%以上: Library Hit%, Redo Nowait%, Buffer Nowait%
- 98%以上: Latch Hit%
关键效率指标详解
| 指标 | 含义 | 计算公式 |
|---|---|---|
| Buffer Nowait % | session申请buffer不等待的比例 | 1 – sum(waitstat.wait_count) / session logical reads |
| Buffer Hit % | 高速缓存命中率 | 1 – (physical reads cache) / (consistent gets + db block gets) |
| Library Hit % | library cache命中率 | 1 – (pin Requests × Pct Miss) / Sum(Pin Requests) |
| Soft Parse % | 软解析比例 | 1 – parse count(hard) / parse count(total) |
| Execute to Parse % | 执行解析比 | 1 – (parse count / execute count) |
| Latch Hit % | willing-to-wait latch不等待的比例 | 1 – (Sum(misses) / Sum(gets)) |
| % Non-Parse CPU | 非解析CPU比例 | (DB CPU – Parse CPU) / DB CPU |
关于Execute to ParseExecute to Parse反映了执行解析比。如果soft parse%接近99%而Execute to Parse不足90%,说明执行解析比低,需要通过静态SQL、动态绑定、session_cached_cursor、open cursors等技术减少软解析。
1.4Shared Pool Statistics
Shared Pool Statistics Begin End
------ ------
Memory Usage %: 84.64 79.67
% SQL with executions>1: 93.77 24.69
% Memory for SQL w/exec>1: 85.36 34.80
| 指标 | 说明 |
|---|---|
| Memory Usage % | (shared pool实时大小 – free memory) / shared pool实时大小,代表空间使用率 |
| % SQL with executions>1 | 复用SQL占总SQL的比率。如果小于90%,需要关注非绑定变量SQL |
| % Memory for SQL w/exec>1 | 执行2次以上的SQL占用内存占总SQL内存的比率 |
SQL复用率低的解决方案短期:ALTER SYSTEM SET CURSOR_SHARING=FORCE;
长期:修改代码使用绑定变量
注意:不推荐使用CURSOR_SHARING=SIMILAR,Oracle从11g开始已宣称废弃此选项。
1.5Top 5 Timed Events
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
gc buffer busy 79,083 73,024 923 65.4 Cluster
enq: TX - row lock contention 35,068 17,123 488 15.3 Application
CPU time 12,205 10.9
gc current request 2,714 3,315 1221 3.0 Cluster
gc cr multi block request 83,666 1,008 12 0.9 Cluster
Top 5 Events是AWR报告中最重要的部分它指出了数据库sessions花费时间最多的等待事件。基于Wait Interface的调优是目前的主流方法。
等待类型(Wait Class)说明
| Wait Class | 说明 |
|---|---|
| User I/O | 前台进程I/O操作,如db file sequential/scattered read |
| System I/O | 后台进程维护I/O,如control file parallel write、log file parallel write |
| Concurrency | 并行争用类型,如latch: shared pool、latch: library cache |
| Cluster | RAC环境等待事件,如gc buffer busy |
| Application | 应用造成的等待,如enq: TM/TX contention |
| Configuration | 配置引起的等待,如log file switch、enq: SQ contention |
| Commit | 仅log file sync |
| Network | 网络类型,如SQL*Net more data to client |
常见等待事件分析
db file scattered read平均等待时间应小于20ms。一般表明正在做全表扫描或全索引扫描(Multi block I/O)。需要结合table scans (long tables)和index fast full scans一起分析。
db file sequential read平均等待时间应小于20ms。单块读等待是最常见的物理IO等待事件,通常由索引读取引起。
buffer busy wait / read by other session可能由以下操作引起:
- select/select:read by other session,需要从数据文件读入buffer cache
- select/update:需要在undo中重建一致性读块
- update/update:热点块争用(非同一行)
- insert/insert:freelist争用,考虑使用ASSM
2. 时间模型统计 Time Model Statistics
2.1Time Model Statistics
Time Model Statistics DB/Inst: ITSCMP/itscmp2 Snaps: 70719-70723
-> Total time in database user-calls (DB Time): 883542.2s
Statistic Name Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time 805,159.7 91.1
sequence load elapsed time 41,159.2 4.7
DB CPU 20,649.1 2.3
parse time elapsed 1,112.8 .1
hard parse elapsed time 995.2 .1
hard parse (sharing criteria) elapsed time 237.3 .0
hard parse (bind mismatch) elapsed time 227.6 .0
connection management call elapsed time 29.7 .0
PL/SQL execution elapsed time 9.2 .0
PL/SQL compilation elapsed time 6.6 .0
failed parse elapsed time 2.0 .0
关键时间指标
| 指标 | 说明 |
|---|---|
| parse time elapsed / hard parse elapsed time | 结合来看解析是否是主要矛盾 |
| sequence load elapsed time | sequence序列争用是否是问题焦点 |
| PL/SQL execution elapsed time | 纯耗费在PL/SQL解释器上的时间,不包括SQL执行和解析时间 |
| connection management call elapsed time | 建立和断开数据库session连接的耗时 |
| failed parse elapsed time | 解析失败,例如由于ORA-4031 |
| hard parse (sharing criteria) | 由于无法共享游标造成的硬解析 |
| hard parse (bind mismatch) | 由于bind type或bind size不一致造成的硬解析 |
注意时间模型中的指标存在包含关系,所以Time Model Statistics加起来超过100%是正常的。
时间模型层次结构DB time
├── DB CPU
├── connection management call elapsed time
├── sequence load elapsed time
├── sql execute elapsed time
├── parse time elapsed
│ ├── hard parse elapsed time
│ │ └── hard parse (sharing criteria) elapsed time
│ │ └── hard parse (bind mismatch) elapsed time
│ └── failed parse elapsed time
│ └── failed parse (out of shared memory) elapsed time
├── PL/SQL execution elapsed time
├── PL/SQL compilation elapsed time
└── repeated bind elapsed time
2.5Operating System Statistics
Operating System Statistics Snaps: 70719-70723
Statistic Value End Value
------------------------- ---------------------- ----------------
BUSY_TIME 2,894,855
IDLE_TIME 5,568,240
IOWAIT_TIME 18,973
SYS_TIME 602,532
USER_TIME 2,090,082
LOAD 8 13
NUM_CPUS 24
NUM_CPU_CORES 12
NUM_CPU_SOCKETS 2
PHYSICAL_MEMORY_BYTES 101,221,343,232
| 统计项 | 描述 |
|---|---|
| NUM_CPU_SOCKETS | 物理CPU的数目 |
| NUM_CPU_CORES | CPU的核数 |
| NUM_CPUS | 逻辑CPU的数目 |
| SYS_TIME | 内核态消耗的CPU时间片(1/100秒) |
| USER_TIME | 用户态消耗的CPU时间片(1/100秒) |
| BUSY_TIME | SYS_TIME + USER_TIME |
| IDLE_TIME | 空闲的CPU时间片 |
| IOWAIT_TIME | 所有CPU等待I/O完成的时间 |
CPU使用率计算 所有CPU总时间片 = BUSY_TIME + IDLE_TIME = ELAPSED_TIME × CPU_COUNT
% of busy CPU for Instance = (DB CPU + background cpu time) / (BUSY_TIME / 100)
% of Total CPU for Instance = (DB CPU + background cpu time) / ((BUSY_TIME + IDLE_TIME) / 100)
3. SQL Statistics SQL统计
AWR包含多种SQL统计视图,数据主要来源于DBA_HIST_SQLSTAT:
| 统计类型 | 用途 |
|---|---|
| SQL ordered by Elapsed Time | 按消耗时间排序,识别最耗时的SQL |
| SQL ordered by CPU Time | 按CPU时间排序,识别CPU密集型SQL |
| SQL ordered by Gets | 按逻辑读排序,识别需要调优的SQL |
| SQL ordered by Reads | 按物理读排序,识别I/O密集型SQL |
| SQL ordered by Executions | 按执行次数排序,识别高频SQL |
| SQL ordered by Parse Calls | 按解析调用排序,识别解析开销大的SQL |
| SQL ordered by Version Count | 识别子游标过多的SQL |
SQL ordered by Gets DB/Inst: ITSCMP/itscmp2 Snaps: 70719-70723
-> Total Buffer Gets: 2,021,476,421
-> Captured SQL account for 68.2% of Total
Buffer Gets Elapsed
Gets Executions per Exec %Total Time (s) %CPU %IO SQL Id
----------- ----------- ------------ ------ ---------- ------ ------ -----------
4.61155E+08 1,864,424 247.3 22.8 4,687.8 33.0 65.7 8g6a701j83c
调优建议对于OLTP事务操作:
- 单次buffer gets应小于2000
- 单次physical reads应小于100
- 平均执行时间应小于0.1秒甚至更短
4. Instance Activity Stats 实例活动统计
数据来源于DBA_HIST_SYSSTAT,每个指标代表一种数据库行为的活跃度。
IO相关重要指标
| 指标 | 含义 |
|---|---|
| physical read bytes | 应用造成的物理读字节数 |
| physical read total bytes | 包括RMAN备份恢复和后台任务的物理读字节数 |
| physical read IO requests | 物理读IOPS |
| physical write bytes | 物理写吞吐量 |
| physical write IO requests | 物理写IOPS |
IO指标计算 总物理吞吐量/秒 = physical read total bytes + physical write total bytes
总物理IOPS = physical read total IO requests + physical write total IO requests
5. IO统计
Tablespace IO Stats 表空间IO统计
Tablespace IO Stats DB/Inst: ITSCMP/itscmp2 Snaps: 70719-70723
Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
DATA_TS
17,349,398 4,801 2.3 1.5 141,077 39 4,083,704 5.8
INDEX_TS
9,193,122 2,544 2.0 1.0 238,563 66 3,158,187 46.1
UNDOTBS1
1,582,659 438 0.7 1.0 2 0 12,431 69.0
| 列名 | 说明 |
|---|---|
| Reads | 物理读的次数(不是块数) |
| Av Reads/s | 平均每秒物理读次数 |
| Av Rd(ms) | 平均每次读取延迟,应小于20ms |
| Av Blks/Rd | 平均每次读取的块数,OLTP环境应接近1 |
| Writes | 物理写的次数 |
| Buffer Waits | buffer busy waits + read by other session的次数 |
| Av Buf Wt(ms) | 平均buffer等待时间 |
6. Buffer Pool Statistics 缓冲池统计
Buffer Pool Statistics Snaps: 70719-70723
Free Writ Buffer
Number of Pool Buffer Physical Physical Buff Comp Busy
P Buffers Hit% Gets Reads Writes Wait Wait Waits
--- ---------- ---- ------------ ------------ ----------- ------ ------ --------
D 2,259,159 98 2.005084E+09 42,753,650 560,460 0 1 8.51E+06
| Pool类型 | 说明 |
|---|---|
| D | Default buffer pool |
| K | Keep Pool |
| R | Recycle Pool |
| 2k/4k/8k/16k/32k | 各种非标准块大小的缓冲池 |
7. Advisory统计 内存建议
Buffer Pool Advisory
帮助评估增大或减小buffer cache对物理读的影响:
使用场景- 增加buffer pool:关注Size Factor > 1时Est Phys Read Factor是否显著减少
- 减少buffer pool:关注Size Factor减小时Est Phys Read Factor是否显著增大
PGA Advisory
关键指标当使用Auto Memory Mgmt时,应选择Estd PGA Overalloc Count = 0的pga_aggregate_target值。
Shared Pool Advisory
评估共享池大小对library cache性能的影响。一般推荐shared pool至少有300~500 MB的free memory。
8. Wait Statistics 等待统计
Buffer Wait Statistics
Buffer Wait Statistics Snaps: 70719-70723
Class Waits Total Wait Time (s) Avg Time (ms)
------------------ ----------- ------------------- --------------
data block 8,442,041 407,259 48
undo header 16,212 1,711 106
undo block 21,023 557 26
segment header 197 13 66
Enqueue Activity 队列锁等待
| Enqueue Type | 说明 |
|---|---|
| TX-Transaction (index contention) | 索引块争用 |
| TX-Transaction (row lock contention) | 行锁等待 |
| TX-Transaction (allocate ITL entry) | ITL槽位争用 |
| TM-DML | 表级锁争用 |
| SQ-Sequence Cache | 序列缓存争用 |
| HW-Segment High Water Mark | 高水位线争用 |
10. Latch Statistics 闩锁统计
Latch Activity Snaps: 70719-70723
Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Name Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
cache buffers chains 3,365,097,866 0.4 0.0 66 0 N/A
shared pool 9,988,637 0.0 0.1 0 0 N/A
library cache 6,753,468 0.0 0.0 0 0 N/A
| 列名 | 说明 |
|---|---|
| Get Requests | 以willing-to-wait模式申请并获得latch的次数 |
| Pct Get Miss | 申请latch但需要等待的比例 |
| Avg Slps/Miss | 每次miss平均sleep的次数 |
| NoWait Requests | 以no-wait模式申请latch的次数 |
| Pct NoWait Miss | no-wait模式申请直接失败的比例 |
Latch调优建议- cache buffers chains: 优化SQL减少全表扫描和逻辑读
- redo copy/redo allocation: 考虑增大LOG_BUFFER
- library cache: 考虑增大shared_pool_size
11. Segment Statistics 段级统计
AWR提供多种维度的段级统计:
| 统计类型 | 用途 |
|---|---|
| Segments by Logical Reads | 识别逻辑读最多的段 |
| Segments by Physical Reads | 识别物理读最多的段 |
| Segments by Physical Writes | 识别物理写最多的段 |
| Segments by Row Lock Waits | 识别行锁等待最多的段 |
| Segments by ITL Waits | 识别ITL等待最多的段 |
| Segments by Buffer Busy Waits | 识别buffer busy等待最多的段 |
| Segments by GC Buffer Busy | 识别RAC gc buffer busy最多的段 |
15. RAC相关指标
Global Cache Load Profile
| 指标 | 说明 |
|---|---|
| Global Cache blocks received | 通过interconnect收到远程实例数据块的数量 |
| Global Cache blocks served | 通过interconnect发送到远程实例数据块的数量 |
| GCS/GES messages received | 收到的GCS/GES消息数量,代表RAC服务开销 |
| GCS/GES messages sent | 发送的GCS/GES消息数量 |
| DBWR Fusion writes | 融合写入次数 |
Global Cache Efficiency Percentages
缓存访问效率 Local Cache Buffer Access Ratio = 1 - (physical reads cache + Global Cache blocks received) / Logical Reads
Remote Cache Buffer Access Ratio = Global Cache blocks received / Logical Reads
Disk Access Ratio = 1 - physical reads cache / Logical Reads
RAC性能目标在OLTP应用中,Buffer access – local cache % 应尽可能高,因为本地缓存访问是最快的。
Buffer access – local cache % + Buffer access – remote cache % 的和应尽可能接近100%。
Global Cache Workload Characteristics
| 指标 | 正常值 | 说明 |
|---|---|---|
| Avg global enqueue get time | < 20ms | 获取全局队列锁的时间 |
| Avg global cache cr block receive time | < 15ms | 接收CR块的时间 |
| Avg global cache current block receive time | < 30ms | 接收Current块的时间 |
Interconnect Statistics
关键指标- Avg message sent queue time on ksxp (ms): 直接反映网络延迟,应小于1ms
- % of flow controlled messages: 应小于1%
参考资源- 开Oracle调优鹰眼,深入理解AWR性能报告
- AWR鹰眼第二讲
- Oracle官方文档:Statistics Descriptions
- MOS文档:1466035.1, 1523048.1

