关于本文

本文是Oracle AWR(Automatic Workload Repository)报告的完整解析指南，涵盖所有关键性能指标。建议配合实际AWR报告阅读，以深入理解各项指标的含义。

什么是AWR?

AWR (Automatic Workload Repository) 是Oracle 10g开始引入的自动工作负载存储库，用于收集和维护数据库性能统计信息。

关键知识点

AWR数据存储在SYSAUX表空间
默认快照间隔1小时
10g保存7天，11g保存8天
可通过DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS修改配置

AWR维护进程

AWR主要由MMON(Manageability Monitor Process)和它的工作进程(m00x)维护：

启动slave进程m00x做AWR快照
当某个度量阀值被超过时发出alert告警
为最近改变过的SQL对象捕获指标信息

AWR常用操作

SQL

-- 手动创建快照
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT;

-- 创建AWR基线
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE(
    start_snap_id, 
    end_snap_id, 
    baseline_name
);

-- 生成AWR报告
@?/rdbms/admin/awrrpt      -- 本实例
@?/rdbms/admin/awrrpti     -- RAC中选择实例号
@?/rdbms/admin/awrddrpt    -- AWR比对报告
@?/rdbms/admin/awrgrpt     -- RAC全局AWR

1. 报告总结 Report Summary

报告头部信息

AWR Report Header

WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
MAC           2629627371 askmaclean.com       1 22-Jan-13 16:49 11.2.0.3.0  YES

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
MAC10            AIX-Based Systems (64-bit)        128    32             320.00

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      5853 23-Jan-13 15:00:56     3,520       1.8
  End Snap:      5854 23-Jan-13 15:30:41     3,765       1.9
   Elapsed:               29.75 (mins)
   DB Time:            7,633.76 (mins)

关键概念：DB Time

DB TIME = 所有前台session花费在database调用上的总和时间

仅包含前台进程(foreground sessions)
包括CPU时间、IO Time、和其他非空闲等待时间
DB Time ≠ 响应时间

核心公式

DB TIME = DB CPU + Non-Idle Wait + Wait on CPU queue

Average Active Session (AAS) = DB Time / Elapsed Time

AAS示例解读：

DB Time	Elapsed Time	AAS	分析
60 min	60 min	1	负载一般
1 min	60 min	0.017	负载很轻
60000 min	60 min	1000	系统可能hang了

1.1内存参数大小 Cache Sizes

Cache Sizes

Cache Sizes                       Begin        End
~~~~~~~~~~~                  ---------- ----------
               Buffer Cache:    49,152M    49,152M  Std Block Size:         8K
           Shared Pool Size:    13,312M    13,312M      Log Buffer:   334,848K

注意事项

在ASMM、AMM和11gR2 MSMM下，Buffer Cache和Shared Pool Size的begin/end值可能会变化
Shared Pool一直收缩可能导致row cache lock等解析等待
Shared Pool持续grow说明原有大小不足以满足需求（可能是大量硬解析）

1.2Load Profile 负载画像

Load Profile

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):              256.6                0.2       0.07       0.03
       DB CPU(s):                3.7                0.0       0.00       0.00
       Redo size:        1,020,943.0              826.5
   Logical reads:          196,888.0              159.4
   Block changes:            6,339.4                5.1
  Physical reads:            5,076.7                4.1
 Physical writes:              379.2                0.3
      User calls:           10,157.4                8.2
          Parses:              204.0                0.2
     Hard parses:                0.9                0.0
W/A MB processed:                5.0                0.0
          Logons:                1.7                0.0
        Executes:            3,936.6                3.2
       Rollbacks:            1,126.3                0.9
    Transactions:            1,235.3

  % Blocks changed per Read:   53.49    Recursive Call %:    98.04
 Rollback per transaction %:   36.57       Rows per Sort:    73.70

Load Profile 关键指标解读

指标	说明
Redo size	单位bytes，用于估量update/insert/delete的频率。大的redo size对LGWR写日志和ARCH归档造成I/O压力
Logical Read	单位：次数×块数。逻辑读耗CPU，主频和核数都很重要。逻辑读高则DB CPU往往高，也可能看到latch: cache buffer chains等待
Block changes	单位：次数×块数，描绘数据变化频率
Physical Read	单位：次数×块数。物理读消耗IO读，体现在IOPS和吞吐量等维度上
Physical writes	主要是DBWR写datafile，也有direct path write。DBWR长期写出慢会导致log file switch(checkpoint not complete)
Parses	解析次数，包括软解析+硬解析
Hard parses	万恶之源，可能导致Cursor pin s on X, library cache: mutex X等等待。硬解析最好少于每秒20次
Executes	执行次数，反应执行频率
Transactions	每秒事务数，是数据库层的TPS

派生指标公式

% Blocks changed per Read = (block changes) / (logical reads)

Recursive Call % = (recursive calls) / (user calls)

Rollback per transaction % = (rollback) / (transactions)

Rows per Sort = (sorts(rows)) / (sorts(disk) + sorts(memory))

1.3Instance Efficiency Percentages 实例效率

Instance Efficiency

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.97       Redo NoWait %:  100.00
            Buffer  Hit   %:   97.43    In-memory Sort %:  100.00
            Library Hit   %:   99.88        Soft Parse %:   99.58
         Execute to Parse %:   94.82         Latch Hit %:   99.95
Parse CPU to Parse Elapsd %:    1.75     % Non-Parse CPU:   99.85

健康指标参考值

80%以上： %Non-Parse CPU
90%以上： Buffer Hit%, In-memory Sort%, Soft Parse%
95%以上： Library Hit%, Redo Nowait%, Buffer Nowait%
98%以上： Latch Hit%

关键效率指标详解

指标	含义	计算公式
Buffer Nowait %	session申请buffer不等待的比例	1 – sum(waitstat.wait_count) / session logical reads
Buffer Hit %	高速缓存命中率	1 – (physical reads cache) / (consistent gets + db block gets)
Library Hit %	library cache命中率	1 – (pin Requests × Pct Miss) / Sum(Pin Requests)
Soft Parse %	软解析比例	1 – parse count(hard) / parse count(total)
Execute to Parse %	执行解析比	1 – (parse count / execute count)
Latch Hit %	willing-to-wait latch不等待的比例	1 – (Sum(misses) / Sum(gets))
% Non-Parse CPU	非解析CPU比例	(DB CPU – Parse CPU) / DB CPU

关于Execute to Parse

Execute to Parse反映了执行解析比。如果soft parse%接近99%而Execute to Parse不足90%，说明执行解析比低，需要通过静态SQL、动态绑定、session_cached_cursor、open cursors等技术减少软解析。

1.4Shared Pool Statistics

Shared Pool Statistics

 Shared Pool Statistics        Begin    End
                              ------  ------
             Memory Usage %:   84.64   79.67
    % SQL with executions>1:   93.77   24.69
  % Memory for SQL w/exec>1:   85.36   34.80

指标	说明
Memory Usage %	(shared pool实时大小 – free memory) / shared pool实时大小，代表空间使用率
% SQL with executions>1	复用SQL占总SQL的比率。如果小于90%，需要关注非绑定变量SQL
% Memory for SQL w/exec>1	执行2次以上的SQL占用内存占总SQL内存的比率

SQL复用率低的解决方案

短期：ALTER SYSTEM SET CURSOR_SHARING=FORCE;

长期：修改代码使用绑定变量

注意：不推荐使用CURSOR_SHARING=SIMILAR，Oracle从11g开始已宣称废弃此选项。

1.5Top 5 Timed Events

Top 5 Timed Events

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
gc buffer busy                       79,083      73,024    923   65.4    Cluster
enq: TX - row lock contention        35,068      17,123    488   15.3 Application
CPU time                                         12,205          10.9           
gc current request                    2,714       3,315   1221    3.0    Cluster
gc cr multi block request            83,666       1,008     12    0.9    Cluster

Top 5 Events是AWR报告中最重要的部分

它指出了数据库sessions花费时间最多的等待事件。基于Wait Interface的调优是目前的主流方法。

等待类型(Wait Class)说明

Wait Class	说明
User I/O	前台进程I/O操作，如db file sequential/scattered read
System I/O	后台进程维护I/O，如control file parallel write、log file parallel write
Concurrency	并行争用类型，如latch: shared pool、latch: library cache
Cluster	RAC环境等待事件，如gc buffer busy
Application	应用造成的等待，如enq: TM/TX contention
Configuration	配置引起的等待，如log file switch、enq: SQ contention
Commit	仅log file sync
Network	网络类型，如SQL*Net more data to client

常见等待事件分析

db file scattered read

平均等待时间应小于20ms。一般表明正在做全表扫描或全索引扫描(Multi block I/O)。需要结合table scans (long tables)和index fast full scans一起分析。

db file sequential read

平均等待时间应小于20ms。单块读等待是最常见的物理IO等待事件，通常由索引读取引起。

buffer busy wait / read by other session

可能由以下操作引起：

select/select：read by other session，需要从数据文件读入buffer cache
select/update：需要在undo中重建一致性读块
update/update：热点块争用（非同一行）
insert/insert：freelist争用，考虑使用ASSM

2. 时间模型统计 Time Model Statistics

2.1Time Model Statistics

Time Model Statistics

Time Model Statistics             DB/Inst: ITSCMP/itscmp2  Snaps: 70719-70723
-> Total time in database user-calls (DB Time): 883542.2s

Statistic Name                                       Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time                            805,159.7         91.1
sequence load elapsed time                           41,159.2          4.7
DB CPU                                               20,649.1          2.3
parse time elapsed                                    1,112.8           .1
hard parse elapsed time                                 995.2           .1
hard parse (sharing criteria) elapsed time              237.3           .0
hard parse (bind mismatch) elapsed time                 227.6           .0
connection management call elapsed time                  29.7           .0
PL/SQL execution elapsed time                             9.2           .0
PL/SQL compilation elapsed time                           6.6           .0
failed parse elapsed time                                 2.0           .0

关键时间指标

指标	说明
parse time elapsed / hard parse elapsed time	结合来看解析是否是主要矛盾
sequence load elapsed time	sequence序列争用是否是问题焦点
PL/SQL execution elapsed time	纯耗费在PL/SQL解释器上的时间，不包括SQL执行和解析时间
connection management call elapsed time	建立和断开数据库session连接的耗时
failed parse elapsed time	解析失败，例如由于ORA-4031
hard parse (sharing criteria)	由于无法共享游标造成的硬解析
hard parse (bind mismatch)	由于bind type或bind size不一致造成的硬解析

注意

时间模型中的指标存在包含关系，所以Time Model Statistics加起来超过100%是正常的。

时间模型层次结构

DB time
├── DB CPU
├── connection management call elapsed time
├── sequence load elapsed time
├── sql execute elapsed time
├── parse time elapsed
│   ├── hard parse elapsed time
│   │   └── hard parse (sharing criteria) elapsed time
│   │       └── hard parse (bind mismatch) elapsed time
│   └── failed parse elapsed time
│       └── failed parse (out of shared memory) elapsed time
├── PL/SQL execution elapsed time
├── PL/SQL compilation elapsed time
└── repeated bind elapsed time

2.5Operating System Statistics

OS Statistics

Operating System Statistics         Snaps: 70719-70723

Statistic                                  Value        End Value
------------------------- ---------------------- ----------------
BUSY_TIME                              2,894,855
IDLE_TIME                              5,568,240
IOWAIT_TIME                               18,973
SYS_TIME                                 602,532
USER_TIME                              2,090,082
LOAD                                           8               13
NUM_CPUS                                      24
NUM_CPU_CORES                                 12
NUM_CPU_SOCKETS                                2
PHYSICAL_MEMORY_BYTES            101,221,343,232

统计项	描述
NUM_CPU_SOCKETS	物理CPU的数目
NUM_CPU_CORES	CPU的核数
NUM_CPUS	逻辑CPU的数目
SYS_TIME	内核态消耗的CPU时间片（1/100秒）
USER_TIME	用户态消耗的CPU时间片（1/100秒）
BUSY_TIME	SYS_TIME + USER_TIME
IDLE_TIME	空闲的CPU时间片
IOWAIT_TIME	所有CPU等待I/O完成的时间

CPU使用率计算

所有CPU总时间片 = BUSY_TIME + IDLE_TIME = ELAPSED_TIME × CPU_COUNT

% of busy CPU for Instance = (DB CPU + background cpu time) / (BUSY_TIME / 100)

% of Total CPU for Instance = (DB CPU + background cpu time) / ((BUSY_TIME + IDLE_TIME) / 100)

3. SQL Statistics SQL统计

AWR包含多种SQL统计视图，数据主要来源于DBA_HIST_SQLSTAT：

统计类型	用途
SQL ordered by Elapsed Time	按消耗时间排序，识别最耗时的SQL
SQL ordered by CPU Time	按CPU时间排序，识别CPU密集型SQL
SQL ordered by Gets	按逻辑读排序，识别需要调优的SQL
SQL ordered by Reads	按物理读排序，识别I/O密集型SQL
SQL ordered by Executions	按执行次数排序，识别高频SQL
SQL ordered by Parse Calls	按解析调用排序，识别解析开销大的SQL
SQL ordered by Version Count	识别子游标过多的SQL

SQL ordered by Gets

SQL ordered by Gets               DB/Inst: ITSCMP/itscmp2  Snaps: 70719-70723
-> Total Buffer Gets:   2,021,476,421
-> Captured SQL account for   68.2% of Total

     Buffer                 Gets              Elapsed
      Gets   Executions   per Exec   %Total   Time (s)   %CPU    %IO    SQL Id
----------- ----------- ------------ ------ ---------- ------ ------ -----------
4.61155E+08   1,864,424        247.3   22.8    4,687.8   33.0   65.7 8g6a701j83c

调优建议

对于OLTP事务操作：

单次buffer gets应小于2000
单次physical reads应小于100
平均执行时间应小于0.1秒甚至更短

4. Instance Activity Stats 实例活动统计

数据来源于DBA_HIST_SYSSTAT，每个指标代表一种数据库行为的活跃度。

IO相关重要指标

指标	含义
physical read bytes	应用造成的物理读字节数
physical read total bytes	包括RMAN备份恢复和后台任务的物理读字节数
physical read IO requests	物理读IOPS
physical write bytes	物理写吞吐量
physical write IO requests	物理写IOPS

IO指标计算

总物理吞吐量/秒 = physical read total bytes + physical write total bytes

总物理IOPS = physical read total IO requests + physical write total IO requests

5. IO统计

Tablespace IO Stats 表空间IO统计

Tablespace IO Stats

Tablespace IO Stats               DB/Inst: ITSCMP/itscmp2  Snaps: 70719-70723

Tablespace
------------------------------
                 Av       Av     Av                       Av     Buffer  Av Buf
         Reads Reads/s  Rd(ms) Blks/Rd       Writes Writes/s      Waits  Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
DATA_TS
    17,349,398   4,801     2.3     1.5      141,077       39  4,083,704     5.8
INDEX_TS
     9,193,122   2,544     2.0     1.0      238,563       66  3,158,187    46.1
UNDOTBS1
     1,582,659     438     0.7     1.0            2        0     12,431    69.0

列名	说明
Reads	物理读的次数（不是块数）
Av Reads/s	平均每秒物理读次数
Av Rd(ms)	平均每次读取延迟，应小于20ms
Av Blks/Rd	平均每次读取的块数，OLTP环境应接近1
Writes	物理写的次数
Buffer Waits	buffer busy waits + read by other session的次数
Av Buf Wt(ms)	平均buffer等待时间

6. Buffer Pool Statistics 缓冲池统计

Buffer Pool Statistics

Buffer Pool Statistics              Snaps: 70719-70723

                                                            Free   Writ   Buffer
     Number of Pool       Buffer     Physical    Physical   Buff   Comp     Busy
P      Buffers Hit%         Gets        Reads      Writes   Wait   Wait    Waits
--- ---------- ---- ------------ ------------ ----------- ------ ------ --------
D    2,259,159   98 2.005084E+09   42,753,650     560,460      0      1 8.51E+06

Pool类型	说明
D	Default buffer pool
K	Keep Pool
R	Recycle Pool
2k/4k/8k/16k/32k	各种非标准块大小的缓冲池

7. Advisory统计内存建议

Buffer Pool Advisory

帮助评估增大或减小buffer cache对物理读的影响：

使用场景

增加buffer pool：关注Size Factor > 1时Est Phys Read Factor是否显著减少
减少buffer pool：关注Size Factor减小时Est Phys Read Factor是否显著增大

PGA Advisory

关键指标

当使用Auto Memory Mgmt时，应选择Estd PGA Overalloc Count = 0的pga_aggregate_target值。

Shared Pool Advisory

评估共享池大小对library cache性能的影响。一般推荐shared pool至少有300~500 MB的free memory。

8. Wait Statistics 等待统计

Buffer Wait Statistics

Buffer Wait Statistics          Snaps: 70719-70723

Class                    Waits Total Wait Time (s)  Avg Time (ms)
------------------ ----------- ------------------- --------------
data block           8,442,041             407,259             48
undo header             16,212               1,711            106
undo block              21,023                 557             26
segment header             197                  13             66

Enqueue Activity 队列锁等待

Enqueue Type	说明
TX-Transaction (index contention)	索引块争用
TX-Transaction (row lock contention)	行锁等待
TX-Transaction (allocate ITL entry)	ITL槽位争用
TM-DML	表级锁争用
SQ-Sequence Cache	序列缓存争用
HW-Segment High Water Mark	高水位线争用

10. Latch Statistics 闩锁统计

Latch Activity

Latch Activity        Snaps: 70719-70723
                                           Pct    Avg   Wait                 Pct
                                    Get    Get   Slps   Time       NoWait NoWait
Latch Name                     Requests   Miss  /Miss    (s)     Requests   Miss
------------------------ -------------- ------ ------ ------ ------------ ------
cache buffers chains      3,365,097,866    0.4    0.0     66            0    N/A
shared pool                   9,988,637    0.0    0.1      0            0    N/A
library cache                 6,753,468    0.0    0.0      0            0    N/A

列名	说明
Get Requests	以willing-to-wait模式申请并获得latch的次数
Pct Get Miss	申请latch但需要等待的比例
Avg Slps/Miss	每次miss平均sleep的次数
NoWait Requests	以no-wait模式申请latch的次数
Pct NoWait Miss	no-wait模式申请直接失败的比例

Latch调优建议

cache buffers chains: 优化SQL减少全表扫描和逻辑读
redo copy/redo allocation: 考虑增大LOG_BUFFER
library cache: 考虑增大shared_pool_size

11. Segment Statistics 段级统计

AWR提供多种维度的段级统计：

统计类型	用途
Segments by Logical Reads	识别逻辑读最多的段
Segments by Physical Reads	识别物理读最多的段
Segments by Physical Writes	识别物理写最多的段
Segments by Row Lock Waits	识别行锁等待最多的段
Segments by ITL Waits	识别ITL等待最多的段
Segments by Buffer Busy Waits	识别buffer busy等待最多的段
Segments by GC Buffer Busy	识别RAC gc buffer busy最多的段

15. RAC相关指标

Global Cache Load Profile

指标	说明
Global Cache blocks received	通过interconnect收到远程实例数据块的数量
Global Cache blocks served	通过interconnect发送到远程实例数据块的数量
GCS/GES messages received	收到的GCS/GES消息数量，代表RAC服务开销
GCS/GES messages sent	发送的GCS/GES消息数量
DBWR Fusion writes	融合写入次数

Global Cache Efficiency Percentages

缓存访问效率

Local Cache Buffer Access Ratio = 1 - (physical reads cache + Global Cache blocks received) / Logical Reads

Remote Cache Buffer Access Ratio = Global Cache blocks received / Logical Reads

Disk Access Ratio = 1 - physical reads cache / Logical Reads

RAC性能目标

在OLTP应用中，Buffer access – local cache % 应尽可能高，因为本地缓存访问是最快的。

Buffer access – local cache % + Buffer access – remote cache % 的和应尽可能接近100%。

Global Cache Workload Characteristics

指标	正常值	说明
Avg global enqueue get time	< 20ms	获取全局队列锁的时间
Avg global cache cr block receive time	< 15ms	接收CR块的时间
Avg global cache current block receive time	< 30ms	接收Current块的时间

Interconnect Statistics