问题:生产库RAC其中一节点无法自动生产快照
查看Mmonslave进程m000的trace文件,出现以下报错:
*** KEWROCISTMTEXEC - encountered error: (ORA-12751: cpu time or run time policy
violation)
*** SQLSTR: total-len=295, dump-len=240,
STR={insertinto WRH$_SERVICE_STAT(snap_id, dbid,instance_number,service_name_hash,stat_id, value) select:snap_id, :dbid,:instance_number,stat.service_name_hash, stat.stat_id, stat.value fromv$active_services asvc, v$service_st}
DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
CPU time exceeded 300 seconds
手动执行awr快照很慢,约10分钟,查看活动会话,awr快照会话在执行以下SQL:
insert into WRH$_SERVICE_STAT
(snap_id, dbid, instance_number, service_name_hash, stat_id, value)
select
stat.service_name_hash,
stat.stat_id,
stat.value
from v$active_services asvc, v$service_stats stat
where asvc.name_hash = stat.service_name_hash
顺理查看该SQL为何执行缓慢,发现v$service_stats视图中记录数超多。
联想到在做expdp备份时,alert日志中会出现修改service name的片段。
expdp每次备份开始,都会新增一个service name,备份结束后会去掉该service name,该动作会记录在alert log中:
Tue Apr 26 18:15:48 2016
ALTER SYSTEM SET service_names='***','***','***','SYS$SYS.KUPC$C_2_20160426181545.db' SCOPE=MEMORY SID='db2';
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$C_2_20160426181545.db','***','***','***','SYS$SYS.KUPC$S_2_20160426181545.db' SCOPE=MEMORY SID='db2';
Tue Apr 26 18:15:49 2016
DM00 started with pid=782, OS id=5830, job SYSTEM.SYS_EXPORT_TABLE_02
Tue Apr 26 18:15:52 2016
DW00 started with pid=943, OS id=5832, wid=1, job SYSTEM.SYS_EXPORT_TABLE_02
Tue Apr 26 18:15:59 2016
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$S_2_20160426181545.db','***','***','***' SCOPE=MEMORY SID='db2';
ALTER SYSTEM SET service_names='***','***','***' SCOPE=MEMORY SID='db2';
这个动作就会导致v$service_stats 视图出现很多unknown的记录
expdp的问题相关资料:
Bug 10421418 : UNKNOWN SERVICE_NAME IN V$SERVICE_STATS EVERY TIME CREATE/DELETE SERVICES
On RAC, expdp Removes the Service Name [ID 1269319.1]
Bug 11927911 : THE AWR SNAPSHOT IS NOT OBTAINED
Bug 5974572 : WRH$_SERVICE_STAT CAN GROW VERY LARGE ON RAC
通过以上现象和mos资料以及测试,可以基本判定原因:
1 数据泵bug导致每次逻辑导出时,会在v$service_stats视图中增加56条service_name=unknow的记录,由于在该节点的每日数据泵备份任务,导致v$service_stats视图中累积存储了大量unknow service name的记录(如上图所见库中已有25W条);
2 AWR快照生成过程中在执行上述SQL时,由于fixed table统计信息不准确或者尚无统计信息,oracle选择了效率较低的执行计划,SQL的执行消耗大量时间,导致oracle维护任务cpu time policy violation,AWR快照生成中断。
针对AWR快照中间sql执行过慢的解决办法:手动收集fixed table的统计信息(约执行14min),执行计划改变,效率提升。
--Before 12c, the fixed stats statistics were not gathered automatically
Begin
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS(no_invalidate => false);
end;
原执行计划:
新执行计划:
收集fixed table的统计信息后,AWR快照可以正常生成: