WHY AWR SNAPSHOTS ARE NOT GENERATED SUDDENLY?- UNABLE TO GENERATE ANY REPORT AWR/ADDM/ASH REPORT IN ORACLE DATABASE
AWR snapshots are crucial for the oracle performance diagnosis report to troubleshoot any critical slowness in the database operations
If snapshots are not either fully or partially collected due to some reason, then valuable information is lost which will be really revenue bleed for any organization without any RCA or Proactive tuning. There are two issues which needs to be solved in this case,
- AWR snapshots are not generated
- If there is any performance issue which cannot be diagnosed due to lack of snapshots, then it is a cause of point 1
- Alternatively, statspack can be used but enabling both at the same time needs expensive disks
Hence the following steps should be helpful to diagnose snapshot generation issues
Perform the prechecks:
1)Check the retention and snap interval details
kIsH@Xhydra<>col SNAP_INTERVAL for 999999999999
kIsH@Xhydra<>col RETENTION for 99999999999
kIsH@Xhydra<>select SYSTIMESTAMP,MOST_RECENT_SNAP_TIME,extract(HOUR from SNAP_INTERVAL) SNAP_INTERVAL,extract(DAY from RETENTION) RETENTION from wrm$_wr_control;
SYSTIMESTAMP MOST_RECENT_SNAP_TIM SNAP_INTERVAL RETENTION
-------------------- -------------------- ------------- ------------
07-AUG-22 06.24.30.4 07-AUG-22 06.21.40.5 1 8
19756 PM +05:30 87 PM
2)Check the lag count of snaps to get a hint on time difference between each hour. If the STATUS is 1, then either snapshot generation is stuck or failed due to some reason. FLUSH ELAPSED should not be 0 and ERROR COUNT should be 0. If both these conditions are not met, then the error should be investigated
kIsH@Xhydra<>set lines 200 pages 1000
col SNAP_ID for 99999
col HOUR for 999999999
col BEGIN_INTERVAL_TIME for a20
col FLUSH_ELAPSED for a20
select SNAP_ID,
BEGIN_INTERVAL_TIME,
extract(HOUR FROM BEGIN_INTERVAL_TIME) HOUR,
(extract(HOUR FROM BEGIN_INTERVAL_TIME)) - (lag(extract(HOUR FROM BEGIN_INTERVAL_TIME)) over (order by BEGIN_INTERVAL_TIME)) LAG,
ERROR_COUNT,
FLUSH_ELAPSED,
STATUS
from wrm$_snapshot
order by SNAP_ID
fetch next 10 rows only;
SNAP_ID BEGIN_INTERVAL_TIME HOUR LAG ERROR_COUNT FLUSH_ELAPSED STATUS
------- -------------------- ---------- ---------- ----------- -------------------- ----------
304 04-AUG-22 10.35.38.0 22 0 +00000 00:00:00.6 0
00 PM
305 05-AUG-22 11.31.19.0 11 -11 0 +00000 00:00:01.5 0
00 AM
306 05-AUG-22 11.45.11.0 11 0 0 +00000 00:00:00.1 0
00 AM
307 05-AUG-22 12.55.27.8 12 1 0 +00000 00:00:00.1 0
67 PM
308 05-AUG-22 09.57.39.0 21 9 0 +00000 00:00:00.8 0
00 PM
309 06-AUG-22 01.18.08.0 13 -8 0 +00000 00:00:01.8 0
00 PM
310 06-AUG-22 01.28.14.2 13 0 0 +00000 00:00:00.4 0
27 PM
311 06-AUG-22 02.55.34.5 14 1 0 +00000 00:00:00.8 0
63 PM
312 06-AUG-22 03.55.12.2 15 1 0 +00000 00:00:00.5 0
13 PM
313 06-AUG-22 04.55.26.4 16 1 0 +00000 00:00:00.1 0
89 PM
10 rows selected.
3)Below queries can assist in getting snap details with timings
select SNAP_ID, lag(extract(HOUR FROM BEGIN_INTERVAL_TIME)) over (order by BEGIN_INTERVAL_TIME) * 60 * 60 * 24) from wrm$_snapshot order by SNAP_ID;
select SNAP_ID, (extract(HOUR FROM END_INTERVAL_TIME) - extract(HOUR FROM BEGIN_INTERVAL_TIME)) from wrm$_snapshot order by SNAP_ID;
select count(*) TOTAL_SNAPS_TODAY
from wrm$_snapshot
where to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR')=trunc(sysdate)
group by to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR');
select 24 - count(*) SNAPS_PEND_TODAY
from wrm$_snapshot
where to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR')=trunc(sysdate)
group by to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR');
kIsH@Xhydra<>select min(snap_id), max(snap_id) from dba_hist_snapshot;
MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
244 264
4)Below query should at least contain one row
kIsH@Xhydra<>set lines 200 pages 1000
kIsH@Xhydra<>col PLATFORM_NAME for a20
kIsH@Xhydra<>col STARTUP_TIME for a20
kIsH@Xhydra<>select STARTUP_TIME,VERSION,DB_NAME,INSTANCE_NAME,LAST_ASH_SAMPLE_ID,PLATFORM_NAME from wrm$_database_instance;
STARTUP_TIME VERSION DB_NAME INSTANCE_NAME LAST_ASH_SAMPLE_ID PLATFORM_NAME
-------------------- ----------------- --------- ---------------- ------------------ --------------------
22-JUL-22 03.49.59.0 19.0.0.0.0 TEST test 53924 Linux x86 64-bit
00 PM
5)Check if the database was down during the period, when the snapshot were not taken. Also check if the database is in restricted mode or open read only
kIsH@Xhydra<>select to_char(STARTUP_TIME,'dd-mm-yy hh24:mi:ss') UPTIME,open_mode from v$instance,v$database;
UPTIME OPEN_MODE
----------------- --------------------
08-08-22 17:36:48 READ WRITE
6)If the database is in restricted mode, then enable snapshots using below procedure
exec dbms_workload_repository.control_restricted_snapshot(allow => TRUE);
6)Check for any errors in wrm$_snap_error
kIsH@Xhydra<>select * from wrm$_snap_error order by snap_id;
no rows selected
Diagnosis:
7)Enable MMON trace for both action and flush. Scrutinize the trace carefully.
kIsH@Xhydra<>alter session set tracefile_identifier='swrf';
Session altered.
kIsH@Xhydra<>alter session set "_swrf_test_action" = 28;
Session altered.
kIsH@Xhydra<>--Wait for sometime
kIsH@Xhydra<>
kIsH@Xhydra<>alter session set "_swrf_test_action" = 10;
Session altered.
kIsH@Xhydra<>--Wait for sometime
kIsH@Xhydra<>
kIsH@Xhydra<>--disable
kIsH@Xhydra<>alter session set "_swrf_test_action" = 29;
Session altered.
kIsH@Xhydra<>alter session set "_swrf_test_action" = 11;
Session altered.
8)Alternatively, trace can also be enabled in this way
# trace AWR snapshots
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 1';
Session altered.
# trace AWR purging
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 2';
Session altered.
# trace AWR SQL
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 3';
kIsH@Xhydra<>select QUALIFIER_ID1,QUALIFIER_ID2 from dba_enabled_traces;
QUALIFIER_ID1
----------------------------------------------------------------
QUALIFIER_ID2
----------------------------------------------------------------
MMON_SLAVE
Auto-Flush Slave Action
MMON_SLAVE
Remote-Flush Slave Action
Session altered.
# Turn all of them off
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 4';
Session altered.
9)Recreate AWR workload repository by executing the below script. This will recreate the table wrm$_database_instance which will be accessed by wrm$_snapshot. Then execute a manual snapshot
wrm$_snapshot —> wrm$_database_instance
kIsH@Xhydra<>@?/rdbms/admin/catnoawr.sql
kIsH@Xhydra<>select STARTUP_TIME,VERSION,DB_NAME,INSTANCE_NAME,LAST_ASH_SAMPLE_ID,PLATFORM_NAME from wrm$_database_instance;
select STARTUP_TIME,VERSION,DB_NAME,INSTANCE_NAME,LAST_ASH_SAMPLE_ID,PLATFORM_NAME from wrm$_database_instance
*
ERROR at line 1:
ORA-00942: table or view does not exist
kIsH@Xhydra<>@?/rdbms/admin/catawr.sql
kIsH@Xhydra<>startup force;
ORACLE instance started.
Total System Global Area 1459614512 bytes
Fixed Size 9134896 bytes
Variable Size 369098752 bytes
Database Buffers 1073741824 bytes
Redo Buffers 7639040 bytes
Database mounted.
Database opened.
kIsH@Xhydra<>BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT('TYPICAL');
END;
/ 2 3 4
PL/SQL procedure successfully completed.
10)Check if AWR license is enabled
kIsH@Xhydra<>show parameter control_mana
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_management_pack_access string DIAGNOSTIC+TUNING
11)Check the statistics level value. It should be BASIC
kIsH@Xhydra<>show parameter statistics_level
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
client_statistics_level string TYPICAL
statistics_level string TYPICAL
12)Check sysaux space availability using awrinfo report
Warning: Non Default AWR Setting!
--------------------------------------------------------------------------------
Snapshot interval is 60 minutes and Retention is 8 days
(1a) SYSAUX usage - Schema breakdown (dba_segments)
*****************************************************
|
| Total SYSAUX size 798.1 MB ( 2% of 32,768.0 MB MAX with AUTOEXTEND ON )
|
| Schema SYS occupies 497.9 MB ( 62.4% )
| Schema MDSYS occupies 201.2 MB ( 25.2% )
| Schema XDB occupies 68.4 MB ( 8.6% )
| Schema SYSTEM occupies 12.6 MB ( 1.6% )
| Schema WMSYS occupies 6.6 MB ( 0.8% )
| Schema DVSYS occupies 4.6 MB ( 0.6% )
(5c) AWR snapshots -- OLDEST Non-Baselined snapshots
**********************************
DBID INST SNAP_ID ENDTM STATUS ERROR_COUNT
---------- ----- ---------- ----------------- ------ -----------
1010810948 1 304 22:45:40 (08/04) 0 0
**********************************
(6) AWR Control Settings - interval, retention
**********************************
DBID LSNAPID LSPLITID LSNAPTIME LPURGETIME FLAG INTERVAL RETENTION VRSN
----------- -------- -------- -------------- -------------- ----- ----------------- ----------------- ----
1010810948 370 370 08/07 18:21:40 08/07 18:40:35 2 +00000 01:00:00.0 +00008 00:00:00.0 30
**********************************
(7a) AWR Contents - row counts for each snapshots
**********************************
SNAP_ID INST ASH SQL SQBND FILES SEGST SYSEVT
---------- ----- ---------- ---------- ---------- ---------- ---------- ----------
321 1 0 68 224 0 66 135
13)Check the duration of flush for all AWR internal tables. Here WRH$_MVPARAMETER is the table which took 352 milliseconds which is not high but the topper.
kIsH@Xhydra<>set lines 200 pages 1000
col WRH_TABLE for a20
col TIME for a20
select tb.TABLE_NAME_KEWRTB WRH_TABLE,
(sd.END_TIME - sd.BEGIN_TIME) TIME
from wrm$_snapshot_details sd
inner join x$KEWRTB tb on (sd.TABLE_ID = tb.TABLE_ID_KEWRTB)
order by TIME desc
fetch next 5 rows only;
WRH_TABLE TIME
-------------------- --------------------
WRH$_MVPARAMETER +000000000 00:00:00.
352
SQL Group +000000000 00:00:00.
261
SQL Group +000000000 00:00:00.
245
SQL Group +000000000 00:00:00.
240
WRH$_MVPARAMETER +000000000 00:00:00.
226
Action:
In some cases, MMON logs are filled with below message
KEWRAFC: Flush slave failed, AWR Enqueue Timeout
Disable flush of the table which consume significant amount of time
alter system set "_awr_disabled_flush_tables"='WRH$_MVPARAMETER';
If MMON traces show below message, then disable below parameter. A single database may run out of CPU resources if the server is overloaded, especially if AWR snapshots are being taken when there is significant consumption.
ORA-12751 “cpu time or run time policy violation”
alter system set "_awr_mmon_cpuusage" = false;
14)Restart MMON process by enabling restricted session and disable it immediately. Downtime is needed for this activity because only users with restricted session privilege can access the database
kIsH@Xhydra<>alter system enable restricted session;
System altered.
kIsH@Xhydra<>alter system disable restricted session;
System altered.
15)Kill MMON process from OS level. After killing MMON, immediately fresh MMON process should automatically spawn. The snapshot integrity will be broken after killing the MMON process specifically in RAC instance with random timings in each nodes. If not, then try to enable and disable restricted session again. If still MMON process is not starting, then database needs a reboot
[oracle@xhydra ~]$ ps -ef|grep mmon
oracle 3927 1 0 17:41 ? 00:00:02 ora_mmon_db9zx
16)If all the above options do not work, then reboot the database as a last resort
kIsH@Xhydra<>shu immediate
kIsH@Xhydra<>startup;
17)If the database is in multitenant mode and snapshots are not generated in pdb, then refer below post. You may also encounter “ORA-20200” during AWR report generation
http://alphaoragroup.com/2022/07/01/enable-snapshotawr-or-addm-or-ash-in-pdb-oracle/
18) Enable 10046 trace with manual snapshot generation and isolate the query which takes long time
kIsH@Xhydra<>alter session set tracefile_identifier=SNAPTRACE;
Session altered.
kIsH@Xhydra<>alter session set events '10046 trace name context forever,level 12';
Session altered.
kIsH@Xhydra<>exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
PL/SQL procedure successfully completed.
kIsH@Xhydra<>alter session set events '10046 trace name context off';
Session altered.
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 439 0.01 0.01 0 0 0 0
Execute 757 0.30 0.32 323 1384 9527 7888
Fetch 753 0.06 0.07 0 3824 1 1022
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1949 0.39 0.41 323 5208 9528 8910
The _kebm_suspension_time parameter, which by default is set to 82800 sec, controls the suspend time. In addition, _kebm_nstrikes (default = 3) sets the threshold at which the m00N worker will be rendered inoperable due to a policy violation.
kIsH@Xhydra<>set lines 200 pages 1000
col "Hidden_parameter" for a20
col "It's Value" for a20
select x.ksppinm as Hidden_parameter,
y.ksppstvl "It's Value"
from
x$ksppi x
inner join x$ksppcv y on (x.indx = y.indx)
and
x.ksppinm in ('_kebm_suspension_time','_kebm_nstrikes')
kIsH@Xhydra<>kIsH@Xhydra<>kIsH@Xhydra<> 2 3 4 5 6 7 8 order by x.ksppinm;
HIDDEN_PARAMETER It's Value
-------------------- --------------------
_kebm_nstrikes 3
_kebm_suspension_tim 104400
e
Check the suspension time for MMON and reinitiate MMON
kIsH@Xhydra<>oradebug unit_test kebm_dmp_slv_attrs kewrmrfsa_
Status: 2
Flags: 0
Runtime limit: 1800
CPU time limit: 600
Violations: 0
Suspended until: 0
kIsH@Xhydra<>oradebug unit_test kebm_set_slv_attrs kewrmafsa_ retain retain retain retain 0 0
Modified attributes of kewrmafsa_ (slave id 29)
Hope this post helps and deserves a like!