WHY AWR SNAPSHOTS ARE NOT GENERATED SUDDENLY?- UNABLE TO GENERATE ANY REPORT AWR/ADDM/ASH REPORT IN ORACLE DATABASE

WHY AWR SNAPSHOTS ARE NOT GENERATED SUDDENLY?- UNABLE TO GENERATE ANY REPORT AWR/ADDM/ASH REPORT IN ORACLE DATABASE

AWR snapshots are crucial for the oracle performance diagnosis report to troubleshoot any critical slowness in the database operations

If snapshots are not either fully or partially collected due to some reason, then valuable information is lost which will be really revenue bleed for any organization without any RCA or Proactive tuning. There are two issues which needs to be solved in this case,

  • AWR snapshots are not generated
  • If there is any performance issue which cannot be diagnosed due to lack of snapshots, then it is a cause of point 1
  • Alternatively, statspack can be used but enabling both at the same time needs expensive disks

Hence the following steps should be helpful to diagnose snapshot generation issues

Perform the prechecks:

1)Check the retention and snap interval details

kIsH@Xhydra<>col SNAP_INTERVAL for 999999999999
kIsH@Xhydra<>col RETENTION for 99999999999
kIsH@Xhydra<>select SYSTIMESTAMP,MOST_RECENT_SNAP_TIME,extract(HOUR from SNAP_INTERVAL) SNAP_INTERVAL,extract(DAY from RETENTION) RETENTION from wrm$_wr_control;

SYSTIMESTAMP         MOST_RECENT_SNAP_TIM SNAP_INTERVAL    RETENTION
-------------------- -------------------- ------------- ------------
07-AUG-22 06.24.30.4 07-AUG-22 06.21.40.5             1            8
19756 PM +05:30      87 PM

2)Check the lag count of snaps to get a hint on time difference between each hour. If the STATUS is 1, then either snapshot generation is stuck or failed due to some reason. FLUSH ELAPSED should not be 0 and ERROR COUNT should be 0. If both these conditions are not met, then the error should be investigated

kIsH@Xhydra<>set lines 200 pages 1000
col SNAP_ID for 99999
col HOUR for 999999999
col BEGIN_INTERVAL_TIME for a20
col FLUSH_ELAPSED for a20
select SNAP_ID,
       BEGIN_INTERVAL_TIME,
       extract(HOUR FROM BEGIN_INTERVAL_TIME) HOUR,
       (extract(HOUR FROM BEGIN_INTERVAL_TIME)) - (lag(extract(HOUR FROM BEGIN_INTERVAL_TIME)) over (order by BEGIN_INTERVAL_TIME)) LAG,
           ERROR_COUNT,
           FLUSH_ELAPSED,
           STATUS
from wrm$_snapshot
order by SNAP_ID
fetch next 10 rows only;  

SNAP_ID BEGIN_INTERVAL_TIME        HOUR        LAG ERROR_COUNT FLUSH_ELAPSED            STATUS
------- -------------------- ---------- ---------- ----------- -------------------- ----------
    304 04-AUG-22 10.35.38.0         22                      0 +00000 00:00:00.6             0
        00 PM

    305 05-AUG-22 11.31.19.0         11        -11           0 +00000 00:00:01.5             0
        00 AM

    306 05-AUG-22 11.45.11.0         11          0           0 +00000 00:00:00.1             0
        00 AM

    307 05-AUG-22 12.55.27.8         12          1           0 +00000 00:00:00.1             0
        67 PM

    308 05-AUG-22 09.57.39.0         21          9           0 +00000 00:00:00.8             0
        00 PM

    309 06-AUG-22 01.18.08.0         13         -8           0 +00000 00:00:01.8             0
        00 PM

    310 06-AUG-22 01.28.14.2         13          0           0 +00000 00:00:00.4             0
        27 PM

    311 06-AUG-22 02.55.34.5         14          1           0 +00000 00:00:00.8             0
        63 PM

    312 06-AUG-22 03.55.12.2         15          1           0 +00000 00:00:00.5             0
        13 PM

    313 06-AUG-22 04.55.26.4         16          1           0 +00000 00:00:00.1             0
        89 PM


10 rows selected.

3)Below queries can assist in getting snap details with timings

select SNAP_ID, lag(extract(HOUR FROM BEGIN_INTERVAL_TIME)) over (order by BEGIN_INTERVAL_TIME) * 60 * 60 * 24) from wrm$_snapshot order by SNAP_ID;

select SNAP_ID, (extract(HOUR FROM END_INTERVAL_TIME) - extract(HOUR FROM BEGIN_INTERVAL_TIME)) from wrm$_snapshot order by SNAP_ID;

     select count(*) TOTAL_SNAPS_TODAY
     from wrm$_snapshot
     where to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR')=trunc(sysdate)
     group by to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR');

     select 24 - count(*) SNAPS_PEND_TODAY
     from wrm$_snapshot
     where to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR')=trunc(sysdate)
     group by to_char(BEGIN_INTERVAL_TIME,'DD-MON-RR');

kIsH@Xhydra<>select min(snap_id), max(snap_id) from dba_hist_snapshot;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
         244          264

4)Below query should at least contain one row

kIsH@Xhydra<>set lines 200 pages 1000
kIsH@Xhydra<>col PLATFORM_NAME for a20
kIsH@Xhydra<>col STARTUP_TIME for a20
kIsH@Xhydra<>select STARTUP_TIME,VERSION,DB_NAME,INSTANCE_NAME,LAST_ASH_SAMPLE_ID,PLATFORM_NAME from wrm$_database_instance;

STARTUP_TIME         VERSION           DB_NAME   INSTANCE_NAME    LAST_ASH_SAMPLE_ID PLATFORM_NAME
-------------------- ----------------- --------- ---------------- ------------------ --------------------
22-JUL-22 03.49.59.0 19.0.0.0.0        TEST      test                          53924 Linux x86 64-bit
00 PM

5)Check if the database was down during the period, when the snapshot were not taken. Also check if the database is in restricted mode or open read only

kIsH@Xhydra<>select to_char(STARTUP_TIME,'dd-mm-yy hh24:mi:ss') UPTIME,open_mode from v$instance,v$database;

UPTIME            OPEN_MODE
----------------- --------------------
08-08-22 17:36:48 READ WRITE

6)If the database is in restricted mode, then enable snapshots using below procedure

exec dbms_workload_repository.control_restricted_snapshot(allow => TRUE);

6)Check for any errors in wrm$_snap_error

kIsH@Xhydra<>select * from wrm$_snap_error order by snap_id;

no rows selected

Diagnosis:

7)Enable MMON trace for both action and flush. Scrutinize the trace carefully.

kIsH@Xhydra<>alter session set tracefile_identifier='swrf';

Session altered.

kIsH@Xhydra<>alter session set "_swrf_test_action" = 28;

Session altered.

kIsH@Xhydra<>--Wait for sometime
kIsH@Xhydra<>
kIsH@Xhydra<>alter session set "_swrf_test_action" = 10;

Session altered.

kIsH@Xhydra<>--Wait for sometime
kIsH@Xhydra<>
kIsH@Xhydra<>--disable
kIsH@Xhydra<>alter session set "_swrf_test_action" = 29;

Session altered.

kIsH@Xhydra<>alter session set "_swrf_test_action" = 11;

Session altered.

8)Alternatively, trace can also be enabled in this way

# trace AWR snapshots
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 1';

Session altered.

# trace AWR purging
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 2';

Session altered.

# trace AWR SQL
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 3';

kIsH@Xhydra<>select QUALIFIER_ID1,QUALIFIER_ID2 from dba_enabled_traces;

QUALIFIER_ID1
----------------------------------------------------------------
QUALIFIER_ID2
----------------------------------------------------------------
MMON_SLAVE
Auto-Flush Slave Action

MMON_SLAVE
Remote-Flush Slave Action

Session altered.

# Turn all of them off
kIsH@Xhydra<>alter session set events 'immediate trace name awr_test level 4';

Session altered.

9)Recreate AWR workload repository by executing the below script. This will recreate the table wrm$_database_instance which will be accessed by wrm$_snapshot. Then execute a manual snapshot

wrm$_snapshot —> wrm$_database_instance

kIsH@Xhydra<>@?/rdbms/admin/catnoawr.sql

kIsH@Xhydra<>select STARTUP_TIME,VERSION,DB_NAME,INSTANCE_NAME,LAST_ASH_SAMPLE_ID,PLATFORM_NAME from wrm$_database_instance;
select STARTUP_TIME,VERSION,DB_NAME,INSTANCE_NAME,LAST_ASH_SAMPLE_ID,PLATFORM_NAME from wrm$_database_instance
                                                                                        *
ERROR at line 1:
ORA-00942: table or view does not exist

kIsH@Xhydra<>@?/rdbms/admin/catawr.sql

kIsH@Xhydra<>startup force;
ORACLE instance started.

Total System Global Area 1459614512 bytes
Fixed Size                  9134896 bytes
Variable Size             369098752 bytes
Database Buffers         1073741824 bytes
Redo Buffers                7639040 bytes
Database mounted.
Database opened.
kIsH@Xhydra<>BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT('TYPICAL');
END;
/  2    3    4

PL/SQL procedure successfully completed.

10)Check if AWR license is enabled

kIsH@Xhydra<>show parameter control_mana

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_management_pack_access       string      DIAGNOSTIC+TUNING

11)Check the statistics level value. It should be BASIC

kIsH@Xhydra<>show parameter statistics_level

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
client_statistics_level              string      TYPICAL
statistics_level                     string      TYPICAL

12)Check sysaux space availability using awrinfo report

Warning: Non Default AWR Setting!
--------------------------------------------------------------------------------
Snapshot interval is 60 minutes and Retention is 8 days


(1a) SYSAUX usage - Schema breakdown (dba_segments)
*****************************************************
|
| Total SYSAUX size                        798.1 MB ( 2% of 32,768.0 MB MAX with AUTOEXTEND ON )
|
| Schema  SYS          occupies            497.9 MB (  62.4% )
| Schema  MDSYS        occupies            201.2 MB (  25.2% )
| Schema  XDB          occupies             68.4 MB (   8.6% )
| Schema  SYSTEM       occupies             12.6 MB (   1.6% )
| Schema  WMSYS        occupies              6.6 MB (   0.8% )
| Schema  DVSYS        occupies              4.6 MB (   0.6% )
(5c) AWR snapshots -- OLDEST Non-Baselined snapshots
**********************************

      DBID  INST    SNAP_ID ENDTM             STATUS ERROR_COUNT
---------- ----- ---------- ----------------- ------ -----------
1010810948     1        304 22:45:40 (08/04)       0           0

**********************************
(6) AWR Control Settings - interval, retention
**********************************

       DBID  LSNAPID LSPLITID LSNAPTIME      LPURGETIME      FLAG INTERVAL          RETENTION         VRSN
----------- -------- -------- -------------- -------------- ----- ----------------- ----------------- ----
 1010810948      370      370 08/07 18:21:40 08/07 18:40:35     2 +00000 01:00:00.0 +00008 00:00:00.0   30

**********************************
(7a) AWR Contents - row counts for each snapshots
**********************************

   SNAP_ID  INST        ASH        SQL      SQBND      FILES      SEGST     SYSEVT
---------- ----- ---------- ---------- ---------- ---------- ---------- ----------
       321     1          0         68        224          0         66        135

13)Check the duration of flush for all AWR internal tables. Here WRH$_MVPARAMETER is the table which took 352 milliseconds which is not high but the topper.

kIsH@Xhydra<>set lines 200 pages 1000
col WRH_TABLE for a20
col TIME for a20
select tb.TABLE_NAME_KEWRTB WRH_TABLE,
       (sd.END_TIME - sd.BEGIN_TIME) TIME
from wrm$_snapshot_details sd
inner join x$KEWRTB tb on (sd.TABLE_ID = tb.TABLE_ID_KEWRTB)
order by TIME desc
fetch next 5 rows only;

WRH_TABLE            TIME
-------------------- --------------------
WRH$_MVPARAMETER     +000000000 00:00:00.
                     352

SQL Group            +000000000 00:00:00.
                     261

SQL Group            +000000000 00:00:00.
                     245

SQL Group            +000000000 00:00:00.
                     240

WRH$_MVPARAMETER     +000000000 00:00:00.
                     226

Action:

In some cases, MMON logs are filled with below message

KEWRAFC: Flush slave failed, AWR Enqueue Timeout

Disable flush of the table which consume significant amount of time

alter system set "_awr_disabled_flush_tables"='WRH$_MVPARAMETER';

If MMON traces show below message, then disable below parameter. A single database may run out of CPU resources if the server is overloaded, especially if AWR snapshots are being taken when there is significant consumption.

ORA-12751 “cpu time or run time policy violation”

alter system set "_awr_mmon_cpuusage" = false;

14)Restart MMON process by enabling restricted session and disable it immediately. Downtime is needed for this activity because only users with restricted session privilege can access the database

kIsH@Xhydra<>alter system enable restricted session;

System altered.

kIsH@Xhydra<>alter system disable restricted session;

System altered.

15)Kill MMON process from OS level. After killing MMON, immediately fresh MMON process should automatically spawn. The snapshot integrity will be broken after killing the MMON process specifically in RAC instance with random timings in each nodes. If not, then try to enable and disable restricted session again. If still MMON process is not starting, then database needs a reboot

[oracle@xhydra ~]$ ps -ef|grep mmon
oracle      3927       1  0 17:41 ?        00:00:02 ora_mmon_db9zx

16)If all the above options do not work, then reboot the database as a last resort

kIsH@Xhydra<>shu immediate
kIsH@Xhydra<>startup;

17)If the database is in multitenant mode and snapshots are not generated in pdb, then refer below post. You may also encounter “ORA-20200” during AWR report generation

http://alphaoragroup.com/2022/07/01/enable-snapshotawr-or-addm-or-ash-in-pdb-oracle/

18) Enable 10046 trace with manual snapshot generation and isolate the query which takes long time

kIsH@Xhydra<>alter session set tracefile_identifier=SNAPTRACE;

Session altered.

kIsH@Xhydra<>alter session set events '10046 trace name context forever,level 12';

Session altered.

kIsH@Xhydra<>exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

PL/SQL procedure successfully completed.

kIsH@Xhydra<>alter session set events '10046 trace name context off';

Session altered.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      439      0.01       0.01          0          0          0           0
Execute    757      0.30       0.32        323       1384       9527        7888
Fetch      753      0.06       0.07          0       3824          1        1022
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1949      0.39       0.41        323       5208       9528        8910

The _kebm_suspension_time  parameter, which by default is set to 82800 sec, controls the suspend time. In addition, _kebm_nstrikes  (default = 3) sets the threshold at which the m00N worker will be rendered inoperable due to a policy violation.

kIsH@Xhydra<>set lines 200 pages 1000
col "Hidden_parameter" for a20
col "It's Value" for a20
select x.ksppinm as Hidden_parameter,
            y.ksppstvl "It's Value"
     from
  x$ksppi x
  inner join x$ksppcv y on (x.indx = y.indx)
and
  x.ksppinm in ('_kebm_suspension_time','_kebm_nstrikes')
kIsH@Xhydra<>kIsH@Xhydra<>kIsH@Xhydra<>  2    3    4    5    6    7    8  order by x.ksppinm;

HIDDEN_PARAMETER     It's Value
-------------------- --------------------
_kebm_nstrikes       3
_kebm_suspension_tim 104400
e

Check the suspension time for MMON and reinitiate MMON

kIsH@Xhydra<>oradebug unit_test kebm_dmp_slv_attrs kewrmrfsa_
Status: 2
Flags: 0
Runtime limit: 1800
CPU time limit: 600
Violations: 0
Suspended until: 0


kIsH@Xhydra<>oradebug unit_test kebm_set_slv_attrs kewrmafsa_ retain retain retain retain 0 0
Modified attributes of kewrmafsa_ (slave id 29)

Hope this post helps and deserves a like!

Leave a Reply

Discover more from XscalibaL

Subscribe now to keep reading and get access to the full archive.

Continue reading