FASTER RECOVERY AFTER CRASH FAILURE OR FASTER RESPONSE TIME BEFORE CRASH FAILURE?

FASTER RECOVERY AFTER CRASH FAILURE OR FASTER RESPONSE TIME BEFORE CRASH FAILURE?

Recovery is the process which makes use of recorded redo logs to rebuild the data blocks from failed state automatically.

There are two types of recovery

  • Instance (RAC)
  • Crash (Standalone)

Graceful shutdown of database(shutdown normal or immediate) initiates a check point during close operation to flush the dirty blocks from buffer cache to datafiles.

Abnormal shutdown or shutdown abort causes the instance to immediately close without any checkpoint. During this stage, when the instance is started again, the database has to go through two phases,

  • Redo recovery(Cache) – Roll forward the redo’s blocks from online redo logs to failed data blocks
  • Undo recovery (Transactions) Discard the uncommitted transactions using undo

Checkpoints:

Frequent check point can speed up the recovery of instance but there is a trade off of performance degradation during high updates. So, determine the type of database and configure accordingly. For example: If a database needs to achieve tier 1 datacenter goals with 99.7 % of uptime like banking , finance , payment transaction etc.., then it is essential for the system to be up always. In this case, configuring MTTR fast recovery can trade off the performance. This outweighs the benefits of using fast recovery for 1% crash to 99% performance.

But if the business or application does not bother about performance but about faster recovery after a failure which crashes too frequently like commodity hardware, then consider to set the MTTR(mean time to recover).

All the dirty blocks which has SCN less than CKPT SCN are flushed to disk by DBWR during a check point, when the database is up and running fine.

During a crash, only the SCN which are greater than CKPT SCN are applied to the data blocks as part of recovery process

Configure the redo log size in such a way that, there are 4-5 switches per hour or every 20 minutes switch in an hour.

There can be misconception that LGWR gets performance issues due to redo size as it has connection to redo log files.

But that’s not the case. Size of the log files indirectly influence the performance of the CKPT and DBWR but not LGWR.

CKPT performance depends on the order of magnitude of redo size and fast_start_mttr_target parameter.

If redo size is very small and fast_start_mttr_target value is also less, then CKPT accompanies the group of DBWR to behave in a lazy manner 🙂

CKPT actually work fast to frequently mark the SCN, write the data to disk and maintain the SLA of recovery time for “once in a blue moon” crash failure.

This is the reason, the size of redo logs should not be small that add extra overhead and cause overlapping checkpoints

Check if the MTTR advisor is enabled by default in the database. fast_start_mttr_target causes incremental checkpoints to achieve the goal of faster recovery in seconds. Normally, during a conventional checkpoint, DBWR has to lazily pour the data to datafile till checkpoint scn in the form of bulk load which causes sudden burst of IO spike. This is the main reason for accumulation of more pending redo to be applied during crash recovery.

Below parameter should be enabled

kIsH@STAX<>show parameter statistics_l

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
client_statistics_level              string      TYPICAL
statistics_level                     string      TYPICAL

kIsH@STAX<>show parameter MTTR

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_mttr_target               integer     0

kIsH@STAX<>set lines 200 pages 1000
kIsH@STAX<>SELECT RECOVERY_ESTIMATED_IOS,ACTUAL_REDO_BLKS,TARGET_REDO_BLKS,TARGET_MTTR,ESTIMATED_MTTR,OPTIMAL_LOGFILE_SIZE from v$instance_recovery;

RECOVERY_ESTIMATED_IOS ACTUAL_REDO_BLKS TARGET_REDO_BLKS TARGET_MTTR ESTIMATED_MTTR OPTIMAL_LOGFILE_SIZE
---------------------- ---------------- ---------------- ----------- -------------- --------------------
                    46              110           663552           0             11

Below bold parameter’s should be disabled during enablement of MTTR.

kIsH@STAX<>show parameter log_check

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_checkpoint_interval              integer     0
log_checkpoint_timeout               integer     1800
log_checkpoints_to_alert             boolean     FALSE

kIsH@STAX<>show parameter fast_start

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_io_target                 integer     0
fast_start_mttr_target               integer     1
fast_start_parallel_rollback         string      LOW

Take a backup of the existing data using RMAN

RMAN> run
2> {
3> backup as compressed backupset database plus archivelog delete input;
4> backup current controlfile;
5> }

TEST 1:

Simulate the issue by abruptly closing the instance

kIsH@STAX<>conn kish/password
Connected.
kIsH@STAX<>insert into lob2 select * from lob1;

100000 rows created.
--Another session
kIsH@STAX<>UPDATE ztbl SET city='None' WHERE xid between 1000000 and 10000000;

1000000 rows updated.

kIsH@STAX<>conn / as sysdba
Connected.
kIsH@STAX<>shu abort
ORACLE instance shut down.

Thread 1 cannot allocate new log, sequence 131
Checkpoint not complete <===== Happens due to small redo log file
  Current log# 1 seq# 130 mem# 0: /u01/app/oracle/oradata/EXHYDRA/onlinelog/o1_mf_1_kryzocqy_.log
  Current log# 1 seq# 130 mem# 1: /apps01/RECO/EXHYDRA/onlinelog/o1_mf_1_kryzod3g_.log

2023-01-31T21:20:18.178373+05:30
Thread 1 advanced to log sequence 131 (LGWR switch),  current SCN: 8024014
  Current log# 2 seq# 131 mem# 0: /u01/app/oracle/oradata/EXHYDRA/onlinelog/o1_mf_2_kryzocrs_.log
  Current log# 2 seq# 131 mem# 1: /apps01/RECO/EXHYDRA/onlinelog/o1_mf_2_kryzod2h_.log
2023-01-31T21:20:18.633187+05:30
ARC1 (PID:10306): Archived Log entry 118 added for T-1.S-130 ID 0x28507ac1 LAD:1
2023-01-31T21:20:31.056345+05:30
Shutting down ORACLE instance (abort) (OS id: 33052)
2023-01-31T21:20:31.066008+05:30
Shutdown is initiated by sqlplus@staxhost (TNS V1-V3).

Alert log messages record the crash recovery from the check point SCN

2023-01-31T21:42:51.987587+05:30
ALTER DATABASE OPEN
Smart fusion block transfer is disabled:
  instance mounted in exclusive mode.
2023-01-31T21:42:51.999312+05:30
Crash Recovery excluding pdb 2 which was cleanly closed.
2023-01-31T21:42:52.003721+05:30
Beginning crash recovery of 1 threads
 Thread 1: Recovery starting at checkpoint rba (logseq 129 block 189187), scn 0 <======= Check point 
2023-01-31T21:42:52.006353+05:30
Started redo scan
2023-01-31T21:42:52.382964+05:30
Completed redo scan
 read 331728 KB redo, 18875 data blocks need recovery
2023-01-31T21:42:52.592817+05:30
Started redo application at
 Thread 1: logseq 129, block 189187, offset 0
2023-01-31T21:42:52.593587+05:30
Recovery of Online Redo Log: Thread 1 Group 3 Seq 129 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/EXHYDRA/onlinelog/o1_mf_3_kryzodxp_.log
  Mem# 1: /apps01/RECO/EXHYDRA/onlinelog/o1_mf_3_kryzof5n_.log
2023-01-31T21:42:53.138252+05:30
Recovery of Online Redo Log: Thread 1 Group 1 Seq 130 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/EXHYDRA/onlinelog/o1_mf_1_kryzocqy_.log
  Mem# 1: /apps01/RECO/EXHYDRA/onlinelog/o1_mf_1_kryzod3g_.log
2023-01-31T21:42:54.598711+05:30
Recovery of Online Redo Log: Thread 1 Group 2 Seq 131 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/EXHYDRA/onlinelog/o1_mf_2_kryzocrs_.log
  Mem# 1: /apps01/RECO/EXHYDRA/onlinelog/o1_mf_2_kryzod2h_.log
2023-01-31T21:42:54.763982+05:30
Completed redo application of 267.93MB
2023-01-31T21:42:54.794460+05:30
Completed crash recovery at
 Thread 1: RBA 131.59771.16, nab 59771, scn 0x00000000007a6feb
 18875 data blocks read, 18875 data blocks written, 331728 redo k-bytes read

RBA information and other recovery information can be queries in the below tables

kIsH@STAX<>select * from X$KCTICW;

ADDR                   INDX    INST_ID     CON_ID MTTR_WRITES LOGFILE_SIZE_WRITES CKPT_SETTING_WRITES OTHER_WRITES AUTO_WRITES FULL_WRITES
---------------- ---------- ---------- ---------- ----------- ------------------- ------------------- ------------ ----------- -----------
00007F52C45654C0          0          1          0           0                   0                   0            0        3400           0
kIsH@STAX<>SELECT ADDR,INDX,LOGFILESZ,TOTALLOGSZ,TARGET_RBA_SEQ from x$targetrba;

ADDR                   INDX  LOGFILESZ TOTALLOGSZ TARGET_RBA_SEQ
---------------- ---------- ---------- ---------- --------------
00007F52C458FAD8          0     663552    1228800              0

kIsH@STAX<>set lines 200 pages 1000
kIsH@STAX<>col NAME for a20
kIsH@STAX<>SELECT NAME,THREAD#,SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE# from v$archived_log WHERE SEQUENCE#=129;

NAME                    THREAD#  SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
-------------------- ---------- ---------- ------------- ------------
                              1        129       8023673      8023842

Sequence# is the numeric representation of order of the redo log file during a log switch from one to another and SCN is the change number which records every block change information into header of control file and datafile to maintain consistency.

200 MB size each of redo log file is not sufficient for the redo data.

kIsH@STAX<>SELECT GROUP#,SEQUENCE#,(BYTES/1048576) MB,STATUS from v$log;

    GROUP#  SEQUENCE#         MB STATUS
---------- ---------- ---------- ----------------
         1        133        200 INACTIVE
         2        134        200 INACTIVE
         3        135        200 CURRENT

There are times at when 14 log switches are observed in an hour for the insert and update. So log file size should be increased. So, redo size should be calculated as 60/14 = 4.28 ~ 4 which means every 4 minutes there is a log switch. Redo log size should be 20/4=5 times larger than current size to make it switch for every 20 minutes which is 5 * 200MB = 1 GB.

SQL QUERY TO FIND LOG SWITCH PER HOUR HISTORY IN ORACLE – eXhydra (alphaoragroup.com)

DAY  H0  H1  H2  H3  H4   H5   H6   H7   H8   H9  H10  H11  H12  H13  H14  H15  H16  H17  H18  H19  H20  H21  H22  H23
--- --- --- --- --- --- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
SUN   0   0   0   0   0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
MON   0   0   0   0   0    0    0    0    0    0    0    0    0    0    0    0    0    2    0    0    0    0    0    0
TUE   0   0   0   0   0    0    0    0    0    0    0    0    0    0    3    0    2    5   14    0    7   11    0   14
WED   0   0   0   0   0    0    1   10    0    0    2    0    0    0    4    4    1    7    9    0    1    4    3    0
THU   0   0   0   0   0    0    0    0    0    0    0    0    0    0    3    1    0    4    9    0    0    0    0    0
FRI   0   0   0   0   0    0    0    0    0    0    0    0    0    0    1    0    1    0    0    0    0    4    1    0
SAT   0   0   0   0   0    0    0    0    0    0    0    1    1    0    3    0    0    0    0    0    0    0    0    0

7 rows selected.

Last check point SCN is 8023673 and 8024043 is the overtaken SCN recorded in redo logs. So 80240438023673 = 370 are the remaining data which need to be applied to the data blocks during crash recovery. All the dirty blocks below 8023673 are already flushed into datafile.

Thread 1 checkpoint: logseq 129, block 2, scn 0x00000000007a6e79 <=====(8023673) Check point SCN
kctdrsrt: on-disk rba 131.59771.0
  cache-low rba: logseq 129, block 189187
    on-disk rba: logseq 131, block 59771, scn 0x00000000007a6feb <========(8024043) Higher SCN than check point which need to be applied from redo log file for crash recovery
Synchronized sync destination connectivity mask for thread 1 is 0x0
Using largest hash table of 262144 entries (2Mb), htbl_size = 2097232, max_alloc = 1073729536

*** 2023-01-31T21:42:52.377955+05:30 (CDB$ROOT(1))
Started writing zeroblks thread 1 seq 131 blocks 59771-59778

*** 2023-01-31T21:42:52.380670+05:30 (CDB$ROOT(1))
Completed writing zeroblks thread 1 seq 131
==== Redo read statistics for thread 1 ====
Total physical reads (from disk and memory): 334612Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 331729Kb in 0.37s => 875.55 Mb/sec
Longest record: 118Kb, moves: 0/1213186 (0%)
Change moves: 19/22 (86%), moved: 0Mb
Longest LWN: 2521Kb, moves: 80/285 (28%), moved: 94Mb
Last redo scn: 0x00000000007a6fea (8024042) <======
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 18875/18875 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 2421574/2452540 = 1.0
----------------------------------------------

*** 2023-01-31T21:42:52.388997+05:30 (CDB$ROOT(1))
KCRA: start recovery claims for 18875 data blocks

*** 2023-01-31T21:42:52.592614+05:30 (CDB$ROOT(1))
KCRA: blocks processed = 18875/18875, claimed = 18875, eliminated = 0

*** 2023-01-31T21:42:52.593610+05:30 (CDB$ROOT(1))
Recovery of Online Redo Log: Thread 1 Group 3 Seq 129 Reading mem 0

*** 2023-01-31T21:42:53.138270+05:30 (CDB$ROOT(1))
Recovery of Online Redo Log: Thread 1 Group 1 Seq 130 Reading mem 0
*** 2023-01-31T21:42:54.598730+05:30 (CDB$ROOT(1))
Recovery of Online Redo Log: Thread 1 Group 2 Seq 131 Reading mem 0

*** 2023-01-31T21:42:54.763804+05:30 (CDB$ROOT(1))
Completed redo application of 267.93MB

*** 2023-01-31T21:42:54.794153+05:30 (CDB$ROOT(1))
Completed recovery checkpoint

TEST 2:

Let us test the MTTR parameter with value 1 second

kIsH@STAX<>ALTER SYSTEM SET fast_start_mttr_target=1;

System altered.

kIsH@STAX<>SELECT OPTIMAL_LOGFILE_SIZE from v$instance_recovery;

OPTIMAL_LOGFILE_SIZE
--------------------
                 977

Optimal log size is recommended as 977 MB by MTTR advisor. Instead of increasing the exact log size to exact value, let us start with 600MB and add additional members gradually rather than sudden high increase.

kIsH@STAX<>ALTER database add logfile '/u01/app/oracle/oradata/EXHYDRA/onlinelog/redo01.log' size 600M;

Database altered.

kIsH@STAX<>ALTER database add logfile '/u01/app/oracle/oradata/EXHYDRA/onlinelog/redo02.log' size 600M;

Database altered.

kIsH@STAX<>ALTER database add logfile GROUP 6 '/apps01/RECO/EXHYDRA/onlinelog/redo01.log' size 600M;

Database altered.

kIsH@STAX<>ALTER database add logfile GROUP 7 '/apps01/RECO/EXHYDRA/onlinelog/redo02.log' size 600M;

Database altered.

kIsH@STAX<>ALTER database add logfile GROUP 8 '/u01/app/oracle/oradata/EXHYDRA/onlinelog/redo03.log' size 600M;

Database altered.

kIsH@STAX<>ALTER database add logfile GROUP 9 '/u01/app/oracle/oradata/EXHYDRA/onlinelog/redo05.log' size 600M;

Database altered.

kIsH@STAX<>alter database drop logfile group 1;

Database altered.

kIsH@STAX<>alter database drop logfile group 2;

Database altered.

kIsH@STAX<>alter database drop logfile group 3;

Database altered.
kIsH@STAX<>ALTER database add logfile MEMBER '/apps01/RECO/EXHYDRA/onlinelog/redo03.log' to GROUP 4;

Database altered.

kIsH@STAX<>ALTER database add logfile MEMBER '/apps01/RECO/EXHYDRA/onlinelog/redo04.log' to GROUP 5;

Database altered.

kIsH@STAX<>ALTER database add logfile MEMBER '/apps01/RECO/EXHYDRA/onlinelog/redo05.log' to GROUP 6;

Database altered.

kIsH@STAX<>ALTER database add logfile MEMBER '/apps01/RECO/EXHYDRA/onlinelog/redo06.log' to GROUP 7;

Database altered.
kIsH@STAX<>ALTER database add logfile MEMBER '/apps01/RECO/EXHYDRA/onlinelog/redo07.log' to GROUP 8;

Database altered.

kIsH@STAX<>ALTER database add logfile MEMBER '/apps01/RECO/EXHYDRA/onlinelog/redo08.log' to GROUP 9;

Database altered.

Verify the added files

kIsH@STAX<>set lines 200 pages 1000
kIsH@STAX<>select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME     CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
         4          1        145  629145600        512          2 NO  CURRENT                8512450 02-FEB-23   9.2954E+18                    0
         5          1          0  629145600        512          2 YES UNUSED                       0                      0                    0
         6          1          0  629145600        512          2 YES UNUSED                       0                      0                    0
         7          1          0  629145600        512          2 YES UNUSED                       0                      0                    0
         8          1          0  629145600        512          2 YES UNUSED                       0                      0                    0
         9          1          0  629145600        512          2 YES UNUSED                       0                      0                    0

It took less than a second for the crash recovery after log size and MTTR change. Also note the size of redo to be recovered which also reduced to 3.11 MB from 267 MB in the previous test.

Read rate (ASYNC): 442632Kb in 0.16s => 2701.61 Mb/sec



*** 2023-02-02T20:58:42.006593+05:30 (CDB$ROOT(1))
Recovery of Online Redo Log: Thread 1 Group 5 Seq 146 Reading mem 0

*** 2023-02-02T20:58:42.033915+05:30 (CDB$ROOT(1))
Recovery of Online Redo Log: Thread 1 Group 6 Seq 147 Reading mem 0

*** 2023-02-02T20:58:42.157070+05:30 (CDB$ROOT(1))
Completed redo application of 3.11MB
 <==== Amount of redo to be recovered after a crash recovery startup is small after change of MTTR!

Even though , MTTR parameter is set to recover from failure in 1 second, still oracle sometime may take more than a second based on size of redo and amount of pending redo’s that need to be applied. But it try to hit the target (1 second) as much as possible.

If both performance and faster recovery need to be achieved, then set fast_start_mttr_target to as high as possible. Max value is 3600 sec and default value is 0 (disabled)

SUMMARY:

  • Frequent checkpoint is caused if MTTR is configured
  • Faster recovery may happen if MTTR value is low.
  • Roll forward involves: reading and applying of log blocks higher than ckpt scn and also the time taken for recovery rely on the dirty block information stored in redo higher than ckpt scn

Leave a Reply

%d bloggers like this: