log file switch (checkpoint incomplete) oracle

log file switch (checkpoint incomplete) oracle

Log file switches happen during a check point activity to write the dirty buffers from buffer cache to datafiles which is performed by DBWR. LGWR process, as soon as the current redo log is full, it jumps to another redo log file to write the change vector. Redo’s will not be generated until the flush is completed by DBWR.

In this wait, the LGWR,

  • try to use the next log file from current
  • but the contents of next log file is still in check point phase.
  • has to wait for the DBWR to write the dirty buffer to disk

There may be multiple reasons for this event to occur and subsequent or underlying process need to be identified which is waited upon by LGWR

In AWR report, the top event is log file switch (checkpoint incomplete) and it waits for 1797 s in the total duration of the issue.

Next, check the number of log switches per hour. Normally, there should be 3-5 switches per hour which means every 20 minutes, there should be one switch. The calculation is as follows,

log switches = 10 ; duration = 60 mins ; Interval = 20 mins

===> Redo log size = 20/10 = 2 times bigger than the current size

kish@x3z6zx9<^>select GROUP#,count(*),bytes/1048576 MB from v$log group by bytes/1048576,GROUP#;

                                            GROUP#     COUNT(*)                                                 MB
-------------------------------------------------- ------------ --------------------------------------------------
                                                 2            1                                                 50
                                                 3            1                                                 50
                                                 1            1                                                 50

In our case, it is too that there are 1153 switches per hour!! šŸ™

Here for every 60/1161 = 0.052 min ~ 3.2 sec, there is a switch observed. Our goal is to size the redo logs in such a way that, there is a log switch for every 10 minutes.

log switches = 1161 ; duration = 60 mins ; Interval = 10 mins

===> Current redo log file size = 50 MB

===> New redo log file size = 10/0.052 = 192 times bigger than current size. But this is not feasible if there are no sufficient space in disk. Hence, test it with a reduced size and gradually reduce the waits.

Check the IOstat section for background processes which consumed most of the IO.

Here both LGWR and Logfile are on the top of the IO contributors

Check the histogram for long waits

If background waits are also related to log files, then there should be a thorough investigation on the LGWR, redo log sizing and MTTR.

Wait classes give a hint on the configuration class which means some configuration is not optimal. Here it is redo log.

Check for OS %iowait for any issues on IO service time from OS layer.

Check the size of the redo and number of requests

On checking the chains of wait in the ASH using ASHDIG.sql, the blocker processes during the issue time are log file parallel write and control file sequential read

  SID TIME               WHO     PROGRAM USERNAM STATE      W_CHAINS                                                     WHO_BLOCK_WHO
----- ------------------ ------- ------- ------- ---------- ------------------------------------------------------------ --------------------
    6 15-05-22 08:57:14  oracle  LGWR    SYS     LONG_WAIT  NULL                                                         |NULL blocks===> 6|
  145 15-05-22 08:57:16  sqlplus TNS V1- SYS     LONG_WAIT  |log file parallel write [|]<~~~ log file switch (checkpoint |6 blocks===> 145|
                                 V3                          incomplete)|

    6 15-05-22 08:57:17  oracle  LGWR    SYS     LONG_WAIT  NULL                                                         |NULL blocks===> 6|
   15 15-05-22 08:57:17  oracle  ARC2    SYS     LONG_WAIT  NULL                                                         |NULL blocks===> 15|
  145 15-05-22 08:57:17  sqlplus TNS V1- SYS     LONG_WAIT  NULL                                                         |NULL blocks===> 145
                                 V3                                                                                      |

  145 15-05-22 08:57:18  sqlplus TNS V1- SYS     LONG_WAIT  NULL                                                         |NULL blocks===> 145
                                 V3                                                                                      |

   15 15-05-22 08:57:18  oracle  ARC2    SYS     LONG_WAIT  NULL                                                         |NULL blocks===> 15|
  145 15-05-22 08:57:19  sqlplus TNS V1- SYS     LONG_WAIT  |control file sequential read [|]<~~~ log file switch (check |6 blocks===> 145|

Check the counts for the waits which wait for too long and sql which waits on it. More blocks are towards log file switch (checkpoint incomplete) waiting on control file sequential read. Hence we need to tune control files IO by placing or spreading multiplexed control files in different faster disks. Post that, log files switch need to be tuned by adding additional groups in redo logs or increasing the size of redo logs.

W_CHAINS                                                CNTR SQL_ID
-------------------------------------------------- --------- -------------
NULL                                                    1509
NULL                                                     333 99yaq3qcnc74t
|control file sequential read [|]<~~~ log file swi        94 99yaq3qcnc74t
tch (checkpoint incomplete)|

|log file sequential read [|]<~~~ log file switch         80 99yaq3qcnc74t
(checkpoint incomplete)|

|log file switch (checkpoint incomplete) [|]<~~~ l        67 99yaq3qcnc74t

Check the sql which causes the waits.

kish@x3z6zx9<^>select sql_text from v$sql where sql_id='99yaq3qcnc74t';

SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
update kish.xtbl set salary=100000 where xid between 1830000 and 183090000

Action:

  • Add redo log groups in faster disks
  • Increase redo log file size
  • Use direct path load for insert statements
  • Use nologging for DML transactions
  • Disable force logging if enabled

Add new groups for redo log in faster disks

kish@x3z6zx9<^>alter database add logfile member '+REDOX' to group 4;

Database altered.

kish@x3z6zx9<^>alter database add logfile member '+REDOX' to group 5;

Database altered.

kish@x3z6zx9<^>alter database add logfile member '+REDOX' to group 6;

Database altered.

kish@x3z6zx9<^>alter database add logfile member '+REDOX' to group 7;

Database altered.

kish@x3z6zx9<^>alter database add logfile member '+REDOX' to group 8;

Database altered.

kish@x3z6zx9<^>alter database add logfile member '+REDOX' to group 9;

Database altered.

Check the new groups

kish@x3z6zx9<^>select * from v$logfile;

                                            GROUP# STATUS  TYPE
-------------------------------------------------- ------- -------
MEMBER
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IS_
---
                                                 4 INVALID ONLINE
+REDOX/x3z6zx9/onlinelog/group_4.262.1104939995
NO

                                                 5 INVALID ONLINE
+REDOX/x3z6zx9/onlinelog/group_5.263.1104940005
NO

                                                 6 INVALID ONLINE
+REDOX/x3z6zx9/onlinelog/group_6.264.1104940011
NO

                                                 4         ONLINE
+REDOX/x3z6zx9/onlinelog/group_4.256.1104939409
NO

                                                 5         ONLINE
+REDOX/x3z6zx9/onlinelog/group_5.257.1104939429
NO

                                                 6         ONLINE
+REDOX/x3z6zx9/onlinelog/group_6.258.1104939453
NO

                                                 7         ONLINE
+REDOX/x3z6zx9/onlinelog/group_7.259.1104939455
NO

                                                 8         ONLINE
+REDOX/x3z6zx9/onlinelog/group_8.260.1104939459
NO

                                                 9         ONLINE
+REDOX/x3z6zx9/onlinelog/group_9.261.1104939461
NO

                                                 7 INVALID ONLINE
+REDOX/x3z6zx9/onlinelog/group_7.265.1104940019
NO

                                                 8 INVALID ONLINE
+REDOX/x3z6zx9/onlinelog/group_8.266.1104940025
NO

                                                 9 INVALID ONLINE
+REDOX/x3z6zx9/onlinelog/group_9.267.1104940031
NO


12 rows selected.

Post the addition of extra log files, new set of waits creep in.

W_CHAINS                                                CNTR SQL_ID
-------------------------------------------------- --------- -------------
NULL                                                    3556
NULL                                                    1356 99yaq3qcnc74t
NULL                                                      58 5mw55dp9p6nyt
NULL                                                      42 6b8yxwua9048n
|log file parallel write [|]<~~~ log buffer space|        27 99yaq3qcnc74t
NULL                                                      26 dksgsw2v42bru
|log file sequential read [|]<~~~ log buffer space        22 99yaq3qcnc74t
|

NULL                                                      17 gd3aqp7xuk8yd
NULL                                                      17 gf87bnxzf9raf
NULL                                                      13 gq4u3dppjkmyr
NULL                                                      10 6cahay53y2xz0
|log file sequential read [|]<~~~ log file switch          8 99yaq3qcnc74t
completion|

|control file sequential read [|]<~~~ log file swi         7 99yaq3qcnc74t
tch completion|

|control file parallel write [|]<~~~ log file swit         6 99yaq3qcnc74t
ch completion|

|log file sequential read [|]<~~~ enq: CF - conten         6
tion|

Post validate from AWR shows that “log file switch (checkpoint incomplete)” waits are slightly reduced but another new wait “log buffer space” appear on the top events. Also the log switch count is reduced. This is how the event is tuned

Summary:

  • Log file sizes should be optimally sized to reduce frequent checkpoints
  • Log file can be placed in faster disks to avoid IO contention
  • Check for dependent root wait causing log file switch (checkpoint incomplete) and tune the root culprit first

Leave a Reply

%d bloggers like this: