log file switch (archiving needed) in oracle

log file switch happens after every check point activity in the database. log file switch (archiving needed) is a wait event which occurs because archiver process ARCn is not able to cope up with the LGWR in archiving the data into archive log file. During a log switch, the archiver process need to immediately archive the redo’s of the current group before jumping to next group which are generated at the right momentum. A further delay in archival lead to delay in log switch causing this event. The amount of DML operation happening in the database is higher than significant with a continuous flood of changes in the tables. Another case of this event occurrence is due to tablespaces kept in hot backup mode and DML on top of the hot backup enabled tablespace.

  • Check if any of the archive’s are failed to be written
  • Plan to multiplex the redo log files
  • Plan to increase the frequency of archival backup’s
  • Plan to ship the archive log’s in stripped disk’s

Troubleshoot the issue:

waitSpade shows the top time consumer as log file switch (archiving needed)

W_CHAINS                                                TIMER CHART                SQL_ID         TOTAVGACT     PER%AS
-------------------------------------------------- ---------- -------------------- ------------- ---------- ----------
|log file switch (archiving needed) [|]<~~~ log fi       1062  <><><><><><><><><>< 5awjvvs8rvx7g       3.23       25.2
le switch (archiving needed)|                                 ><><><><><><><><><><
                                                              ><><><><><><><><><><
                                                              ><><><><><><><><><><
                                                              ><><><><><><><><><><
                                                              ><><><

|log file switch (archiving needed) [|]<~~~ log fi        685  <><><><><><><><><><                     3.23       16.3
le switch (archiving needed)|                                 ><><><><><><><><><><
                                                              ><><><><><><><><><><
                                                              ><><><><

|log file switch (archiving needed) [|]<~~~ log fi        578  <><><><><><><><><>< frqcv71skm5rv       3.23       13.7
le switch (archiving needed)|                                 ><><><><><><><><><><
                                                              ><><><><><><><><>

|log file switch (archiving needed) [|]<~~~ log fi        245  <><><><><><><><><>< g4gp07gt2z920       3.23        5.8
le switch (archiving needed)|                                 ><><

|log file switch (archiving needed) [|]<~~~ buffer        137  <><><><><><>        g4gp07gt2z920       3.23        3.3
 busy waits|

Checking all the archiver process, there is no blocking sessions identified.

  SID TIME               WHO     PROGRAM SQL_ID        USERNAM STATE      W_CHAINS                       WHO_BLOCK_WHO
----- ------------------ ------- ------- ------------- ------- ---------- ------------------------------ --------------------
  337 10-09-22 12:53:48  oracle  ARC6                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 337
                                                                                                         |

  341 10-09-22 12:57:59  oracle  ARCE                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 341
                                                                                                         |

   28 10-09-22 13:17:07  oracle  ARCB                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 28|
   27 10-09-22 13:52:33  oracle  ARC9                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 27|
   23 10-09-22 14:54:21  oracle  ARC2                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 23|
  336 10-09-22 14:55:23  oracle  ARC5                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 336
                                                                                                         |

   23 10-09-22 14:58:32  oracle  ARC2                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 23|
   28 10-09-22 14:59:01  oracle  ARCB                  SYS     LONG_WAIT  NULL                           |NULL blocks===> 28|

8 rows selected.

AWR report gives us information on the event

Under top events, subjected event has consume almost 34 % of the database time which is a bit high not too much

Load profile section has lot of valuable details on the redo log transactions with 162 M of IO throughput and out of that 127 M are writes

There are significant log switches per hour with 666 switch per hour which is too far from baseline value(5 switches per hour)

Check segments which has maximum block changes. Here DBSRC is the object which is modified very frequently in blistering seconds

ASH top events shows that the sql involved is an ‘UPDATE’ statement. This statement need to be improved for less frequent executions(165 executions in an hour) and areas to tune.

Scrutinizing the individual archiver traces shows some weird error ORA-00312, implying that the archive information are failed to be written in the archive files. Also there are IO error’s err=27072 errno=28 from OS level which indicates OS error. fsync call is trying to write the buffer’s to disk device files called archive log file as per the linux file descriptor. This error may happen due to corruption in the disk or lack of space in the disk

krsv_proc_add: Request to add process to V$MANAGED_STANDBY [krss.c:2851]
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)

*** 2022-09-09T10:03:42.278089+05:30
ORA-00312: online log 5 thread 1: '/apps01/RECO/DB9ZX/onlinelog/redo03.log'
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 5 thread 1: '/apps01/RECO/DB9ZX/onlinelog/redo02.log'
ORA-00312: online log 5 thread 1: '/apps01/RECO/DB9ZX/onlinelog/redo03.log'

*** 2022-09-09T10:03:54.457590+05:30 (CDB$ROOT(1))
=================== skgfqio Request Dump =======================
OSD Context: aiopend=0, aiodone=0, limfsiz=4294967295l, sigwinchslot=0
Request flags: WRITE
- - - - skgfrrq request element 1 - - - -
BLOCKNO = 251904
IOV: addr=0x0x71b92698, fib=0x0x71bddca8, maxaio=0, seal=0x45726963, fd=260
     fsync required?=TRUE, offset=18446744073709551615, aiopend=0
FIB: addr=0x0x71bddca8, lblksiz=512, ora ftype=4, pblksiz=512, filsiz=390153
     maxvec=16, fname=/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6550_kkojkl1o_.arc, serr=0, seal=0x45726963
     fstype=0x58465342, unix ftype=0x81a0, last block=390154
IOSB: addr=0x0x7f5c2dfa6440, status=3, time=0, qstatus=0,AIO start time=0
err=27072 errno=28 ose[0]=4 ose[1]=251904 ose[2]=-1
BUFFER: addr=0x0x7f5c2d956000, len=1048576
=================== skgfqio Request Dump =======================
OSD Context: aiopend=0, aiodone=0, limfsiz=4294967295l, sigwinchslot=0
Request flags: WRITE
- - - - skgfrrq request element 1 - - - -
BLOCKNO = 251904
IOV: addr=0x0x71b92698, fib=0x0x71bddca8, maxaio=0, seal=0x45726963, fd=260
     fsync required?=TRUE, offset=18446744073709551615, aiopend=0
FIB: addr=0x0x71bddca8, lblksiz=512, ora ftype=4, pblksiz=512, filsiz=390153
     maxvec=16, fname=/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6550_kkojkl1o_.arc, serr=0, seal=0x45726963
     fstype=0x58465342, unix ftype=0x81a0, last block=390154
IOSB: addr=0x0x7f5c2dfa6440, status=3, time=0, qstatus=0,AIO start time=0
err=27072 errno=28 ose[0]=4 ose[1]=251904 ose[2]=-1
BUFFER: addr=0x0x7f5c2d956000, len=1048576

By checking the archive log files which are generated, they are

kIsH@Xhydra<>col NAME for a20
set lines 200 pages 1000
select NAME,THREAD#,SEQUENCE#,ARCHIVED,STATUS,COMPLETION_TIME,END_OF_REDO,END_OF_REDO_TYPE from v$archived_log order by NAME asc;

Last hour archival generation exceeded 700 which is huge count. Use the script from http://alphaoragroup.com/2022/09/10/shell-script-to-parse-the-total-log-switch-from-oracle-database-alert-log/

[oracle@xhydra ~]$ ./lswitchparser.sh -l 3
670

Check if all the redo log group’s are online. If the status of any group is invalid, then clear the unarchived log files manually. In my case, all the redo log group’s are online.

kIsH@Xhydra<>select group#,status,type from v$logfile;

    GROUP# STATUS  TYPE
---------- ------- -------
         3         ONLINE
         3         ONLINE
         2         ONLINE
         2         ONLINE
         1         ONLINE
         1         ONLINE
         4         ONLINE
         5         ONLINE
         5         ONLINE
         6         ONLINE
         6         ONLINE

11 rows selected.

Further more research on the error bring me to another error which clearly confirm space crunch from OS level

ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device
Additional information: 4
Additional information: 94208
Additional information: 4294967295
<error barrier> at 0x7ffe9e14f980 placed krsi.c@7069
ORA-19502: write error on file "/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6551_kkojkl5g_.arc", block number 94208 (block size=512)
krsi_dst_close: Exception 19502 encountered when closing file
<error barrier> at 0x7ffe9e1502d8 placed krse.c@7169
ORA-19502: write error on file "/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6551_kkojkl5g_.arc", block number 94208 (block size=512)
I/O error 19502 archiving LNO:1 to '/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6551_kkojkl5g_.arc'
*** 2022-09-09 10:03:54.541503 [krsi.c:8737]
krsi_dst_fail_caller: LAD:1 err:19502 force:0 blast:1
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 1 thread 1: '/apps01/RECO/DB9ZX/onlinelog/o1_mf_1_k8dcj6p9_.log'
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 1 thread 1: '/apps01/oradata/DB9ZX/onlinelog/o1_mf_1_k8dcj6b0_.log'

It is obvious that the mount point which holds the archive log files is full due to heavy archive generation. This also give me a learning that, heavy archival should be caused by heavy DML from the database. log file switch (archiving needed) event happen not only due to slow ARCn process or IO, it may also happen due to lack of storage to accommodate the flooding archives at a time.

[oracle@xhydra ~]$ df -h /apps01/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        30G   29G   11G  97% /apps01

Solution:

Increase the parameter “log_archive_max_processes” to higher value to reduce the load on existing archiver process. In this case, 4 archiver process is not sufficient to handle the redo workload. Hence, if workload is thrice the current archiver processes, then it should be increased to 16.

kIsH@Xhydra<>show parameter log_archive_max

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes            integer     4
kIsH@Xhydra<>alter system set log_archive_max_processes=16;

System altered.


11:10:52 kIsH@Xhydra<>!ps -ef|grep 'ora_arc'|grep -v 'grep'
oracle      8074       1  0 09:06 ?        00:00:22 ora_arc0_db9zx
oracle      8078       1  0 09:06 ?        00:00:22 ora_arc1_db9zx
oracle      8080       1  0 09:06 ?        00:00:23 ora_arc2_db9zx
oracle      8082       1  0 09:06 ?        00:00:22 ora_arc3_db9zx
oracle     27687       1  0 11:00 ?        00:00:00 ora_arc4_db9zx
oracle     27689       1  0 11:00 ?        00:00:01 ora_arc5_db9zx
oracle     27691       1  0 11:00 ?        00:00:01 ora_arc6_db9zx
oracle     27693       1  0 11:00 ?        00:00:01 ora_arc7_db9zx
oracle     29193       1  0 11:10 ?        00:00:00 ora_arc8_db9zx
oracle     29195       1  0 11:10 ?        00:00:00 ora_arc9_db9zx
oracle     29197       1  0 11:10 ?        00:00:00 ora_arca_db9zx
oracle     29199       1  0 11:10 ?        00:00:00 ora_arcb_db9zx
oracle     29201       1  0 11:10 ?        00:00:00 ora_arcc_db9zx
oracle     29203       1  0 11:10 ?        00:00:00 ora_arcd_db9zx
oracle     29205       1  0 11:10 ?        00:00:00 ora_arce_db9zx
oracle     29207       1  0 11:10 ?        00:00:00 ora_arcf_db9zx

Also, create a shell script to take frequent backup’s of archive logs so that, they can be deleted frequent to give room for subsequent archive files.

[oracle@xhydra ~]$ cat archbak.sh
#!/bin/bash

#For logging purpose
_LOG_0()
{
echo "*************************************$1"
}

#Set the environment variables
_SET_ENV_1()
{
export ORACLE_SID=db9zx
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export PATH=/u01/app/oracle/product/19.0.0/dbhome_1/bin:/usr/sbin:/usr/local/bin:/home/oracle/.local/bin:/home/oracle/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
}

_BAKDELARCH_2(){
$ORACLE_HOME/bin/rman << EOF
connect target /
crosscheck archivelog all;
backup archivelog all delete input;
exit;
EOF
}

_SET_ENV_1
_BAKDELARCH_2

After performing all the changes, the wait event vanished all of a sudden

W_CHAINS                                                TIMER CHART                SQL_ID         TOTAVGACT     PER%AS
-------------------------------------------------- ---------- -------------------- ------------- ---------- ----------
|enq: CR - block range reuse ckpt [|]<~~~ enq: CR         331  <><><><><><><><><>< 5awjvvs8rvx7g       1.36        4.3
- block range reuse ckpt|                                     ><><><><><><>

|enq: CR - block range reuse ckpt [|]<~~~ reliable         83  <><><><             5awjvvs8rvx7g       1.36        1.1
 message|

Leave a Reply