log file switch (archiving needed) in oracle
log file switch happens after every check point activity in the database. log file switch (archiving needed) is a wait event which occurs because archiver process ARCn is not able to cope up with the LGWR in archiving the data into archive log file. During a log switch, the archiver process need to immediately archive the redo’s of the current group before jumping to next group which are generated at the right momentum. A further delay in archival lead to delay in log switch causing this event. The amount of DML operation happening in the database is higher than significant with a continuous flood of changes in the tables. Another case of this event occurrence is due to tablespaces kept in hot backup mode and DML on top of the hot backup enabled tablespace.
- Check if any of the archive’s are failed to be written
- Plan to multiplex the redo log files
- Plan to increase the frequency of archival backup’s
- Plan to ship the archive log’s in stripped disk’s
Troubleshoot the issue:
waitSpade shows the top time consumer as log file switch (archiving needed)
W_CHAINS TIMER CHART SQL_ID TOTAVGACT PER%AS
-------------------------------------------------- ---------- -------------------- ------------- ---------- ----------
|log file switch (archiving needed) [|]<~~~ log fi 1062 <><><><><><><><><>< 5awjvvs8rvx7g 3.23 25.2
le switch (archiving needed)| ><><><><><><><><><><
><><><><><><><><><><
><><><><><><><><><><
><><><><><><><><><><
><><><
|log file switch (archiving needed) [|]<~~~ log fi 685 <><><><><><><><><>< 3.23 16.3
le switch (archiving needed)| ><><><><><><><><><><
><><><><><><><><><><
><><><><
|log file switch (archiving needed) [|]<~~~ log fi 578 <><><><><><><><><>< frqcv71skm5rv 3.23 13.7
le switch (archiving needed)| ><><><><><><><><><><
><><><><><><><><>
|log file switch (archiving needed) [|]<~~~ log fi 245 <><><><><><><><><>< g4gp07gt2z920 3.23 5.8
le switch (archiving needed)| ><><
|log file switch (archiving needed) [|]<~~~ buffer 137 <><><><><><> g4gp07gt2z920 3.23 3.3
busy waits|
Checking all the archiver process, there is no blocking sessions identified.
SID TIME WHO PROGRAM SQL_ID USERNAM STATE W_CHAINS WHO_BLOCK_WHO
----- ------------------ ------- ------- ------------- ------- ---------- ------------------------------ --------------------
337 10-09-22 12:53:48 oracle ARC6 SYS LONG_WAIT NULL |NULL blocks===> 337
|
341 10-09-22 12:57:59 oracle ARCE SYS LONG_WAIT NULL |NULL blocks===> 341
|
28 10-09-22 13:17:07 oracle ARCB SYS LONG_WAIT NULL |NULL blocks===> 28|
27 10-09-22 13:52:33 oracle ARC9 SYS LONG_WAIT NULL |NULL blocks===> 27|
23 10-09-22 14:54:21 oracle ARC2 SYS LONG_WAIT NULL |NULL blocks===> 23|
336 10-09-22 14:55:23 oracle ARC5 SYS LONG_WAIT NULL |NULL blocks===> 336
|
23 10-09-22 14:58:32 oracle ARC2 SYS LONG_WAIT NULL |NULL blocks===> 23|
28 10-09-22 14:59:01 oracle ARCB SYS LONG_WAIT NULL |NULL blocks===> 28|
8 rows selected.
AWR report gives us information on the event
Under top events, subjected event has consume almost 34 % of the database time which is a bit high not too much

Load profile section has lot of valuable details on the redo log transactions with 162 M of IO throughput and out of that 127 M are writes

There are significant log switches per hour with 666 switch per hour which is too far from baseline value(5 switches per hour)

Check segments which has maximum block changes. Here DBSRC is the object which is modified very frequently in blistering seconds

ASH top events shows that the sql involved is an ‘UPDATE’ statement. This statement need to be improved for less frequent executions(165 executions in an hour) and areas to tune.

Scrutinizing the individual archiver traces shows some weird error ORA-00312, implying that the archive information are failed to be written in the archive files. Also there are IO error’s err=27072 errno=28 from OS level which indicates OS error. fsync call is trying to write the buffer’s to disk device files called archive log file as per the linux file descriptor. This error may happen due to corruption in the disk or lack of space in the disk
krsv_proc_add: Request to add process to V$MANAGED_STANDBY [krss.c:2851]
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
*** 2022-09-09T10:03:42.278089+05:30
ORA-00312: online log 5 thread 1: '/apps01/RECO/DB9ZX/onlinelog/redo03.log'
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 5 thread 1: '/apps01/RECO/DB9ZX/onlinelog/redo02.log'
ORA-00312: online log 5 thread 1: '/apps01/RECO/DB9ZX/onlinelog/redo03.log'
*** 2022-09-09T10:03:54.457590+05:30 (CDB$ROOT(1))
=================== skgfqio Request Dump =======================
OSD Context: aiopend=0, aiodone=0, limfsiz=4294967295l, sigwinchslot=0
Request flags: WRITE
- - - - skgfrrq request element 1 - - - -
BLOCKNO = 251904
IOV: addr=0x0x71b92698, fib=0x0x71bddca8, maxaio=0, seal=0x45726963, fd=260
fsync required?=TRUE, offset=18446744073709551615, aiopend=0
FIB: addr=0x0x71bddca8, lblksiz=512, ora ftype=4, pblksiz=512, filsiz=390153
maxvec=16, fname=/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6550_kkojkl1o_.arc, serr=0, seal=0x45726963
fstype=0x58465342, unix ftype=0x81a0, last block=390154
IOSB: addr=0x0x7f5c2dfa6440, status=3, time=0, qstatus=0,AIO start time=0
err=27072 errno=28 ose[0]=4 ose[1]=251904 ose[2]=-1
BUFFER: addr=0x0x7f5c2d956000, len=1048576
=================== skgfqio Request Dump =======================
OSD Context: aiopend=0, aiodone=0, limfsiz=4294967295l, sigwinchslot=0
Request flags: WRITE
- - - - skgfrrq request element 1 - - - -
BLOCKNO = 251904
IOV: addr=0x0x71b92698, fib=0x0x71bddca8, maxaio=0, seal=0x45726963, fd=260
fsync required?=TRUE, offset=18446744073709551615, aiopend=0
FIB: addr=0x0x71bddca8, lblksiz=512, ora ftype=4, pblksiz=512, filsiz=390153
maxvec=16, fname=/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6550_kkojkl1o_.arc, serr=0, seal=0x45726963
fstype=0x58465342, unix ftype=0x81a0, last block=390154
IOSB: addr=0x0x7f5c2dfa6440, status=3, time=0, qstatus=0,AIO start time=0
err=27072 errno=28 ose[0]=4 ose[1]=251904 ose[2]=-1
BUFFER: addr=0x0x7f5c2d956000, len=1048576
By checking the archive log files which are generated, they are
kIsH@Xhydra<>col NAME for a20
set lines 200 pages 1000
select NAME,THREAD#,SEQUENCE#,ARCHIVED,STATUS,COMPLETION_TIME,END_OF_REDO,END_OF_REDO_TYPE from v$archived_log order by NAME asc;
Last hour archival generation exceeded 700 which is huge count. Use the script from http://alphaoragroup.com/2022/09/10/shell-script-to-parse-the-total-log-switch-from-oracle-database-alert-log/
[oracle@xhydra ~]$ ./lswitchparser.sh -l 3
670
Check if all the redo log group’s are online. If the status of any group is invalid, then clear the unarchived log files manually. In my case, all the redo log group’s are online.
kIsH@Xhydra<>select group#,status,type from v$logfile;
GROUP# STATUS TYPE
---------- ------- -------
3 ONLINE
3 ONLINE
2 ONLINE
2 ONLINE
1 ONLINE
1 ONLINE
4 ONLINE
5 ONLINE
5 ONLINE
6 ONLINE
6 ONLINE
11 rows selected.
Further more research on the error bring me to another error which clearly confirm space crunch from OS level
ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device
Additional information: 4
Additional information: 94208
Additional information: 4294967295
<error barrier> at 0x7ffe9e14f980 placed krsi.c@7069
ORA-19502: write error on file "/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6551_kkojkl5g_.arc", block number 94208 (block size=512)
krsi_dst_close: Exception 19502 encountered when closing file
<error barrier> at 0x7ffe9e1502d8 placed krse.c@7169
ORA-19502: write error on file "/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6551_kkojkl5g_.arc", block number 94208 (block size=512)
I/O error 19502 archiving LNO:1 to '/apps01/RECO/DB9ZX/archivelog/2022_09_09/o1_mf_1_6551_kkojkl5g_.arc'
*** 2022-09-09 10:03:54.541503 [krsi.c:8737]
krsi_dst_fail_caller: LAD:1 err:19502 force:0 blast:1
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 1 thread 1: '/apps01/RECO/DB9ZX/onlinelog/o1_mf_1_k8dcj6p9_.log'
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 1 thread 1: '/apps01/oradata/DB9ZX/onlinelog/o1_mf_1_k8dcj6b0_.log'
It is obvious that the mount point which holds the archive log files is full due to heavy archive generation. This also give me a learning that, heavy archival should be caused by heavy DML from the database. log file switch (archiving needed) event happen not only due to slow ARCn process or IO, it may also happen due to lack of storage to accommodate the flooding archives at a time.
[oracle@xhydra ~]$ df -h /apps01/
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 30G 29G 11G 97% /apps01
Solution:
Increase the parameter “log_archive_max_processes” to higher value to reduce the load on existing archiver process. In this case, 4 archiver process is not sufficient to handle the redo workload. Hence, if workload is thrice the current archiver processes, then it should be increased to 16.
kIsH@Xhydra<>show parameter log_archive_max
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes integer 4
kIsH@Xhydra<>alter system set log_archive_max_processes=16;
System altered.
11:10:52 kIsH@Xhydra<>!ps -ef|grep 'ora_arc'|grep -v 'grep'
oracle 8074 1 0 09:06 ? 00:00:22 ora_arc0_db9zx
oracle 8078 1 0 09:06 ? 00:00:22 ora_arc1_db9zx
oracle 8080 1 0 09:06 ? 00:00:23 ora_arc2_db9zx
oracle 8082 1 0 09:06 ? 00:00:22 ora_arc3_db9zx
oracle 27687 1 0 11:00 ? 00:00:00 ora_arc4_db9zx
oracle 27689 1 0 11:00 ? 00:00:01 ora_arc5_db9zx
oracle 27691 1 0 11:00 ? 00:00:01 ora_arc6_db9zx
oracle 27693 1 0 11:00 ? 00:00:01 ora_arc7_db9zx
oracle 29193 1 0 11:10 ? 00:00:00 ora_arc8_db9zx
oracle 29195 1 0 11:10 ? 00:00:00 ora_arc9_db9zx
oracle 29197 1 0 11:10 ? 00:00:00 ora_arca_db9zx
oracle 29199 1 0 11:10 ? 00:00:00 ora_arcb_db9zx
oracle 29201 1 0 11:10 ? 00:00:00 ora_arcc_db9zx
oracle 29203 1 0 11:10 ? 00:00:00 ora_arcd_db9zx
oracle 29205 1 0 11:10 ? 00:00:00 ora_arce_db9zx
oracle 29207 1 0 11:10 ? 00:00:00 ora_arcf_db9zx
Also, create a shell script to take frequent backup’s of archive logs so that, they can be deleted frequent to give room for subsequent archive files.
[oracle@xhydra ~]$ cat archbak.sh
#!/bin/bash
#For logging purpose
_LOG_0()
{
echo "*************************************$1"
}
#Set the environment variables
_SET_ENV_1()
{
export ORACLE_SID=db9zx
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export PATH=/u01/app/oracle/product/19.0.0/dbhome_1/bin:/usr/sbin:/usr/local/bin:/home/oracle/.local/bin:/home/oracle/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
}
_BAKDELARCH_2(){
$ORACLE_HOME/bin/rman << EOF
connect target /
crosscheck archivelog all;
backup archivelog all delete input;
exit;
EOF
}
_SET_ENV_1
_BAKDELARCH_2
After performing all the changes, the wait event vanished all of a sudden
W_CHAINS TIMER CHART SQL_ID TOTAVGACT PER%AS
-------------------------------------------------- ---------- -------------------- ------------- ---------- ----------
|enq: CR - block range reuse ckpt [|]<~~~ enq: CR 331 <><><><><><><><><>< 5awjvvs8rvx7g 1.36 4.3
- block range reuse ckpt| ><><><><><><>
|enq: CR - block range reuse ckpt [|]<~~~ reliable 83 <><><>< 5awjvvs8rvx7g 1.36 1.1
message|