Day: May 30, 2020

ORA-03113: end-of-file on communication channel

Error:

First check logfile to identify the cause !

[oracle@orcl19x ~]$ locate alert_orcl19x1.log
/apps01/base/diag/rdbms/orcl19x1/orcl19x1/trace/alert_orcl19x1.log
/apps01/diag/rdbms/orcl19x1/orcl19x1/trace/alert_orcl19x1.log
/data01/base/diag/rdbms/orcl19x1/orcl19x1/trace/alert_orcl19x1.log
[oracle@orcl19x ~]$ tail -f /apps01/diag/rdbms/orcl19x1/orcl19x1/trace/alert_orcl19x1.log
ORA-19502: write error on file "/data01/ORCL12X/archivelog/2021_01_17/o1_mf_1_299_j08yt23w_.arc", block number 26624 (block size=512)
ARCH: I/O error 19502 archiving log 2 to '/data01/ORCL12X/archivelog/2021_01_17/o1_mf_1_299_j08yt23w_.arc'
Sun Jan 17 13:00:34 2021
Errors in file /app01/oracle/product/base/diag/rdbms/orcl12x/orcl12x/trace/orcl12x_ora_15741.trc:
ORA-16038: log 2 sequence# 299 cannot be archived
ORA-19502: write error on file "", block number  (block size=)
ORA-00312: online log 2 thread 1: '/app01/oracle/product/base/oradata/orcl12x/redo02.log'
USER (ospid: 15741): terminating the instance due to error 16038
Sun Jan 17 13:00:34 2021
System state dump requested by (instance=1, osid=15741), summary=[abnormal instance termination].
System State dumped to trace file /app01/oracle/product/base/diag/rdbms/orcl12x/orcl12x/trace/orcl12x_diag_15701_20210117130034.trc
Sun Jan 17 13:00:34 2021
Dumping diagnostic data in directory=[cdmp_20210117130034], requested by (instance=1, osid=15741), summary=[abnormal instance termination].
Sun Jan 17 13:00:34 2021
Instance terminated by USER, pid = 15741

While startup, the instance crashed due to multiple reasons because of space crunch in filesystem or ASM,FRA gets filled up and so on.

SQL> startup;
ORACLE instance started.

Total System Global Area 1140850688 bytes
Fixed Size                  2923584 bytes
Variable Size             419431360 bytes
Database Buffers          704643072 bytes
Redo Buffers               13852672 bytes
ORA-03113: end-of-file on communication channel
Process ID: 6215
Session ID: 1 Serial number: 46504


SQL> startup;
SP2-0642: SQL*Plus internal error state 2133, context 3114:0:0
Unsafe to proceed
ORA-03114: not connected to ORACLE
SP2-0642: SQL*Plus internal error state 2133, context 3114:0:0
Unsafe to proceed
ORA-03114: not connected to ORACLE

SQL> !oerr ora 3113
03113, 00000, “end-of-file on communication channel”
// *Cause: The connection between Client and Server process was broken.
// *Action: There was a communication error that requires further investigation.
// First, check for network problems and review the SQL*Net setup.
// Also, look in the alert.log file for any errors. Finally, test to
// see whether the server process is dead and whether a trace file
// was generated at failure time.

Solution:

Close your session and reopen sql*plus again

Sometimes due to network ipc waits,we get this kind of error

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
[oracle@orcl ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Sat May 30 21:36:46 2020

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup;
ORACLE instance started.

Total System Global Area 1140850688 bytes
Fixed Size                  2923584 bytes
Variable Size             419431360 bytes
Database Buffers          704643072 bytes
Redo Buffers               13852672 bytes
Database mounted.
Database opened.

There might be another reason that you face this error!!

Check if your mount point or ASM has sufficient space and not full because of archivelog files fill.

[oracle@orcl12x ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        20G   11G  7.5G  59% /
tmpfs           2.0G  1.1G  953M  52% /dev/shm
/dev/sda2        15G  8.4G  5.3G  62% /app01
/dev/sda5        12G  2.8G  8.1G  26% /data01

Then delete unwanted trace files and get rid of archivelogs files by backing them up to tape.

RMAN> crosscheck archivelog all;
RMAN> backup archivelog all delete input;
RMAN> delete expired archivelog all;

I will create test case for this error to occur by simulating ora 3113

Check the FRA total space

SQL> show parameter db_recovery_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      /data01
db_recovery_file_dest_size           big integer 6000M

Check filesystem mountpoint space

SQL> !df -h /data01
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        12G  9.6G  1.3G  89% /data01         <---------------

I manually fill up FRA space with archivelogs by populating data into table

SQL> create table t(name varchar2(10),id number(10));

Table created.

SQL> begin
  2  for i in 1 .. 10000000 loop
  3  insert into t values('xyz',i);
  4  end loop;
  5  commit;
  6  end;
  7  /

Archives flood up the FRA after insert of data


[oracle@orcl12x ~]$ tail -50f /app01/oracle/product/base/diag/rdbms/orcl12x/orcl12x/trace/alert_orcl12x.log|grep 'Archived Log'
Archived Log entry 217 added for thread 1 sequence 243 ID 0x748f1abe dest 1:
Archived Log entry 218 added for thread 1 sequence 244 ID 0x748f1abe dest 1:
Archived Log entry 219 added for thread 1 sequence 245 ID 0x748f1abe dest 1:
Archived Log entry 220 added for thread 1 sequence 246 ID 0x748f1abe dest 1:
Archived Log entry 221 added for thread 1 sequence 247 ID 0x748f1abe dest 1:
Archived Log entry 222 added for thread 1 sequence 248 ID 0x748f1abe dest 1:
Archived Log entry 223 added for thread 1 sequence 249 ID 0x748f1abe dest 1:
Archived Log entry 224 added for thread 1 sequence 250 ID 0x748f1abe dest 1:
Archived Log entry 225 added for thread 1 sequence 251 ID 0x748f1abe dest 1:
Archived Log entry 226 added for thread 1 sequence 252 ID 0x748f1abe dest 1:
Archived Log entry 227 added for thread 1 sequence 253 ID 0x748f1abe dest 1:
Archived Log entry 228 added for thread 1 sequence 254 ID 0x748f1abe dest 1:
Archived Log entry 229 added for thread 1 sequence 255 ID 0x748f1abe dest 1:
Archived Log entry 230 added for thread 1 sequence 256 ID 0x748f1abe dest 1:
Archived Log entry 231 added for thread 1 sequence 257 ID 0x748f1abe dest 1:
Archived Log entry 232 added for thread 1 sequence 258 ID 0x748f1abe dest 1:
Archived Log entry 233 added for thread 1 sequence 259 ID 0x748f1abe dest 1:
Archived Log entry 234 added for thread 1 sequence 260 ID 0x748f1abe dest 1:
Archived Log entry 235 added for thread 1 sequence 261 ID 0x748f1abe dest 1:
Archived Log entry 236 added for thread 1 sequence 262 ID 0x748f1abe dest 1:
Archived Log entry 237 added for thread 1 sequence 263 ID 0x748f1abe dest 1:
Archived Log entry 238 added for thread 1 sequence 264 ID 0x748f1abe dest 1:
Archived Log entry 239 added for thread 1 sequence 265 ID 0x748f1abe dest 1:
Archived Log entry 240 added for thread 1 sequence 266 ID 0x748f1abe dest 1:
Archived Log entry 241 added for thread 1 sequence 267 ID 0x748f1abe dest 1:

Archive logs are the high contributor of FRA space



SQL> select * from v$flash_recovery_area_usage order by PERCENT_SPACE_USED desc;

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE
----------------------- ------------------ -------------------------
NUMBER_OF_FILES     CON_ID
--------------- ----------
ARCHIVED LOG                         44.57                         0
             56          0

BACKUP PIECE                         34.66                     11.49
             21          0

CONTROL FILE                             0                         0
              0          0

As expected,there is no space to populate further data into FRA

[oracle@orcl12x ~]$ df -h /data01/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        12G   11G   34M 100% /data01

I abort the instance,startup again and there it is the weird error! although it is common


SQL> shu abort;
ORACLE instance shut down.
SQL> startup;
ORACLE instance started.

Total System Global Area 1644167168 bytes
Fixed Size                  2925024 bytes
Variable Size             989859360 bytes
Database Buffers          637534208 bytes
Redo Buffers               13848576 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 6723
Session ID: 1 Serial number: 60575

After strace on the particular process id,it is clear that archivelogs are not being written in disk

[oracle@orcl12x cdmp_20210117042038]$ strace -p 15682
Process 15682 attached
write(1, "alter database open\n", 20)   = 20
write(1, "*\n", 2)                      = 2
lseek(4, 5120, SEEK_SET)                = 5120
read(4, "\r\0\351\0\0\0V\0\352\0\0\0\220\0\353\0\0\0\240\0\356\0\0\0\320\0\357\0\0\0\344\0"..., 512) = 512
write(1, "ERROR at line 1:\n", 17)      = 17  <-----------
write(1, "ORA-03113: end-of-file on commun"..., 48) = 48
write(1, "Process ID: 15741\n", 18)     = 18
write(1, "Session ID: 1 Serial number: 363"..., 35) = 35
write(1, "\n", 1)                       = 1
write(1, "\n", 1)                       = 1
write(1, "SQL> ", 5)                    = 5
read(0, 0x7f9c11fd3000, 1024)           = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15741, si_status=0, si_utime=16, si_stime=5} ---

Stack trace also clearly shows that archivelog fill up space due to space crunch in both FRA and Filesystem!

Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
ORA-16038: log 2 sequence# 299 cannot be archived
ORA-19502: write error on file "", block number  (block size=) <--------- 
ORA-00312: online log 2 thread 1: '/app01/oracle/product/base/oradata/orcl12x/redo02.log'
2021-01-17 13:00:34.545399 :kjzduptcctx(): Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+244<-kjzdssdmp()+321<-kjzduptcctx()+692<-kjzdicrshnfy()+992<-ksuitm()+5857<-kcfopd()+8296<-adbdrv_options()+1656<-adbdrv()+151<-opiexe()+20208<-opiosq0()+4554<-kpoal8()+1223<-opiodr()+1165<-ttcpip()+2699<-opitsk()+1734<-opiino()+945<-opiodr()+1165
<-opidrv()+587<-sou2o()+145<-opimai_real()+154<-ssthrdmain()+412
----- End of Abridged Call Stack Trace -----

*** 2021-01-17 13:00:34.580
USER (ospid: 15741): terminating the instance due to error 16038
ksuitm: waiting up to [5] seconds before killing DIAG(15701)

Start the database in mount stage,open rman and report all obsolete backups

RMAN> startup mount;

Oracle instance started
database mounted

Total System Global Area    1644167168 bytes

Fixed Size                     2925024 bytes
Variable Size                989859360 bytes
Database Buffers             637534208 bytes
Redo Buffers                  13848576 bytes

RMAN> report obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
Report of obsolete backups and copies
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           1      12-DEC-20
  Backup Piece       1      12-DEC-20          /data01/1955551678-20201212-01vhsu9i_1_1
Backup Set           2      12-DEC-20
  Backup Piece       2      12-DEC-20          /data01/1955551678-20201212-02vhsu9m_1_1

Delete all obsolete backups

RMAN> delete obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=21 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=22 device type=DISK

Crosscheck the backups


RMAN> crosscheck backup;
crosschecked backup piece: found to be 'AVAILABLE'
backup piece handle=/data01/ORCL12X/backupset/2021_01_17/o1_mf_nnndf_TAG20210117T021408_j07ry2vp_.bkp RECID=144 STAMP=1062036850
crosschecked backup piece: found to be 'AVAILABLE'
.
RMAN> crosscheck archivelog all;
archived log file name=/data01/ORCL12X/archivelog/2021_01_17/o1_mf_1_298_j07txh61_.arc RECID=272 STAMP=1062038880
.
Crosschecked 70 objects

I dont need any backuppiece in disk as they already saved in tape,so i will delete them

It is always recommended to have a recent backup both in disk and tape

RMAN> delete noprompt backuppiece tag TAG20210117T021408;
.
.
backup piece handle=/data01/ORCL12X/backupset/2021_01_17/o1_mf_nnndf_TAG20210117T021408_j07ry2vp_.bkp RECID=144 STAMP=1062036850
deleted backup piece
backup piece handle=/data01/ORCL12X/backupset/2021_01_17/o1_mf_nnndf_TAG20210117T021408_j07ry31l_.bkp RECID=145 STAMP=1062036850
Deleted 12 objects

Now i have sufficient space to open the database 🙂

[oracle@orcl12x archivelog]$ df -h /data01/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        12G  6.0G  5.0G  55% /data01

Archivelogs still spill up FRA

My FRA dont have enough room to hold the backup and archive

[oracle@orcl12x archivelog]$ du -sh *
4.0K    2020_12_20
4.0K    2020_12_21
4.0K    2020_12_25
4.0K    2020_12_26
4.0K    2020_12_27
4.0K    2021_01_13
3.3G    2021_01_17

Anyway i can open database without any issue

RMAN> alter database open;

Statement processed

Finally, take a full cold backup and delete all the archives from disk