Brief view on OWI ORACLE WAIT EVENTS part 1

Types of wait events:

  • foreground
  • Background
  • Idle
  • Non Idle

RESPONSE TIME = SERVICE TIME + WAIT TIME or CPU TIME + QUEUE TIME

CPU used for normal work (CPU_W) = CPU used when call started – parse time CPU – recursive CPU usage

In 11g, below are the list of wait events which i used to describe as 4c’s in wait class .These 4c’s include Cluster,commit,concurrency and configuration.These events are most common waits in both RAC and standalone databases.We will see cluster events in another post part 2.

kish@exdbx<>select name,wait_class from v$event_name where wait_class in ('Commit','Concurrency','Configuration','Cluster') order by wait_class;

NAME                                                             WAIT_CLASS
---------------------------------------------------------------- ----------------------------------------------------------------
retry contact SCN lock master                                    Cluster
gc buffer busy acquire                                           Cluster
gc buffer busy release                                           Cluster
pi renounce write complete                                       Cluster
gc current request                                               Cluster
gc cr request                                                    Cluster
gc cr disk request                                               Cluster
gc cr multi block request                                        Cluster
gc current multi block request                                   Cluster
gc block recovery request                                        Cluster
gc cr block 2-way                                                Cluster
gc cr block 3-way                                                Cluster
gc cr block busy                                                 Cluster
gc cr block congested                                            Cluster
gc cr failure                                                    Cluster
gc cr block lost                                                 Cluster
gc cr block unknown                                              Cluster
gc current block 2-way                                           Cluster
gc current block 3-way                                           Cluster
gc current block busy                                            Cluster
log file sync                                                    Commit
enq: BB - 2PC across RAC instances                               Commit
latch: cache buffers chains                                      Concurrency
Shared IO Pool Memory                                            Concurrency
logout restrictor                                                Concurrency
os thread startup                                                Concurrency
Streams apply: waiting for dependency                            Concurrency
pipe put                                                         Concurrency
securefile chain update                                          Concurrency
buffer busy waits                                                Concurrency
db flash cache invalidate wait                                   Concurrency
enq: TX - index contention                                       Concurrency
latch: Undo Hint Latch                                           Concurrency
latch: In memory undo latch                                      Concurrency
latch: MQL Tracking Latch                                        Concurrency
Streams apply: waiting to commit                                 Configuration
checkpoint completed                                             Configuration
Global transaction acquire instance locks                        Configuration
statement suspended, wait error to be cleared                    Configuration
enq: SQ - contention                                             Configuration
sort segment request                                             Configuration

Below are the wait events w.r.t application and user I/O which also should be noted when there is a performance issue

kish@exdbx<>select name,wait_class from v$event_name where wait_class in ('Application','User I/O') order by wait_class;

NAME                                                             WAIT_CLASS
---------------------------------------------------------------- ----------------------------------------------------------------
enq: PW - flush prewarm buffers                                  Application
WCR: replay lock order                                           Application
enq: RO - fast object reuse                                      Application
enq: KO - fast object checkpoint                                 Application
enq: TM - contention                                             Application
enq: TX - row lock contention                                    Application
Wait for Table Lock                                              Application
enq: RC - Result Cache: Contention                               Application
Streams capture: filter callback waiting for ruleset             Application
Streams: apply reader waiting for DDL to apply                   Application
SQL*Net break/reset to client                                    Application
SQL*Net break/reset to dblink                                    Application
External Procedure initial connection                            Application
External Procedure call                                          Application
enq: UL - contention                                             Application
OLAP DML Sleep                                                   Application
enq: RO - contention                                             Application
Parameter File I/O                                               User I/O
Disk file operations I/O                                         User I/O
Disk file I/O Calibration                                        User I/O
Disk file Mirror Read                                            User I/O
Disk file Mirror/Media Repair Write                              User I/O
direct path sync                                                 User I/O
Datapump dump file I/O                                           User I/O
dbms_file_transfer I/O                                           User I/O
DG Broker configuration file I/O                                 User I/O
Data file init write                                             User I/O
Log file init write                                              User I/O
Shared IO Pool IO Completion                                     User I/O

Let check the histogram buckets for most common exadata wait event.There are different number of buckets in the wait_time_milli column

Cell single block physical read has 12 buckets upto 2048 ms;Cell multiblock physical read has 10 buckets upto 512 ms;

wait_count for Cell single block physical read ranges from 707*1ms to 3*2048ms

Here second bucket shows 2*1069ms of waits which is higher than the 1st bucket (707)

But if you take Cell multiblock physical read,there are few buckets(10) upto 512 ms.From this information,smaller number of long waits and large number of shorter waits can be derived.

kish@exdbx<>select event#,event,wait_time_milli,wait_count from v$event_histogram where event like '%cell%';

    EVENT# EVENT                                                            WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------------------------------- --------------- ----------
       403 cell single block physical read                                                1        707
       403 cell single block physical read                                                2       1069
       403 cell single block physical read                                                4        691
       403 cell single block physical read                                                8        497
       403 cell single block physical read                                               16       1414
       403 cell single block physical read                                               32       1492
       403 cell single block physical read                                               64        432
       403 cell single block physical read                                              128         93
       403 cell single block physical read                                              256         14
       403 cell single block physical read                                              512          6
       403 cell single block physical read                                             1024          2
       403 cell single block physical read                                             2048          3
       404 cell multiblock physical read                                                  1         23
       404 cell multiblock physical read                                                  2        207
       404 cell multiblock physical read                                                  4        283
       404 cell multiblock physical read                                                  8        175
       404 cell multiblock physical read                                                 16        410
       404 cell multiblock physical read                                                 32        471
       404 cell multiblock physical read                                                 64        120
       404 cell multiblock physical read                                                128         34
       404 cell multiblock physical read                                                256         23
       404 cell multiblock physical read                                                512         14
33 rows selected.

Common wait events:

Buffer busy waits:

Whenever a session needs to access a datablock in buffer cache ,the same block is accessed or modified by some other session.The other session is either reading the same data block into buffer cache from datafile,or it is modifying the one in buffer cache.

To guarantee that reader session has a consistent image of the datablock with dirty block or free block,the session modifying the block marks the block header with flag to intimate other sessions know that the change is taking place and to wait until complete change is applied.Below are the most common classes of buffer which undergo BBW are,

kish@exdbx<>select * from v$waitstat where count > 0;

CLASS                   COUNT       TIME
------------------ ---------- ----------
data block                 56         61
segment header              2          7
1st level bmb               2          2
2nd level bmb               2          0
undo header                 4          0
undo block                  1          0

Control file parallel write:

Control file parallel write(CFPW) occur if session waits until all controlfiles are updated with the latest SCN by CKPT process and other backup and recovery information to maintain consistency of datafile and redolog files.Every 3 seconds CKPT(checkpoint) process writes the checkpoint position in all members of online redologs for recovery purpose.Oracle also store the unrecoverable SCN operations like DML performed using nologging options.

Here the session waits for the I/O on OS holding CF enqueue to complete the write operations on the controlfiles.

When it is considered problematic? If there are enormous waits and elapsed time observed on CFPW,then it might be an indication of slow I/O subsystem or slow writes by CKPT process.

USERNAME                    SID    SERIAL# EVENT
-------------------- ---------- ---------- ------------------------------
WAIT_CLASS       WAIT_TIME SECONDS_IN_WAIT STATE
--------------- ---------- --------------- -------------------

(oracle)                     12          1 control file sequential read
System I/O               0               0 WAITING

DB file parallel read:

Dont get fooled by the naming convention of this wait event.This event is purely occur during the recovery of the database.It also occurs during read of single or multiple block parallely from datafile to database buffer cache.

DB file parallel write:

Similar to previous event,this events is responsible for writing the dirty buffer from db buffer to datafile where DBWR plays a major role.The session waits for the I/O subsystem on OS for the writes to complete.This may also impact the read requests simultaneously during write on the same disk.DBWR perform batch writes to the datafile either synchronously or asynchronously.

If you dont see this event in your database,then disk_async_io parameter must be set to false

When db file parallel write average wait time > 100ms ,then you have slow IO subsystem.

kish@exdbx<>select KSPPINM from x$ksppi where KSPPINM in ('fast_start_mttr_target','_db_large_dirty_queue','_db_block_max_dirty_target');

KSPPINM
--------------------------------------------------------------------------------
_db_large_dirty_queue
fast_start_mttr_target

DB file scattered read:

You must have heard of this most common wait event,which occur during SELECT statement and a process waits for multiblock read to complete.Multiple datablocks from datafile are read by server process and datablocks are collected and scattered randomly into database buffercache in a non contiguous manner.Also we can notice full table scan or index fast full scan during this event

DB_FILE_MULTIBLOCK_READ_COUNT is the parameter which determines the number of datablock which can be read from the datafile.This event is usual and occur most of the time.Consider this event as a nightmare,when you observe significant amount of wait compared to other events.

SQL> select event,sample_time, session_state,count(*) from v$active_session_history where event like '%scattered%' group by event,sample_time, session_state;

EVENT                SAMPLE_TIME          SESSION_STATE          COUNT(*)
-------------------- -------------------- -------------------- ----------
db file scattered re 29-MAR-21 11.39.35.0 WAITING                       1
ad                   15 AM

This wait falls under USER I/O and if there is high wait observed,then most probably it is an application issue

Three parameters are used for this event :file#,first block# and block count

kish@exdbx<>select a.sid,b.name,a.value from v$sesstat a ,v$statname b
  2  where a.statistic# = b.statistic# and a.value <> 0
  3  and b.name = 'table scan blocks gotten' order by 3,1;

       SID NAME                                                                 VALUE
---------- ---------------------------------------------------------------- ----------
        50 table scan blocks gotten                                                 1
        36 table scan blocks gotten                                                 2
        51 table scan blocks gotten                                                38
         7 table scan blocks gotten                                                44
         5 table scan blocks gotten                                               187
        26 table scan blocks gotten                                               298
        62 table scan blocks gotten                                               501
        60 table scan blocks gotten                                              9566
        22 table scan blocks gotten                                             18935

9 rows selected.

DB file sequential read:

This event is also common when a process waits for a single block read to complete.This event is displayed by oracle ,when there is index scan,undo segments,table access by rowid,rebuilding controlfiles,dumping datafile headers.If you observe significant wait time for single block read,then you include this event in your trouleshooting list.

Recommendation:

  • Since it is totally IO based event where we need to concentrate on reducing the physical and logical IO’s by tuning the sql query used by application. Index reads causes heavy IO when blocks are requested from disk to buffercache and if there is poor extent outline in the datafiles,then demand for IO increases.
  • Reduce the average wait time by tuning the sql query

Here the thumb rule is to maintain AVERAGE_WAIT time should not be more than 10ms as a standard measure.

Clustering factor:

Clustering factor of an index should be equal to the number of blocks in a table which means rows are ordered in the table as per the index.If the clustering factor is equal to number of rows in table,then probably you have bad clustering factor.Index requires a rebuild.

Optimal solution is to rewrite the query by writing efficiently queries with better logic

DB file single write:

This event is also one of DBWR event ,occur when datafile headers are updated by oracle.If your database has ample number of datafiles ,then you see this event in your top events

Direct path read(DPR):

This event occurs when oracle directly read the datablocks using PGA instead SGA buffer.DPR can be performed using sync or async I/O based on the hardware.Init parameter DISK_AYNC_IO determine the type of IO during block read.

There are some scenarios where in DPR can be expected

  • Parallel server execution
  • sorting(sort by/group by clause on a select query)
  • hash joins

If the session cannot attain sync IO,then it has to wait until there is a acknowledgement and IO complete.

If session can attain async IO,then multiple DPR requests are sent and they continue to process the blocks that are already cached in PGA.

Direct path write:

If a session writes datablocks using PGA from data buffer to the datafiles.Multiple write requests would be sent to continue processing.OS handles the IO operation.

Scenarios when this event is posted:

  • INSERT operation with APPEND like IAS or CTAS
  • Parallel DML operations

For LOB segments,specifically there is a direct path write (lob) event

USERNAME                    SID    SERIAL# EVENT
-------------------- ---------- ---------- ------------------------------
WAIT_CLASS       WAIT_TIME SECONDS_IN_WAIT STATE
--------------- ---------- --------------- -------------------

SYS                          60        391 direct path write
User I/O                 0               0 WAITING

Enqueue:

An enqueue is a shared memory structure which serialize the control to acquire to the database resource.To access a memory structure say for instance a particular transaction,the process must get the enqueue lock of resource.Eg: If A acquires an enqueue lock on resource X and B also needs to access resource X,then B has to wait in queue to attain the respective enqueue when A releases X

Types of enqueue

  • ST – space management
  • SQ – sequence wait
  • TX – transaction wait
USERNAME                    SID    SERIAL# EVENT                          WAIT_CLASS       WAIT_TIME SECONDS_IN_WAIT STATE
-------------------- ---------- ---------- ------------------------------ --------------- ---------- --------------- -------------------
SYS                           1         47 enq: PS - contention           Other    0               6 WAITING
USERNAME                    SID    SERIAL# EVENT

SYS                          60        391 enq: CF - contention
Other                    0               0 WAITING

Identify the sid and sqlid of the locked sessions

kish@exdbx<>select sid,sql_id from v$session where sid in ('66','58');

                                     SID SQL_ID
---------------------------------------- -------------
                                     
                                      58 au7znvpsmyyff
                                      66 5s96y62hq2rhk

Find the sql which are under TX enqueue

kish@exdbx<>select sql_text,sql_id from v$sql where sql_id in ('au7znvpsmyyff','5s96y62hq2rhk');

SQL_TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID
-------------
update tabx set order_id=369 where mod(order_id,4)=1
5s96y62hq2rhk

update tabx set order_id=369 where mod(order_id,5)=1
au7znvpsmyyff

Identify the lock mode to get the lock type held on the object

kish@exdbx<>select * from v$lock where sid in ('66','58');

ADDR             KADDR                                                 SID TY                                      ID1    ID2                                    LMODE
---------------- ---------------- ---------------------------------------- -- ---------------------------------------- ---------------------------------------- ----------------------------------------
                                 REQUEST                                    CTIME                                    BLOCK
---------------------------------------- ---------------------------------------- ----------------------------------------
00000000D3D3AD18 00000000D3D3AD70                                       66 AE                                      100      0                                        4
                                       0                                     2357                                     2

00000000D3D3B098 00000000D3D3B0F0                                       58 AE                                      100      0                                        4
                                       0                                     2350                                     2

00000000D3D3B798 00000000D3D3B7F0                                       58 TX                                   262167    932                                        0
                                       6                                     1764                                     0

00000000D3D3BCF0 00000000D3D3BD48                                       66 TX                                   327702   1124                                        0
                                       6                                     1772                                     0

Free buffer waits:

This event occurs when there is no room for datablocks to get accomodate or to build a CR image of block on the buffer cache of SGA.Either the buffer cache is small or the dirty buffers are not flushed to the disk in regular intervals.The server process will notify DBWR to clean the dirty buffers to disk as part the algorithm rules.

Latch free:

When a process holding a latch on a memory structure,another process requiring the latch needs to wait in queue.Oracle maintains consistency with the help of latch.Only one process is allowed to either update or modify the data structures on a latch.Processes which need to attain the latch need to wait

Below are the list of latchname filtered by latch.There are some more latch which were not listed here!Each latch is identified by latch#

SQL> select * from v$latchname where name like '%latch%';

    LATCH# NAME                                                                   HASH
---------- ---------------------------------------------------------------- ----------
         2 hot latch diags                                                  1435869216
        18 ksupkttest latch                                                 1143481739
        20 ksim membership request latch                                    3846407325
        25 ksbxic instance latch                                            3429070552
        31 enqueue freelist latch                                           1930520624
  .........
...........
    

Library cache pin:

This event is related to library cache concurrency where a session pins an object in library cache in shared pool to read or modify them.Thus preventing others sessions from updating the same object.

It occurs mainly when executing

  • PL/SQL procedures
  • Views

When this event is considered problematic?

  1. Dynamic SQL inside PL/SQL procedure has to perform recompilation of the code from the calling procedure
  2. Blocking session

Solution:

  • Shared pool needs to be tuned
  • Blocking session must be taken care
SQL> select s.sid,kglpnmod "Mode",kglpnreq "Req" from x$kglpn p,v$session s where p.kglpnuse=s.addr and kglpnhdl='&P1RAW';
Enter value for p1raw:

We can get the locks and pins detail in db_object_cache view which is very useful when tuning library cache.If these locks on objects are significant,then its time to have a look on these locks

SQL> select * from (select owner,locks,pins from v$db_object_cache order by locks desc) where rownum < 6 and owner is not null;

OWNER                                                                 LOCKS       PINS
---------------------------------------------------------------- ---------- ----------
SYS                                                                       9          0
SYS                                                                       7          0
SYS                                                                       7          0
SYS                                                                       5          0
SYS                                                                       5          0

lets test the same with a scenario

Open two sessions A and B and execute procedure with dbms_lock upto 30 seconds time

kish@exdbx<>create or replace procedure x as
  2  begin
  3  dbms_lock.sleep(30);
  4  end;
  5  /
Elapsed: 00:00:00.15


kish@exdbx<>exec x
Elapsed: 00:00:30.01

At the sametime,compile the procedure from session B.Note that the command hanged for 26 seconds until the procedure complete execution

kish@exdbx<>alter procedure x compile;

Procedure altered.

Elapsed: 00:00:26.53

Simultaneously observe the locks and pins for the procedure with recent timestamp.

kish@exdbx<>select name,owner,pins,locks,timestamp from v$db_object_cache where name like '%DBMS_LOCK%' order by timestamp;


NAME
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OWNER                                                                                                PINS                 LOCKS
---------------------------------------------------------------- ---------------------------------------- ----------------------------------------
TIMESTAMP
----------------------------------------------------------------------------
DBMS_LOCK
SYS                                                                                                     1             1
2013-08-24/11:41:29

Library cache lock:

A session must acquire a lock on library cache to prevent other sessions access of same object at the same time to maintain consistency or to maintain dependency or to locate an object on library cache.

Log buffer space:

When session has to write redo on log buffer,but has no space to write new information.LGWR process flush log from log buffer to redolog file as per the 3 rules of LGWR(every 3 seconds,every 1/3rd of log buffer full and user commit).This event tells us that LGWR cant write the redo information faster and cope up with the high redo transactions generated by application.

This event is posted

  • Redolog files are on disks with I/O contention
  • if log buffer size is not optimal

Log file parallel write:

When session waits for LGWR process to write redo information from log buffer to redolog members in group.If LGWR writes redo in ASYNC mode,then it parallely writes the redo information to the active redologfile else it writes in serial manner.

Main reason for this event to occur are

  • redolog files are sized small
USERNAME                              SID    SERIAL#
------------------------------ ---------- ----------
EVENT
----------------------------------------------------------------
WAIT_CLASS                                                        WAIT_TIME
---------------------------------------------------------------- ----------
SECONDS_IN_WAIT STATE
--------------- -------------------
(oracle)                               20          1
log file parallel write
System I/O                                                                0
              0 WAITING

Log file sequential read:

When process waits for blocks to be read from online redolog files this event is posted.When ARCH process archives the redo to archivelog file,this event is posted.

USERNAME                    SID    SERIAL# EVENT
-------------------- ---------- ---------- ------------------------------
WAIT_CLASS       WAIT_TIME SECONDS_IN_WAIT STATE
--------------- ---------- --------------- -------------------
(oracle)                     41          1 log file sequential read
System I/O               0               0 WAITING

Log file switch(archiving needed) and log file switch (checkpoint incomplete)

*Archiving needed* message indicates that ARCH process is not coping up with LGWR process and is slow in archiving the redo to archivelog file than writing the redo to redolog file.This occurs only if database is in archivelog mode and LGWR has to wait until the redos are archived from one of the member being full.If archiving is failed,then this event is posted

*Checkpoint incomplete* indicate that LGWR wait for log file switch to complete because checkpoint for the logfile is not completed.

This is also an indication of redolog file size being small

kish@exdbx<>select bytes/1048576 from v$log;

                           BYTES/1048576
----------------------------------------
                                      50
                                      50
                                      50
                                      50

Let check the current time of database

kish@exdbx<>select to_char(sysdate,'dd-mm-yy hh24:mi:ss') from dual;

TO_CHAR(SYSDATE,'
-----------------
07-04-21 23:10:05

Lets run the log advisory report for past one hour where there was high DML activity which generated high redo generation.As you can see the size recommendation is very high at time when there was application activity on the database

   kish@exdbx<>SELECT
   (SELECT
      ROUND (AVG (BYTES) / 1024 / 1024, 2)
   FROM
      V$LOG) AS "Redo size (MB)",
   ROUND ( (20 / AVERAGE_PERIOD) * (SELECT AVG (BYTES) FROM V$LOG) / 1024 / 1024, 2) AS "Recommended Size (MB)"
FROM
   (SELECT AVG ( (NEXT_TIME - FIRST_TIME) * 24 * 60) AS
   AVERAGE_PERIOD
   FROM
   V$ARCHIVED_LOG
   WHERE
   FIRST_TIME > SYSDATE - 1
   AND
   TO_CHAR (FIRST_TIME, 'HH24:MI') BETWEEN '22:00' AND '23:00');

                          Redo size (MB)                    Recommended Size (MB)
---------------------------------------- ----------------------------------------
                                      50                                  2045.45

This doesnt mean that we blindly increase the redolog size because this also increase the mean time for recovery during instance crash(MTTR) determined by fast_start_mttr_target.Depending on the database and its behaviour,a deep analysis of optimal redo is required to determine the redolog size.

Log file sync:

User session performs DML transactions like insert,update or delete on the database and issues a commit or rollback to save the records.When user issues a commit or rollback,the changes made on the objects must be immediately saved by LGWR to members of online redolog files to maintain data protection.Until LGWR write the redos from log buffer to redolog files ,the server process has to wait for an acknowledgement from the LGWR.

If the session waits on same log buffer#,seq# of v$session_wait with an increment on those columns,then the session waits due to slow LGWR process.Here we need to identify if LGWR itself is slow due to short transaction commits or due to slow disks like RAID-5.

Solution:

  • Tune LGWR
  • Batch the transactions to reduce the commits
  • Check if all the members of redolog file are equally and optimally sized

SQL*NET message from client:

This wait event is posted if a client sits in front of sqlplus or sql developer terminal and process wait for a message from the client to action.This may indicate that the client go for break and come back after sometime and the process sits idle until clients provide any input.Another scenario may be batch programs run on the system non interactively which doesnot have control over client.The event itself state that problem is not from database but from network or application layer.

SYS                          60        391 SQL*Net more data from dblink
Network                  0               0 WAITING

SYS                          37         45 SQL*Net message to client
Network                 -1               0 WAITED SHORT TIME

SQL*NET message to client:

This event is contradictory to previous event.The client process is too busy in some task and server process has to wait for the acknowledgement from the client or due to network latency

USERNAME                    SID    SERIAL# EVENT                          WAIT_CLASS       WAIT_TIME SECONDS_IN_WAIT STATE
-------------------- ---------- ---------- ------------------------------ --------------- ---------- --------------- -------------------
SYS                          74        249 SQL*Net message to client      Network                 -1               0 WAITED SHORT TIME

ASM file metadata operation:

This event posts waits when the ASM processes perform operations very slow than the database process or the database process sends multiple request faster than the ASM can respond during a smart scan.

USERNAME                    SID    SERIAL# EVENT
-------------------- ---------- ---------- ------------------------------
WAIT_CLASS       WAIT_TIME SECONDS_IN_WAIT STATE
--------------- ---------- --------------- -------------------

(oracle)                     56        643 ASM file metadata operation
Other                    0               1 WAITING

VSSWRITER service stopped

WHAT:

VSSwriter is one of the services on windows used specifically for writing backup and recovery information in the form of snapshots and create volume based and component based shadow copies

Manages and implements Volume Shadow Copies used for backup and other purposes. If this service is stopped, shadow copies will be unavailable for backup and the backup may fail. If this service is disabled, any services that explicitly depend on it will fail to start.

If oracle database reside on windows OS,then oracle VSS writer enables all types of backup like log,copy,full,incremental differential and cumulative

All the options used in vssadmin are self explanatory

C:\Windows\system32>vssadmin
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Error: Invalid command.

---- Commands Supported ----

Delete Shadows        - Delete volume shadow copies
List Providers        - List registered volume shadow copy providers
List Shadows          - List existing volume shadow copies
List ShadowStorage    - List volume shadow copy storage associations
List Volumes          - List volumes eligible for shadow copies
List Writers          - List subscribed volume shadow copy writers
Resize ShadowStorage  - Resize a volume shadow copy storage association

vssadmin is a useful utility to control the vss writers and for listing status of all writers on the host and volumn shadow copy

VSS writer process may stop sometime due to unstable state of resource like CPU,memory etc or host itself.This may lead to failure of dependent process like backup if scheduled at the time

C:\Windows\system32>vssadmin list writers

vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Writer name: 'Task Scheduler Writer'
   Writer Id: {d61d61c8-d73a-4eee-8cdd-f6f9786b7124}
   Writer Instance Id: {1bddd48e-5052-49db-9b07-b96f96727e6b}
   State: [1] Stable
   Last error: No error

Writer name: 'VSS Metadata Store Writer'
   Writer Id: {75dfb225-e2e4-4d39-9ac9-ffaff65ddf06}
   Writer Instance Id: {088e7a7d-09a8-4cc6-a609-ad90e75ddc93}
   State: [1] Stable
   Last error: No error

Writer name: 'Performance Counters Writer'
   Writer Id: {0bada1de-01a9-4625-8278-69e735f39dd2}
   Writer Instance Id: {f0086dda-9efc-47c5-8eb6-a944c3d09381}
   State: [1] Stable
   Last error: No error

Writer name: 'System Writer'
   Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}
   Writer Instance Id: {6f04bff6-5501-4e09-a045-843c66492a22}
   State: [5] Waiting for completion
   Last error: No error

Writer name: 'Shadow Copy Optimization Writer'
   Writer Id: {4dc3bdd4-ab48-4d07-adb0-3bee2926fd7f}
   Writer Instance Id: {f6747060-ebb0-4c11-a94f-4e687c5d7981}
   State: [1] Stable
   Last error: No error

Writer name: 'ASR Writer'
   Writer Id: {be000cbe-11fe-4426-9c58-531aa6355fc4}
   Writer Instance Id: {3266cf90-c973-4df6-ad30-ab57ff536b9f}
   State: [1] Stable
   Last error: No error

Writer name: 'MSSearch Service Writer'
   Writer Id: {cd3f2362-8bef-46c7-9181-d62844cdc0b2}
   Writer Instance Id: {09a979e6-c403-4a37-bb07-eac141c6dcb6}
   State: [5] Waiting for completion
   Last error: No error

Writer name: 'IIS Config Writer'
   Writer Id: {2a40fd15-dfca-4aa8-a654-1f8c654603f6}
   Writer Instance Id: {ef00cb03-3995-4411-9ded-556d4a80a7bc}
   State: [1] Stable
   Last error: No error

Writer name: 'WMI Writer'
   Writer Id: {a6ad56c2-b509-4e6c-bb19-49d8f43532f0}
   Writer Instance Id: {23134a5e-23be-4087-ab74-1cf091229962}
   State: [5] Waiting for completion
   Last error: No error

Writer name: 'Registry Writer'
   Writer Id: {afbab4a2-367d-4d15-a586-71dbb18f8485}
   Writer Instance Id: {fc569deb-26d1-4e5d-bf76-304e493b09b7}
   State: [1] Stable
   Last error: No error

Writer name: 'COM+ REGDB Writer'
   Writer Id: {542da469-d3e1-473c-9f4f-7847f01fc64f}
   Writer Instance Id: {a8228371-281e-4d2e-9bff-f29f36b4aa06}
   State: [1] Stable
   Last error: No error

To list the volumes related to VSS

C:\Windows\system32>vssadmin list volumes
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Volume path: C:\
    Volume name: \\?\Volume{e8efcbfe-13a7-44a0-85b9-3f3d8bb9efa7}\
Volume path: \\?\Volume{b684ff23-a96c-45bc-a37e-6f4006673de9}\
    Volume name: \\?\Volume{b684ff23-a96c-45bc-a37e-6f4006673de9}\
Volume path: D:\
    Volume name: \\?\Volume{7677cd95-e711-4056-a59f-c21c5dd7b6a4}\
Volume path: E:\
    Volume name: \\?\Volume{9529e32f-701a-489a-81ac-0a0f82a64d15}\
Volume path: F:\
    Volume name: \\?\Volume{4af99bb0-57c1-4c90-8b42-e4e9052c9e24}\

To check the provider name and details

C:\Windows\system32>vssadmin list providers
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Provider name: 'Microsoft Software Shadow Copy provider 1.0'
   Provider type: System
   Provider Id: {b5946137-7b9f-4925-af80-51abd60b20d5}
   Version: 1.0.0.7

oravssw is another useful utlity which is linked to dll file can be helpful to start,stop and check status of the current writer

C:\Windows\system32>oravssw


oravssw options are:

        { /q [/start | /stop | /status] } |
        { SID [/tl <trace_level>] [/tf <trace_file>] } |
        { SID [/i {/user:<userid> /password:<password>}] } |
        { SID [/d] }

        /q to operate on all Oracle VSS Writer services
        /status to display status for Oracle VSS Writer service
        /start to start Oracle VSS Writers Services
        /stop to stop Oracle VSS Writer Services

        /tl to specify trace level
        /tf to specify trace file name

        /i {/user:<userid> /password:<password>}
                to Install Oracle VSS writer Service for specified SID

        /d to De-Install Oracle VSS writer Service for specified SID

To check the status of oracle VSS writer

C:\Windows\system32>oravssw /q status
OracleVssWriterORCL

We can also check the service status in services.msc

Volume shadow copy is the service name which runs in background.This service can be either restarted automatically or manually if the service fail

There are four options to choose to control the volume shadow copy service

manual,automatic,automatic(delayed start),disabled

Any of the above options can be chosen based on the manipulation purpose of the service

ORA-12838: cannot read/modify an object after modifying it in parallel

Cause:

The error occur when multiple session try to load data on the same table with direct path load operation + parallel

SQL> begin
  2  for i in 1 .. 100000 loop
  3  insert /*+append */ into sales select * from demo_dup;
  4  end loop;
  5  commit;
  6  end;
  7  /
begin
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
ORA-06512: at line 3
[oracle@exdbadm01 ~]$ oerr ora 12838
12838, 00000, "cannot read/modify an object after modifying it in parallel"
// *Cause: Within the same transaction, an attempt was made to add read or
// modification statements on a table after it had been modified in parallel
// or with direct load. This is not permitted.
// *Action: Rewrite the transaction, or break it up into two transactions:
// one containing the initial modification and the second containing the
// parallel modification operation.

Workaround:

There are no constraints on the table where data load happen so constraints error were skipped and ora 12838 is thrown.Four sessions were connected to same database and all sessions perform insert with append option

kish@exdbx<>select a.sid,a.serial#,a.username,a.sql_id,a.event,b.sql_text from v$session a,v$sql b where a.sql_id=b.sql_id;

                                     SID                                  SERIAL# USERNAME                       SQL_ID        EVENT
---------------------------------------- ---------------------------------------- ------------------------------ ------------- ----------------------------------------------------------------
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


                                      36                                      133 TEST                           b08gc4pv7cprr enq: TM - contention
INSERT /*+append */ INTO SALES SELECT * FROM DEMO_DUP


                                      71                                      657 TEST                           b08gc4pv7cprr enq: TM - contention
INSERT /*+append */ INTO SALES SELECT * FROM DEMO_DUP

Execute the insert with no append because by default direct path load work in parallel and rows are appended above highwatermark ignoring the free space gaps below.If constraints are present on the table,then oracle force conventional data load to maintain consistency

Either exclude append hint from the insert or run one insert with append ,commit it and then go for next insert and so on

Session 1:

SQL> begin
  2  for i in 1 .. 100000 loop
  3  insert /*+append */ into sales select * from demo_dup;
  4  end loop;
  5  commit;
  6  end;
  7  /

commited

After transaction 1 commit from session 1,run next transaction in session 2
Session 2:


SQL> begin
  2  for i in 100000 .. 100001 loop
  3  insert /*+append */ into sales select * from demo_dup;
  4  end loop;
  5  commit;
  6  end;
  7  /

IDENTIFY MASTER NODE IN RAC

Get master node details from

1)ohasd logfile

[oracle@exdbadm01 ohasd]$ grep 'MASTER' ohasd.log|tail
2021-04-07 16:23:05.580: [   CRSPE][2999859520]{0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Configuring] new [Starting]
2021-04-07 16:23:05.582: [   CRSPE][2999859520]{0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Starting] new [Running]
2021-04-08 13:39:13.858: [   CRSPE][839031104]{0:0:2} PE Role|State Update: old role [INVALID] new [MASTER]; old state [Enabling: waiting for role] new [Configuring]
2021-04-08 13:39:13.858: [   CRSPE][839031104]{0:0:2} PE MASTER NAME: exdbadm01
2021-04-08 13:39:14.569: [   CRSPE][839031104]{0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Configuring] new [Starting]
2021-04-08 13:39:14.571: [   CRSPE][839031104]{0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Starting] new [Running]
2021-04-09 13:21:01.639: [   CRSPE][2623547712]{0:0:2} PE Role|State Update: old role [INVALID] new [MASTER]; old state [Enabling: waiting for role] new [Configuring]
2021-04-09 13:21:01.639: [   CRSPE][2623547712]{0:0:2} PE MASTER NAME: exdbadm01
2021-04-09 13:21:02.548: [   CRSPE][2623547712]{0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Configuring] new [Starting]
2021-04-09 13:21:02.551: [   CRSPE][2623547712]{0:0:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Starting] new [Running]

2)v$ges_resource dynamic parameter

kish@exdbx<>select RESP,RESOURCE_NAME,MASTER_NODE,NEXT_CVT_LEVEL from v$ges_resource where rownum < 11;

RESP             RESOURCE_NAME                                               MASTER_NODE NEXT_CVT_
---------------- ------------------------------ ---------------------------------------- ---------
00000000C64D7128 [0x43c0f61e][0xb9c6fb9e],[QQ][                                        0 KJUSERNL
00000000D1977698 [0xd9ac5aaf][0x38a98ebb],[QQ][                                        0 KJUSERNL
00000000D1913028 [0x682062be][0x33658b50],[QQ][                                        0 KJUSERNL
00000000D19011D8 [0x4c4f434b][0x39],[IV][ext 0x                                        0 KJUSERNL
00000000D187C048 [0x33708fee][0x6bf1a91f],[QQ][                                        1 KJUSERNL
00000000D18F53E0 [0x36818230][0x751c0852],[QI][                                        0 KJUSERNL
00000000C665CF90 [0x801502][0x1],[QC][ext 0x0,0                                        1 KJUSERNL
00000000C680AC78 [0xf1b15c0b][0x50a5d6ae],[QQ][                                        1 KJUSERNL
00000000D1890688 [0xe7532ae4][0x24fdb5df],[QQ][                                        1 KJUSERNL
00000000D18CE9C8 [0xf9a5536][0x6254bff1],[QQ][e                                        1 KJUSERNL

3)ocssd logfile

[oracle@exdbadm01 ~]$ grep 'MASTER' /u01/app/11.2.0/log/exdbadm01/cssd/ocssd.log|tail

2021-04-09 14:06:25.972: [    CSSD][1021745472]clssgmTestSetLastGrockUpdate: grock(CLSN.RLB.exdbx.MASTER), updateseq(5) msgseq(6), lastupdt<0x20992b0>, ignoreseq(0)
2021-04-09 14:06:25.972: [    CSSD][1021745472]clssgmGrockOpTagProcess: Request to commission member(1) using key(1) for grock(CLSN.RLB.exdbx.MASTER)
2021-04-09 14:06:25.972: [    CSSD][1021745472]clssgmUpdateGrpData: grock(CLSN.RLB.exdbx.MASTER), commissioner(1/1)
2021-04-09 14:06:25.973: [    CSSD][1021745472]clssgmQueueGrockEvent: groupName(CLSN.RLB.exdbx.MASTER) count(2) master(2) event(18), incarn 1, mbrc 1, to member 1, events 0xa0, state 0x0
2021-04-09 14:06:25.973: [    CSSD][1021745472]clssgmHandleGrockRcfgUpdate: grock(CLSN.RLB.exdbx.MASTER), updateseq(6), status(0), sendresp(1)
2021-04-09 14:06:26.554: [    CSSD][1021745472]clssgmTestSetLastGrockUpdate: grock(CLSN.RLB.exdbx.MASTER), updateseq(6) msgseq(7), lastupdt<0x2098dd0>, ignoreseq(0)
2021-04-09 14:06:26.554: [    CSSD][1021745472]clssgmGrockOpTagProcess: Request to commission member(-1) using key(1) for grock(CLSN.RLB.exdbx.MASTER)
2021-04-09 14:06:26.554: [    CSSD][1021745472]clssgmUpdateGrpData: grock(CLSN.RLB.exdbx.MASTER), commissioner(-1/0)
2021-04-09 14:06:26.554: [    CSSD][1021745472]clssgmQueueGrockEvent: groupName(CLSN.RLB.exdbx.MASTER) count(2) master(2) event(18), incarn 0, mbrc 1, to member 1, events 0xa0, state 0x0
2021-04-09 14:06:26.554: [    CSSD][1021745472]clssgmHandleGrockRcfgUpdate: grock(CLSN.RLB.exdbx.MASTER), updateseq(7), status(0), sendresp(1)

4)crsd logfile

[oracle@exdbadm01 ~]$ grep 'MASTER' /u01/app/11.2.0/log/exdbadm01/crsd/crsd.log|tail


2021-04-07 16:37:55.261: [  OCRMAS][1307658560]th_master: NEW OCR MASTER IS 2
2021-04-07 16:37:57.188: [   CRSPE][1370696000]{1:39863:2} PE MASTER NAME: exdbadm02
2021-04-08 14:38:20.591: [  OCRMAS][1711864128]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number 1
2021-04-08 14:38:23.231: [   CRSPE][1849760064]{1:43178:2} PE Role|State Update: old role [INVALID] new [MASTER]; old state [Enabling: waiting for role] new [Configuring]
2021-04-08 14:38:23.231: [   CRSPE][1849760064]{1:43178:2} PE MASTER NAME: exdbadm01
2021-04-08 14:38:24.739: [   CRSPE][1849760064]{1:43178:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Configuring] new [Starting]
2021-04-08 14:38:24.745: [   CRSPE][1849760064]{1:43178:2} PE Role|State Update: old role [MASTER] new [MASTER]; old state [Starting] new [Running]
2021-04-09 14:02:49.296: [  OCRMAS][2666457408]proath_master: SUCCESSFULLY CONNECTED TO THE MASTER
2021-04-09 14:02:49.296: [  OCRMAS][2666457408]th_master: NEW OCR MASTER IS 2
2021-04-09 14:02:52.784: [   CRSPE][2729494848]{1:57928:2} PE MASTER NAME: exdbadm02

5)Recent OCR backup taken by any node is MASTER B)

[oracle@exdbadm01 ~]$ ocrconfig -showbackup

exdbadm02     2021/04/09 22:18:00     /u01/app/11.2.0/cdata/exdbadm-scan/backup00.ocr <==== I am Master 

exdbadm02     2021/04/09 18:17:58     /u01/app/11.2.0/cdata/exdbadm-scan/backup01.ocr

exdbadm01     2021/04/08 22:53:32     /u01/app/11.2.0/cdata/exdbadm-scan/backup02.ocr

exdbadm01     2021/04/08 18:53:31     /u01/app/11.2.0/cdata/exdbadm-scan/day.ocr

exdbadm01     2021/04/03 17:27:41     /u01/app/11.2.0/cdata/exdbadm-scan/week.ocr

SHELL SCRIPT TO CHECK HIGH CPU SESSION FROM LINUX TO ORACLE

#This script will check the top oracle process which consume high CPU from OS and map the OSPID to database spid 
_check_high_CPU(){


CPU_LOG=/tmp/CPU.f
CPU_LEVEL=15.0
CPU_FACTOR=ps -eo pcpu,pid,user,etime|grep 'oracle'|sort -nr|head -1|awk '{print $1,$2,$3}'
#echo $CPU_FACTOR > /tmp/TOP_CPU_PROCESS.f
if [ echo "$CPU_FACTOR"|head -1|awk '{print $1}' > $CPU_LEVEL ];then echo "HIGH CPU";fi
SPID=echo "$CPU_FACTOR"|awk '{print $2}'
echo "$SPID"
$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" << EOF > $CPU_LOG
DEFINE SPID='$SPID'
set lines 200
set pages 1000
set echo on
column SPID format a20
column USERNAME format a20
column PROGRAM format a20
column SQL_ID format a20
SELECT se.inst_id,
       se.sid,
       se.serial#,
       se.sql_id,
       pr.spid,
       se.username,
       se.program
FROM   gv\$session se
       INNER JOIN gv\$process pr ON pr.addr = se.paddr AND pr.inst_id = se.inst_id
WHERE  p.spid='${SPID}' 
order by se.sid;
EOF
exit;


grep "sid" $CPU_LOG
if [ $? -eq 0 ];then
mailx -s "`$ORACLE_SID`:HIGH CPU USAGE" youremail.gmail.com
fi

}
_check_high_CPU

DO YOU KNOW TO IDENTIFY IPC PROTOCOL USED BY ORACLE RAC?

Here in this article you learn to check the ipc protocol used in your RAC system using trace dump

To get the RAC ipc protocol details used for communication between the nodes,use tracing utility like oradebug

Below output shows an example of RDS protocol used for interprocess communication


kish@exdbx<>oradebug ipc
IPC information written to the trace file

kish@exdbx<>oradebug tracefile_name
/data/base/diag/rdbms/exdbx/exdbx1/trace/exdbx1_ora_17035.trc


SKGXP:[2b9012c62c48.35]{ctx}:   SSKGXPT 0x2b9012c641b8 flags 0x4 { READPENDING } sockno 4 IP 169.254.66.74 RDS 4377 lerr 0
SKGXP:[2b9012c62c48.36]{ctx}:
SKGXP:[2b9012c62c48.37]{ctx}: post port:
SKGXP:[2b9012c62c48.38]{ctx}:   SSKGXPT 0x2b9012c64680 flags 0x2 { WRITE } sockno 7 IP 127.0.0.1 UDP 60304 lerr 0
SKGXP:[2b9012c62c48.39]{ctx}:
SKGXP:[2b9012c62c48.40]{ctx}: qos adm 0/0 -> 255 UNKNOWN QOS LEVEL rdma 0 -> 0 flags dis

ORA-16038: log 1 sequence# 191 cannot be archived – ORA-19809: limit exceeded for recovery files – ORA-00312: online log 1 thread 1: ‘+DATA/exdbx/onlinelog/group_1.261.1063822485’ – ORA-00312: online log 1 thread 1: ‘+FRA/exdbx/onlinelog/group_1.257.1063822491’

Cause:

RAC database instances were not starting after investigating for long time.

ORA-16038: log 1 sequence# 191 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 1 thread 1: '+DATA/exdbx/onlinelog/group_1.261.1063822485'
ORA-00312: online log 1 thread 1: '+FRA/exdbx/onlinelog/group_1.257.1063822491'
USER (ospid: 17363): terminating the instance due to error 16038
Tue Mar 30 12:12:36 2021
System state dump requested by (instance=1, osid=17363), summary=[abnormal instance termination].
System State dumped to trace file /u01/base/diag/rdbms/exdbx/exdbx1/trace/exdbx1_diag_17302_20210330121236.trc
Dumping diagnostic data in directory=[cdmp_20210330121236], requested by (instance=1, osid=17363), summary=[abnormal instance termination].
Instance terminated by USER, pid = 17363

srvctl throw error with respect to archivelog sequence and ended up with end of file on communicatio channel

[oracle@exdbadm01 ~]$ srvctl start database -d exdbx
PRCR-1079 : Failed to start resource ora.exdbx.db
CRS-5017: The resource action "ora.exdbx.db start" encountered the following error:
ORA-16038: log 1 sequence# 191 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 1 thread 1: '+DATA/exdbx/onlinelog/group_1.261.1063822485'
ORA-00312: online log 1 thread 1: '+FRA/exdbx/onlinelog/group_1.257.1063822491'
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/exdbadm02/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.exdbx.db' on 'exdbadm02' failed
CRS-5017: The resource action "ora.exdbx.db start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 17363
Session ID: 1 Serial number: 3
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/exdbadm01/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.exdbx.db' on 'exdbadm01' failed
CRS-2632: There are no more servers to try to place resource 'ora.exdbx.db' on that would satisfy its placement policy

When i checked FRA,sufficient space was available but still error persists.

kish@exdbx<>show parameter db_recovery_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      +FRA
db_recovery_file_dest_size           big integer 4868M

Workaround:

After investigating on the root cause, archivelog files were directly removed using OS utility and not cleared using RMAN graceful archive delete



RMAN> crosscheck archivelog all;
RMAN> delete expired archivelog all;

After running above archivelog delete commands in RMAN,issue got cleared

[oracle@exdbadm01 lib]$ srvctl stop database -d exdbx                                                                 [oracle@exdbadm01 lib]$ srvctl start database -d exdbx
[oracle@exdbadm01 lib]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Mar 30 17:30:28 2021

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

kish@exdbx<>select status from gv$instance;

STATUS
------------
OPEN
OPEN

ORA-01119: error in creating database file ‘+DATA’ – ORA-17502: ksfdcre:4 Failed to create file +DATA – ORA-15041: diskgroup “DATA” space exhausted

Cause:

The error creeps in ,when datafile or tempfile cant be added due to space crunch on asm diskgroup

kish@exdbx<>create temporary tablespace temp2 tempfile '+DATA' size 1G;
create temporary tablespace temp2 tempfile '+DATA' size 1G
*
ERROR at line 1:
ORA-01119: error in creating database file '+DATA'
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup "DATA" space exhausted

Workaround:

Check the space for DATA diskgroup.We have 112 MB space on the diskgroup which has external redundancy.So there is no room for incoming datafile or tempfile to accomodate in DATA diskgroup

ASMCMD> lsdg --discovery
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  4194304      5760      112                0             112              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  4194304      1440     1304                0            1304              0             N  DATA2/
MOUNTED  EXTERN  N         512   4096  4194304      5040     1784                0            1784              0             N  FRA/
MOUNTED  EXTERN  N         512   4096  1048576      1440     1042                0            1042              0             Y  MGMT/

Check with your storage team to allocate the disks to the server and check disk header status in v$asm_disk

If it is exadata,then all the luns should be added as celldisks which in turn should be mapped to ASM griddisks.In my case, after storage team added 4 arrays(cell20,cell21,cell22,cell23),symbolic link should be created from the logical cylinders

[root@exceladm00 raw]# ls -lrt
total 0
lrwxrwxrwx 1 root root 9 Feb 25 01:38 cell23 -> /dev/sdad
lrwxrwxrwx 1 root root 9 Feb 25 01:38 cell22 -> /dev/sdac
lrwxrwxrwx 1 root root 9 Feb 25 01:38 cell21 -> /dev/sdab
lrwxrwxrwx 1 root root 9 Feb 25 01:38 cell20 -> /dev/sdaa

Use a shell script to generate symbolic link commands in one go

fdisk -l 2>/dev/null | grep "805 MB" | awk '{ printf "%s%02d\n", "ln -s "$2" cell", NR }'|sed "s/://"

  392  ln -s /dev/sdaa cell20
  393  ln -s /dev/sdab cell21
  394  ln -s /dev/sdac cell22
  395  ln -s /dev/sdad cell23

In storage server,check if physical disk which were allocated reflect!

CellCLI> list physicaldisk

         /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell20    /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell20          normal
         /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell21    /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell21          normal
         /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell22    /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell22          normal
         /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell23    /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell23          normal

Create celldisk from the physical disk with size clause to allocate only specific size for capacity

CellCLI> create celldisk  all;
CellDisk FD_06_stocell successfully created
CellDisk FD_07_stocell successfully created
CellDisk FD_08_stocell successfully created
CellDisk CD_cell20_stocell successfully created
CellDisk CD_cell21_stocell successfully created
CellDisk CD_cell22_stocell successfully created
CellDisk CD_cell23_stocell successfully created

Next map your asm to celldisks with griddisk creation

CellCLI> create griddisk all harddisk prefix=DATA
Cell disks were skipped because they had no freespace for grid disks: CD_cell10_stocell, CD_cell11_stocell, CD_cell12_stocell, CD_cell13_stocell, CD_cell14_stocell, CD_cell15_stocell, CD_cell16_stocell, CD_cell17_stocell, CD_cell18_stocell.
GridDisk DATA_CD_cell20_stocell successfully created
GridDisk DATA_CD_cell21_stocell successfully created
GridDisk DATA_CD_cell22_stocell successfully created
GridDisk DATA_CD_cell23_stocell successfully created

Newly added disk header status should reflect as CANDIDATE disk.Below,there are 4 candidate disks which can be added to the DATA diskgroup for space.

SQL> select name,header_status,state from v$asm_disk;

NAME                           HEADER_STATU STATE
------------------------------ ------------ --------
                               CANDIDATE    NORMAL
                               CANDIDATE    NORMAL
                               CANDIDATE    NORMAL
                               CANDIDATE    NORMAL
DATA_CD_CELL07_STOCELL         MEMBER       NORMAL
MGMT_CD_CELL15_STOCELL         MEMBER       NORMAL
FRA_CD_CELL08_STOCELL          MEMBER       NORMAL
MGMT_CD_CELL16_STOCELL         MEMBER       NORMAL
DATA_CD_CELL19_STOCELL         MEMBER       NORMAL
FRA_CD_CELL12_STOCELL          MEMBER       NORMAL
FRA_CD_CELL11_STOCELL          MEMBER       NORMAL

NAME                           HEADER_STATU STATE
------------------------------ ------------ --------
DATA_CD_CELL09_STOCELL         MEMBER       NORMAL
FRA_CD_CELL09_STOCELL          MEMBER       NORMAL
DATA_CD_CELL05_STOCELL         MEMBER       NORMAL
DATA_CD_CELL06_STOCELL         MEMBER       NORMAL
FRA_CD_CELL14_STOCELL          MEMBER       NORMAL
FRA_CD_CELL10_STOCELL          MEMBER       NORMAL
DATA_CD_CELL03_STOCELL         MEMBER       NORMAL
FRA_CD_CELL13_STOCELL          MEMBER       NORMAL
DATA_CD_CELL04_STOCELL         MEMBER       NORMAL
DATA_CD_CELL01_STOCELL         MEMBER       NORMAL
DATA_CD_CELL02_STOCELL         MEMBER       NORMAL

NAME                           HEADER_STATU STATE
------------------------------ ------------ --------
DATA_CD_CELL08_STOCELL         MEMBER       NORMAL

23 rows selected.

Use the below command with rebalance power 9 to speed up the rebalance operation.Make sure there is sufficient CPU power on your host so that 9 parallel I/O’s can be performed by ARB process.

SQL> alter diskgroup DATA add disk 'o/192.168.56.33/DATA_CD_cell2*' reba                                      lance power 9


NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (2,8) to disk (o/192.168.56.33/DATA_CD_cell21_stocell)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk DATA_CD_CELL21_STOCELL
NOTE: requesting all-instance disk validation for group=2
Thu Mar 25 22:01:18 2021
NOTE: skipping rediscovery for group 2/0x1f0821be (DATA) on local instance.
NOTE: initiating PST update: grp = 2
Thu Mar 25 22:01:20 2021
GMON updating group 2 at 13 for pid 29, osid 9120
NOTE: PST update grp = 2 completed successfully
NOTE: membership refresh pending for group 2/0x1f0821be (DATA)
GMON querying group 2 at 14 for pid 19, osid 4885
NOTE: cache opening disk 8 of grp 2: DATA_CD_CELL21_STOCELL path:o/192.168.56.33                                      /DATA_CD_cell21_stocell
GMON querying group 2 at 15 for pid 19, osid 4885
Thu Mar 25 22:01:34 2021
SUCCESS: refreshed membership for 2/0x1f0821be (DATA)
Thu Mar 25 22:01:36 2021
SUCCESS: alter diskgroup DATA add disk 'o/192.168.56.33/DATA_CD_cell21_stocell'                                       rebalance power 9
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
NOTE: starting rebalance of group 2/0x1f0821be (DATA) at power 9
Starting background process ARB0
Thu Mar 25 22:01:38 2021
ARB0 started with pid=36, OS id=9478
NOTE: assigning ARB0 to group 2/0x1f0821be (DATA) with 9 parallel I/Os

Disk are added to DATA diskgroup and datafile or tempfile to instance can be added

State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  Y         512   4096  4194304      7920     2248                0            2248              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  4194304      1440     1304                0            1304              0             N  DATA2/
MOUNTED  EXTERN  N         512   4096  4194304      5040     1752                0            1752              0             N  FRA/
MOUNTED  EXTERN  N         512   4096  1048576      1440     1042                0            1042              0             Y  MGMT/
kish@exdbx<>create temporary tablespace temp2 tempfile '+DATA2' size 100M autoextend on next 50M maxsize 1G;

Tablespace created.

Elapsed: 00:00:04.26

Does killing arch process crash the database instance?

What happen if you kill the archiver background process in oracle database? Does it kill the database or there is no impact for database? Great! Lets research

It is always good to practically test a theoritical question which are sceptical

[oracle@orcl12x ~]$ ps -ef|grep arc
oracle    3829     1  0 05:55 ?        00:00:00 ora_arc0_orcl12x
oracle    3831     1  0 05:55 ?        00:00:00 ora_arc1_orcl12x
oracle    3833     1  0 05:55 ?        00:00:00 ora_arc2_orcl12x
oracle    3835     1  0 05:55 ?        00:00:00 ora_arc3_orcl12x
oracle    3980  3361  0 05:56 pts/0    00:00:00 grep arc

I have killed all the archive process running in background.There is no archiver process running.

[oracle@orcl12x ~]$ kill -9 3829
[oracle@orcl12x ~]$ kill -9 3833
[oracle@orcl12x ~]$ kill -9 3835
[oracle@orcl12x ~]$ kill -9 3831

[oracle@orcl12x ~]$ ps -ef|grep arc
oracle    4036  3361  0 05:56 pts/0    00:00:00 grep arc

Lets check the database status

There is no impact for database on killing arch process as the process gets restart automatically once killed

[oracle@orcl12x ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Mon Mar 1 05:56:58 2021

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select status from v$instance;

STATUS
------------
OPEN

PLS-00201: identifier ‘DBMS_LOGSTDBY.BUILD’ must be declared

Cause:

SQL> execute dbms_logstdby.build;
BEGIN dbms_logstdby.build; END;

      *
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00201: identifier 'DBMS_LOGSTDBY.BUILD' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored

Workaround:

Check the user and status of the database.

Database should be in open mode

SQL> show user
USER is "SYS"
SQL> select status from v$instance;

STATUS
------------
MOUNTED