RAC database hung analysis

RAC database hung analysis

If rac database instance is slow due to deadlock or other blocking ,we can use oradebug hang analyzer to deep dive into the root of cause.

Lets troubleshoot with a simulation

I create a sample table for testing purpose

SQL> CREATE TABLE regions
   ( region_id NUMBER
   CONSTRAINT region_id_nn NOT NULL
   , region_name VARCHAR2(25)
   );
CREATE UNIQUE INDEX reg_id_pk
         ON regions (region_id);
ALTER TABLE regions
         ADD ( CONSTRAINT reg_id_pk
   PRIMARY KEY (region_id)
   ) ;  2    3    4    5


Table created.

SQL>   2
Index created.

SQL>   2    3    4


Table altered.

I insert rows on node 1 and donot commit the changes

SQL> SQL> SQL> SQL>
SQL>
SQL> INSERT INTO regions VALUES
   ( 1
   , 'Europe'
   );
INSERT INTO regions VALUES
   ( 2
   , 'Americas'
   );
INSERT INTO regions VALUES
   ( 3
   , 'Asia'
   );
INSERT INTO regions VALUES
   ( 4
   , 'Middle East and Africa'
   );  2    3    4

1 row created.

SQL>   2    3    4
1 row created.

SQL>   2    3    4
1 row created.

SQL>   2    3    4
1 row created.

At the same time i insert the same rows on the same table with different user but from node 2!

SQL> select name from v$database;

NAME
---------
RACDB

SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
RACDB2

SQL> show user
USER is "TEST"
SQL> INSERT INTO sys.regions VALUES
   ( 1
   , 'Europe'
   );
INSERT INTO sys.regions VALUES
   ( 2
   , 'Americas'
   );
INSERT INTO sys.regions VALUES
   ( 3
   , 'Asia'
   );
INSERT INTO sys.regions VALUES
   ( 4
   , 'Middle East and Africa'
   );  2    3    4

The database is hung as node 1 blocks node 2 to insert rows

Enable the hang analyzer on one of the RAC nodes to generate trace files for analyzing

SQL> oradebug setmypid
oStatement processed.
SQL> radebug unlimit
oradebugStatement processed.
SQL> oradebug setmypid
SP2-0734: unknown command beginning "oradebugor..." - rest of line ignored.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug setinst all
Statement processed.
SQL> oradebug -g all hanganalyze 3
Hang Analysis in /grid/base/diag/rdbms/racdb/RACDB1/trace/RACDB1_diag_3917.trc
SQL> oradebug -g all hanganalyze 3
Hang Analysis in /grid/base/diag/rdbms/racdb/RACDB1/trace/RACDB1_diag_3917.trc
SQL> oradebug -g all hanganalyze 3
Hang Analysis in /grid/base/diag/rdbms/racdb/RACDB1/trace/RACDB1_diag_3917.trc

In the hang trace file ,we can find the diagnostic details of RAC1 blocking RAC2 with wait event ,blocking id,waiting time etc which is pretty much valuable information!!

Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <not in a wait><='rdbms ipc reply'<='enq: TX - row lock contention'
     Chain 1 Signature Hash: 0x7ba3a1e1

===============================================================================
Non-intersecting chains:

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 2 (racdb.racdb2)
                   os id: 5922
              process id: 40, oracle@RAC2 (TNS V1-V3)
              session id: 60
        session serial #: 67
    }
    is waiting for 'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580004
                      p2: 'usn<<16 | slot'=0x6000e
                      p3: 'sequence'=0x32b
            time in wait: 5 min 50 sec
           timeout after: never
                 wait id: 97
               blocking: 0 sessions
             current sql: INSERT INTO sys.regions VALUES
   ( 1
   , 'Europe'
   )
             short stack: ksedsts()+265<-ksdxfstk()+19<-ksdxcb()+1646<-sspuser()+100<-semtimedop()+36<-sskgpwwait()+202<-skgpwwait()+138<-ksliwat()+1602<-kslwaitctx()+138<-kjusuc()+6900<-ksipgetctxi()+1641<-ksqcmi()+23174<-ksqgtlctx()+3204<-ksqgelctx()+469<-ktuGetTxForXid()+95<-ktcwit1()+272<-ktbgtl0()+958<-kdiins0()+46577<-kdiinsp()+61<-kauxsin()+1634<-qesltcLoadIndexList()+718<-qesltcLoadIndexes()+43<-qerltcSimpleSingleInsRowCBK()+65<-qerltcSingleRowLoad()+228<-qerltcFetch()+301<-insexe()+567<-opiexe()+4953<-kpoal8()+1811<-opiodr()+962<-
            wait history:
              * time between current wait and wait #1: 0.002054 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 26.277790 sec
                     wait id: 96              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000002 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 95              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000031 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 0.000146 sec
                     wait id: 94              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (racdb.racdb1)
                   os id: 11456


oradebug is the advisable tool in times of performance issues!!

Until the session 1 in node 1 releases the lock from the rows in the table,the other session couldnot modify the rows in the same table.So if the user doesnot commit and flush the block changes to the disk ,other sessions have to sit idle and wait in the queue!!

Leave a Reply

Discover more from XscalibaL

Subscribe now to keep reading and get access to the full archive.

Continue reading