The following article consists of suggestions based on personal experience ,learning and own style of troubleshooting. This page is just a perception of mine and not to imitate anyone or any organization.

Basically, CPU usage = Workload / Capacity

CPU is mainly used for processing the SQL statements like parsing, fetching, executing etc

Here we are going to see CPU usage from oracle database perspective not from OS perspective. But we will use OS tools to identify the processes consuming CPU

Check the top process to monitor the pid eating up high CPU. Note that , top command will sometimes show process which consume CPU consistently. So you do high level analysis first to check the load average for last five minutes and idle% on the output

First thing to check on top output is ,the user who is on the top of the top command. It should be either root or oracle

If it is a root user, then you may need to involve system admin to resolve the issue. But if the user is oracle ,then it is DBA who need to resolve

If load average value for last 5 minutes is high and idle% is 0.0% consistently, then your oracle database might end up in performance issues because of lack of CPU resource for other processes. This also means your server is CPU bound with 100% CPU used if you find this kind of behaviour most of the time.



top - 23:04:10 up  1:18,  1 user,  load average: 8.35, 2.17, 0.92
Tasks: 277 total,  25 running, 245 sleeping,   6 stopped,   1 zombie
Cpu(s): 36.6%us, 16.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 46.7%si,  0.0%st
Mem:   4723712k total,  3673968k used,  1049744k free,   125640k buffers
Swap:  3172828k total,        0k used,  3172828k free,  1907968k cached

10739 oracle    16   0 2113m 122m 117m R 36.5  2.7   0:06.07 oracle

If you see the below top output, on the left side, you have to perform three observations

  • Keep an eye on ‘%id’ column for – 0.0 (100% cores of CPU is utilized)
  • Keep an eye on Load average for first 5 mins (highlighted in blue) – Should not be signifantly high compared to 10 mins and 15 mins load. In below example, 56.07 (5 mins load) is significantly high compared to 43.49(10 mins load) and 30.38 (15 mins load)
  • Keep an eye on top PID and USER and check if it is oracle or OS or application process
Fig – 1.0 – Top output with high load average and CPU 100%


Use vmstat to see if CPU run queue exceed the total CPU core capacity . Focus on column ‘r’ under procs of vmstat output. It shows that there are high concurrent sessions which are waiting in queue for CPU resource. This means you have to either add more CPU cycles or distribute the application workload around the clock or schedule the jobs during idle time of database. If below capacity and workload is compared , then run queue (r) > cpu_count which means system is overloaded than capacity it has.

[oracle@exdbadm01 ~]$ vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
** r ** b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
19  0      0 1370648  38384 1932252    0    0   386   116  483 1479  7  7 80  4  3
38  0      0 1354784  38408 1939472    0    0     2    66 1363 1468 35 62  3  0  0
33  0      0 1338144  38432 1945248    0    0     2    66 1298 1372 30 70  0  0  0
43  0      0 1313560  38456 1947764    0    0     2    52 1589 1300 25 75  0  0  0
21  0      0 1301236  38480 1950140    0    0     2    66 1132 1178 28 72  0  0  0
14  0      0 1296888  38496 1953904    0    0     2    67  934 1317 32 68  0  0  0
11  0      0 1291784  38520 1959876    0    0     2    87 1189 1400 39 61  0  0  0
29  0      0 1290676  38536 1961076    0    0     2    82 1367 1368 16 82  2  0  0
18  0      0 1285328  38552 1961964    0    0     2    34 1699 1370 14 86  0  0  0
30  0      0 1282732  38576 1966356    0    0     2    97 1206 1452 31 69  0  0  0

CPU’s can be verified like this from OS level

[oracle@exdbadm01 ~]$ lscpu|grep 'CPU(s):'|grep -v 'NUMA'
CPU(s):                6

Also verify from database level

SQL> show parameter CPU_COUNT

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     6

Check the plist-sz column of sar output and plot it in a graph like timeseries plot. Visualization often helps us to easily identify the spikes in resources or workload during a particular time interval.

Fig – 1.1 – Time series plot of total process in database

Plot the load average with respect to CPU using sar output to analyze the history of system behaviour with respect to the workload. If you see at Fig – 1.1 graph, we can easily narrow down the unexpected pile up of CPU utilization on the right corner of graph.

Compare Fig -1.1 graph with Fig – 1.2 . You can observe a pattern of correlation

  • Higher the task count(>600) , higher is the CPU load.
  • Upto some extent the CPU usage is low. But when task count is increased from 18:00:01 till 23:50:03,the CPU usage started to climb the hill
  • If you see below output , CPU usage has not increased till 23:20:04 in Fig 1.2 ,however task count start to increase at 17:50:01 itself in Fig 1.1. This shows that increase in task count(600) at 17:50:01 doesnot suddenly increase the CPU to (idle% (0.0) – 100%). CPU usage is constant for more than 3 hours irrespective of the process count but once task count > 600 at 23:10:01, there is sudden spike observed below.
Fig – 1.2 – Load average CPU in server

To understand it clearly, see the below graph. Red line indicates the load. Yellow,grey and orange represent the CPU usage

Fig – 1.3 – Increase in load(>600) after some extent(600) increase the CPU usage drastically

If you are not sure about the process which consume high CPU, then use below command to check top 5 process which consume high CPU. This command will also not help always to nail down the bottleneck. Because, we need to narrow down process which eats up CPU consistently but not an adhoc process which suddenly cause CPU to spike and immediately disappear from the system. Even though a process is not on the toppers list, they hide between other temporary high processes and deviate us. Its good to be careful in focusing on the right process.

eg: Process X consume high CPU(100%) at the moment and CPU spike observed. After blink of your eyes, the process got vanished from top. Process X tempt us to conclude bottleneck. Thats not always the case.

Process Y consume high CPU(80%) at the moment and it continues to stay at 80% CPU for next 20 minutes consistently. But the process is not in toppers list from top output. Still process Y may be the cause. This is just an illustrated example. In reality , there are various other factors to consider.


From database perspective, there are certain things to rule out to narrow down bottleneck. They are

  • SQL query
  • Database internal jobs like stats, advisory, oem etc…
  • RMAN backups

Generic commands to check CPU process

Here we identified the top process which is from oracle user and pid is 10739

[oracle@exdbadm01 ~]$ ps -eo pcpu,user,pid|sort -nr|head -5
20.1 oracle   10739
 3.0 oracle   10850
 2.5 oracle   10849
 1.6 oracle   10814
 1.0 oracle   10812

Check the pid for more details. Here we identified that the connection is not local but connected remotely from either sql developer or toad or …

[oracle@exdbadm01 ~]$ ps -ef|grep 10739
oracle   10739     1  3 23:03 ?        00:01:15 oracleexdbx1 (LOCAL=NO)
oracle   13024  5505  0 23:38 pts/0    00:00:00 grep 10739

Login to the database and grab the details of the user

kish@exdbx<>select PID,SPID,PNAME,USERNAME,SERIAL#,TERMINAL,PROGRAM from v$process where spid=10739;

   PID     SPID            PNAME           USERNAME         SERIAL#   TERMINAL        PROGRAM
---------- --------------- --------------- --------------- ---------- --------------- ---------------
        81 10739           NULL            oracle                   1 UNKNOWN         oracle@exdbadm0

Check why this process consumes high CPU?


Then you can use the below query to narrow down the problematic session which consume high CPU and if it is caused by a SQL query or other job .

set lines 250
set pages 2000
col username format a15 justify center
col program format a20 justify center
col event format a20 justify center
col sid format 999999 justify center
col CPU_USAGE format 999999999 justify center
col CPUPCT format 9999999999 justify center
col value format 999999
col PCT% format 99999999 justify center 
col blocking_session format 999999 justify center 
set serveroutput off
variable v_cpu number;
select value into :v_cpu from v$parameter where name like '%cpu_count%';
* from (
        select z.sid,
               nvl(z.username,'oracle-bg') as "username",
               nvl(z.SQL_ID,'non-SQL') as "SQL_ID",
               sum(y.value/100) "CPU_USAGE",
               round((y.value/100) / round((sysdate - z.logon_time)*24*60*60)*100,2) * (:v_cpu) as "PCT%", -- Here 1 denotes the number of cpu cores,
from v$statname x
     inner join v$sesstat y on (x.STATISTIC# = y.STATISTIC#)
     inner join v$session z on (y.SID = z.SID)
where in 
('CPU used by this session') 
group by 
order by 8 desc
where rownum < 20;

After identification of high CPU session, correlate the sql id to find the sql text

kish@exdbx<>select sq.sql_text,
se.event from 
v$sql sq,
v$session se 
where se.sql_id=sq.sql_id 

--------------- ----------
update SALES_TA enq: TX -
B_COPY set NULL row lock c
_COLUMN = 'ABC' ontention
 where mod(ORDE

update SALES_TA SQL*Net me
B_COPY set NULL ssage from
_COLUMN = 'ABC'  client
 where mod(ORDE

You can also check the pid which causes high CPU from database using column spid to confirm the user and notify the application team

kish@exdbx<>select s.sid,
from v$process p,v$session s 
where p.spid=10739 

 SID      USERNAME        SQL_ID     SPID
------ --------------- ------------- ---------------
    35 TEST            9013dtzg5xx47 10739
    36 TEST            9013dtzg5xx47 10739
    44 TEST            9013dtzg5xx47 10739
    49 TEST            9013dtzg5xx47 10739
    51 TEST            74b11p5tzdtgy 10739


Let us look at the most common cause of high CPU

Causes of high CPU due to SQL query:


Check for any parallel processes used by any user from OS perspective. Here we can tell that oracle user has used parallel slaves. Here we can grep the parallel process using p0*

Please note that if “parallel_min_servers” init parameter has some value n, then by default n number of parallel processes are readily available for work in OS. We cannot blindly decide the utilization of parallel, if we get p0* output.

SQL> show parameter parallel_min_servers

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
parallel_min_servers                 integer     2
[oracle@exdbadm01 ~]$ ps -ef|grep ora_p0|grep -v 'grep'|wc -l


oracle   10347     1  0 22:58 ?        00:00:00 ora_p000_exdbx1
oracle   10349     1  0 22:58 ?        00:00:00 ora_p001_exdbx1
oracle   10351     1  0 22:58 ?        00:00:00 ora_p002_exdbx1
oracle   10358     1  0 22:58 ?        00:00:00 ora_p003_exdbx1
oracle   10360     1  0 22:58 ?        00:00:00 ora_p004_exdbx1
oracle   10362     1  0 22:58 ?        00:00:00 ora_p005_exdbx1
oracle   10364     1  0 22:58 ?        00:00:00 ora_p006_exdbx1
oracle   10366     1  0 22:58 ?        00:00:00 ora_p007_exdbx1
oracle   10368     1  0 22:58 ?        00:00:00 ora_p008_exdbx1

Check parallel process usage from database perspective if you confirm that parallelism used from OS.Here we can observe high parallel activity being utilized in a 6 core CPU which is beyond the workload and is not normal

kish@exdbx<>select * from v$px_process;

SERVER_NAME      STATUS      PID     SPID               SID      SERIAL#
--------------- --------- ---------- --------------- ---------- ----------
P026               IN USE         79 10515           NULL       NULL
P020            AVAILABLE         73 10502           NULL       NULL
P018            AVAILABLE         71 10498           NULL       NULL
P007            AVAILABLE         58 10366           NULL       NULL
P013            AVAILABLE         64 10379           NULL       NULL
P016            AVAILABLE         69 10494           NULL       NULL
P003            AVAILABLE         54 10358           NULL       NULL
P021               IN USE         74 10504           NULL       NULL
P009               IN USE         60 10371           NULL       NULL
P025            AVAILABLE         78 10513           NULL       NULL
PZ99            AVAILABLE         52 12623           NULL       NULL
P011            AVAILABLE         62 10375           NULL       NULL
P002            AVAILABLE         53 10351           NULL       NULL
P000            AVAILABLE         49 10347           NULL       NULL
P017            AVAILABLE         70 10496           NULL       NULL
P010            AVAILABLE         61 10373           NULL       NULL
P006            AVAILABLE         57 10364           NULL       NULL
P001            AVAILABLE         50 10349           NULL       NULL
P023               IN USE         76 10509           NULL       NULL
P024            AVAILABLE         77 10511           NULL       NULL
P014            AVAILABLE         65 10381           NULL       NULL
P012            AVAILABLE         63 10377           NULL       NULL
P015            AVAILABLE         66 10383           NULL       NULL
P022            AVAILABLE         75 10506           NULL       NULL
P019            AVAILABLE         72 10500           NULL       NULL
P005            AVAILABLE         56 10362           NULL       NULL
P008            AVAILABLE         59 10368           NULL       NULL
P004            AVAILABLE         55 10360           NULL       NULL

28 rows selected.

Now correlate the sid to find username and sql id of the parallel consumption

kish@exdbx<>select s.sid,
from v$process p,v$session s,v$px_process px 
 where p.spid=px.spid 

   SID     USERNAME           SQL_ID     SPID               PID
---------- --------------- ------------- --------------- ----------
        80 TEST            9013dtzg5xx47 13630                   40
        81 TEST            9013dtzg5xx47 13632                   45
        87 TEST            a1xbv4d9f876d 13577                   50
        88 TEST            a1xbv4d9f876d 13579                   52
        89 TEST            a1xbv4d9f876d 13581                   53
        57 TEST            a1xbv4d9f876d 13583                   54
        36 TEST            a1xbv4d9f876d 13588                   55
        95 TEST            a1xbv4d9f876d 13590                   56
        49 TEST            a1xbv4d9f876d 13592                   57
        83 TEST            a1xbv4d9f876d 13594                   58
        52 TEST            a1xbv4d9f876d 13596                   59
        84 TEST            a1xbv4d9f876d 13598                   60
        72 TEST            a1xbv4d9f876d 13600                   61
        70 TEST            a1xbv4d9f876d 13602                   62
        85 TEST            a1xbv4d9f876d 13604                   63
        74 TEST            a1xbv4d9f876d 13606                   64
        66 TEST            a1xbv4d9f876d 13608                   65
        86 TEST            a1xbv4d9f876d 13612                   66
       100 TEST            9013dtzg5xx47 13616                   69
        69 TEST            9013dtzg5xx47 13618                   70
        65 TEST            9013dtzg5xx47 13620                   71
        79 TEST            9013dtzg5xx47 13622                   72
        35 TEST            9013dtzg5xx47 13624                   73
        44 TEST            9013dtzg5xx47 13626                   74
        60 TEST            9013dtzg5xx47 13628                   75
        73 TEST            9013dtzg5xx47 13634                   76

26 rows selected.


Check if backups run during high CPU. Channels allocation from rman cause CPU to pile up.

kish@exdbx<>col STATUS format a15
col hrs format 999.99
to_char(START_TIME,'mm/dd/yy hh24:mi') start_time,
to_char(END_TIME,'mm/dd/yy hh24:mi') end_time,
elapsed_seconds/3600 hrs
order by session_key desc;kish@exdbx<>kish@exdbx<>  2    3    4    5    6    7

INPUT_TYPE          STATUS        START_TIME      END_TIME        HRS
--------------- --------------- -------------- -------------- -------
ARCHIVELOG      FAILED          04/12/21 23:15 NULL           NULL
ARCHIVELOG      FAILED          04/06/21 22:13 NULL           NULL

Elapsed: 00:00:00.65

If backup run currently, then check for ‘MMNL’ in PROGRAM column of v$session. If the program is on top of CPU contributors, then you probably have to keep an eye on media layer.

SQL> select sid,program,status
  2  from
  3  v$session
  4  where paddr=(select addr from v$process where program like '%MMNL%');

       SID PROGRAM                                          STATUS
---------- ------------------------------------------------ --------
        16 oracle@orcl19x (MMNL)                            ACTIVE
SQL> set lines 250
SQL> set pages 2000
col username format a15 justify center
col program format a20 justify center
col event format a30 justify center
col sid format 99999 justify center
col SESSION_CPU_USAGE format 99999 justify center
select * from
select z.sid,
            nvl(z.username,'oracle-bg') as username,
            nvl(z.SQL_ID,'non-SQL') as SQL_ID,
            round(sum(y.value)/100,6) as "SESSION_CPU_USAGE"
from v$statname x
          inner join v$sesstat y on x.STATISTIC# = y.STATISTIC#
          inner join v$session z on y.SID = z.SID
where in ('CPU used by this session')
and z.status = 'ACTIVE'
and z.program = 'MMNL'
group by
z.sid,z.username,z.SQL_ID,z.EVENT,z.program order by SESSION_CPU_USAGE desc
where rownum < 10;

no rows selected


Last but not the least! Check for any stats job run on your database using sys user which consume high CPU. To mitigate the high CPU usage, there are couple of options

  • Disable the auto task jobs. This is not feasible for stats job but can be applicable to tuning advisor or segment advisor
  • Cap the resource allocation using resource manager for the SYS group. But this solution has its pros and cons because if the resource allocation is limited for Autotask’s, then they cannot complete within the scheduled maintenance window timeout and fail which may lead to suboptimal plan due to insufficient stats.
  • Third option is to reschedule the maintenance window to an idle time of the system.
  • Raise SR with MOS to investigate on any bug
kish@exdbx<>select OPERATION,TARGET,to_char(START_TIME,'dd-mm-yy hh24:mi:ss'),to_char(END_TIME,'dd-mm-yy hh24:mi:ss') from dba_optstat_operations order by END_TIME desc;

                           OPERATION                             TARGET          TO_CHAR(START_TIM TO_CHAR(END_TIME,
---------------------------------------------------------------- --------------- ----------------- -----------------
purge_stats                                                      NULL            08-06-21 22:10:42 08-06-21 22:10:46
gather_database_stats(auto)                                      NULL            08-06-21 22:00:10 08-06-21 22:10:42
gather_table_stats                                               SYS.WRH$_FILEST 07-06-21 22:22:50 07-06-21 22:22:51

As this is not a proactive approach , we have to act immediately to resolve high CPU issue. As this may cause database performance to slow down. Kill the pid which cause high CPU. We dont have any other options other than killing the session

[oracle@exdbadm01 ~]$ kill -9 10739


Sometimes if resource manager is configured to limit the CPU for resource group , “resmgr:cpu quantum” wait event will be on top the top foreground wait events which also indicate task wait to acquire CPU resource. This events denotes that default_maintanence_plan is active and has the resource plan for the consumer groups. Here resource manager limits the CPU for a particular group based on the allocation in the resource plan for either SYS, OTHER GROUPS and AUTOTASKS

Event                    Waits   Time(s)  Avg wait(ms) % DB time Wait Class

------------------------ ------- -------- ------------ -------------- ---------- -----------

resmgr:cpu quantum         1,596    281         134           3.9 Scheduler

db file scattered read     1,051    778          86           3.81 User I/O

To know if the CPU is consumed by foreground or background session, v$sess_time_model can be quite helpful

kish@exdbx<>select sid,stat_name,value
  2  from v$sess_time_model
  3  where stat_name like '%CPU%' and value > 0
  4  or
  5  stat_name like '%back%' and value > 0;

   SID     STAT_NAME         VALUE
---------- --------------- ----------
        34 DB CPU               68003
        88 DB CPU             3116194
         2 background elap     240888
           sed time

         3 background elap   21428756
           sed time

For long term solution, we need to tune the SQL queries to access less CPU and distribute the application workload to a non business hour

Long term solution:

Don’t schedule multiple critical jobs with overlapping time. Distribute the jobs evenly to utilize all available resource so that there is not an imbalance in resource utilization.

Disable all unwanted or useless reports which is not of priority and not examined by anyone in the team or project. Research says that almost 60 to 80% of the reports are not checked by anyone.

Isolate the problematic SQL query if it is an adhoc or repeated query with the help of application team

Check the time at which you face performance issue!

Wed Sep 29 03:03:41 IST 2021

Use the three pillars of performance diagnosis to correlate the dots

  • AWR(time analysis)
  • ADDM
  • ASH(sample analysis)

Sequential analysis

  1. AWR :

DB time is higher than elapsed time. As we know that,

DB time = CPU time + Non idle wait events

This indicates that parallelism is used in the database which lead to DB time > Elapsed time. This also indicates that sessions wait on to some events to process the tasks

If DB time > Elapsed time, then load is high and database has slowness issue.

If no parallel slaves or PQ or no waits are observed , then probably DB time <= Elapsed time

High load period

Calculation for CPU

Available CPU = num_cpu * Elapsed time = 2 * 36 = 72 minutes

CPU burned (out of 72 min)= 69 minutes(DB time)

Check top wait events. If events like DB CPU, db file scattered read, direct path read, direct path write, direct path read temp, px waits etc are observed, then probably SQL statement consumes more CPU due to full table scans with unavailability of data blocks in buffer cache and a need to populate the blocks from disk reads.

Check for low Parse statistics with respect to CPU.

In HOST CPU section,

  • %user means CPU usage by oracle or grid
  • %system means CPU usage by OS user like root

There is no use in looking at all the sections in AWR. Just identify the top SQL ordered by executions , Elapsed and CPU time and tune the SQL. Here the top SQL is an insert statement which uses enormous parallel with direct path load and root cause for high CPU is identified.

If there are significant executions, then map the sql to SQL ordered by Elapsed and CPU time


  • Reduce the parallel to increase the throughput for database
  • Reschedule the resource intensive jobs during off business hours
  • Tune the slow SQL with a better plan
  • Increase resources(last resort)


ADDM report is self explanatory with a set of boosted recommendations to your analysis like adding a cherry on top of cake and hence there is not much to discuss about ADDM.

3) ASH

Probably, ASH is the most lovable analysis report in the planet with a contrary to the so called time based analysis which take snapshots with samples.

Generally, ASH report gives us clues for high CPU usage.

CPU + Wait for CPU – this event tells us that the SQL waits and starves for CPU but unable to get due to resource shortage

This indicates us that the SQL should be tuned to consume less resource

How to tune SQL:

So far the diagnosis may be quite straightforward but implementing a solution is quite complicated.

Tuning SQL statement should not only improve the speed but also should not affect performance of other SQL’s which is quite important. For this scenario, SQL performance analyzer can be used to test the SQL performance in a workload

If the top SQL is identified, check for the execution plans which affect the performance of the SQL. This can be identified by generating a SQLT or SQLHC or SQL tuning advisor report for the SQL id.

  • Identify the best execution plan and pin the plan using a SQL profile or constrain a plan using SPM
  • Create relevant index on columns wherever performance can be improved. The secret lies in indexing skills.
  • Use hints to influence an alternative plan
  • Rewrite the SQL

In this case, we may need to reproduce the cause for which the SQL is slow. So, Enable extended SQL trace for a particular SQL if top SQL is identified.

alter session set max_dump_file_size=unlimited;
alter session set timed_statistics=true;
alter session set statistics_level=all;
alter session set events '10046 trace name context forever, level 12';
--Execute the query to be traced
select * from dual; --to close parse call for the main query
alter session set events '10046 trace name context off';

Use TKPROF to format the trace if raw trace file looks clumsy for analysis

kish@exdbx<>alter session set tracefile_identifier='CPU_OPTIM';

Session altered.

Elapsed: 00:00:00.32
kish@exdbx<>alter session set events 'trace[rdbms.SQL_Optimizer.*][sql:gdgkr8y3k9zdw]';

Session altered.
kish@exdbx<>show parameter tracefile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
tracefile_identifier                 string      NULL
kish@exdbx<>alter session set tracefile_identifier='CPU_TRACE';

Session altered.

Elapsed: 00:00:00.01
kish@exdbx<>alter system set events 'sql_trace [sql:&&sql_id] bind=true , wait=true';
old   1: alter system set events 'sql_trace [sql:&&sql_id] bind=true , wait=true'
new   1: alter system set events 'sql_trace [sql:gdgkr8y3k9zdw] bind=true , wait=true'

System altered.

Still under construction ! To be continued!

Hope you liked this post!

Leave a Reply

%d bloggers like this: