WAYS TO IMPROVE IMPROVE CTAS PERFORMANCE

WAYS TO IMPROVE IMPROVE CTAS PERFORMANCE

CTAS(create table as select) is one of the efficient way to perform a bulk load of data from a big table to another fresh table. But if CTAS is itself slower, then some tweaks should be used to improve the performance.

What exactly happen during CTAS at high level?

  • create a new table
  • select the rows from source table
  • insert the rows using direct path load using parallel by default in new table

If either of the above step become slow due to some reason, then the benefits of using a CTAS are outweighed to another approach.

Main areas of slowness may be

  • DDL statement which restructure source table cause library cache contention
  • Lack of CPU to copy a larger table
  • Parallel waits
  • Segment level checkpoint issues
  • Redo log generation during bulk load
  • PGA not sufficient for direct path operation
  • Lack of temp space to sort
  • Statistics gathering adds additional time
  • DBWR not able to write dirty blocks faster to cope up with checkpoints
  • IO waits due to slow response time from block devices
  • Undo shortage

Create a LOB table for testing purpose

kIsH@STAX<>conn kish/password
Connected.
kIsH@STAX<>create table lob1 (
                             n number(10),
                            c clob
                            )
          lob (c) store as securefile ( enable storage in row nocompress) tablespace USERS;

Table created.

Populate some random data into the table

kIsH@STAX<>insert /*+parallel(8)*/ into lob1 nologging
                 select /*+parallel(8)*/ rownum, rpad('lob',10000,'lob')
                        from dual
          connect by level <= 100000;

100000 rows created.

kIsH@STAX<>commit;

Commit complete.

Approach 1:

Normal CTAS itself takes 11 minutes to create a new table. There is step with “OPTIMIZER STATISTICS GATHERING” which gather statistics during the load.

kIsH@STAX<>set timing on
kIsH@STAX<>create table lob2 as select * from lob1;

Table created.

Elapsed: 00:11:27.06

kIsH@STAX<>explain plan for create table lob2 as select * from lob1;

Explained.

Elapsed: 00:00:00.01
kIsH@STAX<>set lines 200 pages 1000
kIsH@STAX<>select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3273868302

-----------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |      |   117K|   226M|  7053   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | LOB2 |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |   117K|   226M|   205   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | LOB1 |   117K|   226M|   205   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

14 rows selected.

Observe something suspicious here? Optimizer simultaneously gather statistics during the load for the insert statement due to the configurations. It takes approximate 0.363494 seconds additionally for stats gather which is not necessary during bulk load. Statistics can be gathered later once the data is loaded.

create table lob2 as select * from lob1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0         37          2           0
Execute      1     20.98    1409.55     100016       9451     117171      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     20.98    1409.55     100016       9488     117173      100000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 111
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD AS SELECT  LOB2 (cr=10339 pr=100018 pw=100679 time=1409747397 us starts=1)
    100000     100000     100000   OPTIMIZER STATISTICS GATHERING  (cr=695 pr=0 pw=0 time=363494 us starts=1 cost=205 size=198904680 card=98712)
    100000     100000     100000    TABLE ACCESS FULL LOB1 (cr=695 pr=0 pw=0 time=251449 us starts=1 cost=205 size=198904680 card=98712)

Approach 2:

Check the execution plans without statistics. Additional “OPTIMIZER STATISTICS GATHERING” step is removed after disabling the parameter.

kIsH@STAX<>drop table kish.lob2;

Table dropped.

kIsH@STAX<>alter session set "_optimizer_gather_stats_on_load"=false;

Session altered.

kIsH@STAX<>explain plan for create table lob2 as select * from lob1;

Explained.

kIsH@STAX<>set lines 200 pages 1000
kIsH@STAX<>select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3273868302

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT |      |   112K|   216M|  6714   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT        | LOB2 |       |       |            |          |
|   2 |   TABLE ACCESS FULL    | LOB1 |   112K|   216M|   205   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

13 rows selected.

kIsH@STAX<>alter session set "_optimizer_gather_stats_on_load"=true;

Session altered.

kIsH@STAX<>explain plan for create table lob2 as select /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * from lob1;

Explained.

kIsH@STAX<>select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3273868302

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT |      |   112K|   216M|  6714   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT        | LOB2 |       |       |            |          |
|   2 |   TABLE ACCESS FULL    | LOB1 |   112K|   216M|   205   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

13 rows selected.

Disabling the load statistics improved the performance drastically by 90% as below from 11 min to 37 seconds.

kIsH@STAX<>set timing on
kIsH@STAX<>alter session set "_optimizer_gather_stats_on_load"=false;

Session altered.

Elapsed: 00:00:00.00
kIsH@STAX<>create table lob2 as select * from lob1;

Table created.

Elapsed: 00:00:37.45

Approach 3:

No statistics + nologging improves the performance further down to 33 sec. ‘nologging’ or ‘unrecoverable’ means skipping the redo vectors to be written to redo logs. DBA should be cautious to use nologging due to its non recoverable trap. Hence roll forward is not possible. If the database is in archive log mode, using logging can help to recover the data up to a point in time. Since nologging skips redo log, archive logs are also not generated for the CTAS operation. This option can be used to clone tables for test or dev which does not cause any business impact.

kIsH@STAX<>drop table kish.lob2;

Table dropped.
kIsH@STAX<>alter session set "_optimizer_gather_stats_on_load"=false;

Session altered.
kIsH@STAX<>create table lob2 nologging as select /*+parallel(8)*/ * from lob1;

Table created.

Elapsed: 00:00:33.30

Check explain plan for the new approach

kIsH@STAX<>explain plan for create table lob2 nologging as select /*+parallel(8)*/ * from lob1;

Explained.

kIsH@STAX<>set lines 200 pages 1000
kIsH@STAX<>select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3155667128

----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT |          |   112K|   216M|   842   (0)| 00:00:01 |        |      |            |
|   1 |  LOAD AS SELECT        | LOB2     |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |   112K|   216M|    28   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR  |          |   112K|   216M|    28   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL | LOB1     |   112K|   216M|    28   (0)| 00:00:01 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint

17 rows selected.

Approach 4:

This approach need some extra resource for the buffer cache which is memory. KEEP pool is one of the underrated pool which do not strike during a performance issue. Use KEEP pool, if the object to be kept is considerably small to memory size. If the segment is bigger than memory, then this approach may not be efficient. Example: 500GB LOB is not a good candidate to be placed in KEEP. This approach reduced the slowness further by 3 seconds

kIsH@STAX<>select segment_name,bytes/1048576 MB from dba_segments where segment_name='LOB1';

SEGMENT_NAME                                                                                                                             MB
-------------------------------------------------------------------------------------------------------------------------------- ----------
LOB1                                                                                                                                      6

kIsH@STAX<>select BUFFER_POOL from dba_tables where table_name='LOB1';

BUFFER_
-------
DEFAULT

kIsH@STAX<>alter system set db_keep_cache_size=500M;

System altered.

kIsH@STAX<>alter table LOB1 storage (buffer_pool keep);

Table altered.

Elapsed: 00:00:00.04
kIsH@STAX<>select BUFFER_POOL from dba_tables where table_name='LOB1';

BUFFER_
-------
KEEP

kIsH@STAX<>drop table lob2;

Table dropped.

kIsH@STAX<>create table lob2 nologging as select /*+parallel(8)*/ * from lob1;

Table created.

Elapsed: 00:00:30.62
kIsH@STAX<>alter table LOB1 storage (buffer_pool default);

Table altered.

Approach 5:

If more parallel resource can be utilized, then use parallel degree to select statement as well to burn dual way CPU for both load(write) and select(read). If management is ready to spend some cash on extra CPU, then increase the parallel gradually until the expected response time

kIsH@STAX<>create table lob2 parallel(degree 8) nologging as select /*+parallel(8)*/ * from lob1;

Table created.

Elapsed: 00:00:27.90

HYBRID TSM/HWMB” this message is printed in the execution plan for dual parallel usage

kIsH@STAX<>explain plan for create table lob2 parallel(degree 8) nologging as select /*+parallel(8)*/ * from lob1;

Explained.

Elapsed: 00:00:00.01
kIsH@STAX<>select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1111949198

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |          |   112K|   216M|   842   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10000 |   112K|   216M|    28   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| LOB2     |       |       |            |          |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR              |          |   112K|   216M|    28   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL             | LOB1     |   112K|   216M|    28   (0)| 00:00:01 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint

17 rows selected.

Alternatively, an order clause can also used to sort by primary key column. This approach may or may not speed up depending on the situation.

kIsH@STAX<>create table lob2 parallel(degree 8) nologging as select /*+parallel(8)*/ * from lob1 order by n;

Table created.

Elapsed: 00:00:33.11

kIsH@STAX<>explain plan for create table lob2 parallel(degree 8) nologging as select /*+parallel(8)*/ * from lob1 order by n;

Explained.

kIsH@STAX<>select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1487570938

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |          |   100K|    13M|       |    72   (2)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |          |       |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001 |   100K|    13M|       |    30   (7)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| LOB2     |       |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |          |   100K|    13M|       |    30   (7)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      SORT ORDER BY                 |          |   100K|    13M|    14M|    30   (7)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE                   |          |   100K|    13M|       |    28   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX SEND RANGE               | :TQ10000 |   100K|    13M|       |    28   (0)| 00:00:01 |  Q1,00 | P->P | RANGE      |
|   8 |         PX BLOCK ITERATOR          |          |   100K|    13M|       |    28   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL         | LOB1     |   100K|    13M|       |    28   (0)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------

Approach 6:

Setting parallel_execution_message_size to higher value speed up the amount of chunks which are transferred between px buffer, if memory is plenty in the system. Warning: If other critical jobs are executing at the same time, then setting this parameter exhaust the memory resource aggressively.

kIsH@STAX<>alter system set parallel_execution_message_size=32768 scope=spfile;

System altered.

kIsH@STAX<>show parameter parallel_execution

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
parallel_execution_message_size      integer     32768
kIsH@STAX<>create table lob2 parallel(degree 8) nologging as select /*+parallel(8)*/ * from lob1;

Table created.

Elapsed: 00:00:27.19

Approach 7:

Last but not the least, gather proper statistics on the source table.

Summary:

  • CTAS is one of the most efficient way to replicate a table efficiently, but if any part of CTAS become slow, then whole CTAS will be slow
  • Always tune the SELECT and INSERT part of the query which comprises most of time in CTAS
  • Try to distribute parallel chunks in such a way that parallel servers are not overutilized in one place
  • CTAS is more efficient in larger tables compared to other methods
  • Mostly, tune the PGA to avoid temp spill scans during Direct IO
  • If possible, “_smm_isort_cap” can be increased to higher value to avoid frequent temp sorts
  • Statistics can be gathered later after a load than during the load.
  • Monitor the wait events specific to the CTAS using ASH or 10046 and tune that event
  • Last but not the least, check for better execution plan.

Leave a Reply

%d bloggers like this: