SECRET OF INCREMENTAL STATISITCS – IMPROVE SPEED FOR STATISTICS COLLECTION OF PARTITIONED TABLES

SECRET OF INCREMENTAL STATISITCS – IMPROVE SPEED FOR STATISTICS COLLECTION OF PARTITIONED TABLES

Incremental statistics is similar to incremental backup where only the updated changes will be recorded using synopsis table. This option is introduced for large partitioned tables where global statistics often perform full table scans regardless of the number of partitions in the table.

Consider an example where you write a research thesis paper. Due to tiredness, you slept off. Next day, again you take previous day notes and continue from where you left. This is good and efficient. But what if you need to write the paper from the beginning daily with the updated content?

Similarly a table has 100 partitions and only one partition requires a modification. In this case, if the whole 100 partitions need to be scanned, then it is suboptimal and time consuming.

To overcome this challenge, oracle come up with incremental stats which consider changed records and update the changes at global level via synopsis

Synopsis in general means a brief summary of something. (https://languages.oup.com/google-dictionary-en)

NDV which means number of distinct value is a statistics for getting the cardinality estimates of the each partition. Example if a column has 100 males and 50 females, then NDV = 2 (male + female)

Check the partitioned table and part count. There are 11 partitions in the table.

kIsH@zxd00<^>col owner for a15
kIsH@zxd00<^>col table_name for a15
kIsH@zxd00<^>select owner,table_name,partition_count from dba_part_tables where table_name='XSAL';

OWNER           TABLE_NAME      PARTITION_COUNT
--------------- --------------- ---------------
KISH            XSAL                         11

Check total count of table

kIsH@zxd00<^>select count(*) from XSAL;

  COUNT(*)
----------
   2000000

Check the stats preference at global level. Check if incremental parameter is set to false.

In order for incremental statistics to function normally, ‘PUBLISH‘ parameter should be set to true because the statistics of individual partition will be gathered and should be established to the global level at table.

Both granularity and estimate percent should be auto. These values cannot be manually calculated because static value is not always applicable in a changing database. Script available in https://alphaoragroup.com/2022/02/14/sql-to-check-statistics-preferences-oracle/?preview=true

STATS_PARAM
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AUTOSTATS_TARGET =====> AUTO
CASCADE =====> DBMS_STATS.AUTO_CASCADE
CONCURRENT =====> OFF
DEGREE =====> NULL
ESTIMATE_PERCENT =====> DBMS_STATS.AUTO_SAMPLE_SIZE
GRANULARITY =====> AUTO
INCREMENTAL =====> FALSE
INCREMENTAL_LEVEL =====> PARTITION
INCREMENTAL_STALENESS =====> ALLOW_MIXED_FORMAT
METHOD_OPT =====> FOR ALL COLUMNS SIZE AUTO
OPTIONS =====> GATHER
PUBLISH =====> TRUE
STALE_PERCENT =====> 10

13 rows selected.

There are huge modifications with inserts on table partitions on XSAL

kIsH@zxd00<^>select partition_name,inserts from dba_tab_modifications where table_name='XSAL';

PARTITION_NAME     INSERTS
--------------- ----------
P_TIR               100023
SYS_P387            199080
SYS_P388            200256
SYS_P389            200470
SYS_P390            199965
SYS_P391            200017
SYS_P392            199834
SYS_P393            201160
SYS_P394             99708
SYS_P395            199809
SYS_P396            199678

11 rows selected.

Insert data into only one partition and observe the time taken for statistics gather without enabling incremental statistics. It took 8.2 seconds to complete.

kIsH@zxd00<^>insert /*+append parallel(8)*/ into xsal
  2  select xid.nextval,
       dbms_random.string('a','10'),
           decode(round(dbms_random.value(0,10)),0,'TINNEVELI',1),
           floor(dbms_random.value(1,10)),
           to_date(trunc(dbms_random.value(to_char(date '1979-01-01','J'),
                                           to_char(date '9999-12-31','J'))),'J')
from (select level from dual connect by level <= 100),
(select level from dual connect by level <= 100),
(select level from dual connect by level <= 100);  3    4    5    6    7    8    9   10

1000000 rows created.

Elapsed: 00:00:16.84
kIsH@zxd00<^>commit;

Commit complete.

Elapsed: 00:00:00.02
kIsH@zxd00<^>
kIsH@zxd00<^>
kIsH@zxd00<^>
kIsH@zxd00<^>exec dbms_stats.set_global_prefs('INCREMENTAL','FALSE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
kIsH@zxd00<^>exec dbms_stats.set_table_prefs('KISH','XSAL','INCREMENTAL','FALSE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.02
kIsH@zxd00<^>EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'KISH',TABNAME => 'XSAL');

PL/SQL procedure successfully completed.

Elapsed: 00:00:08.25

Either full table scan or full index scan is used for statistics gather on partition without incremental option

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  HASH JOIN  (cr=0 pr=0 pw=0 time=242 us starts=1 cost=4 size=47 card=1)
         0          0          0   NESTED LOOPS  (cr=0 pr=0 pw=0 time=220 us starts=1 cost=4 size=47 card=1)
         0          0          0    STATISTICS COLLECTOR  (cr=0 pr=0 pw=0 time=63 us starts=1)
         0          0          0     TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=0 pr=0 pw=0 time=31 us starts=1 cost=3 size=43 card=1)
         0          0          0      INDEX RANGE SCAN I_OBJ1 (cr=0 pr=0 pw=0 time=25 us starts=1 cost=2 size=0 card=1)(object id 36)
         1          1          1       FAST DUAL  (cr=0 pr=0 pw=0 time=1 us starts=1 cost=2 size=0 card=1)
         0          0          0    INDEX RANGE SCAN I_USER2 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=1 size=4 card=1)(object id 47)
         0          0          0   INDEX FULL SCAN I_USER2 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=1 size=4 card=1)(object id 47)

Check last analyzed column which has equal time for all partitions. This concludes that the whole table is scanned even though we inserted data into one partition P_TIR

TABLE_NAME           PARTITION_NAME         NUM_ROWS LAST_STATS     GLO AVG_ROW_LEN HIGH_VALUE
-------------------- -------------------- ---------- -------------- --- ----------- --------------------
XSAL                 P_TIR                    500196 24-03-22 22:34 YES          38 'TINNEVELI'
XSAL                 SYS_P387                 698771 24-03-22 22:34 YES          35 'LISBON'
XSAL                 SYS_P388                 699475 24-03-22 22:34 YES          35 'LONDON'
XSAL                 SYS_P389                 700120 24-03-22 22:34 YES          37 'CANBERRA'
XSAL                 SYS_P390                 700292 24-03-22 22:34 YES          35 'MOSCOW'
XSAL                 SYS_P391                 699726 24-03-22 22:34 YES          35 'MUMBAI'
XSAL                 SYS_P392                 700590 24-03-22 22:34 YES          35 'BERLIN'
XSAL                 SYS_P393                 701718 24-03-22 22:34 YES          35 'MADRID'
XSAL                 SYS_P394                 349921 24-03-22 22:34 YES          33 'ROME'
XSAL                 SYS_P395                 698830 24-03-22 22:34 YES          34 'PARIS'
XSAL                 SYS_P396                 700265 24-03-22 22:34 YES          39 'WASHINGTON'
XSAL                 SYS_P454                2850096 24-03-22 22:34 YES          30 '1'

If incremental is false then set to true to enable synopsis on the columns and incremental statistics gather.

--Table level
kIsH@zxd00<^>exec dbms_stats.set_table_prefs('KISH','XSAL','INCREMENTAL','TRUE');

PL/SQL procedure successfully completed.
or 
--Global level
kIsH@zxd00<^>exec dbms_stats.set_global_prefs('INCREMENTAL','TRUE');

PL/SQL procedure successfully completed.


kIsH@zxd00<^>col INC_OR_FULL for a20
kIsH@zxd00<^>select dbms_stats.get_prefs('INCREMENTAL','KISH','XSAL') as INC_OR_NOT from dual;

INC_OR_NOT
--------------------
TRUE

Always when incremental stats is enabled for first time, it will take lot of time to build the synopsis on the columns and construct metadata. Consequent run will be faster because all synopsis are in place and only the work is to collect changes to partition. It took 2.1 seconds in the second run. First run is not posted for brevity.

kIsH@zxd00<^>insert /*+append parallel(8)*/ into xsal
select xid.nextval,
       dbms_random.string('a','10'),
           decode(round(dbms_random.value(0,10)),0,'TINNEVELI',1),
           floor(dbms_random.value(1,10)),
           to_date(trunc(dbms_random.value(to_char(date '1979-01-01','J'),
                                           to_char(date '9999-12-31','J'))),'J')
from (select level from dual connect by level <= 100),
(select level from dual connect by level <= 100),
(select level from dual connect by level <= 100);  2    3    4    5    6    7    8    9   10

1000000 rows created.

Elapsed: 00:00:17.59
kIsH@zxd00<^>commit;

Commit complete.
kIsH@zxd00<^>EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'KISH',TABNAME => 'XSAL');

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.12

Only the partition which are modified, are analyzed and rest of them are ignored.

kIsH@zxd00<^>col HIGH_VALUE for a20
col PARTITION_NAME for a20
kIsH@zxd00<^>kIsH@zxd00<^>col TABLE_NAME for a20
col HIGH_VALUE for a20
col PARTITION_NAME for a20
col TABLE_NAME for a20
select table_name,
                 partition_name,
                 num_rows,
                 to_char(last_analyzed,'dd-mm-yy hh24:mi') last_stats,
                 global_stats,
                 avg_row_len,
                 high_value
  from dba_tab_partitions
  where table_name='XSAL';kIsH@zxd00<^>kIsH@zxd00<^>kIsH@zxd00<^>kIsH@zxd00<^>  2    3    4    5    6    7    8    9

TABLE_NAME           PARTITION_NAME         NUM_ROWS LAST_STATS     GLO AVG_ROW_LEN HIGH_VALUE
-------------------- -------------------- ---------- -------------- --- ----------- --------------------
XSAL                 P_TIR                    500196 24-03-22 22:32 YES          38 'TINNEVELI'
XSAL                 SYS_P387                 698771 24-03-22 21:55 YES          35 'LISBON'
XSAL                 SYS_P388                 699475 24-03-22 21:55 YES          35 'LONDON'
XSAL                 SYS_P389                 700120 24-03-22 21:55 YES          37 'CANBERRA'
XSAL                 SYS_P390                 700292 24-03-22 21:55 YES          35 'MOSCOW'
XSAL                 SYS_P391                 699726 24-03-22 21:55 YES          35 'MUMBAI'
XSAL                 SYS_P392                 700590 24-03-22 21:55 YES          35 'BERLIN'
XSAL                 SYS_P393                 701718 24-03-22 21:55 YES          35 'MADRID'
XSAL                 SYS_P394                 349921 24-03-22 21:55 YES          33 'ROME'
XSAL                 SYS_P395                 698830 24-03-22 21:55 YES          34 'PARIS'
XSAL                 SYS_P396                 700265 24-03-22 21:55 YES          39 'WASHINGTON'
XSAL                 SYS_P454                2850096 24-03-22 22:32 YES          30 '1'

12 rows selected.

Check the state column which updated to a newer NDV. Previously there where 11 distinct columns and now it is 12

kIsH@zxd00<^>col COLUMN_NAME for a20
kIsH@zxd00<^>col TABLE_NAME for a20
kIsH@zxd00<^>select table_name,column_name,num_distinct from dba_tab_columns where table_name='XSAL';

TABLE_NAME           COLUMN_NAME          NUM_DISTINCT
-------------------- -------------------- ------------
XSAL                 ID                       10000000
XSAL                 NAME                      9958400
XSAL                 STATE                          12 <<==
XSAL                 AMOUNT                          9
XSAL                 SDATE                     2889728

After enabling incremental, index range scan is used and synopsis table(WRI$_OPTSTAT_SYNOPSIS_HEAD$) can be visible

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  FILTER  (cr=58 pr=0 pw=0 time=153 us starts=1)
         0          0          0   FAST DUAL  (cr=0 pr=0 pw=0 time=0 us starts=0 cost=2 size=0 card=1)
         0          0          0   HASH JOIN  (cr=58 pr=0 pw=0 time=152 us starts=1 cost=32 size=26 card=1)
         0          0          0    TABLE ACCESS BY INDEX ROWID BATCHED WRI$_OPTSTAT_SYNOPSIS_HEAD$ (cr=58 pr=0 pw=0 time=109 us starts=1 cost=3 size=24 card=1)
        60         60         60     INDEX RANGE SCAN I_WRI$_OPTSTAT_SYNOPHEAD (cr=2 pr=0 pw=0 time=72 us starts=1 cost=2 size=0 card=1)(object id 15990)
         0          0          0    COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=0 us starts=0 cost=29 size=8 card=4)

********************************************************************************

Below tables contains the information on synopsis

kIsH@Xhydra<>select OWNER,TABLE_NAME,PARTITIONING_TYPE,SUBPARTITIONING_TYPE from dba_part_tables where TABLE_NAME in ('WRI$_OPTSTAT_SYNOPSIS$','WRI$_OPTSTAT_SYNOPSIS_HEAD$');

OWNER                TABLE_NAME           PARTITIONING_TYPE    SUBPARTITIONING_TYPE
-------------------- -------------------- -------------------- --------------------
SYS                  WRI$_OPTSTAT_SYNOPSI LIST                 HASH
                     S$

Summary:

It is not always recommended to use incremental statistics if partition comes into picture. If there is a small partition which will not be updated for next 10 to 20 years, then there is not of much gain for incremental statistics and rather sampling based stats collection can be utilized. If there is a huge table with daily modifications to each partition in the table, then incremental statistics will be the best strategy in focus of long term goal. Having awareness of the business goals and objectives can lead to choose the right tactics. If the table size reached TB , then its too late to implement incremental because there are limitations of first time slow stat gather and rigorous planning should be incorporated with downtime. So its always better to be proactive.

Leave a Reply

Discover more from XscalibaL

Subscribe now to keep reading and get access to the full archive.

Continue reading