CELL SMART SCAN/OFFLOADING SECRET SAUCE OF EXADATA
Offloading in its name says that something is offloaded somewhere.Here in exadata ,high volume of data is offloaded to the cell layer so that the storage gets the total hit rather than the database.
You often heard of ‘smart work vs hard work’ and would have seen this picture

In similar way,exadata has a smart storage which has the intelligence to process the data in storage layer without disturbing the database layer much.So the database layer would always be happy đŸ™‚ with less pressure from the clients who retrieve,insert and update repeated data using sql query
There is a saying always that “a single poorly written sql query might collapse a powerful database “
However it is not the problem in database itself but a combination of layers like OS,network,storage and database layer

So sql query which is resource intensive is offloaded to storage layer and volume of data resturned to the database layer will be filtered and reduced with multiple layers of abstraction.
In cell offloading,there are three types of filtering the data
- Column projection(columns of interest)
- Predicate filtering(rows of interest)
- Storage indexes(min-max value)
We will see all these optimizations in detail in another post.So lets dont waste time in definitions of above three points and test the exciting features of exadata
I perform a test case which proves the efficiency of the cell smart scan
I print the statistics for cell before testing which is zero for all category
kish@exdbx<>select a.name, b.value/1024/1024 MB from v$sysstat a, v$mystat b
where a.statistic# = b.statistic# and (a.name in ('physical read total bytes',
'physical write total bytes', 'cell IO uncompressed bytes')
or a.name like '%cell physical%'); 2 3 4
NAME MB
---------------------------------------------------------------- ----------
physical read total bytes 0
physical write total bytes 0
cell physical IO interconnect bytes 0
cell physical IO bytes saved during optimized file creation 0
cell physical IO bytes saved during optimized RMAN file restore 0
cell physical IO bytes eligible for predicate offload 0
cell physical IO bytes saved by storage index 0
cell physical IO bytes sent directly to DB node to balance CPU 0
cell physical IO interconnect bytes returned by smart scan 0
cell IO uncompressed bytes 0
10 rows selected.
Sales table is what we use for testing which has million rows
kish@exdbx<>select count(*) from sales;
COUNT(*)
----------
1048576
I flush the contents in shared pool and buffer cache so that we dont blindly get fooled by the speed which we get from the cached data in buffer gets and soft parse in SGA
kish@exdbx<>alter system flush shared_pool;
System altered.
Elapsed: 00:00:00.00
kish@exdbx<>alter system flush buffer_cache;
System altered.
Elapsed: 00:00:00.02
TEST1: Disable the cell offloading and check the response time ! In this test,i get the aggregate value of unit cost and total profit columns and use inner join sales with sales3 table
kish@exdbx<>alter session set cell_offload_processing=false;
Session altered.
kish@exdbx<>select a.country,a.order_date,sum(a.unit_cost),sum(a.total_profit) from sales a
inner join sales2 b on a.order_id = b.order_id
group by a.country,a.order_date
order by sum(a.total_profit) desc; 2 3 4
220105 rows selected.
Elapsed: 01:44:58.07 <======================
TEST2:Enable cell offloading and check the response time
kish@exdbx<>alter system flush shared_pool;
System altered.
Elapsed: 00:00:00.04
kish@exdbx<>alter system flush buffer_cache;
System altered.
Elapsed: 00:00:00.16
See the difference between previous run and after enabling smart scan !
kish@exdbx<>alter session set cell_offload_processing=true;
Session altered.
Elapsed: 00:00:00.00
kish@exdbx<>select a.country,a.order_date,sum(a.unit_cost),sum(a.total_profit) from sales a
inner join sales2 b on a.order_id = b.order_id
group by a.country,a.order_date
order by sum(a.total_profit) desc ; 2 3 4
220105 rows selected.
Elapsed: 00:00:11.37 <==================
There is a huge difference in response time from 1 hour 44 minutes to 11 minutes 37 seconds which accounts to almost 1 hour 33 minutes time saved by smart scan with IO bytes saved
The statistics information is also recorded in sysstat and mystat views where there is good evidence for the total IO saved by storage cell
See ‘cell physical IO interconnect bytes returned by smart scan‘ and other rows highlighted with arrow which shows that IO’s requested for database is taken care by storage cell thus reducing the volume of data
kish@exdbx<>select a.name, b.value/1024/1024 MB from v$sysstat a, v$mystat b
where a.statistic# = b.statistic# and (a.name in ('physical read total bytes',
'physical write total bytes', 'cell IO uncompressed bytes')
or a.name like '%cell physical%'); 2 3 4
NAME MB
---------------------------------------------------------------- ----------
physical read total bytes 2429.28906
physical write total bytes .0078125
cell physical IO interconnect bytes 792.847153 <========
cell physical IO bytes saved during optimized file creation 0
cell physical IO bytes saved during optimized RMAN file restore 0
cell physical IO bytes eligible for predicate offload 2224.3125 <========
cell physical IO bytes saved by storage index 0
cell physical IO bytes sent directly to DB node to balance CPU 20.734375
cell physical IO interconnect bytes returned by smart scan 587.862778 <=========
cell IO uncompressed bytes 2198.10156
Take a look at the execution plan with column projection information
As we have both column values and where clause in the previous query ,oracle took both the benefits by using predicate filtering and column projection
Column projection works when there is no where clause and predicate filtering works if there is where clause in predicate
Check out the new term highlighted with * in execution plan called ‘Table access storage full’ which denotes that the table is scanned on the storage layer and this term cant be seen in normal database apart from exadata.
Identify the sql_id of the query

Generated the execution plan with column projection information
kish@exdbx<>select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','+projection'));
Enter value for sql_id: cupasgmbw87j5
Enter value for child_no: 0
old 1: select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','+projection'))
new 1: select * from table(dbms_xplan.display_cursor('cupasgmbw87j5','0','+projection'))
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID cupasgmbw87j5, child number 0
-------------------------------------
select a.country,a.order_date,sum(a.unit_cost),sum(a.total_profit) from
sales a inner join sales2 b on a.order_id = b.order_id group by
a.country,a.order_date order by sum(a.total_profit) desc
Plan hash value: 2885808317
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 21560 (100)| |
| 1 | SORT ORDER BY | | 157K| 7524K| 56M| 21560 (1)| 00:04:19 |
| 2 | HASH GROUP BY | | 157K| 7524K| 56M| 21560 (1)| 00:04:19 |
|* 3 | HASH JOIN | | 1053K| 49M| 18M| 10082 (1)| 00:02:01 |
| 4 | *** TABLE ACCESS STORAGE FULL***| SALES2 | 1048K| 6144K| | 3224 (1)| 00:00:39 |
| 5 | *** TABLE ACCESS STORAGE FULL***| SALES | 1048K| 43M| | 3228 (1)| 00:00:39 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("A"."ORDER_ID"="B"."ORDER_ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) SUM("A"."TOTAL_PROFIT")[22], "A"."COUNTRY"[VARCHAR2,380],
"A"."ORDER_DATE"[DATE,7], SUM("A"."UNIT_COST")[22]
2 - "A"."COUNTRY"[VARCHAR2,380], "A"."ORDER_DATE"[DATE,7],
SUM("A"."TOTAL_PROFIT")[22], SUM("A"."UNIT_COST")[22]
3 - (#keys=1) "A"."COUNTRY"[VARCHAR2,380], "A"."ORDER_DATE"[DATE,7],
"A"."TOTAL_PROFIT"[NUMBER,22], "A"."UNIT_COST"[NUMBER,22]
4 - "B"."ORDER_ID"[NUMBER,22]
5 - "A"."COUNTRY"[VARCHAR2,380], "A"."ORDER_DATE"[DATE,7],
"A"."ORDER_ID"[NUMBER,22], "A"."UNIT_COST"[NUMBER,22], "A"."TOTAL_PROFIT"[NUMBER,22]
37 rows selected.
Elapsed: 00:00:01.16
Lets check the individual sql statement for IO conserve by smart scan
kish@exdbx<>select sql_id,IO_CELL_OFFLOAD_ELIGIBLE_BYTES,IO_INTERCONNECT_BYTES,100*(IO_CELL_OFFLOAD_ELIGIBLE_BYTES-IO_INTERCONNECT_BYTES)/IO_CELL_OFFLOAD_ELIGIBLE_BYTES "IO_SAVED_%",sql_text from v$sql where sql_id='cupasgmbw87j5';
SQL_ID IO_CELL_OFFLOAD_ELIGIBLE_BYTES IO_INTERCONNECT_BYTES IO_SAVED_% SQL_TEXT
------------- ------------------------------ --------------------- ---------- ----------
cupasgmbw87j5 194363392 49433192 74.5666139 select a.c
ountry,a.o
rder_date,
sum(a.unit
_cost),sum
As you can see that approx 75% of IO is saved by storage scan.If same IO is processed by database layer,then think about how the database badly gets overloaded by resource intensive queries during busy schedule !
kish@exdbx<>set verify off
set pagesize 999
set lines 190
col sql_text format a40 trunc
col child format 99999 heading CHILD
col execs format 9,999,999
col avg_etime format 99,999.99
col avg_cpu format 9,999,999.99
col avg_lio format 999,999,999
col avg_pio format 999,999,999
col "IO_SAVED_%" format 999.99
col avg_px format 999
col offload for a7
select sql_id, child_number child,
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,'No','Yes') Offload,
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,0,
100*(IO_CELL_OFFLOAD_ELIGIBLE_BYTES-IO_INTERCONNECT_BYTES)
/decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,1,IO_CELL_OFFLOAD_ELIGIBLE_BYTES)) "IO_ SAVED_%",
(elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions) avg_etime,
--decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,buffer_gets/decode(nvl(executions,0),0 ,1,executions),null) avg_lio,
--decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,disk_reads/decode(nvl(executions,0),0, 1,executions),null) avg_pio,
sql_text
from v$sql s
where upper(sql_text) like upper(nvl('&sql_text',sql_text))
and sql_text not like 'BEGIN :sql_text := %'
and sql_text not like '%IO_CELL_OFFLOAD_ELIGIBLE_BYTES%'
and sql_id like nvl('&sql_id',sql_id)
order by 1, 2, 3
/
Enter value for sql_text: select UNITS_SOLD from sales where order_id > 10000
Enter value for sql_id: 6xtxn2qknnw1p
SQL_ID CHILD OFFLOAD IO_SAVED_% AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ---------- ----------------------------------------
6xtxn2qknnw1p 0 Yes 87.60 .52 select UNITS_SOLD from sales where order
Again we check the IO SAVED by offloading for a particular query and you can see that 88 % of IO hits on storage layer