Very simple error when spfile is not found in $ORACLE_HOME/dbs location
SQL> create pfile from spfile;
create pfile from spfile
*
ERROR at line 1:
ORA-01565: error in identifying file '?/dbs/spfile@.ora'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Workaround:
Spfile is not in use by instance
SQL> show parameter spfile
NAME TYPE VALUE
———————————— ———– ——————————
spfile string
Create spfile from pfile first and bounce the database instance to use spfile
Offloading in its name says that something is offloaded somewhere.Here in exadata ,high volume of data is offloaded to the cell layer so that the storage gets the total hit rather than the database.
You often heard of ‘smart work vs hard work’ and would have seen this picture
In similar way,exadata has a smart storage which has the intelligence to process the data in storage layer without disturbing the database layer much.So the database layer would always be happy 🙂 with less pressure from the clients who retrieve,insert and update repeated data using sql query
There is a saying always that “a single poorly written sql query might collapse a powerful database “
However it is not the problem in database itself but a combination of layers like OS,network,storage and database layer
So sql query which is resource intensive is offloaded to storage layer and volume of data resturned to the database layer will be filtered and reduced with multiple layers of abstraction.
In cell offloading,there are three types of filtering the data
Column projection(columns of interest)
Predicate filtering(rows of interest)
Storage indexes(min-max value)
We will see all these optimizations in detail in another post.So lets dont waste time in definitions of above three points and test the exciting features of exadata
I perform a test case which proves the efficiency of the cell smart scan
I print the statistics for cell before testing which is zero for all category
kish@exdbx<>select a.name, b.value/1024/1024 MB from v$sysstat a, v$mystat b
where a.statistic# = b.statistic# and (a.name in ('physical read total bytes',
'physical write total bytes', 'cell IO uncompressed bytes')
or a.name like '%cell physical%'); 2 3 4
NAME MB
---------------------------------------------------------------- ----------
physical read total bytes 0
physical write total bytes 0
cell physical IO interconnect bytes 0
cell physical IO bytes saved during optimized file creation 0
cell physical IO bytes saved during optimized RMAN file restore 0
cell physical IO bytes eligible for predicate offload 0
cell physical IO bytes saved by storage index 0
cell physical IO bytes sent directly to DB node to balance CPU 0
cell physical IO interconnect bytes returned by smart scan 0
cell IO uncompressed bytes 0
10 rows selected.
Sales table is what we use for testing which has million rows
kish@exdbx<>select count(*) from sales;
COUNT(*)
----------
1048576
I flush the contents in shared pool and buffer cache so that we dont blindly get fooled by the speed which we get from the cached data in buffer gets and soft parse in SGA
kish@exdbx<>alter system flush shared_pool;
System altered.
Elapsed: 00:00:00.00
kish@exdbx<>alter system flush buffer_cache;
System altered.
Elapsed: 00:00:00.02
TEST1: Disable the cell offloading and check the response time ! In this test,i get the aggregate value of unit cost and total profit columns and use inner join sales with sales3 table
kish@exdbx<>alter session set cell_offload_processing=false;
Session altered.
kish@exdbx<>select a.country,a.order_date,sum(a.unit_cost),sum(a.total_profit) from sales a
inner join sales2 b on a.order_id = b.order_id
group by a.country,a.order_date
order by sum(a.total_profit) desc; 2 3 4
220105 rows selected.
Elapsed: 01:44:58.07 <======================
TEST2:Enable cell offloading and check the response time
kish@exdbx<>alter system flush shared_pool;
System altered.
Elapsed: 00:00:00.04
kish@exdbx<>alter system flush buffer_cache;
System altered.
Elapsed: 00:00:00.16
See the difference between previous run and after enabling smart scan !
kish@exdbx<>alter session set cell_offload_processing=true;
Session altered.
Elapsed: 00:00:00.00
kish@exdbx<>select a.country,a.order_date,sum(a.unit_cost),sum(a.total_profit) from sales a
inner join sales2 b on a.order_id = b.order_id
group by a.country,a.order_date
order by sum(a.total_profit) desc ; 2 3 4
220105 rows selected.
Elapsed: 00:00:11.37 <==================
There is a huge difference in response time from 1 hour 44 minutes to 11 minutes 37 seconds which accounts to almost 1 hour 33 minutes time saved by smart scan with IO bytes saved
The statistics information is also recorded in sysstat and mystat views where there is good evidence for the total IO saved by storage cell
See ‘cell physical IO interconnect bytes returned by smart scan‘ and other rows highlighted with arrow which shows that IO’s requested for database is taken care by storage cell thus reducing the volume of data
kish@exdbx<>select a.name, b.value/1024/1024 MB from v$sysstat a, v$mystat b
where a.statistic# = b.statistic# and (a.name in ('physical read total bytes',
'physical write total bytes', 'cell IO uncompressed bytes')
or a.name like '%cell physical%'); 2 3 4
NAME MB
---------------------------------------------------------------- ----------
physical read total bytes 2429.28906
physical write total bytes .0078125
cell physical IO interconnect bytes 792.847153 <========
cell physical IO bytes saved during optimized file creation 0
cell physical IO bytes saved during optimized RMAN file restore 0
cell physical IO bytes eligible for predicate offload 2224.3125 <========
cell physical IO bytes saved by storage index 0
cell physical IO bytes sent directly to DB node to balance CPU 20.734375
cell physical IO interconnect bytes returned by smart scan 587.862778 <=========
cell IO uncompressed bytes 2198.10156
Take a look at the execution plan with column projection information
As we have both column values and where clause in the previous query ,oracle took both the benefits by using predicate filtering and column projection
Column projection works when there is no where clause and predicate filtering works if there is where clause in predicate
Check out the new term highlighted with * in execution plan called ‘Table access storage full’ which denotes that the table is scanned on the storage layer and this term cant be seen in normal database apart from exadata.
Identify the sql_id of the query
Generated the execution plan with column projection information
kish@exdbx<>select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','+projection'));
Enter value for sql_id: cupasgmbw87j5
Enter value for child_no: 0
old 1: select * from table(dbms_xplan.display_cursor('&sql_id','&child_no','+projection'))
new 1: select * from table(dbms_xplan.display_cursor('cupasgmbw87j5','0','+projection'))
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID cupasgmbw87j5, child number 0
-------------------------------------
select a.country,a.order_date,sum(a.unit_cost),sum(a.total_profit) from
sales a inner join sales2 b on a.order_id = b.order_id group by
a.country,a.order_date order by sum(a.total_profit) desc
Plan hash value: 2885808317
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 21560 (100)| |
| 1 | SORT ORDER BY | | 157K| 7524K| 56M| 21560 (1)| 00:04:19 |
| 2 | HASH GROUP BY | | 157K| 7524K| 56M| 21560 (1)| 00:04:19 |
|* 3 | HASH JOIN | | 1053K| 49M| 18M| 10082 (1)| 00:02:01 |
| 4 | *** TABLE ACCESS STORAGE FULL***| SALES2 | 1048K| 6144K| | 3224 (1)| 00:00:39 |
| 5 | *** TABLE ACCESS STORAGE FULL***| SALES | 1048K| 43M| | 3228 (1)| 00:00:39 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("A"."ORDER_ID"="B"."ORDER_ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) SUM("A"."TOTAL_PROFIT")[22], "A"."COUNTRY"[VARCHAR2,380],
"A"."ORDER_DATE"[DATE,7], SUM("A"."UNIT_COST")[22]
2 - "A"."COUNTRY"[VARCHAR2,380], "A"."ORDER_DATE"[DATE,7],
SUM("A"."TOTAL_PROFIT")[22], SUM("A"."UNIT_COST")[22]
3 - (#keys=1) "A"."COUNTRY"[VARCHAR2,380], "A"."ORDER_DATE"[DATE,7],
"A"."TOTAL_PROFIT"[NUMBER,22], "A"."UNIT_COST"[NUMBER,22]
4 - "B"."ORDER_ID"[NUMBER,22]
5 - "A"."COUNTRY"[VARCHAR2,380], "A"."ORDER_DATE"[DATE,7],
"A"."ORDER_ID"[NUMBER,22], "A"."UNIT_COST"[NUMBER,22], "A"."TOTAL_PROFIT"[NUMBER,22]
37 rows selected.
Elapsed: 00:00:01.16
Lets check the individual sql statement for IO conserve by smart scan
As you can see that approx 75% of IO is saved by storage scan.If same IO is processed by database layer,then think about how the database badly gets overloaded by resource intensive queries during busy schedule !
kish@exdbx<>set verify off
set pagesize 999
set lines 190
col sql_text format a40 trunc
col child format 99999 heading CHILD
col execs format 9,999,999
col avg_etime format 99,999.99
col avg_cpu format 9,999,999.99
col avg_lio format 999,999,999
col avg_pio format 999,999,999
col "IO_SAVED_%" format 999.99
col avg_px format 999
col offload for a7
select sql_id, child_number child,
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,'No','Yes') Offload,
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,0,
100*(IO_CELL_OFFLOAD_ELIGIBLE_BYTES-IO_INTERCONNECT_BYTES)
/decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,1,IO_CELL_OFFLOAD_ELIGIBLE_BYTES)) "IO_ SAVED_%",
(elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions) avg_etime,
--decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,buffer_gets/decode(nvl(executions,0),0 ,1,executions),null) avg_lio,
--decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,disk_reads/decode(nvl(executions,0),0, 1,executions),null) avg_pio,
sql_text
from v$sql s
where upper(sql_text) like upper(nvl('&sql_text',sql_text))
and sql_text not like 'BEGIN :sql_text := %'
and sql_text not like '%IO_CELL_OFFLOAD_ELIGIBLE_BYTES%'
and sql_id like nvl('&sql_id',sql_id)
order by 1, 2, 3
/
Enter value for sql_text: select UNITS_SOLD from sales where order_id > 10000
Enter value for sql_id: 6xtxn2qknnw1p
SQL_ID CHILD OFFLOAD IO_SAVED_% AVG_ETIME SQL_TEXT
------------- ------ ------- ---------- ---------- ----------------------------------------
6xtxn2qknnw1p 0 Yes 87.60 .52 select UNITS_SOLD from sales where order
Again we check the IO SAVED by offloading for a particular query and you can see that 88 % of IO hits on storage layer
LIBCELL in its name tells us that LIB stands for library and CELL stands for storage cell
LIBCELL acts as a liquid layer which has the source code that allows the oracle kernel to communicate to the storage cells using network protocol called RDS(reliable datagram sockets)
Do you know that there is small library file used in exadata machine which is the core library which speaks with the iDB(intelligence database protocol)/RDS to get the data from storage cell
Let us get some practical insights from the machine
There are set of kernel parameters and error codes inside the file
This Libcell file actively interacts between kernel in the OS layer and storage layer and has the intelligence to ask for the data from storage cells via iDB which is required by the database or end user
[root@exceladm00 ~]# head libcell && echo "=================================================" &&tail libcell
_DYNAMIC
_GLOBAL_OFFSET_TABLE_
__gmon_start__
__cxa_finalize
_Jv_RegisterClasses
dbgc_get_gp
kgefac_
kgeasnmierr
__assert_fail
dbgdChkEventInt
=================================================
^^^^^^^^Error: Buffer overrun occurred, forced exit
^^^^^^^^Initialization of symbol handler failed. Error code %d
^^^^^^^^RtlCaptureContext function not found in ntdll.dll
^^^^^^^^StackWalk is terminated abnormally. Error code %d
^^^^^^^^Exception is raised during stack walking
^^^^^^^^Signal %s is raised at 0x%p
^^^^^^^^Intel(R) Pentium(R) M and compatible Intel processors
^^^^^^^^Fatal Error: Can not initiate the Heap
basic
disabled
Lets view the process
[root@exceladm00 ~]# lsof|grep libcell
cellrssrm 4312 root mem REG 8,1 1717370 587643 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/lib/libcell11.so
cellrssrm 4321 root mem REG 8,1 1717370 587643 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/lib/libcell11.so
cellrssrm 4322 root mem REG 8,1 1717370 587643 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/lib/libcell11.so
java 4323 root mem REG 8,1 1717370 587643 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/lib/libcell11.so
cellrsbkm 4324 root mem REG 8,1 1717370 587643 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/lib/libcell11.so
cellrsbkm 4339 root mem REG 8,1 1717370 587643 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/lib/libcell11.so
Lets take a close look at trace for libcell process during a sql transaction and observe the insights from it
I triggered a select statement and grabbed the pid from the session for libcell
we can see lseek () system call which has the file pointer and increment the seek set from 0 to 512
we observe read() and write() system call which performs the data read from storage cell and write it to the buffer
Libcell constructs the sql statement over iDB/RDS and requests the data from the storage layer
you can observe the db block size as 8208 in the read system call which is shown in trace
The contents shown in this article are just for general audience for understanding purpose and not standard operation procedure.Never ever try this in a production machine
SELECT customer_id,first_name,last_name
FROM customers
WHERE NOT REGEXP_LIKE(first_name, '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}', 'i');
There is a hidden parameter which has a heartbeat timeout threshold for cell offload
90 seconds is cell induced limit which is defined by the cell level parameter _cell_oflsrv_heartbeat_timeout_sec.
One of the solution is to disable Regular expression offloading at the DB level
Identify the function id of the regexplike() first
The function id details looks something like this but the below output is not exact function id of regexplike
SQL> select name,offloadable,datatype,func_id from v$sqlfn_metadata where name like '%REGEXP%';
NAME OFF DATATYPE FUNC_ID
------------------------------ --- -------- ----------
REGEXP_LIKE YES STRING 534
Disable only the particular function
SQL>alter system set cell_offload_parameters="OPT_DISABLED={534}" scope=both sid='*' ;
Another efficient solution would be to rewrite the query with different logic without using regex function
I try to drop an asm diskgroup which is not properly created but i get the error that diskgroup is not mounted
SQL> drop diskgroup DATA;
drop diskgroup DATA
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15001: diskgroup "DATA" does not exist or is not mounted
Error code:
[oracle@exdbadm02 ~]$ oerr ora 15039 && oerr ora 15001
15039, 00000, "diskgroup not dropped"
// *Cause: An attempt to drop a diskgroup failed.
// *Action: See the associated messages for details about why the drop
// was not successful.
//
15001, 00000, "diskgroup \"%s\" does not exist or is not mounted"
// *Cause: An operation failed because the diskgroup specified does not
// exist or is not mounted by the current ASM instance.
// *Action: Verify that the diskgroup name used is valid, that the
// diskgroup exists, and that the diskgroup is mounted by
// the current ASM instance.
//
When i checked the logfile,i can see couple of errors because the physical disks are removed from the storage tier
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
ERROR: alter diskgroup DATA MOUNT /* ASMCA */
Sat Feb 06 16:58:19 2021
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
Sat Feb 06 16:59:11 2021
SQL> drop diskgroup DATA
ORA-15039: diskgroup not dropped
ORA-15001: diskgroup "DATA" does not exist or is not mounted
Workaround:
If graceful drop doesnot work ,then we can forcefully drop the diskgroup with contents option
SQL> drop diskgroup DATA force including contents;
Diskgroup dropped.
Oracle forcefully erase the diskgroup header and metadata of the diskgroups
NOTE: Assigning number (2,9) to disk (o/192.168.56.33/DATA_CD_cell19_stocell)
NOTE: Assigning number (2,8) to disk (o/192.168.56.33/DATA_CD_cell09_stocell)
NOTE: erasing header on grp 2 disk DATA_CD_CELL09_STOCELL
NOTE: erasing header on grp 2 disk DATA_CD_CELL19_STOCELL
NOTE: Disk DATA_CD_CELL09_STOCELL in mode 0x19 marked for de-assignment
NOTE: Disk DATA_CD_CELL19_STOCELL in mode 0x19 marked for de-assignment
SUCCESS: diskgroup DATA was force dropped
SUCCESS: drop diskgroup DATA force including contents
The error occurs while installing RAC software and already an existing single instance oracle restart is present on the host which prevents the installation
Cause - The installer detected single-instance versions of Cluster Synchronization Services (CSS) on the following nodes in the cluster:exdbadm02
Action - Before proceeding with the installation, you must deconfigure CSS, shut down any Oracle Automatic Storage Management (Oracle ASM) instances, and shut down all databases that use Oracle ASM for storage on all of the listed nodes. Oracle Clusterware installation starts a cluster-wide version of CSS. After Oracle Clusterware installation is complete, you may start up any databases and Oracle ASM instances that previously used CSS. When they are brought back up, Oracle Clusterware provides CSS.
Workaround:
Check the single instance oracle restart configuration on the server
[root@exdbadm02 oracle]# ps -ef|grep has
root 4198 1 0 16:59 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 7046 5418 0 17:25 pts/6 00:00:00 grep has