disk_asynch_io and filesystemio_options in oracle – LEVERAGE SYSTEM IO PERFORMANCE IN FILESYSTEM

disk_asynch_io and filesystemio_options in oracle – LEVERAGE SYSTEM IO PERFORMANCE IN FILESYSTEM

Basically, there are two types of IO

  • Synchronous I/O- A process waits for previous operation to complete safely and also waits for acknowledgement until completion to signal next activity.
  • Asynchronous I/O – A process continue to operate with multiple I/O tasks by overlapping without any acknowledgement of the completion or failure of previous tasks irrespectively.

In general linux or unix filesystems, I/O operations has to pass through filesystem cache. This is the default behavior of the system which requires additional CPU resources to process the tasks. For database files stored in ASM, by default async IO is enabled and hence there is no need to configure async IO manually.

Below output shows the FS cache usage on oracle datafiles

Look at RES and PAGES column for system datafile which is quite high in a non busy system.

RES stands for resident memory which is the physical memory used by a process without swaps

[root@xhydra bin]# cd /apps01/oradata/DB9ZX/datafile/
[root@xhydra datafile]# fincore *|sort -hr|head
 181M 46323   910M o1_mf_system_k582h8cy_.dbf <<<=== File which consume FS cache
63.8M 16334   660M o1_mf_sysaux_k582jcl6_.dbf
  30M  7692 707.5M o1_mf_users_k582jvpo_.dbf
16.3M  4170   270M o1_mf_system_k582qwgh_.dbf
 576K   144   330M o1_mf_sysaux_k582qwgm_.dbf
  24K     6    36M temp012022-04-11_07-03-55-578-AM.dbf
  16K     4   100M o1_mf_undotbs1_k582qwgp_.dbf
   8K     2   131M o1_mf_temp_k582l8cq_.tmp
  RES PAGES   SIZE FILE
   0B     0    10M undo01.dbf
[root@xhydra datafile]# pwd
/apps01/oradata/DB9ZX/datafile

To bypass the filesystem cache, there are some tweaks in oracle database to control the speed of processing with the help of filesystemio_options and disk_asynch_io parameters. Oracle can influence these IO parameters to control the type of IO for bypassing the filesystem cache and speed up the service. So lets test with different settings and observe the results.

If disk_asynch_io is true, then asynchronous IO operation is performed for filesystems which is faster compared to safe synchronous mode.

Specifically db file async I/O submit is mimicked in place of ‘db file parallel write’ . To disable the mimic event, disable disk_asynch_io .

To confirm if asynchronous is used or not, check the kioctx. If there are non zero values on first two columns, then async IO is used.

[root@xhydra ~]# cat /proc/slabinfo |grep 'kio'
kioctx                11     11    704   11    2 : tunables    0    0    0 : slabdata      1      1      0

List of background wait events with disk_asynch_io =true and filesystemio_options =none

List of foreground events may vary depending on the SQL statements. But look at IO events specifically.

Now we perform simple tests without any workload in the database for different values of filesystemio_options and disk_asynch_io =true

First, we perform some insert operation into a table with filesystemio_options = none option

It took 13 seconds for the IO to complete. Note: These behavior may not be same in your system due to different platforms, workload, resources etc so test it thoroughly before implementation.

kIsH@Xhydra<>create table async (name varchar2(100),num number(38));

Table created.

kIsH@Xhydra<>begin
  2  for i in 1 .. 1000000 loop
  3  insert into async values('abc',i);
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:13.80

pwrite64 is the system call used for synchronous calls

[root@xhydra oracle]# grep -A3 'pwrite64' syncio

     0.000069 pwrite64(259,  "\6\242\0\0c\2\300\0\357\3214\0\0\0\1\6\3214\0\0\2\0\0\0\252\1\0\0\356\3214\0"..., 8192, 5005312) = 8192
     0.000859 pwrite64(259, "\6\242\0\0\2544\301\0\357\3214\0\0\0\1\6\321\336\0\0\1\0\0\0\247\1\0\0\356\3214\0"..., 8192, 647331840) = 8192
     0.000648 pwrite64(259, "\6\242\0\0k\2\300\0\357\3214\0\0\0\1\6\224\261\0\0\2\0\0\0\253\1\0\0\356\3214\0"..., 8192, 5070848) = 8192
     0.000620 pwrite64(259, " \242\0\0\2504\301\0\356\3214\0\0\0\1\4\235D\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 647299072) = 8192
     0.000560 semtimedop(3, [{22, -1, 0}], 1, {tv_sec=3, tv_nsec=0}) = -1 EAGAIN (Resource temporarily unavailable)
     3.005406 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=0, tv_usec=23368}, ru_stime={tv_sec=0, tv_usec=53586}, ...}) = 0
     0.000064 semtimedop(3, [{22, -1, 0}], 1, {tv_sec=3, tv_nsec=0}) = -1 EAGAIN (Resource temporarily unavailable)

First, we enable setall value which denotes both async and direct IO can be used whenever possible.

kIsH@Xhydra<>alter system set filesystemio_options=setall scope=spfile;

System altered.

kIsH@Xhydra<>startup force;
ORACLE instance started.

Total System Global Area 1962932432 bytes
Fixed Size                  9136336 bytes
Variable Size            1140850688 bytes
Database Buffers          805306368 bytes
Redo Buffers                7639040 bytes
Database mounted.
Database opened.

kIsH@Xhydra<>select name,value from v$parameter where name like '%disk_async%' or name like '%filesystemio%';

NAME                 VALUE
-------------------- --------------------
disk_asynch_io       TRUE
filesystemio_options SETALL

Insert statement completed in less than previous run which denotes that asynchronous IO is used. How to confirm this?

kIsH@Xhydra<>create table async as select * from sys.async;

Table created.

Elapsed: 00:00:00.36
kIsH@Xhydra<>begin
  2  for i in 1 .. 1000000 loop
  3  insert into async values('abc',i);
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.64

io_submit and io_getevents are the system calls used for asynchronous and direct IO which confirms the usage. Refer to https://man7.org/linux/man-pages/man2/io_getevents.2.html and https://man7.org/linux/man-pages/man2/io_submit.2.html or man page

Pseudocode for async IO with control blocks

           #include <linux/aio_abi.h>

           struct iocb {
               __u64   aio_data;
               __u32   PADDED(aio_key, aio_rw_flags);
               __u16   aio_lio_opcode;
               __s16   aio_reqprio;
               __u32   aio_fildes;
               __u64   aio_buf;
               __u64   aio_nbytes;
               __s64   aio_offset;
               __u64   aio_reserved2;
               __u32   aio_flags;
               __u32   aio_resfd;
           };

strace output shows and confirm the asynchronous operations, after modifying the parameter.

[root@xhydra oracle]# egrep 'aio|io_' asyncio|tail -5

     0.009802 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33feb3a0, obj=0x7f7c33feb3a0, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
     0.002236 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33f51ef0, obj=0x7f7c33f51ef0, res=1048576, res2=0}, {data=0x7f7c33f57f80, obj=0x7f7c33f57f80, res=1048576, res2=0}, {data=0x7f7c33f85080, obj=0x7f7c33f85080, res=1048576, res2=0}, {data=0x7f7c33f7e980, obj=0x7f7c33f7e980, res=1048576, res2=0}, {data=0x7f7c33f38960, obj=0x7f7c33f38960, res=1048576, res2=0}, {data=0x7f7c33f32260, obj=0x7f7c33f32260, res=1048576, res2=0}, {data=0x7f7c34090970, obj=0x7f7c34090970, res=1048576, res2=0}, {data=0x7f7c3404e9b0, obj=0x7f7c3404e9b0, res=1048576, res2=0}, {data=0x7f7c340c7b60, obj=0x7f7c340c7b60, res=1048576, res2=0}, {data=0x7f7c340929a0, obj=0x7f7c340929a0, res=983040, res2=0}, {data=0x7f7c3403c800, obj=0x7f7c3403c800, res=1048576, res2=0}, {data=0x7f7c33f96550, obj=0x7f7c33f96550, res=1048576, res2=0}, {data=0x7f7c3409aa60, obj=0x7f7c3409aa60, res=1048576, res2=0}, {data=0x7f7c34029300, obj=0x7f7c34029300, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 14
     0.001353 io_submit(0x7f7c33305000, 3, [{aio_data=0x7f7c33f50530, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0jP@\4\"\2226\0\0\0?\4\6\336\0\0\16\0\20\0\231\7\0\0\335\0tt"..., aio_nbytes=1048576, aio_offset=168640512}, {aio_data=0x7f7c33f450f0, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0\352P@\4[\2226\0\0\0u\4\270\221\0\0\16\0\20\0\231\7\0\0\335\0tt"..., aio_nbytes=1048576, aio_offset=169689088}, {aio_data=0x7f7c33f3e9f0, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0jQ@\4\225\2226\0\0\0u\4\221\336\0\0\16\0\20\0\231\7\0\0\336\0tt"..., aio_nbytes=131072, aio_offset=170737664}]) = 3
     0.000207 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33f50530, obj=0x7f7c33f50530, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
     0.002980 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33f3e9f0, obj=0x7f7c33f3e9f0, res=131072, res2=0}, {data=0x7f7c33f450f0, obj=0x7f7c33f450f0, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 2

Now set only asynch option

kIsH@Xhydra<>alter system set filesystemio_options=asynch scope=spfile;

System altered.

Insert is faster than ‘none‘ option but slower than ‘setall

kIsH@Xhydra<>create table async as select * from sys.async;

Table created.

Elapsed: 00:00:00.01
kIsH@Xhydra<>set timing on
kIsH@Xhydra<>begin
  2  for i in 1 .. 1000000 loop
  3  insert into async values('abc',i);
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.75

Even for asynch option, io_getevents and io_submit can be observed with reduced

[root@xhydra oracle]# egrep 'aio|io_' asyncio|tail -5

     0.003120 io_getevents(0x7f61a0825000, 1, 128, [{data=0x7f61a15fdd70, obj=0x7f61a15fdd70, res=32768, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
     0.000067 io_submit(0x7f61a0825000, 1, [{aio_data=0x7f61a15fdd70, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="&\242\0\0\220\0@\4\212S8\0\0\0\1\0042\32\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., aio_nbytes=8192, aio_offset=1179648}]) = 1
     0.003446 io_getevents(0x7f61a0825000, 1, 128, [{data=0x7f61a15fdd70, obj=0x7f61a15fdd70, res=8192, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
     0.000075 io_submit(0x7f61a0825000, 1, [{aio_data=0x7f61a15fdd70, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0,\34@\4EN8\0\0\0\1\4\353\27\0\0\f\0\33\0u\1\0\0l\0\17\17"..., aio_nbytes=24576, aio_offset=59080704}]) = 1
     0.003171 io_getevents(0x7f61a0825000, 1, 128, [{data=0x7f61a15fdd70, obj=0x7f61a15fdd70, res=24576, res2=0}], {tv_sec=600, tv_nsec=0}) = 1

Similarly enabling direct IO

kIsH@Xhydra<>alter system set filesystemio_options=directio scope=spfile;

System altered.

This value has effect on the database layer rather than filesystem layer.

kIsH@Xhydra<>create table async as select * from sys.async;

Table created.

Elapsed: 00:00:00.22
kIsH@Xhydra<>set timing on
kIsH@Xhydra<>set timing on
kIsH@Xhydra<>begin
  2  for i in 1 .. 1000000 loop
  3  insert into async values('abc',i);
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:12.23

Again linux pwrite is used for direct IO with synchronous calls

[root@xhydra oracle]# egrep 'pwrite' asyncio
     0.001275 pwrite64(260, "#\242\0\0\342@\301\1\22\3559\0\0\0\37\4\215\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 672940032) = 8192
     0.000187 pwrite64(259, "\6\242\0\0c\2\300\0\276\3529\0\0\0\1\6\224\253\0\0\2\0\0\0\252\1\0\0\275\3529\0"..., 8192, 5005312) = 8192
     0.002090 pwrite64(259, "\6\242\0\0\2524\301\0\276\3529\0\0\0\1\6Q\26\0\0\1\0\0\0\247\1\0\0\275\3529\0"..., 8192, 647315456) = 8192
     0.001655 pwrite64(259, "\6\242\0\0k\2\300\0\276\3529\0\0\0\1\6\"C\0\0\2\0\0\0\253\1\0\0\275\3529\0"..., 8192, 5070848) = 8192

Leave a Reply

%d bloggers like this: