disk_asynch_io and filesystemio_options in oracle – LEVERAGE SYSTEM IO PERFORMANCE IN FILESYSTEM
Basically, there are two types of IO
- Synchronous I/O- A process waits for previous operation to complete safely and also waits for acknowledgement until completion to signal next activity.
- Asynchronous I/O – A process continue to operate with multiple I/O tasks by overlapping without any acknowledgement of the completion or failure of previous tasks irrespectively.
In general linux or unix filesystems, I/O operations has to pass through filesystem cache. This is the default behavior of the system which requires additional CPU resources to process the tasks. For database files stored in ASM, by default async IO is enabled and hence there is no need to configure async IO manually.
Below output shows the FS cache usage on oracle datafiles
Look at RES and PAGES column for system datafile which is quite high in a non busy system.
RES stands for resident memory which is the physical memory used by a process without swaps
[root@xhydra bin]# cd /apps01/oradata/DB9ZX/datafile/
[root@xhydra datafile]# fincore *|sort -hr|head
181M 46323 910M o1_mf_system_k582h8cy_.dbf <<<=== File which consume FS cache
63.8M 16334 660M o1_mf_sysaux_k582jcl6_.dbf
30M 7692 707.5M o1_mf_users_k582jvpo_.dbf
16.3M 4170 270M o1_mf_system_k582qwgh_.dbf
576K 144 330M o1_mf_sysaux_k582qwgm_.dbf
24K 6 36M temp012022-04-11_07-03-55-578-AM.dbf
16K 4 100M o1_mf_undotbs1_k582qwgp_.dbf
8K 2 131M o1_mf_temp_k582l8cq_.tmp
RES PAGES SIZE FILE
0B 0 10M undo01.dbf
[root@xhydra datafile]# pwd
/apps01/oradata/DB9ZX/datafile
To bypass the filesystem cache, there are some tweaks in oracle database to control the speed of processing with the help of filesystemio_options and disk_asynch_io parameters. Oracle can influence these IO parameters to control the type of IO for bypassing the filesystem cache and speed up the service. So lets test with different settings and observe the results.
If disk_asynch_io is true, then asynchronous IO operation is performed for filesystems which is faster compared to safe synchronous mode.
Specifically db file async I/O submit is mimicked in place of ‘db file parallel write’ . To disable the mimic event, disable disk_asynch_io .
To confirm if asynchronous is used or not, check the kioctx. If there are non zero values on first two columns, then async IO is used.
[root@xhydra ~]# cat /proc/slabinfo |grep 'kio'
kioctx 11 11 704 11 2 : tunables 0 0 0 : slabdata 1 1 0
List of background wait events with disk_asynch_io =true and filesystemio_options =none

List of foreground events may vary depending on the SQL statements. But look at IO events specifically.

Now we perform simple tests without any workload in the database for different values of filesystemio_options and disk_asynch_io =true
First, we perform some insert operation into a table with filesystemio_options = none option
It took 13 seconds for the IO to complete. Note: These behavior may not be same in your system due to different platforms, workload, resources etc so test it thoroughly before implementation.
kIsH@Xhydra<>create table async (name varchar2(100),num number(38));
Table created.
kIsH@Xhydra<>begin
2 for i in 1 .. 1000000 loop
3 insert into async values('abc',i);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:13.80
pwrite64 is the system call used for synchronous calls
[root@xhydra oracle]# grep -A3 'pwrite64' syncio
0.000069 pwrite64(259, "\6\242\0\0c\2\300\0\357\3214\0\0\0\1\6\3214\0\0\2\0\0\0\252\1\0\0\356\3214\0"..., 8192, 5005312) = 8192
0.000859 pwrite64(259, "\6\242\0\0\2544\301\0\357\3214\0\0\0\1\6\321\336\0\0\1\0\0\0\247\1\0\0\356\3214\0"..., 8192, 647331840) = 8192
0.000648 pwrite64(259, "\6\242\0\0k\2\300\0\357\3214\0\0\0\1\6\224\261\0\0\2\0\0\0\253\1\0\0\356\3214\0"..., 8192, 5070848) = 8192
0.000620 pwrite64(259, " \242\0\0\2504\301\0\356\3214\0\0\0\1\4\235D\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 647299072) = 8192
0.000560 semtimedop(3, [{22, -1, 0}], 1, {tv_sec=3, tv_nsec=0}) = -1 EAGAIN (Resource temporarily unavailable)
3.005406 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=0, tv_usec=23368}, ru_stime={tv_sec=0, tv_usec=53586}, ...}) = 0
0.000064 semtimedop(3, [{22, -1, 0}], 1, {tv_sec=3, tv_nsec=0}) = -1 EAGAIN (Resource temporarily unavailable)
First, we enable setall value which denotes both async and direct IO can be used whenever possible.
kIsH@Xhydra<>alter system set filesystemio_options=setall scope=spfile;
System altered.
kIsH@Xhydra<>startup force;
ORACLE instance started.
Total System Global Area 1962932432 bytes
Fixed Size 9136336 bytes
Variable Size 1140850688 bytes
Database Buffers 805306368 bytes
Redo Buffers 7639040 bytes
Database mounted.
Database opened.
kIsH@Xhydra<>select name,value from v$parameter where name like '%disk_async%' or name like '%filesystemio%';
NAME VALUE
-------------------- --------------------
disk_asynch_io TRUE
filesystemio_options SETALL
Insert statement completed in less than previous run which denotes that asynchronous IO is used. How to confirm this?
kIsH@Xhydra<>create table async as select * from sys.async;
Table created.
Elapsed: 00:00:00.36
kIsH@Xhydra<>begin
2 for i in 1 .. 1000000 loop
3 insert into async values('abc',i);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:11.64
io_submit and io_getevents are the system calls used for asynchronous and direct IO which confirms the usage. Refer to https://man7.org/linux/man-pages/man2/io_getevents.2.html and https://man7.org/linux/man-pages/man2/io_submit.2.html or man page
Pseudocode for async IO with control blocks
#include <linux/aio_abi.h>
struct iocb {
__u64 aio_data;
__u32 PADDED(aio_key, aio_rw_flags);
__u16 aio_lio_opcode;
__s16 aio_reqprio;
__u32 aio_fildes;
__u64 aio_buf;
__u64 aio_nbytes;
__s64 aio_offset;
__u64 aio_reserved2;
__u32 aio_flags;
__u32 aio_resfd;
};
strace output shows and confirm the asynchronous operations, after modifying the parameter.
[root@xhydra oracle]# egrep 'aio|io_' asyncio|tail -5
0.009802 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33feb3a0, obj=0x7f7c33feb3a0, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
0.002236 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33f51ef0, obj=0x7f7c33f51ef0, res=1048576, res2=0}, {data=0x7f7c33f57f80, obj=0x7f7c33f57f80, res=1048576, res2=0}, {data=0x7f7c33f85080, obj=0x7f7c33f85080, res=1048576, res2=0}, {data=0x7f7c33f7e980, obj=0x7f7c33f7e980, res=1048576, res2=0}, {data=0x7f7c33f38960, obj=0x7f7c33f38960, res=1048576, res2=0}, {data=0x7f7c33f32260, obj=0x7f7c33f32260, res=1048576, res2=0}, {data=0x7f7c34090970, obj=0x7f7c34090970, res=1048576, res2=0}, {data=0x7f7c3404e9b0, obj=0x7f7c3404e9b0, res=1048576, res2=0}, {data=0x7f7c340c7b60, obj=0x7f7c340c7b60, res=1048576, res2=0}, {data=0x7f7c340929a0, obj=0x7f7c340929a0, res=983040, res2=0}, {data=0x7f7c3403c800, obj=0x7f7c3403c800, res=1048576, res2=0}, {data=0x7f7c33f96550, obj=0x7f7c33f96550, res=1048576, res2=0}, {data=0x7f7c3409aa60, obj=0x7f7c3409aa60, res=1048576, res2=0}, {data=0x7f7c34029300, obj=0x7f7c34029300, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 14
0.001353 io_submit(0x7f7c33305000, 3, [{aio_data=0x7f7c33f50530, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0jP@\4\"\2226\0\0\0?\4\6\336\0\0\16\0\20\0\231\7\0\0\335\0tt"..., aio_nbytes=1048576, aio_offset=168640512}, {aio_data=0x7f7c33f450f0, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0\352P@\4[\2226\0\0\0u\4\270\221\0\0\16\0\20\0\231\7\0\0\335\0tt"..., aio_nbytes=1048576, aio_offset=169689088}, {aio_data=0x7f7c33f3e9f0, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0jQ@\4\225\2226\0\0\0u\4\221\336\0\0\16\0\20\0\231\7\0\0\336\0tt"..., aio_nbytes=131072, aio_offset=170737664}]) = 3
0.000207 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33f50530, obj=0x7f7c33f50530, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
0.002980 io_getevents(0x7f7c33305000, 1, 128, [{data=0x7f7c33f3e9f0, obj=0x7f7c33f3e9f0, res=131072, res2=0}, {data=0x7f7c33f450f0, obj=0x7f7c33f450f0, res=1048576, res2=0}], {tv_sec=600, tv_nsec=0}) = 2
Now set only asynch option
kIsH@Xhydra<>alter system set filesystemio_options=asynch scope=spfile;
System altered.
Insert is faster than ‘none‘ option but slower than ‘setall‘
kIsH@Xhydra<>create table async as select * from sys.async;
Table created.
Elapsed: 00:00:00.01
kIsH@Xhydra<>set timing on
kIsH@Xhydra<>begin
2 for i in 1 .. 1000000 loop
3 insert into async values('abc',i);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:11.75
Even for asynch option, io_getevents and io_submit can be observed with reduced
[root@xhydra oracle]# egrep 'aio|io_' asyncio|tail -5
0.003120 io_getevents(0x7f61a0825000, 1, 128, [{data=0x7f61a15fdd70, obj=0x7f61a15fdd70, res=32768, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
0.000067 io_submit(0x7f61a0825000, 1, [{aio_data=0x7f61a15fdd70, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="&\242\0\0\220\0@\4\212S8\0\0\0\1\0042\32\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., aio_nbytes=8192, aio_offset=1179648}]) = 1
0.003446 io_getevents(0x7f61a0825000, 1, 128, [{data=0x7f61a15fdd70, obj=0x7f61a15fdd70, res=8192, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
0.000075 io_submit(0x7f61a0825000, 1, [{aio_data=0x7f61a15fdd70, aio_lio_opcode=IOCB_CMD_PWRITE, aio_fildes=261, aio_buf="\2\242\0\0,\34@\4EN8\0\0\0\1\4\353\27\0\0\f\0\33\0u\1\0\0l\0\17\17"..., aio_nbytes=24576, aio_offset=59080704}]) = 1
0.003171 io_getevents(0x7f61a0825000, 1, 128, [{data=0x7f61a15fdd70, obj=0x7f61a15fdd70, res=24576, res2=0}], {tv_sec=600, tv_nsec=0}) = 1
Similarly enabling direct IO
kIsH@Xhydra<>alter system set filesystemio_options=directio scope=spfile;
System altered.
This value has effect on the database layer rather than filesystem layer.
kIsH@Xhydra<>create table async as select * from sys.async;
Table created.
Elapsed: 00:00:00.22
kIsH@Xhydra<>set timing on
kIsH@Xhydra<>set timing on
kIsH@Xhydra<>begin
2 for i in 1 .. 1000000 loop
3 insert into async values('abc',i);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:12.23
Again linux pwrite is used for direct IO with synchronous calls
[root@xhydra oracle]# egrep 'pwrite' asyncio
0.001275 pwrite64(260, "#\242\0\0\342@\301\1\22\3559\0\0\0\37\4\215\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 672940032) = 8192
0.000187 pwrite64(259, "\6\242\0\0c\2\300\0\276\3529\0\0\0\1\6\224\253\0\0\2\0\0\0\252\1\0\0\275\3529\0"..., 8192, 5005312) = 8192
0.002090 pwrite64(259, "\6\242\0\0\2524\301\0\276\3529\0\0\0\1\6Q\26\0\0\1\0\0\0\247\1\0\0\275\3529\0"..., 8192, 647315456) = 8192
0.001655 pwrite64(259, "\6\242\0\0k\2\300\0\276\3529\0\0\0\1\6\"C\0\0\2\0\0\0\253\1\0\0\275\3529\0"..., 8192, 5070848) = 8192