IMPDP PERFORMANCE SPEED UP IN DATAPUMP ORACLE
We all get pressure from application team for data migration of big tables and schemas which is hard time for us during business production hours.There should be some faster approach to save us in this situation 🙂 You can test the below methods and be a super hero oracle dba to your team.Let us see how!!
Always test anything before you mess up. Testing makes things clear!
Case study :There are 4 big tables to be exported from prod and imported to dev.
Database name – source – testdb
destination – test1db
version – 12.1.0
Total size of the four tables = 436GB
Prod schema: test
Dev schema: test1
1) Estimated time taken for traditional import = 13 hours
2) Importing tables separately ,followed by constraints and indexes took 7 hours
3)Firing the import of four tables parallely and their objects took 6 hours
i usually estimate parallelism based on the formula ===> ” DOP = ” parallel threads per CPU * number of cpu cores “ and with the total available CPU cycles
Take the export backup of full tables with the parameters with max parallel servers:
~]# vi test.par directory = export_dir dumpfile = exptest_%U.dmp logfile = exptest.log tables = test.big1,test.big2,test.big3,test.big4 filesize=10GB #this parameter will split the export dumpfiles to reduce single file I/O read and write. parallel = 4 -------> parallel = "parallel = "Total size of objects/CPU capacity" ~]# expdp user/password parfile=test.par
In this case , datapump take 1 hour for exporting the data.
Job “SYSTEM”.”SYS_EXPORT_TABLE_01″ successfully completed at Thu Dec 21 10:25:26 2018 elapsed 001:01:03
Importing the data with normal method:
directory=export_dir dumpfile=exptest_%U.dmp logfile=imp_speed.log tables=test.big1,test.big2,test.big3,test.big4 remap_schema=test:test1 remap_tablespace=prod1:dev1,prod2:dev2 parallel=4 logtime=all
Time taken for normal method:
nohup expdp username/password parfile=test.par
22-DEC-18 15:09:59.953: Job “SYSTEM”.”SYS_IMPORT_TABLE_02″ successfully completed at Thu Dec 22 15:09:59 2018 elapsed 0 14:11:03
1)Importing with faster approach:
~] vi test.par directory=export_dir dumpfile=expfast_%U.dmp logfile=impfast.log tables=test.big1,test.big2,test.big3,test.big4 remap_schema=test:test1 table_exists_action=replace parallel=4 logtime=all exclude=index,constraint,statistics,grant <===== access_method=direct_path <======= transform=disable_archive_logging:y <======= cluster=y <====== if you have RAC setup,then this parameter utilizes cluster resources to speed up the import. Always use this parameter for larger datasets as it causes cluster overhead nohup impdp username/password parfile=test.par
The above parameters which are highlighted in red are the major impact for the performance enhancement.
direct path access method insert the rows on top of the high water mark. Note that this parameter will not always be used by oracle due to certain limitations of table properties but mostly for all tables oracle automatically choose access method
disable archive logging parameter will not archive the redologs during import. If the database is in force_logging mode , then the parameter changes doesnot take any effect.If archivelogs are disabled,then there is not much impact of this parameter
Command for dumping DDL to script file:
---impdp username/password directory=default tables=test.big1,test.big2,test.big3,test.big4 sqlfile=script_ddl.sql include=constraint,index,grant
a script file is placed in your current directory named “script_ddl.sql”
Once the ddl are dumped just modify the ddl with higher DOP for index creation and enable novalidate on constraints in the script file. Edit the file in vi editor.Split the index, constraint and grant in separate files and run it.
Substitute “noparallel” or “parallel 1” with “parallel 16” on all ddl
The degree of parallel is calculated by Total size of objects / CPU capacity..
Calculating the number of dumpfiles. Note that oracle internally distributes the data across the dumpfiles. So if you specify substitution variables like %U, then there may be random number of dumpfiles.
kish@exdbx<>select 450/50 from dual; 450/50 ---------- 9
This means that depending on the number of tables we can use parallel servers which is 4 here!In datapump,no matter how many parallel server you use,oracle will utilize parallel degree per table count.If i have 4 tables,then using 16 parallel degree would make remaining 12 worker processes idle.
Note1: Degree of parallelism is dependent on the number of cores of CPU on your server.This server has 48 cores.If your server has less cores then you can use less parallel resource
Substitute “enable” to “enable novalidate” for the ddl command
Note2: ‘Enable novalidate’ can be used in case of emergency like business hours for faster import.After constraints are imported without validation,you can ‘enable validate’ leisurely so that there will not be application user complaint.
Eg: ~] vi script_ddl.sql
---set timing on; ---create index "test"."big1_ind" on "test"."big1"("row1","row2","row3" DESC) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "dev1" parallel 16;
---alter index "test"."big1_ind" parallel 16;
---alter table "test"."big1" add constraint "big1_pk" primary key("row11","row12","row13") using index "test"."big1_ind" enable novalidate; (By default, oracle validate constraints which take huge amount of time)<======
Note: Using novalidate clause means, oracle will not validate constraints only while import but in future if there are constraint violation, then they will be validated
use “sqlplus” to import the constraints and indexes
---nohup sqlplus username/password@script_ddl.sql
Time taken to import without constraints and indexes:
22-DEC-18 22:10:33.991: Job “SYSTEM”.”SYS_IMPORT_TABLE_02″ successfully completed at Thu Dec 22 22:10:33 2018 elapsed 0 00:48:04
Time until index and constraint validated:
Table data without metadata like constraints and indexes get import faster.
Major time spent on validating constraints and importing indexes. we reduce the time by not validating the constraints and creating index with parallelism.
elapsed time: 06:30:00 hours
For faster import,we can enable novalidate to import first and then we can validate constraint.
After the process, validate constraints by replacing “enable validate” on the script_ddl.sql file if there are performance issues.
This approach saved 9 hours from the previous method.
2)Again do the import process by firing the import parallely for four tables with four putty sessions at the same time to parallelize the operation manually.Follow the same process 1) but split the tables.
this approach reduced 30 mins from previous method.
elapsed time: 06:01:00 hours
The methods shown are tested in a different environment.The performance varies based on the configuration of the system.There is no guarantee of speed.Please test it in POC before refreshing in production or development.
If you have big schemas,then follow the same approach with high parallel servers and make sure to split the import jobs with multiple chunks of dumpfile to make parallel more effective
eg: if a schema SCOTT has 50 tables,then you can follow the same method which has been used for 4 tables.But parallel should be equal to or less than 50 (or 48 or 32 …) based the CPU power you have.
If out of 50 tables only ten tables are big and remaining 45 tables are considerably small,then start import of 5 big tables separately with 5 different putty sessions excluding index,constraints and statistics and parallely import 45 small tables by schema import excluding the 5 big tables with 48 parallel degree without excluding index,constraint and statistics so that you save time and dont fall into serialization trap!
Init parameters to refine before fresh migration:
Processes and sessions parameter should be set appropriately to avoid ORA-0020
Flashback and force logging can be switch off if enabled
db_block_checksum and db_block_checking are datablock checking parameters for block validation and block corruption. These can be set to off to bypass the block validation.
disk_asynch_io should be enabled to encourage asynchronous IO from OS level
parallel_min_servers and parallel_max_servers are parameters which set soft and hard limit for parallelism. Set it wisely
parallel_degree_policy can be set to AUTO if tons of CPU is available but still oracle not use parallel
12c feature _optimizer_gather_stats_on_load should be disabled before load to avoid auto statistics gathering and enable after load
aq_tm_process should be set to a higher value to control the message queue of the jobs. Expdp/Impdp totally depends on AQ processing.
_memory_broker_stat_interval should be increased to 999 to control auto sga resize operations
SQL> select name,value from v$parameter where name in ('db_block_checking','db_block_checksum','sessions','processes','disk_asynch_io','parallel_max_servers','parallel_min_servers','parallel_degree_policy','undo_retention'); NAME VALUE -------------------- -------------------- processes 500 sessions 772 disk_asynch_io TRUE db_block_checksum TYPICAL undo_retention 900 db_block_checking FALSE parallel_min_servers 4 parallel_max_servers 40 parallel_degree_poli MANUAL cy 9 rows selected.
THANK YOU !! Hope you enjoyed the post