We all get pressure from application team for data migration of big tables and schemas which is hard time for us during business production hours.There should be some faster approach to save us in this situation 🙂 You can test the below methods and be a super hero oracle dba to your team.Let us see how!!

Always test anything before you mess up. Testing makes things clear!

Case study :There are 4 big tables to be exported from prod and imported to dev.

Database name – source – testdb

destination – test1db

version – 12.1.0

Total size of the four tables = 436GB

Tables: big1,big2,big3,big4

Prod schema: test

Dev schema: test1

1) Estimated time taken for traditional import = 13 hours

2) Importing tables separately ,followed by constraints and indexes took 7 hours

3)Firing the import of four tables parallely and their objects took 6 hours

i usually estimate parallelism based on the formula ===> ” DOP = ” parallel threads per CPU * number of cpu cores “ and with the total available CPU cycles

Take the export backup of full tables with the parameters with max parallel servers:

~]# vi test.par 
directory = export_dir 
dumpfile = exptest_%U.dmp 
logfile = exptest.log 
tables = test.big1,test.big2,test.big3,test.big4
filesize=10GB #this parameter will split the export dumpfiles to reduce single file I/O read and write.
parallel = 4  -------> parallel = "parallel = "Total size of objects/CPU capacity"
~]# expdp user/password parfile=test.par 

In this case , datapump take 1 hour for exporting the data.

Job “SYSTEM”.”SYS_EXPORT_TABLE_01″ successfully completed at Thu Dec 21 10:25:26 2018 elapsed 001:01:03

Importing the data with normal method:

~]vi test.par



Time taken for normal method:

nohup expdp username/password parfile=test.par

22-DEC-18 15:09:59.953: Job “SYSTEM”.”SYS_IMPORT_TABLE_02″ successfully completed at Thu Dec 22 15:09:59 2018 elapsed 0 14:11:03

1)Importing with faster approach:


~] vi test.par
exclude=index,constraint,statistics,grant  <=====
access_method=direct_path <=======
transform=disable_archive_logging:y <=======
cluster=y <====== if you have RAC setup,then this parameter utilizes cluster resources to speed up the import. Always use this parameter for larger datasets as it causes cluster overhead

nohup impdp username/password parfile=test.par

The above parameters which are highlighted in red are the major impact for the performance enhancement.

direct path access method insert the rows on top of the high water mark. Note that this parameter will not always be used by oracle due to certain limitations of table properties but mostly for all tables oracle automatically choose access method

disable archive logging parameter will not archive the redologs during import. If the database is in force_logging mode , then the parameter changes doesnot take any effect.If archivelogs are disabled,then there is not much impact of this parameter

Command for dumping DDL to script file:

---impdp username/password directory=default tables=test.big1,test.big2,test.big3,test.big4 sqlfile=script_ddl.sql include=constraint,index,grant

a script file is placed in your current directory named “script_ddl.sql”

Once the ddl are dumped just modify the ddl with higher DOP for index creation and enable novalidate on constraints in the script file. Edit the file in vi editor.Split the index, constraint and grant in separate files and run it.

Substitute “noparallel” or “parallel 1” with “parallel 16” on all ddl

The degree of parallel is calculated by Total size of objects / CPU capacity..

Calculating the number of dumpfiles. Note that oracle internally distributes the data across the dumpfiles. So if you specify substitution variables like %U, then there may be random number of dumpfiles.

kish@exdbx<>select 450/50 from dual;


This means that depending on the number of tables we can use parallel servers which is 4 here!In datapump,no matter how many parallel server you use,oracle will utilize parallel degree per table count.If i have 4 tables,then using 16 parallel degree would make remaining 12 worker processes idle.

Note1: Degree of parallelism is dependent on the number of cores of CPU on your server.This server has 48 cores.If your server has less cores then you can use less parallel resource

Substitute “enable” to “enable novalidate” for the ddl command

Note2: ‘Enable novalidate’ can be used in case of emergency like business hours for faster import.After constraints are imported without validation,you can ‘enable validate’ leisurely so that there will not be application user complaint.

Eg: ~] vi script_ddl.sql

---set timing on;
---alter index "test"."big1_ind" parallel 16; 
---alter table "test"."big1" add constraint "big1_pk" primary key("row11","row12","row13") using index "test"."big1_ind" enable novalidate; (By default, oracle validate constraints which take huge amount of time)<======

Note: Using novalidate clause means, oracle will not validate constraints only while import but in future if there are constraint violation, then they will be validated

use “sqlplus” to import the constraints and indexes

---nohup sqlplus username/password@script_ddl.sql

Time taken to import without constraints and indexes:

22-DEC-18 22:10:33.991: Job “SYSTEM”.”SYS_IMPORT_TABLE_02″ successfully completed at Thu Dec 22 22:10:33 2018 elapsed 0 00:48:04

Time until index and constraint validated:

Table data without metadata like constraints and indexes get import faster.

Major time spent on validating constraints and importing indexes. we reduce the time by not validating the constraints and creating index with parallelism.

index altered

elapsed time: 06:30:00 hours

For faster import,we can enable novalidate to import first and then we can validate constraint.

After the process, validate constraints by replacing “enable validate” on the script_ddl.sql file if there are performance issues.

This approach saved 9 hours from the previous method.

2)Again do the import process by firing the import parallely for four tables with four putty sessions at the same time to parallelize the operation manually.Follow the same process 1) but split the tables.

session 1

impdp tables=big1

session 2

impdp tables=big2

session 3

impdp tables=big3

session 4

impdp tables=big4

this approach reduced 30 mins from previous method.

elapsed time: 06:01:00 hours

The methods shown are tested in a different environment.The performance varies based on the configuration of the system.There is no guarantee of speed.Please test it in POC before refreshing in production or development.

If you have big schemas,then follow the same approach with high parallel servers and make sure to split the import jobs with multiple chunks of dumpfile to make parallel more effective

eg: if a schema SCOTT has 50 tables,then you can follow the same method which has been used for 4 tables.But parallel should be equal to or less than 50 (or 48 or 32 …) based the CPU power you have.

If out of 50 tables only ten tables are big and remaining 45 tables are considerably small,then start import of 5 big tables separately with 5 different putty sessions excluding index,constraints and statistics and parallely import 45 small tables by schema import excluding the 5 big tables with 48 parallel degree without excluding index,constraint and statistics so that you save time and dont fall into serialization trap!

Init parameters to refine before fresh migration:

Processes and sessions parameter should be set appropriately to avoid ORA-0020

Flashback and force logging can be switch off if enabled

db_block_checksum and db_block_checking are datablock checking parameters for block validation and block corruption. These can be set to off to bypass the block validation.

disk_asynch_io should be enabled to encourage asynchronous IO from OS level

parallel_min_servers and parallel_max_servers are parameters which set soft and hard limit for parallelism. Set it wisely

parallel_degree_policy can be set to AUTO if tons of CPU is available but still oracle not use parallel

12c feature _optimizer_gather_stats_on_load should be disabled before load to avoid auto statistics gathering and enable after load

aq_tm_process should be set to a higher value to control the message queue of the jobs. Expdp/Impdp totally depends on AQ processing.

_memory_broker_stat_interval should be increased to 999 to control auto sga resize operations

SQL> select name,value from v$parameter where name in ('db_block_checking','db_block_checksum','sessions','processes','disk_asynch_io','parallel_max_servers','parallel_min_servers','parallel_degree_policy','undo_retention');

NAME                 VALUE
-------------------- --------------------
processes            500
sessions             772
disk_asynch_io       TRUE
db_block_checksum    TYPICAL
undo_retention       900
db_block_checking    FALSE
parallel_min_servers 4
parallel_max_servers 40
parallel_degree_poli MANUAL

9 rows selected.

THANK YOU !! Hope you enjoyed the post

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s