We all get pressure from application team for data migration of big tables and schemas which is hard time for us during business production hours.There should be some faster approach to save us in this situation 🙂 You can test the below methods and be a super hero oracle dba to your team.Let us see how!!
Case study :There are 4 big tables to be exported from prod and imported to dev.
Database name – source – testdb
destination – test1db
version – 12.1.0
Total size of the four tables = 436GB
Prod schema: test
Dev schema: test1
1) Estimated time taken for traditional import = 13 hours
2) Importing tables separately ,followed by constraints and indexes took 7 hours
3)Firing the import of four tables parallely and their objects took 6 hours
i usually estimate parallelism based on the formula ===> ” parallel = “total size of objects/total dumpfiles “ and with the total available CPU cycles
Take the export backup of full tables with the parameters with max parallel servers:
~]# vi test.par directory = export_dir dumpfile = exptest_%U.dmp logfile = exptest.log tables = test.big1,test.big2,test.big3,test.big4 filesize=10GB #this parameter will split the export dumpfiles to reduce single file I/O read and write. parallel = 4 -------> parallel = "parallel = "total size of objects/total number of tables per filesize" ~]# expdp user/password parfile=test.par
In this case , datapump take 1 hour for exporting the data.
Job “SYSTEM”.”SYS_EXPORT_TABLE_01″ successfully completed at Thu Dec 21 10:25:26 2018 elapsed 001:01:03
Importing the data with normal method:
directory=export_dir dumpfile=exptest_%U.dmp logfile=imp_speed.log tables=test.big1,test.big2,test.big3,test.big4 remap_schema=test:test1 remap_tablespace=prod1:dev1,prod2:dev2 parallel=4 logtime=all
Time taken for normal method:
nohup expdp username/password parfile=test.par
22-DEC-18 15:09:59.953: Job “SYSTEM”.”SYS_IMPORT_TABLE_02″ successfully completed at Thu Dec 22 15:09:59 2018 elapsed 0 14:11:03
1)Importing with faster approach:
~] vi test.par directory=export_dir dumpfile=expfast_%U.dmp logfile=impfast.log tables=test.big1,test.big2,test.big3,test.big4 remap_schema=test:test1 table_exists_action=replace parallel=4 logtime=all exclude=index,constraint,statistic,grant <===== access_method=direct_path <======= transform=disable_archive_logging:y <======= cluster=y <====== if you have RAC setup,then this parameter utilizes cluster resources to speed up the import nohup impdp username/password parfile=test.par
The above parameters which are highlighted in red are the major impact for the performance enhancement.
direct path access method insert the rows on top of the high water mark. Note that this parameter will not always be used by oracle due to certain limitations of table properties but mostly for all tables oracle automatically choose access method
disable archive logging parameter will not archive the redologs during import. If the database is in force_logging mode , then the parameter changes doesnot take any effect.If archivelogs are disabled,then there is not much impact of this parameter
Command for dumping DDL to script file:
---impdp username/password directory=default tables=test.big1,test.big2,test.big3,test.big4 sqlfile=script_ddl.sql include=constraint,index,grant
a script file is placed in your current directory named “script_ddl.sql”
Once the ddl are dumped just modify the ddl with higher DOP for index creation and enable novalidate on constraints in the script file. Edit the file in vi editor.Split the index, constraint and grant in separate files and run it.
Substitute “noparallel” or “parallel 1” with “parallel 16” on all ddl
The degree of parallel is calculated by total size of objects / number of tables(based on filesize)..eg: total filesize=450gb;filesize=50gb; then number of dumpfiles is
kish@exdbx<>select 450/50 from dual; 450/50 ---------- 9
This means that depending on the number of tables we can use parallel servers which is 4 here!In datapump,no matter how many parallel server you use,oracle will utilize parallel degree per table count.If i have 4 tables,then using 16 parallel degree would make remaining 12 worker processes idle.
Note1: Degree of parallelism is dependent on the number of cores of CPU on your server.This server has 48 cores.If your server has less cores then you can give less parallel resource
Substitute “enable” to “enable novalidate” for the ddl command
Note2: ‘Enable novalidate’ can be used in case of emergency like business hours for faster import.After constraints are imported without validation,you can ‘enable validate’ leisurely so that there will not be application user complaint.
Eg: ~] vi script_ddl.sql
---set timing on; ---create index "test"."big1_ind" on "test"."big1"("row1","row2","row3" DESC) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "dev1" parallel 16;
---alter index "test"."big1_ind" parallel 16;
---alter table "test"."big1" add constraint "big1_pk" primary key("row11","row12","row13") using index "test"."big1_ind" enable novalidate; (by default it oracle validate constraints which take huge amount of time)<======
use “sqlplus” to import the constraints and indexes
---nohup sqlplus username/password@script_ddl.sql
Time taken to import without constraints and indexes:
22-DEC-18 22:10:33.991: Job “SYSTEM”.”SYS_IMPORT_TABLE_02″ successfully completed at Thu Dec 22 22:10:33 2018 elapsed 0 00:48:04
Time until index and constraint validated:
Table data without metadata like constraints and indexes get import faster.
Major time spent on validating constraints and importing indexes. we reduce the time by not validating the constraints and creating index with parallelism.
elapsed time: 06:30:00 hours
For faster import,we can enable novalidate to import first and then we can validate constraint.
After the process, validate constraints by replacing “enable validate” on the script_ddl.sql file if there are performance issues.
This approach saved 9 hours from the previous method.
2)Again do the import process by firing the import parallely for four tables with four putty sessions at the same time to parallelize the operation manually.Follow the same process 1) but split the tables.
this approach reduced 30 mins from previous method.
elapsed time: 06:01:00 hours
The methods shown are tested in a different environment.The performance varies based on the configuration of the system.There is no guarantee of speed.Please test it in POC before refreshing in production or development.
If you have big schemas,then follow the same approach with high parallel servers and make sure to split the import jobs with multiple chunks of dumpfile to make parallel more effective
eg: if a schema SCOTT has 50 tables,then you can follow the same method which has been used for 4 tables.But parallel should be equal to or less than 50 (or 48 or 32 …) based the CPU power you have.
If out of 50 tables only ten tables are big and remaining 45 tables are considerably small,then start import of 5 big tables separately with 5 different putty sessions excluding index,constraints and statistics and parallely import 45 small tables by schema import excluding the 5 big tables with 48 parallel degree without excluding index,constraint and statistics so that you save time and dont fall into serialization trap!
THANK YOU !! Hope you enjoyed the post