COMPRESSION ALGORITHMS (BZIP2 VS GZIP)
BZIP2 – lossless,old,slow,high compression rate,more CPU bound
GZIP – lossless,present,fast,low compression rate,less CPU bound
BZIP2 compression algorithm:
It is slow old compression algorithm which is a lossless type where data is not lost during compression and decompression. It has the highest compression rate compared to other algorithms. But this is not always true. This statement of compression rate sometimes becomes bogus when we compress a large tar file.
Below is an example which shows the high compression rate of bzip2 algorithm.
[root@exdbadm01 oracle]# du -sh test.dmp
81M test.dmp
[root@exdbadm01 oracle]# du -sh test.dmp.bz2
20M test.dmp.bz2
[root@exdbadm01 oracle]# time bzip2 -d test.dmp.bz2
real 0m23.836s
user 0m18.273s
sys 0m1.120s
Below is an example of compression of a tar file where compression is not effective with bzip2
[oracle@exdbadm01 data]$ du -sh database.tar
2.5G database.tar
[oracle@exdbadm01 data]$ time bzip2 -z database.tar
real 19m31.744s
user 16m46.855s
sys 0m18.529s
[oracle@exdbadm01 data]$ du -sh database.tar.bz2
2.4G database.tar.bz2
[oracle@exdbadm01 ~]$ time bzip2 -d database.tar.bz2
real 10m8.992s
user 8m28.808s
sys 0m22.101s
[root@exdbadm01 oracle]# du -b database.tar.bz2
2555887227 database.tar.bz2
CPU consumption of BZIP2 algorithm is peak
[oracle@exdbadm01 ~]$ ps -ef|grep bzip2
root 14408 3138 0 21:41 pts/0 00:00:04 bzip2 -z test.dmp
root 22188 3138 0 23:40 pts/0 00:00:00 bzip2 -z test.dmp
root 22291 3138 88 23:42 pts/0 00:06:48 bzip2 -d database.tar.bz2
oracle 22759 5673 0 23:49 pts/1 00:00:00 grep bzip2
[oracle@exdbadm01 ~]$ ps -eo user,pcpu,pid |grep 22291
root 88.1 22291
[oracle@exdbadm01 ~]$ ps -eo user,pcpu,pid |grep 22291
root 88.0 22291
GZIP compression algorithm:
GZIP is a fast compression algorithm which is widely used as matter of file compression. Compared to bzip2, gzip has less compression ratio. This also comes under lossless compression type.
[root@exdbadm01 oracle]# time gzip test.dmp
real 0m11.136s
user 0m9.849s
sys 0m0.228s
[root@exdbadm01 oracle]# du -sh test.dmp.gz
29M test.dmp.gz
[root@exdbadm01 oracle]# time gzip -d test.dmp.gz
real 0m6.080s
user 0m1.436s
sys 0m0.264s
Below is an example of gzip of a tar file which has both better compression speed and ratio compared to bzip2. This is really weird and disprove the compression size property of bzip2
[root@exdbadm01 oracle]# du -sh database.tar
2.5G database.tar
[root@exdbadm01 oracle]# time gzip database.tar
real 5m10.489s
user 3m48.818s
sys 0m12.417s
[root@exdbadm01 oracle]# du -sh database.tar.gz
2.4G database.tar.gz
[root@exdbadm01 oracle]#
[root@exdbadm01 oracle]# time gzip -d database.tar.gz
real 3m3.695s
user 0m29.742s
sys 0m10.673s
[root@exdbadm01 oracle]# du -b database.tar.gz
2546273526 database.tar.gz
gzip utilizes less CPU compared to bzip2 which is quite good
[oracle@exdbadm01 ~]$ ps -ef|grep gzip
root 16978 3138 0 22:20 pts/0 00:00:00 gzip -d test.dmp.gz
root 22862 3138 73 23:50 pts/0 00:01:00 gzip database.tar
oracle 22951 5673 0 23:52 pts/1 00:00:00 grep gzip
[oracle@exdbadm01 ~]$ ps -eo user,pcpu,pid |grep 22862
root 73.9 22862
We can also use tar but this utility can only archive the files not compress them
[root@exdbadm01 oracle]# time tar -cvf test.dmp.tar test.dmp >/dev/null 2>&1
real 0m0.282s
user 0m0.012s
sys 0m0.268s
[root@exdbadm01 oracle]# du -sh test.dmp.tar
81M test.dmp.tar