COMPRESSION ALGORITHMS (BZIP2 VS GZIP)

COMPRESSION ALGORITHMS (BZIP2 VS GZIP)

BZIP2 – lossless,old,slow,high compression rate,more CPU bound

GZIP – lossless,present,fast,low compression rate,less CPU bound

BZIP2 compression algorithm:

It is slow old compression algorithm which is a lossless type where data is not lost during compression and decompression. It has the highest compression rate compared to other algorithms. But this is not always true. This statement of compression rate sometimes becomes bogus when we compress a large tar file.

Below is an example which shows the high compression rate of bzip2 algorithm.

[root@exdbadm01 oracle]# du -sh test.dmp
81M     test.dmp

[root@exdbadm01 oracle]# du -sh test.dmp.bz2
20M     test.dmp.bz2

[root@exdbadm01 oracle]# time bzip2 -d test.dmp.bz2

real    0m23.836s
user    0m18.273s
sys     0m1.120s

Below is an example of compression of a tar file where compression is not effective with bzip2

[oracle@exdbadm01 data]$ du -sh database.tar
2.5G    database.tar

[oracle@exdbadm01 data]$ time bzip2 -z database.tar

real    19m31.744s
user    16m46.855s
sys     0m18.529s

[oracle@exdbadm01 data]$ du -sh database.tar.bz2
2.4G    database.tar.bz2

[oracle@exdbadm01 ~]$ time bzip2 -d database.tar.bz2

real    10m8.992s
user    8m28.808s
sys     0m22.101s


[root@exdbadm01 oracle]# du -b database.tar.bz2
2555887227      database.tar.bz2

CPU consumption of BZIP2 algorithm is peak

[oracle@exdbadm01 ~]$ ps -ef|grep bzip2
root     14408  3138  0 21:41 pts/0    00:00:04 bzip2 -z test.dmp
root     22188  3138  0 23:40 pts/0    00:00:00 bzip2 -z test.dmp
root     22291  3138 88 23:42 pts/0    00:06:48 bzip2 -d database.tar.bz2
oracle   22759  5673  0 23:49 pts/1    00:00:00 grep bzip2
[oracle@exdbadm01 ~]$ ps -eo user,pcpu,pid |grep 22291
root     88.1 22291
[oracle@exdbadm01 ~]$ ps -eo user,pcpu,pid |grep 22291
root     88.0 22291

GZIP compression algorithm:

GZIP is a fast compression algorithm which is widely used as matter of file compression. Compared to bzip2, gzip has less compression ratio. This also comes under lossless compression type.

[root@exdbadm01 oracle]# time gzip test.dmp

real    0m11.136s
user    0m9.849s
sys     0m0.228s

[root@exdbadm01 oracle]# du -sh test.dmp.gz
29M     test.dmp.gz


[root@exdbadm01 oracle]# time gzip  -d test.dmp.gz
real    0m6.080s
user    0m1.436s
sys     0m0.264s

Below is an example of gzip of a tar file which has both better compression speed and ratio compared to bzip2. This is really weird and disprove the compression size property of bzip2

[root@exdbadm01 oracle]# du -sh database.tar
2.5G    database.tar
[root@exdbadm01 oracle]# time gzip database.tar

real    5m10.489s
user    3m48.818s
sys     0m12.417s

[root@exdbadm01 oracle]# du -sh database.tar.gz
2.4G    database.tar.gz
[root@exdbadm01 oracle]#

[root@exdbadm01 oracle]# time gzip -d database.tar.gz

real    3m3.695s
user    0m29.742s
sys     0m10.673s


[root@exdbadm01 oracle]# du -b database.tar.gz
2546273526      database.tar.gz

gzip utilizes less CPU compared to bzip2 which is quite good

[oracle@exdbadm01 ~]$ ps -ef|grep gzip
root     16978  3138  0 22:20 pts/0    00:00:00 gzip -d test.dmp.gz
root     22862  3138 73 23:50 pts/0    00:01:00 gzip database.tar
oracle   22951  5673  0 23:52 pts/1    00:00:00 grep gzip
[oracle@exdbadm01 ~]$ ps -eo user,pcpu,pid |grep 22862
root     73.9 22862

We can also use tar but this utility can only archive the files not compress them

[root@exdbadm01 oracle]# time tar -cvf test.dmp.tar test.dmp >/dev/null 2>&1

real    0m0.282s
user    0m0.012s
sys     0m0.268s

[root@exdbadm01 oracle]# du -sh test.dmp.tar
81M     test.dmp.tar

Leave a Reply

%d bloggers like this: