Pages: [1]
  Print  
Author Topic: Comparison between various compression formats.  (Read 1632 times)
rork
Guest
« on: January 11, 2011, 01:31:46 pm »

'I made a comparison between 5 compression formats. I tested the most common formats from my point of view: gzipped tar, zip and RAR. Besides that I also tested two formats I had high expectations about: 7z and bzipped tar. From these formats zip is the oldest originating from 1989 and 7 zip is the youngest originating from 1999, the other formats date from 1992 to 1996. RAR is the only proprietary file format.

Because different filetypes compress different I tried compression of 4 types of files: A mp4 movie, a linux distribution cd image, a folder with a lot of documents (Word documents, excelsheets, pdf's images (jpg) etc) and my scripts folder which contains lots of plain text script and logfiles.

Tools
I used commandline tools for compression but graphical tools are also available. On KDE Ark supports all formats if the appropriate commandline tools are installed. On Windows you can use 7-Zip or the non open source WinRAR which despite it's name also supports the other formats.

Gzipped tar files can be made with tar which is installed by default on linux systems. Tar concatenates the files and compresses the tar file if requested. The following switches are used: -c to create the file, -v for verbosity, -z for gzipping it and -f to specify the filename. Subdirectories are included in the archive by default.

tar -cvzf archive.tar.gz file1 file2

A bzipped tar file is also made with tar. Instead of the -z switch -j is used.

tar -cvjf archive.tar.bz2 file1 file2

For 7z support the package p7zip-full has to be installed. 7z needs a command (a) to tell it to add files to an archive and a switch (-r) for including subdirectories.

7z a -r archive.7z file1 file2

RAR also isn't installed by default, the package name is rar. The command is similar to 7z but rar has a lot more options. With the -m 1-5 switch you can change the compression ratio, I used -m 3 which is the default value. I've also tested -m 5 on the scripts directory which resulted in a compression factor of 9.4% rather then 12.4%.

rar a -r archive.rar file1 file2

Zip is installed by default on linux systems and windows systems. Zip requires the -r switch to include subdirectories.

zip -r archive.zip file1 file2

Movie
TypeSize (bytes)Size (%)Time (m)
raw988288114100.0
tar.gz987812224100.01.04
zip987811646100.01.06
rar990054726100.215.55
7z1000427177101.25.56
tar.bz2991963626100.46.50

The movie compressed very bad, compression has to be looked for in kilobytes and less then tenth of a percent. Zip gave the best compression rate with 100.0% filesize. RAR, 7z and bzip2 even increased the filesize.

ISO
TypeSize (bytes)Size (%)Time (m)
raw729368576100.0
tar.gz71934204098.60.48
zip71934121598.60.48
rar71571871398.111.38
7z71755759098.413.17
tar.bz272545054099.55.39

Compressed sizes are close to the raw size. RAR performed best with a 98.1% filesize.

Documents
TypeSize (bytes)Size (%)Time (m)
raw586378627100.0
tar.gz51933047988.61.25
zip51949672788.60.40
rar51726288088.26.45
7z49750604884.86.15
tar.bz251337613187.66.34

7z performed best with a filesize of 84.8%.

Scripts

TypeSize (bytes)Size (%)Time (m)
raw236930461100.0
tar.gz3228250413.60.14
zip3313522614.00.13
rar2895991812.20.53
7z218121219.22.00
tar.bz2218374149.22.16

7z performed best with a filesize of 9.2%, bzip2 has a similar compression rate but loses on a couple of kilobytes.

Conclusion
7z had the best compression ratio followed by bzip2, rar, gzip and zip. Gzip and zip were a lot faster then 7z and bzip. Rar was intermediately fast with the scripts but slowest when compressing the documents. I came to these results with standard settings, using additional switches might improve compression rate and speed.

The mp4 movie and linux distribution iso were hardly compressed, due to encoding and contents these are probably pretty compressed itself. For these files archive would mainly function as a container to distribute multiple files. The compression of the ISO may be better if an image of a data cd is used.

The documents compressed up to 15%, this could be because documents are already compressed when saved, also the photos were jpg encoded which is a rather small filetype also but due to the different filetypes some compression could be done. The scripts directory compressed very well up to 9.2% of the original file.'
Logged
Pages: [1]
  Print  
 
Jump to: