This is the mail archive of the
cygwin-talk
mailing list for the cygwin project.
RE: Compressing hippos really fast
- From: "Phil Betts" <Phil dot Betts at ascribe dot com>
- To: <cygwin-talk at cygwin dot com>
- Date: Tue, 4 Mar 2008 18:35:13 -0000
- Subject: RE: Compressing hippos really fast
- Reply-to: The Vulgar and Unprofessional Cygwin-Talk List <cygwin-talk at cygwin dot com>
Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM::
> Hi,
>
>
> does anybody know about a compression tool which is above all capable
> of compressing really fast? The compression ratio is only a mild
> concern, it's rather more important that the tool is not acting as
> bottleneck when compressing files which are badly compressable.
> Unfortunately
> the usual compression tools are rather interested in a good
> compression than in a good speed when streaming lots of data.
>
> Here are a couple of disks which are supposed to be backed up. Right
> now this is done using a script which creats tar.gz archives of all
> disks. Some of this disks are quite big and contains many files which
> are already compressed. It turns out that gzipping these disks is
> *the* bottleneck when backing up.
>
> When not compressing, tar creates archives with 37MB/s. When creating
> tar.gz archives, the compression takes so much time that the speed
> goes down to 6MB/s. Using gzip --fast doesn't help much. bzip is a
> lot slower than gzip.
>
> So the question is, does anybody know a compression tool which can be
> used with tar, which doesn't slow down the backup by a factor of 6?
> It would be cool to have a tool which is as quick as the hardware
> compression used in modern tape drives, but that's just dreaming...
>
>
> May the hippos be with you,
> Corinna
I had this problem ages ago. My solution was to run two backups.
One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ],
*.bz2, *.zip etc, and one for the remainder which was piped
through gzip.
Even a fast compression algorithm is just wasting time trying to
compress previously compressed files, and as most compressors work
on some variant of Lempel Ziv, if they're fed a mixture of
compressible and incompressible data, the incompressible data
flushes the dictionary making the compression of the compressible
part worse.
Phil