This is the mail archive of the cygwin-talk mailing list for the cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Compressing hippos really fast


Corinna Vinschen wrote on Tuesday, March 04, 2008 3:43 PM::

> Hi,
> 
> 
> does anybody know about a compression tool which is above all capable
> of compressing really fast?  The compression ratio is only a mild
> concern, it's rather more important that the tool is not acting as
> bottleneck when compressing files which are badly compressable. 
> Unfortunately 
> the usual compression tools are rather interested in a good
> compression than in a good speed when streaming lots of data.
> 
> Here are a couple of disks which are supposed to be backed up.  Right
> now this is done using a script which creats tar.gz archives of all
> disks.  Some of this disks are quite big and contains many files which
> are already compressed.  It turns out that gzipping these disks is
> *the* bottleneck when backing up.
> 
> When not compressing, tar creates archives with 37MB/s.  When creating
> tar.gz archives, the compression takes so much time that the speed
> goes down to 6MB/s.  Using gzip --fast doesn't help much.  bzip is a
> lot slower than gzip.
> 
> So the question is, does anybody know a compression tool which can be
> used with tar, which doesn't slow down the backup by a factor of 6? 
> It would be cool to have a tool which is as quick as the hardware
> compression used in modern tape drives, but that's just dreaming...
> 
> 
> May the hippos be with you,
> Corinna

I had this problem ages ago.  My solution was to run two backups.  
One uncompressed including only files globbing *.gz, *.t[bg]z, *.[zZ], 
*.bz2, *.zip etc, and one for the remainder which was piped 
through gzip.

Even a fast compression algorithm is just wasting time trying to 
compress previously compressed files, and as most compressors work 
on some variant of Lempel Ziv, if they're fed a mixture of 
compressible and incompressible data, the incompressible data 
flushes the dictionary making the compression of the compressible 
part worse.

Phil


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]