This is the mail archive of the cygwin-apps mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ITP: mingw-xz (1.7 only)


Christopher Faylor wrote:
> You've said what you want to do but you haven't said *why*.  What is the
> advantage of further complicating a complicated program by adding .xz
> support, espcially given the lack of adoption of the lzma format?

Even I haven't adopted the .lzma format for cygwin packages, because I
still (feel the need to) support cygwin-1.5, whose setup is a Dead
Parrot, and which doesn't include the support for .lzma.  So, rather
than complicated my life by making 1.5 packages as .tar.bz2, and 1.7
packages as .tar.lzma, I'm still using .tar.bz2 for both -- until 1.7
goes gold and I can drop 1.5 like a bad habit.

Then I'll switch to (one of) .lzma|.xz.  Yaakov has expressed interest
in making the default behavior of cygport use .lzma -- and I explicitly
waved him off for the reasons above (plus, I was hoping to skip .lzma
and go straight to .xz, but we can't do THAT until setup supports it).

Note that there is no rush to try to get this done before 1.7. It should
be transparent to end users: eventually setup.exe would support it. Then
a few packages might be uploaded to release-2 in .xz form (and hopefully
no one even notices!).  Then a few more.  Finally, a month or two or
three later, cygport switches its default.  This ITP is just the first
step in a methodical (read: slow, no rush) process.

The reason to switch from .tar.bz2 to .lzma or .xz are
  1) better compression; smaller, faster downloads, etc.
  2) faster decompression for equivalent uncompressed size (like .gz,
but unlike .bz2 IIRC, .lzma/.xz is an non-symmetric algorithm:
compression is significantly harder, so that decompression is
significantly easier, faster).  Hopefully this means that setup will be
able to install packages faster, but I'm not sure if the compute-bound
of the compression algorithm dominates, or if the io-bound
write-to-the-disk + mess-with-the-ACLs does.  But faster decompression
certainly can't HURT.

Now, I don't think adding support for .xz really complicates setup.exe
much at all; by LOCC it probably simplifies it by moving the codec to an
external library.  Since the xz library (liblzma) supports both the old
.lzma format and the newer .xz format -- AND the decoder interface is
quite similar to that of the liblzmdadec files I used in setup
originally -- I'd rename
  compress_lzma.h   --> compress_xz.h
  compress_lzma.cc  --> compress_xz.cc
and slightly modify them to use the liblzma functions that are cognate
with the ones I was using from
  lzma-sdk/LzmaDec.h
  lzma-sdk/LzmaDec.c
  lzma-sdk/Types.h
There are a few simplifications to the "recognize the file as an
.lzma/.xz" procedure, too, by offloading to the library IIRC. Also, I'd
remove those lzma-sdk files.  Then some Makefile.am changes.

======== Aside:
I have another reason as well, but it's not important for the cygwin
list. I'd like to investigate adapting cygwin's setup program for use
with the MinGW project. Their distribution is soon going to get very
unwieldy, what with separate packages for the libiconv DLL, PPL dll,
GMP, MPFR, Cloog, etc, not to mention all the "normal"
separately-distributed pieces like gfortran, g++, java.  Plus, the
DLLized runtime libraries will all be distributed in separate packages.

They *need* something better -- with decent dependency tracking -- than
they have.  However, there is a wrinkle: none of their packages can use
symlinks, because (at least until Vista, and maybe not even then) native
apps can't grok them.  Therefore, whenever symlinks would be used on
sane systems, mingw packages use copies (I'd have chosen hardlinks, but
that wouldn't have worked on non-NTFS disks; I suppose
running-from-a-FAT-thumb-drive is a reasonable configuration for mingw
to try to support).

Now, neither bzip2 nor gz are very efficient when "compressing" multiple
copies of large files within a tarball, because their dictionary size is
fairly small.  However, .lzma/.xz does an outstanding job -- almost as
efficient as if the tarball actually included real hardlink references
for the copies.

So, when I get around to trying to adapt setup for MinGW, I'd like it to
support .lzma/.xz.  But, the other reasons outlined above that provide
benefit to cygwin proper are still valid.
======== End Aside

Ok, so that's why I want to support .lzma/.xz.  Since we already have
.lzma, why do I want to add .xz?

Because the .lzma file format itself is a hack.  It doesn't support
proper identification nor internal integrity checks (this may be less of
an issue for cygwin/setup, as we have external integrity checking via
md5sum).  It has no reasonable identity/header bytes at the front of the
file -- the best you can do is "this kinda looks like it might be an
.lzma file, let's try to decode it and hope it works".  Plus,
development of the .lzma format/encoder/decoder software is dead -- .xz
IS .lzma2, where all the development, and support, is available.

Some projects:
http://ftp.gnu.org/pub/gnu/m4/
introduced .lzma tarballs for their source distributions last year; and
are now switching to .xz instead of .lzma.

So, for all intents and purposes, .lzma is actually a *deprecated*
compression format.  If we're going to support packages using one of
these variants of the LZMA algorithm for compression, it should be the
new, non-deprecated, one.

--
Chuck


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]