This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance optimization in av::fixup - use buffered IO, not mapped file


On 12/12/2012 9:04 AM, Eric Blake wrote:
On 12/12/2012 06:22 AM, Corinna Vinschen wrote:
On Dec 12 06:11, Eric Blake wrote:
On 12/11/2012 08:13 PM, Daniel Colascione wrote:
Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?


posix_fallocate is not allowed to generate sparse files, due to the following restriction: "If posix_fallocate() returns successfully, subsequent writes to the specified file data shall not fail due to the lack of free space on the file system storage media." See http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html Therefore only ftruncate and lseek potentially generate sparse files. On second thought, I don't quite understand what you mean by "use posix_fallocate() as a means of identifying a file that must not be sparse". Can you explain, please?
Since we know that an executable must NOT be sparse in order to make it
more efficient with the Windows loader, then gcc should use
posix_fallocate() to guarantee that the file is NOT sparse, even if it
happens to issue a sequence of lseek() that would default to making it
sparse without the fallocate.

In other words, I'm proposing that we delete nothing from cygwin1.dll,
and instead fix the problem apps (gcc, emacs unexec) that actually
create executables, so that the files they create are non-sparse because
we have proven that they should not be sparse for performance reasons.
Meanwhile, all non-executable files (such as virtual machine disk
images, which are typically much bigger than executables, and where
being sparse really does matter) do not have to jump through extra hoops
of using ftruncate() when plain lseek() would do to keep them sparse.
Does gcc/ld/whatever know the final file size before the first write?

You have to posix_fallocate the entire file before any write that might create a hole, because the sparse flag poisons the loader, and persists even if all gaps are later filled. For example, if I invoke the following commands:

cp --sparse=always $(which emacs-nox) sparse
cp --sparse=never $(which emacs-nox) dense
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval '(kill-emacs)'; done
cp --sparse=never dense sparse
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval '(kill-emacs)'; done
du dense sparse


The relevant output is:
sparse
real    0m1.791s

dense
real    0m0.606s

sparse
real    0m3.158s

dense
real    0m0.081s

16728   dense
16768   sparse

Given that we're talking about cygwin-specific patches for emacs and binutils anyway, would it be better to add a cygwin-specific fcntl call that clears the file's sparse flag?


Ryan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]