This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: native symlink


On 4/3/2013 11:29 AM, Corinna Vinschen wrote:
> On Apr  3 00:33, Jeffrey Altman wrote:
>> I tried representing AFS Symlinks using a Microsoft assigned Reparse
>> Point Tag.  The downside of following that approach was that Cygwin does
>> not properly handle Reparse Point Tags that it does not recognize.  By
>> discarding the RP attribute and preserving the other reparse point stat
>> information (timestamps, attributes, size, etc) it introduces data
>> corrupting behaviors into Cygwin applications.
> 
> You never explained why this happens and at which point in the code.  So
> far it was the right thing to do, and I'm pretty sure you know why.  I
> don't change that, unless you can show me where and when this leads to
> wrong behaviour.  I asked for these details but you didn't offer an
> explanation besides the fact itself so far.  And it would have been
> no problem to add a special handling for AFS in the cases where it went
> wrong.  I guess this is kind of a moot point, now that you converted
> to native symlinks, but this had to be said.

I highlighted the bad section of code in the patch Christopher commented
on.  The code in question is:

path.cc symlink_info::check_reparse_point() final else block.

/* Maybe it's a reparse point, but it's certainly not one we recognize.
   Drop REPARSE attribute so we don't try to use the flag accidentally.
   It's just some arbitrary file or directory for us. */

   fileattr &= ~FILE_ATTRIBUTE_REPARSE_POINT;

As a result of this change, the timestamps and size of the reparse point
are reported to the application instead of the reparse point target's
stat information.

There are two options that I believe could be implemented here in place
of discarding the reparse point attribute:

1. Open the reparse point target and read its stat information.  Replace
the stat information of the reparse point with that of the target.

2. Open the reparse point target.  Perform a FileNameInformation query
to obtain the actual path of the target.  Replace the reparse point with
a virtual symlink using the FileNameInformation response as the target path.

I believe the 2nd option is the better of the two because it is possible
for a file system driver to implement CreateFile followed by
FileNameInformation queries without requiring that the target be
accessed unless the target is required to determine authorization.

In either case, appropriate checks for reparse points as targets of
reparse points and recursion must be implemented.

>> However, there is a very clear test that can be applied to determine
>> when Microsoft Symlinks should be generated in preference to Cygwin
>> symlinks:
>> [...]
>> There are probably additional approaches but none of them are clean and
>> transparent.  The second two involved significantly more complexity then
>> maintaining support within Cygwin's path.cc and could potentially
>> introduce incompatibilities with future Cygwin path.cc changes.
> 
> No test could be sufficient to switch on native symlinks automatically.
> 
> We were all very excited when it became clear that Microsoft introduced
> native symlinks on NTFS with Vista, and I was early on playing around
> with them and to try integrating them into Cygwin.  My local testcase
> uses DeviceIoControl to workaround any restrictions imposed by
> CreateSymbolicLink.  And I'm still playing around with them every now
> and then, thinking that we could use them, but the restrictions are
> disappointing me each time anew.
> 
> There are some downsides to native symlinks which make them hard to
> justify, if not downright useless in a POSIX environment.
> 
> - The inability of normal users to create symlinks by default.
> 
>     This can be worked around by changing the policy, but it's still a
>     PITA.  Normal users don't know about the policy, some of them don't
>     even have the "Local Security Policy" MMC snap in.  Even in a
>     corporate environment it requires to change the policy settings and
>     we all know how admins don't like to *soften* a policy.  But let's
>     say we can help along with a FAQ entry.

Working around the policy by issuing DeviceIoControl() operations is
possible but will open another can of worms.  I do not believe that
Cygwin should provide a backdoor.

> - Native symlinks are marked as file or directory.
> 
>     This has been added clearly for the benefit of Windows Explorer.
>     But it's a PITA as well because it destroys interoperability.  It's
>     common that POSIX symlinks are created before the target exists.
>     How on earth should the symlink(2) function know if the target is
>     supposed to be a dir or a file.  But Explorer as well as CMD will do
>     the wrong thing if the symlink is using a non-matching dir/file
>     marker.

The target of the symlink must be resolved and the
FILE_ATTRIBUTE_DIRECTORY flag set appropriately for all
GetFileAttribute[Ex] and Find*File[Ex] operation responses.  It is the
inclusion of stat information in the directory enumeration output which
mandates this behavior.

Given the inclusion of stat information and the fact that reparse points
can refer to objects that have a very high latency to access, it is a
reasonable design choice to require the reparse point expose the
FILE_ATTRIBUTE_DIRECTORY bit that the target will have.

I have come to the conclusion that given the need to provide stat
information in the directory enumeration, the implementation of reparse
points is sane.  The implementation permits directory enumeration to be
fast by not requiring the target objects be opened.   For example, a
reparse point to an object stored in a HSM may take hours to load.
Another is a reparse point to a backup snapshot which may require
extended time to restore before it can be accessed.

> - Only Windows paths are stored.
> 
>     In a POSIX env a symlink created by POSIX tools should point to a
>     POSIX path.  For instance, mount points change the fact where a
>     symlink actually points to and the symlink should not still
>     magically work afterwards.
> 
>     But, hey, native symlinks store the path twice, the SubstituteName
>     and the PrintName.  Shouldn't it be possible to store the Windows
>     path in one of them and the POSIX path in the other?  Yes and no.
>     It's possible to write into these members whatever you like, but for
>     some weird reason, both members have to be Windows paths to work for
>     native Windows tools.
> 
>     But we could store the POSIX path with backslashes, thus working
>     around the issue, no?  No.  An absolute path starting with a
>     backslash is possible, but the Windows tools will evaluate it as
>     root-relative to the current drive.  cd to another drive in cmd,
>     and interop is broken again.

It took me a long time to understand how these fields are used.  The
field names were poorly chosen.

The SubstituteName is a path that is used as-is by the Multiple UNC
Provider to redirect a request to the correct file system for
processing.  This is always an absolute path.  In other words, this is
the kernel version of the path.

The PrintName is a user-land UNC path or relative path which is not only
intended for user readability but also for user-land tools such as
robocopy to use when moving a symlink from one location to another.

When storing absolute paths, you must store them as absolute paths from
the device namespace not from the drive namespace.  For example, here is
a symlink stored in AFS which refers to C:\.

[\\afs\yfs\user\jaltman]junction local_disk

\\afs\yfs\user\jaltman\local_disk: SYMBOLIC LINK
   Print Name     : c:\
   Substitute Name: \??\c:\

And here is a symlink stored in c:\ which refers to the root of AFS.

C:\afs: SYMBOLIC LINK
   Print Name     : \\afs\all
   Substitute Name: \??\UNC\afs\all

The output is from the SysInternal's tool, junction v1.06.

Note the inclusion of \??\UNC prior to UNC references and \??\<drive>:\
for DOS device name references.   The DOS device maps to a volume name
and you could provide a link to the volume instead of the DOS device if
that was desirable.

Does this help?

> - Remote and local symlinks may behave different in different environments.
> 
>     Apart from the security policy, symlinks are also affected by an
>     fsutil setting.  The admins can decide if symlinks work at all, or
>     if symlinks don't work depending on their own location and the
>     location of the target they are pointing to (local->local, local->remote,
>     remote->local, remote->remote)
> 
>     So it's possible that local->local symlinks can be resolved while
>     opening local->remote symlinks simply fail with ugly status codes.
>     How on earth do you integrate that reliably into an environment in
>     which a symlink is a plain and simple thing, readable and writable
>     by everyone, whereever located, just apart from parent dir permissions.
> 
>> As I see it, as flawed as Microsoft Symlinks are they are the common
>> interface that enables mixed applications to communicate with one
>> another.  As such, where they can be used, they should be used.  What is
>> the point of cross-platform support if mixed platform applications
>> cannot transparently share the data?
> 
> Cygwin is a POSIX environment in the first place.  Interop is fine,
> but if it collides with POSIX, we're clearly favoring POSIX.

Understood.  Which is why I haven't suggested that cygwin symlinks be
replaced by microsoft symlinks in cases where they cannot be used safely.

> Having said that.
> 
> Chris and I had a private discussion (not the first one on the subject!)
> and we're willing to revisit the use of native symlinks in Cygwin but
> it will be a while before that happens.  A change to the path handling
> code like this is not something that we'd consider for 1.7.18 which is
> long overdue anyway.

Understood.

> What I will do is to add a new CYGWIN environment variable option, along
> the lines of the winsymlinks option(*), or, which is very likely the
> more elgant solution, a mount option, which will result in trying to
> create native symlinks first, and a Cygwin symlink only if creating
> a native symlink failed.  That should help you along.

An environment variable should address James' use case.  For creating
Symlinks in AFS a test for File System name "AFSRDRFsd" in the volume
information can be used as an indicator that DOS SYSTEM attribute is not
supported.

> 
> Corinna
> 
> 
> (*) In your blog you were musing why Cygwin supports lnk files but
> not native symlinks.  Here's the answer:  lnk files support using 
> POSIX paths.

Whereby POSIX paths you mean specifying a path with forward slashes and
without also indicating the type of the object.

Thank you.

Jeffrey Altman





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]