This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: deadlock/hang when calling close()/connect() at the same time on the same socket


Hi Ed,

On Jun 12 11:39, Ed Martin wrote:
> It appears that opening a socket, and then calling connect() on it and
> from another thread calling close() on the socket while it's still in
> connect() results in a deadlock. Furthermore in this state the thread
> cannot be canceled and connect() will never return (my testcase uses
> pthread_cancel(), but it happens without that as well)

Thanks for the testcase.

I tried this with Cygwin 2.0.4 on 32 bit Windows 7 and 64 bit Windows
8.1.  In both cases I get

  $ ./bug
  Test started
  connect: Bad address
  no bug

The "Bad address" isn't exactly right.  I changed that to return the
same error codes as if shutdown has been called.  Note that there's no
hang.  I can't reproduce a deadlock.

The difference is, on Linux connect will continue to hang until the call
to pthread_cancel, while on Cygwin it will return with an error message
after you call close.  I don't see that this behaviour can be emulated
under Cygwin due to the way Windows socket event handling works (which
is what Cygwin uses under the hood).  Anyway, either way should be fine
since it unblocks the connect call.

However, calling close on a descriptor while performing a system call
on this descriptor in another thread is undefined.  Even the Linux
man page for close warns:

  It  is  probably  unwise to close file descriptors while they may be in
  use by system calls in other threads in the same process.  Since a file
  descriptor  may  be reused, there are some obscure race conditions that
  may cause unintended side effects.

See, e.g http://linux.die.net/man/2/close

In Cygwin the problem is that a close() call also removes objects and
datastructures connected to the descriptor.  Calling close on a
descriptor in one thread ultimately lets other, still-running system
calls in other threads access wrong memory or synchronization objects.

What you should do, in theory, is to to use nonblocking sockets in
conjunction with select, or signal the blocking thread so connect
returns with EINTR, and only then close the socket.

The problem with the latter approach is that it won't work with socket
functions in Cygwin up to 2.0.4 :(

The reason is that SA_RESTART is enforced in all threads not being the
main thread for some reason.  The code in question pretty much looks
like outdated behaviour.

I applied patches to fix or workaround the problems outlined above
and uploaded new developer snapshots to https://cygwin.com/snapshots/
Please give'em a try.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgpK7VI2_6724.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]