This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: select() hangs sometimes, for TCP connections


I finally found the culprit.
It seems to be a Tcl extension which was badly
built.

The DB replication scripts are written in Tcl.
For the communication between hosts, the extension
Tcl-DP is used, with TCP socket channels.

The extension is provided as C sources. So I had
to build a DLL. The window$ version was intended to
be built with VC++.
As I didn't have access to this compiler, my purpose
was to make it with gcc. Nice, isn't it?
But, as there is always a but, it was my first
use of autotools, so I wasn't always aware of what
I was doing. I fixed some bugs in the sources,
made some adaptations for the integration with
Tcl sources, made a lot of combinations in the
autotools configuration to get a DLL loadable
with the tcl command "package require dp".
As there was too many errors with a basic build,
the -no-cygwin version was mandatory.
Unfortunately, before to find a working solution
with the -shared option of gcc, I was dealing with
libtool. It seems that libtool introduces a
dependency with cygwin1.dll with its way to impose
the entry point (-Wl,-e,...). And because of the
-no-cygwin option, msvcrt.dll is used.
IIRC, mixing cygwin1 and msvcrt at the same time
is not advised (eventhough some pretend to succeed
in building such an executable).
I made a version with VC++, and another one
with gcc, without libtool or unnecessary options.
These two leave the system stable.
I can't say for sure the problem is solved, just
that the system is more stable. I ran the replication
more than 570 and 220 times.

Here is the dependencies for a working version:

D:/cygwin/usr/share/tcl8.4/dp4.0/win/dp40.dll
  D:\cygwin\bin\tcl84.dll
    C:\WINNT\System32\ADVAPI32.DLL
      C:\WINNT\System32\ntdll.dll
      C:\WINNT\System32\KERNEL32.dll
      C:\WINNT\System32\USER32.dll
        C:\WINNT\System32\GDI32.dll
      C:\WINNT\System32\RPCRT4.dll
    D:\cygwin\bin\cygwin1.dll
  C:\WINNT\System32\msvcrt.dll
  C:\WINNT\System32\WS2_32.DLL
    C:\WINNT\System32\WS2HELP.dll

And for the bad version:

D:/cygwin/usr/share/tcl8.4/dp4.0/win/dp40.dll.gcc.ko
  D:\cygwin\bin\cygwin1.dll
                ^^^^^^^
    C:\WINNT\System32\ADVAPI32.DLL
      C:\WINNT\System32\ntdll.dll
      C:\WINNT\System32\KERNEL32.dll
      C:\WINNT\System32\USER32.dll
        C:\WINNT\System32\GDI32.dll
      C:\WINNT\System32\RPCRT4.dll
  D:\cygwin\bin\tcl84.dll
  C:\WINNT\System32\msvcrt.dll
                    ^^^^^^^
  C:\WINNT\System32\WSOCK32.DLL
    C:\WINNT\System32\WS2_32.dll
      C:\WINNT\System32\WS2HELP.dll

The imported functions of cygwin1.dll are:
abort
cygwin_detach_dll
cygwin_internal
dll_dllcrt0
pthread_atfork
calloc
malloc
realloc
free

--- Patrick Samson wrote:
> Problem: sometimes select() doesn't return.
> 
> Context: I run a DB replication scenario,
> with cron, everything 5 mn. There is no change in
> the
> DB, so the scenario is always the same. Most of the
> time, it works. But eventually, after some time (may
> be some minutes or hours), a process A keeps waiting
> forever in select() for a response on a TCP socket.
> With gdb I can see that the other end B returned in
> its
> ReadCommand() function, meaning it has send its
> response and waits for a new command, so this side
> should be OK.
> 
[snip]


__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]