This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Race condition spawning childs/pipe stuff?
- From: "Michelsen, Robert" <Robert dot Michelsen at vitk dot vossloh dot com>
- To: <cygwin at cygwin dot com>
- Date: Wed, 19 Oct 2005 10:43:26 +0200
- Subject: Race condition spawning childs/pipe stuff?
Hello,
I seem to encounter a race condition when running large recursive build
processes (make).
Occasionally, the build process hangs with a spawned child (sh.exe)
eating with 100% user cpu.
It seems the build command itself (spawned make) finished but
child/parent? shell doesnt exit.
When i kill sh.exe manually, the (recursive) build process continues and
finishes.
I suspect some kind of race condition somewhere in pipe stuff.
The condition itself is not reproducable.
Cygwin dll is: 1.5.19, api ver: 0.138, build date: 2005-10-03 13:32
I attached gdb to process and examined threads:
----------- snip ----
$ ./gdb
GNU gdb 6.3.50.20050926
....
(gdb) attach 3048
Attaching to process 3048
[Switching to thread 3048.0xca8]
(gdb) info threads
* 3 thread 3048.0xca8 0x7c911231 in ntdll!DbgUiConnectToDbg () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
2 thread 3048.0xd04 0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
1 thread 3048.0xf90 0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
(gdb) thread 1
[Switching to thread 1 (thread 3048.0xf90)]#0 0x7c91eb94 in
ntdll!LdrAccessResource ()
from /cygdrive/c/WINDOWS/system32/ntdll.dll
(gdb) bt
#0 0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#1 0x7c91ea53 in ntdll!ZwYieldExecution () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#2 0x7c81e956 in SwitchToThread () from
/cygdrive/c/WINDOWS/system32/kernel32.dll
#3 0x61054215 in low_priority_sleep (secs=0) at
/netrel/src/cygwin-snapshot-20051003-1/winsup/cygwin/miscfuncs.cc:245
#4 0xfffffffe in ?? ()
(gdb) thread 2
[Switching to thread 2 (thread 3048.0xd04)]#0 0x7c91eb94 in
ntdll!LdrAccessResource ()
from /cygdrive/c/WINDOWS/system32/ntdll.dll
(gdb) bt
#0 0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#1 0x7c91e288 in ntdll!ZwReadFile () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#2 0x7c801875 in ReadFile () from
/cygdrive/c/WINDOWS/system32/kernel32.dll
#3 0x0000074c in ?? ()
(gdb) thread 3
[Switching to thread 3 (thread 3048.0xca8)]#0 0x7c911231 in
ntdll!DbgUiConnectToDbg ()
from /cygdrive/c/WINDOWS/system32/ntdll.dll
(gdb) bt
#0 0x7c911231 in ntdll!DbgUiConnectToDbg () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#1 0x7c9607a8 in ntdll!KiIntSystemCall () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#2 0x00000005 in ?? ()
(gdb) q
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: , Pid 3048
----------- snip ----
Thread 1 seems to be the eater.
Gdb doesnt reveal much info so i used my favorite win32 user mode
debugger, ollydbg:
----------- snip ----
Threads
Ident Entry Data block Last error Status
Priority User time System time
00000388 7C96077B 7FFDD000 ERROR_SUCCESS (00000000) Active
32 + 0 0.0000 s 0.0000 s
00000D04 7C810856 7FFDE000 ERROR_SUCCESS (00000000) Active
32 + 0 0.0000 s 0.0000 s
00000F90 00000000 7FFDF000 ERROR_SUCCESS (00000000) Active
32 + 0 52.8437 s 94.5156 s
----------- snip ----
You see (main) thread 0xf90 is eating all the cpu.
I examined the call stack and used gdb's "l/info" commands to get
symbols (i have appropriate .dbg file)
I manually added the symbols as comments "(xxxx)":
----------- snip ----
Call stack of main thread
Address Stack Procedure
Called from Frame
0022DD84 7C91EA53 Includes ntdll.KiFastSystemCallRet
ntdll.7C91EA51
0022DD88 7C81E956 ntdll.ZwYieldExecution
kernel32.7C81E950
0022DD8C 61054215 cygwin1.610F5138
cygwin1.61054210 (low_priority_sleep + 80)
0022DDAC 6106DF57 cygwin1.610541C0 (low_priority_sleep,
miscfuncs.cc:230) cygwin1.6106DF52 (_pinfo::sync_proc_pipe() + 34)
0022DDBC 61095984 cygwin1.6106DF30 (_pinfo::sync_proc_pipe(),
pinfo.cc:977) cygwin1.6109597F (spawn_guts(char const* ...) + 5263)
0022E99C 61095E35 ? cygwin1.610944F0 (spawn_guts(char const* ...),
spawn.cc) cygwin1.61095E30 (spawnve + 224) 0022E998
0022E9CC 610188AB cygwin1.61095D50 (spawnve)
cygwin1.610188A6 (execve + 38) 0022E9C8
----------- snip ----
I searched the current cygwin sources and found following snippets ...
----- snip spawn.cc ----
static int __stdcall
spawn_guts (const char * prog_arg, const char *const *argv,
const char *const envp[], int mode)
{
...
/* If wr_proc_pipe doesn't exist then this process was not started by
a cygwin
process. So, we need to wait around until the process we've
just "execed"
dies. Use our own wait facility to wait for our own pid to
exit (there
is some minor special case code in proc_waiter and friends to
accommodate
this).
If wr_proc_pipe exists, then it should be duplicated to the
child.
If the child has exited already, that's ok. The parent will
pick up
on this fact when we exit. dup_proc_pipe will close our end of
the pipe.
Note that wr_proc_pipe may also be == INVALID_HANDLE_VALUE.
That will make
dup_proc_pipe essentially a no-op. */
if (!newargv.win16_exe && myself->wr_proc_pipe)
{
myself->sync_proc_pipe (); /* Make sure that we own
wr_proc_pipe
just in case we've been
previously
execed. */
myself.zap_cwd ();
myself->dup_proc_pipe (pi.hProcess);
}
----- snip pinfo.cc ----
void
_pinfo::sync_proc_pipe ()
{
if (wr_proc_pipe && wr_proc_pipe != INVALID_HANDLE_VALUE)
while (wr_proc_pipe_owner != GetCurrentProcessId ())
low_priority_sleep (0);
}
---------------------------
It seems "sync_proc_pipe" is looping forever because the condition
"wr_proc_pipe_owner != GetCurrentProcessId ()" is satisfied but never
left.
I updated cygwin core several times but this kind of error persists.
What gives?
Regards,
Robert Michelsen
--
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/