This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: perl threads on 2008 R2 64bit = crash ( was: perl 5.10 threads on 1.5.25 = instant crash )


On Jul 16 17:18, Christopher Faylor wrote:
> On Thu, Jul 16, 2009 at 09:55:52PM +0200, Corinna Vinschen wrote:
> >On Jul 16 17:47, Dave Korn wrote:
> >>   You might want to try again with a watchpoint:
> >> 
> >> watch *(unsigned int*)0x88ce68
> >> 
> >> ... and see how and where that head entry gets set up and whether it
> >> subsequently gets overwritten somehow.
> >[...]
> >(gdb) bt
> >#0  _cygtls::init_exception_handler (this=0x88ce64,
> >    eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_lis
> >t*, _CONTEXT*, void*)>)
> >    at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:244
> >#1  0x61033ff5 in dll_dllcrt0_1 (x=0x883edc)
> >    at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:321
> >#2  0x6103414f in dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
> >    at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
> >#3  0x6eb77acf in _cygwin_dll_entry@12 ()
> >   from /usr/lib/perl5/5.10/i686-cygwin/auto/threads/threads.dll
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> >So this exception handler is installed as part of the Perl threads DLL
> >initialization.  But appanrelty the address is not valid anymore when
> >leaving the DLL initialization.
> >
> >For testing I disabled the 
> >
> >  _my_tls.init_exception_handler (_cygtls::handle_exceptions);
> >
> >call in dll_init.cc:dll_dllcrt0_1() and re-ran the Perl testcase.
> >Now it runs fine:
> >
> >  $ perl ./perlthread.pl
> >  Testing threads...
> >  I'm a thread!
> >  Testing done
> >
> >Is it possible that we have to remove the exception handler before
> >dll_dllcrt0_1 returns?
> 
> Are you saying that perl not cleaning up after itself here?  If so,
> that sounds like a perl bug.

I'm not saying that.  Maybe it is a Perl bug, but it looks like a Cygwin
bug to me.

After having started Perl, at the start of main(), the SEH chain
looks entirely normal:

  (gdb) x/xw 0x7efdd000
  0x7efdd000:     0x0088ce68
  (gdb) x/2x 0x0088ce68
  0x88ce68:       0x0088ffc4      0x6103ce20
  (gdb) x/2x 0x0088ffc4
  0x88ffc4:       0x0088ffe4      0x77cc03dd
  (gdb) x/2x 0x0088ffe4
  0x88ffe4:       0xffffffff      0x77d16900

Note that the start of the SEH chain is already at the address which
gets changed errneously in the later DLL initialization.  It's our
_my_tls.el entry.

Now I set a breakpoint to the start of the dll_dllcrt0 function, which
is called when the DLL gets loaded:

  (gdb) br "dll_init.cc:302"
  Breakpoint 3 at 0x61034144: file /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc, line 302.
  (gdb) c
  Continuing.

  Breakpoint 3, dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
      at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
  302         dll_dllcrt0_1 (&x);
  Current language:  auto; currently c++
  (gdb) bt
  #0  dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
      at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
  #1  0x6eb77acf in _cygwin_dll_entry@12 ()
     from /usr/lib/perl5/5.10/i686-cygwin/auto/threads/threads.dll
  #2  0x77c897c0 in ntdll!RtlQueryInformationActiveActivationContext ()
     from /cygdrive/c/Windows/system32/ntdll.dll

Ok, so the loaded DLL is the threads.dll lib.  How does the SEH chain
look like now?

  (gdb) x/xw 0x7efdd000
  0x7efdd000:     0x0088400c
  (gdb) x/2x 0x0088400c
  0x88400c:       0x00884178      0x77cc03dd
  (gdb) x/2x 0x00884178
  0x884178:       0x0088ce68      0x77cc03dd
  (gdb) x/2x 0x0088ce68
  0x88ce68:       0x0088ffc4      0x6103ce20
  (gdb) x/2x 0x0088ffc4
  0x88ffc4:       0x0088ffe4      0x77cc03dd
  (gdb) x/2x 0x0088ffe4
  0x88ffe4:       0xffffffff      0x77d16900

As you can see, the OS has added two handlers to the chain.  Now I step
to the code which is supposed to add the Cygwin exception handler:

  (gdb) s
  dll_dllcrt0_1 (x=0x883edc)
      at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:311
  311       HMODULE& h = ((dllcrt0_info *)x)->h;
  (gdb) n
  312       per_process*& p = ((dllcrt0_info *)x)->p;
  (gdb)
  313       int& res = ((dllcrt0_info *)x)->res;
  (gdb)
  321       _my_tls.init_exception_handler (_cygtls::handle_exceptions);
  (gdb) s
  _cygtls::init_exception_handler (this=0x88ce64,
      eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_list*, _CONTEXT*, void*)>)
      at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:231
  231       el.handler = eh;

Ok, so _my_tls.el, the SEH chain entry, gets overwritten now with the
new entries.  Where is el?

  (gdb) p/x &el
  $2 = 0x88ce68

Yes, that's still the same _my_tls.el.  That's also the watch address and
it's now an entry in the middle of the current SEH chain.

  (gdb) s
  243       el.prev = _except_list;
  (gdb)
  244       _except_list = &el;
  (gdb) p/x el
  $3 = {prev = 0x88400c, handler = 0x6103ce20}

Now the new prev address points to an address lower than the current
address and...

  (gdb) s
  245     }
  (gdb) x/xw 0x7efdd000
  0x7efdd000:     0x0088ce68

Now the SEH entry address has been moved to the new address and the
SEH chain is invalid since it's a circular list:

  (gdb) x/xw 0x7efdd000
  0x7efdd000:     0x0088ce68
  (gdb) x/2x 0x0088ce68
  0x88ce68:       0x0088400c      0x6103ce20
  (gdb) x/2x 0x0088400c
  0x88400c:       0x00884178      0x77cc03dd
  (gdb) x/2x 0x00884178
  0x884178:       0x0088ce68      0x77cc03dd
  (gdb) x/2x 0x0088ce68
  0x88ce68:       0x0088400c      0x6103ce20

AFAICS, the problem is that _my_tls.el is not the active SEH handler at
this point, but it is already part of the chain,
_cygtls::init_exception_handler doesn't check for validity and just
overwrites the entry in an invalid way.

If it's correct to set %fs:0 to our _my_tls.el address in this case,
thus just ignoring the OS handlers, then it seems incorrect in this
specific situation to change el.prev, because it already points to a
valid address.  Actually, the entire el SEH entry is already set
correctly, just `_except_list = &el;' would have to be called to skip
the OS handlers.

If it's not correct to just skip the OS handlers, we would have to
invent a new SEH entry at a lower stack address, rather than reusing
the _my_tls entry, which is already in use.

Assuming that skipping the OS handlers is OK, I have applied this
(too?) simple patch to have some crude sanity check:

Index: cygtls.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/cygtls.cc,v
retrieving revision 1.67
diff -u -p -r1.67 cygtls.cc
--- cygtls.cc	7 Jul 2009 08:07:38 -0000	1.67
+++ cygtls.cc	17 Jul 2009 08:54:07 -0000
@@ -240,6 +240,7 @@ _cygtls::init_exception_handler (excepti
      Windows 2008, which irremediably gets into an endless loop, taking 100%
      CPU.  That's why we reverted to a normal SEH chain and changed the way
      the exception handler returns to the application. */
-  el.prev = _except_list;
+  if (_except_list > el.prev)
+    el.prev = _except_list;
   _except_list = &el;
 }

With this patch, the Perl testcase works fine.  I'm sure there's
a better way to implement a sanity check, though.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]