This is the mail archive of the gdb-prs@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: cli/1056: divide by zero hangs gdb

From: mec at shout dot net
To: andrel at u dot arizona dot edu, gdb-prs at sources dot redhat dot com, nobody at sources dot redhat dot com
Date: 28 Aug 2003 07:34:23 -0000
Subject: Re: cli/1056: divide by zero hangs gdb
Reply-to: mec at shout dot net, andrel at u dot arizona dot edu, gdb-prs at sources dot redhat dot com, nobody at sources dot redhat dot com, gdb-gnats at sources dot redhat dot com

Synopsis: divide by zero hangs gdb

State-Changed-From-To: open->analyzed
State-Changed-By: chastain
State-Changed-When: Thu Aug 28 07:34:23 2003
State-Changed-Why:
    
    The problem still happens with gdb gdb_6_0-branch 2003-08-27 and gdb HEAD 2003-08-27 on i686-pc-linux-gnu.
    
    Here is the analysis: gdb has a signal handler for SIGFPE, handle_sigfpe.  handle_sigfpe calls mark_async_signal_handler_wrapper, and then re-enables itself, and then returns.
    
    After handle_sigfpe returns from the signal, the linux kernel restarts the same instruction that produced the SIGFPE!  This causes an endless stream of SIGFPE's.
    
    The easy way to see this is to play with this program:
    
      #include <stdio.h>
      int v1 = 1;
      int v2 = 0;
      static void handle_sigfpe (int sig)
      {
        signal (sig, handle_sigfpe);
      }
    
      int main ( )
      {
        int v = 0;
        signal (SIGFPE, handle_sigfpe);
        v = v1 / v2;
        return 0;
    }
    
    If you run this program you can see it take the SIGFPE over and over at the divide instruction where division by zero happens.
    
    The problem is that the CPU traps the instructions before it executes rather than after.  The CPU does this so that the operating system can provide a math emulator in software.  Thus, the kernel receives the address of the offending "divide" instruction, and when it resumes from handle_sigfpe, it just does the "divide" over again.
    
    Single Unix Spec version 3 blesses this behavior.  In the signal() documentation, it says:
    
      If and when the function [the signal handler] returns,
      if the value of sig was SIGFPE, SIGILL, or SIGSEGV or
      any other implementation-defined value corresponding
      to a computational exception, the behavior is undefined.
    
    So: the CPU faults with the program counter pointing to the faulting instruction rather than after it.  The kernel is not in the business of code-reading so it is not going to adjust the program counter to point after the instruction.  Our naive signal handler isn't doing that, either.  There's no simple place to just get past the instruction.
    
    At least on this architecture, it would actually be better to have *no* signal handler for SIGFPE rather than the one we have now, because as soon as a SIGFPE happens, gdb is dead anyways.
    
    More usefully, it looks like handle_sigfpe will have to call error() directly, rather than using the async event handling mechanism.
    
    
      

http://sources.redhat.com/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gdb&pr=1056

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]