This is the mail archive of the gdb-prs@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: threads/1153: gdb-5.3 + glibc-2.3.2 fail for my multithreaded program


The following reply was made to PR threads/1153; it has been noted by GNATS.

From: Mat Hostetter <mat at curl dot com>
To: Daniel Jacobowitz <drow at mvista dot com>
Cc: gdb-gnats at sources dot redhat dot com
Subject: Re: threads/1153: gdb-5.3 + glibc-2.3.2 fail for my multithreaded program
Date: 27 Mar 2003 10:47:50 -0500

 [sorry for the lack of info; I accidentally filed the ticket before
 I had pasted in the important info]
 
 I have a large, multithreaded program that doesn't work on gdb-5.3.
 I am running RedHat Linux 8.0 using glibc-2.3.2 (RedHat's official rpm
 upgrade) with all upgrade RPMs applied.
 
 I have also tried gdb-20030325 and gdb 5.2.1 (RedHat 8's stock gdb)
 with the same results.
 
 My hardware is a dual 2.8 GHz xeon with hyperthreading enabled,
 although I tried this on a different machine so it seems not to
 matter.
 
 It is important to point out that:
 
 - My program runs fine outside gdb
 - gdb 5.3 runs the program fine on RedHat 6.1
 - I'm 95% sure gdb 5.3 worked before I installed RedHat's glibc upgrade RPM.
 
 So the problem may be a glibc bug, or it may be that gdb is making
 some technically invalid assumption that doesn't work with the new
 glibc.  But even if my main problem is somehow a glibc bug, there is
 definitely a minor gdb bug; see BUG2 below.
 
 Before I start, here is the obligatory version info:
 
     unicron:~$ uname -a
     Linux unicron.curl.com 2.4.18-27.8.0smp #1 SMP Fri Mar 14 05:47:33 EST 2003 i686 i686 i386 GNU/Linux
     unicron:~$ gcc --version
     gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
     Copyright (C) 2002 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
     
     unicron:~$ /scratch/mat/gdb-5.3/gdb/gdb --version
     GNU gdb 5.3
     Copyright 2002 Free Software Foundation, Inc.
     GDB is free software, covered by the GNU General Public License, and you are
     welcome to change it and/or distribute copies of it under certain conditions.
     Type "show copying" to see the conditions.
     There is absolutely no warranty for GDB.  Type "show warranty" for details.
     This GDB was configured as "i686-pc-linux-gnu".
 
 And before I get into the details of what I know, let me show you a
 session of just how quickly it fails.  This actually illustrates two bugs:
 
 unicron:/scratch/mat/trunk/bin$ /scratch/mat/gdb-5.3/gdb/gdb ./curl-builder
 GNU gdb 5.3
 Copyright 2002 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "i686-pc-linux-gnu"...
 (gdb) handle SIG32 noprint
 Signal        Stop      Print   Pass to program Description
 SIG32         No        No      Yes             Real-time event 32
 (gdb) run -s
 Starting program: /local/scratch/mat/trunk/bin/curl-builder -s
 Couldn't get registers: Operation not permitted.
 (gdb) info threads
 Segmentation fault
 
 
 As you can see, there are two related problems shown in the
 above session:
 
     BUG1) The "Operation not permitted" error.
     BUG2) gdb SEGVs during "info threads"
 
 [The "handle SIG32 noprint" I typed isn't important, it just silences
  a few SIG32 messages (I think that's the signal used by pthreads; and
  gdb too?)  Without it, it stops a few times on SIG32 before failing
  in the same way.  At least, it usually does; exactly when it fails
  changes sometimes if I have to type "cont", perhaps suggesting a race
  condition.  Perhaps the SIG32 message is very significant; are other
  pthreads users told about this?  It's very annoying.]
 
 
 I will say what I know about each bug.
 
 
 BUG1: The "Operation not permitted" error.
 
 Before you ask, I am running all processes as myself in the
 straightforward way.  There's no process attaching or setuid
 strangeness going on.
 
 What's failing is the call to ptrace(PTRACE_GETREGS,...) in fetch_regs
 in i386-linux-nat.c.  It works for a while but eventually fails with
 EPERM.
 
 I ran 'strace' on gdb to get a better feel for why this was failing.
 If you look for EPERM below you'll see the failing system call.  I
 don't know anything beyond this.  From my debugging sessions I think
 in the strace log below that PTRACE_??? really means PTRACE_GETREGS.
 
 My best guess is that a subprocess (or thread) is somehow exiting and
 gdb isn't handling it properly.  There are a few comments in lin-lwp.c
 that might we relevant.  Indeed, when I get the "Operation not
 permitted" error, if I look at the pid for the thread created by the
 process spawned by gdb (one greater than the pid returned by gdb's
 vfork) it's a zombie ("defunct") process.  So maybe we are going
 GETREGS
 
 However, the program I'm running should have had no threads exit when
 gdb fails, so this is strange.
 
 It may also be telling that exiting gdb after the permission error,
 rather than typing "info threads", causes gdb to hang forever, sitting
 in a call to "wait4".  Perhaps this is also a missing child death
 notification?
 
 One thing I would like to understand better is why gdb uses SIGRTMIN.
 My understanding from other debugging is that pthreads uses the same
 signal (signal 32) for thread communication, and I wonder if this
 could cause any problems.  Just speculating.
 
 Anyway, here's the strace.  I can supply the full log on request,
 it's about 6000 lines long.  Look for EPERM:
 
 unicron:~$ strace -o ~/log.txt /scratch/mat/gdb-5.3/gdb/gdb ./curl-builder
 
 yields:
 
 fstat64(3, {st_mode=S_IFREG|0755, st_size=50808, ...}) = 0
 stat64("/local/scratch/mat/trunk/bin/curl-builder", {st_mode=S_IFREG|0755, st_size=50808, ...}) = 0
 stat64("/local/scratch/mat/trunk/bin/curl-builder", {st_mode=S_IFREG|0755, st_size=50808, ...}) = 0
 write(1, "Starting program: /local/scratch"..., 63) = 63
 vfork()                                 = 10579
 wait4(-1, [WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP], 0, NULL) = 10579
 --- SIGCHLD (Child exited) ---
 sigreturn()                             = ? (mask now [RTMIN])
 ptrace(PTRACE_???, 10579, 0, 0xbfffd7e0) = 0
 ptrace(PTRACE_PEEKUSER, 10579, offsetof(struct user, u_debugreg) + 24, [0]) = 0
 fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
 brk(0)                                  = 0x828f000
 brk(0x8292000)                          = 0x8292000
 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
 ioctl(0, 0x540f, [10575])               = 0
 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
 fcntl64(0, F_SETFL, O_RDONLY)           = 0
 fcntl64(0, F_SETFL, O_RDONLY)           = 0
 ioctl(0, SNDCTL_TMR_START, {B38400 opost isig icanon echo ...}) = 0
 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
 ioctl(0, 0x5410, [10579])               = 0
 ptrace(PTRACE_CONT, 10579, 0, SIG_0)    = 0
 wait4(-1, [WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP], 0, NULL) = 10579
 --- SIGCHLD (Child exited) ---
 sigreturn()                             = ? (mask now [RTMIN])
 ptrace(PTRACE_???, 10579, 0, 0xbfffd7e0) = 0
 
 ...[lots of stuff elided]...
 
 ptrace(PTRACE_PEEKTEXT, 10579, 0x8049f2c, [0x2d322d65]) = 0
 ptrace(PTRACE_PEEKTEXT, 10579, 0x8049f30, [0x6f732e31]) = 0
 ptrace(PTRACE_PEEKTEXT, 10579, 0x8049f34, [0]) = 0
 fcntl64(0, F_SETFL, O_RDWR)             = 0
 fcntl64(0, F_SETFL, O_RDWR)             = 0
 ioctl(0, SNDCTL_TMR_START, {B38400 opost isig icanon echo ...}) = 0
 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
 ioctl(0, 0x5410, [10579])               = 0
 ptrace(PTRACE_PEEKTEXT, 10579, 0x4000bb00, [0x5de58955]) = 0
 ptrace(PTRACE_SINGLESTEP, 10579, 0, SIG_0) = 0
 --- SIGCHLD (Child exited) ---
 sigreturn()                             = ? (mask now [RTMIN])
 wait4(-1, [WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP], 0, NULL) = 10579
 ptrace(PTRACE_???, 10579, 0, 0xbfffd910) = 0
 ptrace(PTRACE_PEEKUSER, 10579, offsetof(struct user, u_debugreg) + 24, [0xffff4ff0]) = 0
 ptrace(PTRACE_PEEKTEXT, 10579, 0x4000bb00, [0x5de58955]) = 0
 ptrace(PTRACE_PEEKTEXT, 10579, 0x4000bb00, [0x5de58955]) = 0
 ptrace(PTRACE_POKEDATA, 10579, 0x4000bb00, 0x5de589cc) = 0
 ptrace(PTRACE_CONT, 10579, 0, SIG_0)    = 0
 wait4(-1, [WIFSTOPPED(s) && WSTOPSIG(s) == SIGRTMIN], 0, NULL) = 10579
 --- SIGCHLD (Child exited) ---
 sigreturn()                             = ? (mask now [RTMIN])
 ptrace(PTRACE_???, 10579, 0, 0xbfffd910) = 0
 ptrace(PTRACE_PEEKUSER, 10579, offsetof(struct user, u_debugreg) + 24, [0xffff4ff0]) = 0
 ptrace(PTRACE_CONT, 10579, 0, SIGRTMIN) = 0
 wait4(-1, [WIFSTOPPED(s) && WSTOPSIG(s) == SIGRTMIN], 0, NULL) = 10579
 --- SIGCHLD (Child exited) ---
 sigreturn()                             = ? (mask now [RTMIN])
 ptrace(PTRACE_???, 10579, 0, 0xbfffd910) = 0
 ptrace(PTRACE_PEEKUSER, 10579, offsetof(struct user, u_debugreg) + 24, [0xffff4ff0]) = 0
 ptrace(PTRACE_CONT, 10579, 0, SIGRTMIN) = 0
 wait4(-1, [WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP], 0, NULL) = 10579
 --- SIGCHLD (Child exited) ---
 sigreturn()                             = ? (mask now [RTMIN])
 ptrace(PTRACE_???, 10579, 0, 0xbfffd910) = -1 EPERM (Operation not permitted)
 
 
 
 
 
 BUG2: gdb SEGVs during "info threads"
 
 gdb crashes because 'selected_frame' is NULL, a case that this code
 in 'info_threads_command' really isn't expecting:
 
   cur_frame = find_relative_frame (selected_frame, &counter);
 
 I don't know what the right fix is, but it seems to me that either:
 
    1) 'info_threads_command' should check for a NULL selected_frame
       and print out a message about "no stack" or somesuch, or
       some kind of warning; or
 
    2) if some invariant was broken elsewhere that 'info_threads_command'
       shouldn't have to worry about, that code needs to be changed
       to preserve the invariant.  'selected_frame' being NULL is no
       doubt caused by the EPERM from ptrace described above.  But a
       failed call to 'ptrace' should not be able to make gdb SEGV.
 
 Program received signal SIGSEGV, Segmentation fault.
 0x0807e28a in get_next_frame (frame=0x0) at blockframe.c:287
 (top-gdb) bt
 #0  0x0807e28a in get_next_frame (frame=0x0) at blockframe.c:287
 #1  0x080b3401 in find_relative_frame (frame=0x0, level_offset_ptr=0xbfffdccc)
     at stack.c:1608
 #2  0x080b43ab in info_threads_command (arg=0x0, from_tty=1) at thread.c:471
 #3  0x08074306 in do_cfunc (c=0x0, args=0x0, from_tty=1) at cli/cli-decode.c:53
 #4  0x08075c12 in cmd_func (cmd=0x8237188, args=0x0, from_tty=1)
     at cli/cli-decode.c:1523
 #5  0x080ebb42 in execute_command (p=0x82286dd "", from_tty=1) at top.c:711
 #6  0x080b5d4d in command_handler (command=0x82286d8 "i thr")
     at event-top.c:504
 #7  0x080b616c in command_line_handler (rl=0x827eb28 "\030\353'\b")
     at event-top.c:799
 #8  0x08183845 in rl_callback_read_char () at callback.c:114
 #9  0x080b5707 in rl_callback_read_char_wrapper (client_data=0x0)
     at event-top.c:168
 #10 0x080b5c5a in stdin_event_handler (error=0, client_data=0x0)
     at event-top.c:418
 #11 0x080b502c in handle_file_event (event_file_desc=22992) at event-loop.c:714
 #12 0x080b4b74 in process_event () at event-loop.c:334
 #13 0x080b4bbc in gdb_do_one_event (data=0x0) at event-loop.c:371
 #14 0x080eb812 in do_catch_errors (uiout=0x8248070, data=0x0) at top.c:492
 #15 0x080eb770 in catcher (func=0x80eb804 <do_catch_errors>, 
     func_uiout=0x8248070, func_args=0xbfffdfc0, func_val=0xbfffdfb8, 
     func_caught=0xbfffdfbc, errstring=0x0, mask=6) at top.c:424
 #16 0x080eb848 in catch_errors (func=0, func_args=0x0, errstring=0x819b8b2 "", 
     mask=6) at top.c:504
 #17 0x080b4bdf in start_event_loop () at event-loop.c:395
 #18 0x08072532 in captured_command_loop (data=0x0) at main.c:96
 #19 0x080eb812 in do_catch_errors (uiout=0x8248070, data=0x0) at top.c:492
 #20 0x080eb770 in catcher (func=0x80eb804 <do_catch_errors>, 
     func_uiout=0x8248070, func_args=0xbfffe140, func_val=0xbfffe138, 
     func_caught=0xbfffe13c, errstring=0x0, mask=6) at top.c:424
 #21 0x080eb848 in catch_errors (func=0, func_args=0x0, errstring=0x819b8b2 "", 
     mask=6) at top.c:504
 #22 0x08072e7e in captured_main (data=0x0) at main.c:729
 #23 0x080eb812 in do_catch_errors (uiout=0x820b640, data=0x0) at top.c:492
 #24 0x080eb770 in catcher (func=0x80eb804 <do_catch_errors>, 
     func_uiout=0x820b640, func_args=0xbfffe3f0, func_val=0xbfffe3e8, 
     func_caught=0xbfffe3ec, errstring=0x0, mask=6) at top.c:424
 #25 0x080eb848 in catch_errors (func=0, func_args=0x0, errstring=0x819b8b2 "", 
     mask=6) at top.c:504
 #26 0x08072f83 in gdb_main (args=0x59d0) at main.c:738
 #27 0x080724ec in main (argc=0, argv=0x0) at gdb.c:33
 #28 0x4009d907 in __libc_start_main () from /lib/libc.so.6


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]