This is the mail archive of the
archer@sourceware.org
mailing list for the Archer project.
Re: Q: mutlithreaded tracees && clone/exit
- From: Jan Kratochvil <jan dot kratochvil at redhat dot com>
- To: Oleg Nesterov <oleg at redhat dot com>
- Cc: archer at sourceware dot org
- Date: Mon, 19 Jul 2010 18:01:27 +0200
- Subject: Re: Q: mutlithreaded tracees && clone/exit
On Fri, 16 Jul 2010 22:51:47 +0200, Oleg Nesterov wrote:
> In case this matters, I used gdb-7.1 for testing.
FSF GDB (not Fedora/RHEL GDB) probably.
> Q1: if gsbstub reported that the tracee has exited (say, we sent
> '$X09#..' to gdb), can gsbstub assume it can forget about this thread?
`X' is about processes, not threads ('W'=TARGET_WAITKIND_EXITED,
'X'=TARGET_WAITKIND_SIGNALLED).
Threads death is handled by GDB-driven 'T' packet (remote_thread_alive).
(I just mostly read the GDB sources, I am intact by the remote GDB stuff.)
> I mean, can it assume that gdb won't send something like 'D;EXITED_PID'?
TARGET_WAITKIND_EXITED and TARGET_WAITKIND_SIGNALLED in
handle_inferior_event() call target_mourn_inferior(), this is very terminal.
> Looking at gdb sources/behaviour, I think the answer is yes, it can
> forget. But I'd like to have the confirmation.
Yes, I also think so. I cannot give the confirmation.
> And. I'd like to let you know that gdb is buggy ;)
Please file those bugs while discussing them here:
http://sourceware.org/bugzilla/enter_bug.cgi?product=gdb
> The user presses ^C, gdb sends 3 and waits for reply. Suppose that
> gdbstub doesn't reply immediately.
IMHO this remote GDB protocol and non-stop mode are primarily tested with
Eclipse-over-MI. Bugs faced by GDB CLI are going to be very common.
> I noticed this bug when I found another problem, gdb+gdbserver doesn't
> work correctly if the main thread exits. But let's forget about this
> problem for now.
This issue does not work well even with linux-nat.c (local GDB), in the
current development stage of ugdb I believe we do not have to solve it before
linux-nat.c gets fixed first:
GDB hangs with simple multi-threaded program on linux
http://sourceware.org/ml/gdb/2010-07/msg00045.html
> The main question is, I do not understand how gdbstub should handle the
> multithreaded targets.
[...]
> (gdb) file test1
> (gdb) target extended-remote :2000
> (gdb) attach 16927
> Attached to process 16927
> ...
> 0x00000033af60e57d in pause () from /lib64/libpthread.so.0
> (gdb)
>
> OK. gdbserver ptraces both threads. But afaics gdb doesn't now this
> program is multithreaded,
> Q2: Shouldn't gdbstub let debugger know about sub-threads somehow?
gdb did not ask for it so why gdbserver should tell gdbserver it?
(gdb) info threads
[New Thread 14739.14740] <-- GDB has notified it now.
2 Thread 14739.14740 0x000000349e8a6a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 14739.14739 0x000000349f007fbd in pthread_join (threadid=140515741927184, thread_return=0x0) at pthread_join.c:89
Eclipse apparently does `info threads' over MI so it is not a problem.
Also as you state in non-stop mode gdb asks for the thread list anyway.
> gdbserver resumes both threads. Press enter, the sub-thread exits.
>
> And nothing happens! gdbserver sends nothing to gdb, it just reaps
> the tracee silently:
...
> Q3: is it correct? shouldn't we inform the debugger?
GDB will sooner or later use the 'T' packet (remote_thread_alive) to reclaim
dead threads.
With libpthread_db (linux-thread-db.c) it just sets
thread_info->private->dying = 1; on TD_DEATH anyway and continues tracking the
threads before its kernel task finally dies.
> So. Afaics, gdb can only find the new thread if the user does
> "info threads", or if this thread reports somehow about itself
> (say, it gets a signal and gdbserver sends "$T..." with its tid).
Yes, GDB is master of the remote protocol communication. Not the gdbserver.
> Also. gdb can't know the sub-thread has exited unless the user
> does "info threads" again, or something like "$TpPID.TGID" gets
> "E01" in reply.
>
> Correct?
>
> Q4: is this what we want to implement?
IMO yes, we should first get ugdb a bit on-par with linux-nat.c, don't we?
> I am asking because that I thought that gdb+gdbserver should
> try to work the same way as it works without gdbserver, and
> thus it should see clone/exit.
GDB has two lists of "threads":
Real libpthread / libthread_db / linux-thread-db.c / struct thread_info *
which is primarily used. Name is displayed by thread_db_pid_to_str().
Then there are also kernel tasks / linux-nat.c / struct lwp_info * which are
provided when libthread_db is not available. This second category IIRC does
not work so well as linux-thread-db.c is more commonly in use (I do not have
a fail testcase offhand, it may work). Name is displayed by
linux_nat_pid_to_str().
Both types of "threads" are displayed by GDB CLI `info threads'. Their name
format differs a bit, according to the display name function (to_pid_to_str).
> However, gdbserver sends nothing to gdb if the tracee does
> pthread_create() or pthread_exit().
yes
On Sun, 18 Jul 2010 19:48:51 +0200, Oleg Nesterov wrote:
> gdbserver tracks PTRACE_EVENT_CLONE, yes. But it doesn't inform gdb.
IMO we can tune the non-libpthread mode later, AFAIK it does not work well
with linux-nat.c anyway.
> > gdb also uses higher-level knowledge read from user memory
> > (libthread_db) for some aspects of thread tracking.
>
> Well, yes and no (if I understood your message correctly).
>
> I have already looked at this code in horror. I really hope this magic
> is not needed for our purposes.
>
> It is gdbserver, not gdb, who uses libthread_db to find sub-threads and
> do other things.
>
> gdbserver asks gdb what is the symbol's address (say, _thread_db_list_t_next)
> via 'qSymbol'.
i see this can be a problem for ugdb. Guessing we will need to change GDB to
support new variant of proc-service.c working over the GDB protocol wire.
> However, there is the complication I already mentioned. If the main
> thread exits, this confuses gdbserver at least.
Replied above, this is a GDB bug even with linux-nat.c first, it was fixed in
Fedora GDB before but for some cases it apparently still does not work.
Thanks,
Jan