This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Linux kernel problem -- food for thoughts


On Wed, Apr 16, 2003 at 10:43:12AM -0400, Elena Zannoni wrote:
> Daniel Jacobowitz writes:
>  > On Wed, Apr 16, 2003 at 10:24:03AM -0400, Elena Zannoni wrote:
>  > > 
>  > > Gdb is currently having a 'little problem' backtracing out of system
>  > > calls in x86 kernels which support NPTL. I think the current public
>  > > 2.5 kernel would make this problem show up.
>  > > 
>  > > Right now, if you are in system calls the backtrace will show up as:
>  > > 
>  > >  0xffffe002 in ??
>  > 
>  > I was just thinking about this.  My reaction is:
>  >   - the page needs to be readable; I vaguely remember badgering Linus
>  > about this and getting it fixed, but it might have been someone else,
>  > or it might not have gotten fixed.
>  >   - GDB needs to get the location of the EH information from glibc
>  > somehow.  My instinct is to make glibc export this in a global symbol,
>  > just like the way we get signal numbers from linuxthreads.
>  > 
>  > How does that sound?
> 
> Roland (but I'll let him speak) has had a thought about creating a
> /proc/pid/vsyscall file, which then gdb could read with add-symbol-file....
> 
> the page is readable right now in 2.5 and the patch for the .eh_frame
> has been integrated.
> 
> core files will also need to be addressed.

Oww.  Should we just include the page in core dumps?  That might be
the simplest solution.

Roland, I think that doing it in glibc is a better idea than doing it
in /proc somewhere; it will make remote debugging and core debugging
more straightforward.

>  > Note that we don't use eh information on i386 yet.  We need to fix
>  > that.  I tried once and got distracted by another project, I think :)
> 
> Yep, of course.
> 
> elena
> 
> 
>  > 
>  > > 
>  > > Here is an explanation of the problem that Roland has provided:
>  > > 
>  > > ---------------
>  > > Previously asm or C code in libc entered the kernel by setting some
>  > > registers and using the "int $0x80" instruction.  e.g.
>  > > 
>  > > 00000000 <__getpid>:
>  > >    0:	b8 14 00 00 00       	mov    $0x14,%eax
>  > >    5:	cd 80                	int    $0x80
>  > >    7:	c3                   	ret    
>  > > 
>  > > That is the function called __getpid in libc, the pre-NPTL build.  (In the
>  > > shared library you will see this if you've run with LD_ASSUME_KERNEL=2.4.1
>  > > so that /lib/i686/libc.so.6 is what you are using.)
>  > > 
>  > > In the new libc (/lib/tls/libc.so.6), that function looks like this:
>  > > 
>  > > 00000000 <__getpid>:
>  > >    0:	b8 14 00 00 00       	mov    $0x14,%eax
>  > >    5:	65 ff 15 10 00 00 00 	call   *%gs:0x10
>  > >    c:	c3                   	ret    
>  > > 
>  > > %gs:0x10 is a location that has been initialized to a kernel-supplied
>  > > special entry point address.  In the current kernels, that address is
>  > > always 0xffffe000.  But that is not part of the ABI, which is why it's
>  > > indirect instead of a literal "call 0xffffe000".  The kernel supplies the
>  > > actual entry point address to libc at startup time, and nothing in the
>  > > kernel-user interface prevents it from using a different address in each
>  > > process if it chose to.
>  > > 
>  > > The reason for this is that there can be multiple ways to enter the kernel,
>  > > not just the "int $0x80" trap instruction.  Some kernels on some hardware
>  > > may use a different method that performs better.  By using this
>  > > kernel-supplied entry point address, no user code has to be changed to
>  > > select the method.  It's entirely the kernel's choice.
>  > > 
>  > > In all the RH kernels we have right now, the entry point page contains:
>  > > 
>  > > 	0xffffe000:	int $0x80
>  > > 	0xffffe002:	ret
>  > > 
>  > > But user code cannot presume what this code sequence looks like exactly.
>  > > It will be some sequence of register and stack moves and special trap
>  > > instructions, but you have to disassemble to know exactly.  In the case
>  > > above, the PC value seen while a thread is in the kernel is 0xffffe002.
>  > > You can disassemble the "ret" there and see that you have to pop the PC off
>  > > the stack to recover the caller's frame.  
>  > > 
>  > > Another example of what this code might look like when you disassemble it is:
>  > > 
>  > > 	0xffffe000:	push   %ecx
>  > > 	0xffffe001:	push   %edx
>  > > 	0xffffe002:	push   %ebp
>  > > 	0xffffe003: 	mov    %esp,%ebp
>  > > 	0xffffe005: 	sysenter 
>  > > 	0xffffe007:	nop    
>  > > 	0xffffe008:	nop    
>  > > 	0xffffe009:	nop    
>  > > 	0xffffe00a:	nop    
>  > > 	0xffffe00b:	nop    
>  > > 	0xffffe00c:	nop    
>  > > 	0xffffe00d:	nop    
>  > > 	0xffffe00e: 	jmp    0xffffe003
>  > > 	0xffffe010:	pop    %ebp
>  > > 	0xffffe011:	pop    %edx
>  > > 	0xffffe012:	pop    %ecx
>  > > 	0xffffe013:	ret    
>  > > 
>  > > In this example, depending on what happened inside the kernel the PC you
>  > > usually see may be either 0xffffe00e or 0xffffe010.  If the process gets a
>  > > signal or you attach asynchronously or so forth, the PC might be at any of
>  > > the earlier instructions as well.  You cannot rely on exactly what the
>  > > sequence is, so you must be able to disassemble from where you are and
>  > > cope.  In this case you will most often see 0xffffe010, in which case you
>  > > need to pop those three registers and the PC off the stack to restore the
>  > > caller's frame.
>  > > 
>  > > So, these cases are like a leaf function with no debugging info.  The
>  > > first solution idea was interpreting the epilogue code.  It will
>  > > probably be safe to assume that it looks like epilogue code normally
>  > > does, i.e. register pops and not any arbitrary instructions.
>  > > 
>  > > Another solution I was considering is to have the system somewhere provide
>  > > DWARF unwind info matching the possible PC addresses in the vsyscall page.
>  > > I am now pretty sure this is the way to go.  The recent development is that
>  > > NPTL now needs .eh_frame information for these PCs as well, and Ulrich has
>  > > made a kernel change to provide it.  The .eh_frame info for the vsyscall
>  > > PCs is on the same read-only kernel page.  The C library now uses this as
>  > > if the vsyscall page were a DSO with .eh_frame info to register, so that
>  > > exception-style unwinding from any valid PC in a magic entry point works.
>  > > 
>  > > So, there is a .eh_frame section available for this code, and getting it
>  > > from where it is into gdb can be done by hook or by crook.  I have the
>  > > impression that gdb turning an available .eh_frame section into happy
>  > > backtraces is something that might be expected real soon now.  
>  > > Sounds like a winner.
>  > > 
>  > > I think that elucidates all but the dreariest bits of the technical issues.
>  > > Now the practical questions.  Oh, one dreary bit: 83172 mostly talks about
>  > > the fact that ptrace refuses to read the 0xffffe000 page for you, which is
>  > > presumed a prerequisite for dealing with the real can of worms (unwinding).
>  > > 
>  > > --------------------
>  > > 
>  > > 
>  > > I think right now the public 2.5 kernel has a fix to make the page
>  > > readable, and another one to provide the .eh_frame information. There
>  > > is no mechanism yet to make that debug info accessible to gdb.
>  > > 
>  > > 
>  > > elena
>  > > 
>  > 
>  > -- 
>  > Daniel Jacobowitz
>  > MontaVista Software                         Debian GNU/Linux Developer
> 

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]