This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Linux kernel thread runtime support


Hi Peter

On Wed, 25 Jan 2017 12:43:39 +0000
Peter Griffin <peter.griffin@linaro.org> wrote:

> Hi Philipp,
> 
> On Thu, 12 Jan 2017, Philipp Rudo wrote:
> 
> > Hi Peter
> > 
> > i finally managed to send my patches. I would appreciate if could
> > have a look at them. Any feedback is welcome.  
> 
> OK great, I will take a look at them. OOI can you post a GIT tree link
> so I can easily pull your patches and have a play around?

Thanks. Currently I'm working on a v2 containing Pedros and Luis
comments. When done (most likely beginning of next week) I (or better
Andreas) will push the commits to an private branch at GDB repository.
At that time I hopefully also found a way to make a dump awailable to
you.

> > 
> > 
> > On Tue, 10 Jan 2017 15:55:56 +0000
> > Peter Griffin <peter.griffin@linaro.org> wrote:
> > 
> > [...]
> >   
> > > > > void
> > > > > lkthread_get_per_cpu_offsets(int numcores)
> > > > > {
> > > > >   enum bfd_endian byte_order = gdbarch_byte_order
> > > > > (target_gdbarch ()); int length = TYPE_LENGTH (builtin_type
> > > > > (target_gdbarch ())->builtin_data_ptr); CORE_ADDR curr_addr =
> > > > > ADDR (__per_cpu_offset); int core;
> > > > > 
> > > > > 
> > > > >   if (!HAS_ADDR (__per_cpu_offset))
> > > > >     {
> > > > >       if (debug_linuxkthread_threads)
> > > > > 	fprintf_unfiltered (gdb_stdlog, "Assuming non-SMP
> > > > > kernel.\n");
> > > > > 
> > > > >       return;
> > > > >     }
> > > > > 
> > > > >   for (core=0; core < numcores; core++)    
> > > > 
> > > > This iteration does not work. As CPUs can be added/removed the
> > > > logical CPU-ids needn't be continuous, e.g. with two CPUs the
> > > > ids can be 0 and 2. You have to walk the cpu_online_mask to see
> > > > which CPUs are used at the moment. The iteration appears
> > > > several times in the code.    
> > > 
> > > Good point, that is something I had not considered! Will fix this
> > > in V2, although the code makes this assumption in quite a few
> > > places :(
> > > 
> > > To get this working nicely I think either GDB will need a
> > > delete_thread() API which also decrements the thread number. As
> > > with the physical CPU being added / removed debug usecase we will
> > > need to add/remove a different number of swapper threads, and we
> > > don't wish for the GDB thread numbering to be 'ever increasing'
> > > in such a situation throughout the debug session.  
> > 
> > I don't see why this should be an problem to simply add additional
> > swapper tasks. I don't expect too many CPUs to added/removed and the
> > kernel tries to recycle the CPU ids when possible. Thus i'd prefer
> > this method ...  
> 
> The problem comes with preserving the thread numbering throughout the
> debug session. You probably won't have experienced this problem if
> you only enumerate the thread list once in the core dump use case.
> 
> Essentially atm in linux-kthread, all swappers are added first, and
> then the Linux threads, with the (incorrect) assumption that the
> number of physical CPU's stays the same throughout the debug session.
> 
> This is done to preserve the thread numbering of the Linux threads
> when the user issues 'info threads'.
> 
> For example: -
> 
> (gdb) info threads
>   Id   Target Id         Frame 
> * 1    [swapper/0] (pid: 0 tgid: 0 <C0>) cpu_v7_do_idle ()
>     at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75
>   2    init (pid: 1 tgid: 1) context_switch (cookie=...,
> next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> 3    [kthreadd] (pid: 2 tgid: 2) context_switch (cookie=...,
> next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> <snip> 92   getty (pid: 2243 tgid: 2243) context_switch (cookie=...,
> next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> 
> Now if we continue the target and halt sometime later and we now have
> 2 CPU's online, then we need 2 swapper threads which if added first
> would alter the thread numbering of all the Linux threads.
> 
> e.g.
> 
> (gdb) info threads
>   Id   Target Id         Frame 
> * 1    [swapper/0] (pid: 0 tgid: 0 <C0>) cpu_v7_do_idle ()
>     at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75
>   2    [swapper/1] (pid: 0 tgid: 0 <C1>) cpu_v7_do_idle ()
>     at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75
>   3  init (pid: 1 tgid: 1) context_switch (cookie=...,
> next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> 4    [kthreadd] (pid: 2 tgid: 2) context_switch (cookie=...,
> next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> <snip> 93   getty (pid: 2243 tgid: 2243) context_switch (cookie=...,
> next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> 
> Where 'getty' is now thread 93 instead of thread 92 previously.
> Having the thread numbering move around like that will be confusing
> for the user I think which is where the suggestion of adding all
> possible swappers came from.
> 
> Alternatively we could always add the swappers *after* adding all the
> Linux threads which would mean the Linux thread numbering would
> remain constant, but the swapper thread number would jump around as
> cpu's come online / offline and more Linux threads are created over
> time.
> 
> I think adding the swappers *after* the Linux threads would be
> preferable. Either approach requires a change in core GDB thread
> numbering logic, which currently only ever 'increases' (the thread
> number is not decremented when a thread is deleted).

Thats right we don't have that problem and I can see why you want that
feature. Your second idea to add the swappers after the linux thread
sounds much better for me. Then you only need to add the swappers you
actually need.

As an alternative you could also add an command to access the thread by
pid instead of GDB-Id. The code should be straight forward and possible
to implement in Python. The only problem would be the swapper tasks
having the same pid. But that is solvable. From the users point of view
I think this approach is even more preferable and you don't need to
worry about GDBs ordering.

> > > Alternatively (maybe better) I could create dummy swapper entries
> > > for every 'possible CPU', of the target, if some CPU's are
> > > offline these will remain dummy swapper threads. This has the
> > > benefit that thread numbering of all the Linux threads which are
> > > added after the swappers will keep the same thread numbering in
> > > GDB regardless of what CPU's are brought online / offline.  
> > 
> > ... over your alternative. At least for S390 the max number of
> > possible CPUs is 256 which would add quite an overhead...  
> 
> Eeek! Ok so that isn't really a feasible option then ;)

Yep, our machines are a little bigger than the average ARM systems ;) 

> > > > >   {
> > > > >       if (!lkthread_h->per_cpu_offset[core])
> > > > > 	lkthread_h->per_cpu_offset[core] =
> > > > > 	  read_memory_unsigned_integer (curr_addr, length,
> > > > > byte_order);
> > > > > 
> > > > >       curr_addr += (CORE_ADDR) length;
> > > > > 
> > > > >       if (!lkthread_h->per_cpu_offset[core])
> > > > > 	{
> > > > > 	  warning ("Suspicious null per-cpu offsets,"
> > > > > 		   " or wrong number of detected cores:\n"
> > > > > 		   "ADDR (__per_cpu_offset) = %s\nmax_cores =
> > > > > %d", phex (ADDR (__per_cpu_offset),4), max_cores);
> > > > > 
> > > > > 	  break;
> > > > > 	}
> > > > >     }
> > > > > 
> > > > >   if (debug_linuxkthread_threads)
> > > > >     fprintf_unfiltered (gdb_stdlog, "SMP kernel. %d cores
> > > > > detected\n", /numcores);
> > > > > }    
> > > > 
> > > > [...]
> > > >     
> > > > > static void
> > > > > lkthread_init (void)
> > > > > {
> > > > >   struct thread_info *th = NULL;
> > > > >   struct cleanup *cleanup;
> > > > >   int size =
> > > > >     TYPE_LENGTH (builtin_type (target_gdbarch
> > > > > ())->builtin_unsigned_long);
> > > > > 
> > > > >   /* Ensure thread list from beneath target is up to date.  */
> > > > >   cleanup = make_cleanup_restore_integer
> > > > > (&print_thread_events); print_thread_events = 0;
> > > > >   update_thread_list ();
> > > > >   do_cleanups (cleanup);
> > > > > 
> > > > >   /* Count the h/w threads.  */
> > > > >   max_cores = thread_count ();    
> > > > 
> > > > The number of CPUs does not have to be constant. Especially
> > > > when you are running in a VM you can add/remove CPUs. Thus
> > > > max_cores has to be determined dynamically or set to the max
> > > > number of CPUs, e.g. by taking the size (in bits) of
> > > > cpu_online_mask.    
> > > 
> > > Good point, will fix in V2 most likely by taking max possible
> > > CPU's from cpu_online_mask, and adding dummy swappers for each (as
> > > suggested above).
> > > 
> > > Although I'm wondering what different GDB remote stubs do in this
> > > situation when a CPU has been taken offline. In Qemu at least it
> > > is still reported by the GDB remote stub, and we can still get a
> > > backtrace from it even when offline.
> > > 
> > > I suspect with a OpenOCD setup when CPU is powered off OpenOCD
> > > will loose access to the debug unit and SWD/JTAG connection will
> > > drop (certainly my experience debugging with OpenOCD in the past
> > > with cpuidle enabled in the kernel).  
> > 
> > For me the Qemu behavior sounds like a bug. Qemu shouldn't report
> > any off line CPU. The OpenOCD behavior sounds much more reasonable.
> > Anyway your code shouldn't have a need to access this data as it
> > doesn't contain any information you need. However i haven't worked
> > with the remote target and gdbserver yet and don't know their exact
> > behavior.

I talked to some of our Qemu developers. They also think that this is a
bug in Qemu/ the GDB stub in Qemu. They believe that the struct
representing the CPU is just marked invalid but not removed and that
the GDB stub simply does not check for that. Unfortunately this is arch
dependent code ...

> > > > >   gdb_assert (max_cores);
> > > > > 
> > > > >   if (debug_linuxkthread_threads)
> > > > >     {
> > > > >       fprintf_unfiltered (gdb_stdlog, "lkthread_init()
> > > > > cores(%d) GDB" "HW threads\n", max_cores);
> > > > >       iterate_over_threads (thread_print_info, NULL);
> > > > >     }
> > > > > 
> > > > >   /* Allocate per cpu data.  */
> > > > >   lkthread_alloc_percpu_data(max_cores);    
> > > > 
> > > > Together with the comment above, allocating the percpu_data to a
> > > > fixed size can cause problems when the number of CPUs is
> > > > increased.    
> > > 
> > > Will fix in v2.
> > >   
> > > >     
> > > > >   lkthread_get_per_cpu_offsets(max_cores);
> > > > > 
> > > > >   if (!lkthread_get_runqueues_addr () && (max_cores > 1))
> > > > >     fprintf_unfiltered (gdb_stdlog, "Could not find the
> > > > > address of CPU" " runqueues current context information maybe
> > > > > " "less precise\n.");
> > > > > 
> > > > >   /* Invalidate the linux-kthread cached list.  */
> > > > >   lkthread_invalidate_threadlist ();
> > > > > }    
> > > > 
> > > > [...]
> > > >     
> > > > > static linux_kthread_info_t **
> > > > > lkthread_get_threadlist_helper (linux_kthread_info_t ** ps)
> > > > > {
> > > > >   struct linux_kthread_arch_ops *arch_ops =
> > > > >     gdbarch_linux_kthread_ops (target_gdbarch ());
> > > > >   CORE_ADDR rq_idle_taskstruct;
> > > > >   CORE_ADDR g, t, init_task_addr;
> > > > >   int core = 0, i;
> > > > > 
> > > > >   if (debug_linuxkthread_threads)
> > > > >     fprintf_unfiltered (gdb_stdlog,
> > > > > "lkthread_get_threadlist_helper\n");
> > > > > 
> > > > >   init_task_addr = ADDR (init_task);
> > > > >   g = init_task_addr;
> > > > > 
> > > > >   do
> > > > >     {
> > > > >       t = g;
> > > > >       do
> > > > >         {
> > > > > 
> > > > > 	  if (!arch_ops->is_kernel_address(t))
> > > > > 	    {
> > > > > 	      warning ("parsing of task list stopped because
> > > > > of invalid address" "%s", phex (t, 4));
> > > > >               break;
> > > > > 	    }
> > > > > 
> > > > >           lkthread_get_task_info (t, ps, core /*zero-based */
> > > > > ); core = CORE_INVAL;
> > > > > 
> > > > >           if (ptid_get_tid (PTID_OF (*ps)) == 0)
> > > > >             {
> > > > >               /* This is init_task, let's insert the other
> > > > > cores swapper now.  */
> > > > >               for (i = 1; i < max_cores; i++)
> > > > >                 {
> > > > >                   ps = &((*ps)->next);
> > > > >                   rq_idle_taskstruct = lkthread_get_rq_idle
> > > > > (i); lkthread_get_task_info (rq_idle_taskstruct, ps,
> > > > > i); }
> > > > >             }    
> > > > 
> > > > How does the setting to a core work? For me it looks like only
> > > > for init_task and the swapper tasks core != CORE_INVAL. Is that
> > > > the intended behavior?    
> > > 
> > > Not sure if I have understood your question correctly or not
> > > but...
> > > 
> > > It does look like a bug there though as these two lines
> > >   
> > > > >           lkthread_get_task_info (t, ps, core /*zero-based */
> > > > > ); core = CORE_INVAL;    
> > > 
> > > Should be swapped, rather than adding all threads with core=0 they
> > > should be added with core=CORE_INVAL.  
> > 
> > Well not all threads are added with core=0, just init_task is...
> > The only other line where core is used is at the beginning of the
> > function where it is defined and initialized to zero. Thus after the
> > first loop (with init_task) core will never change again to a value
> > other than CORE_INVAL. If you swap those two lines you can get rid
> > of core all together and simply pass CORE_INVAL to
> > lkthread_get_task_info. 
> > > The user is informed which thread is running on which physical CPU
> > > core by linux_kthread_extra_thread_info() callback. This calls
> > > lkthread_is_curr_task() for each GDB thread, which calls
> > > lkthread_get_running() which updates 'current->core'.
> > > 
> > > If thread is currently executing on the CPU then
> > > linux_kthread_extra_thread_info() adds a 'C<num>'
> > > suffix to thread name, so it is clear to the user when issuing
> > > a 'info threads' command.
> > > 
> > > e.g
> > > 
> > > ^C[New getty]
> > > 
> > > Thread 1 received signal SIGINT, Interrupt.
> > > cpu_v7_do_idle ()
> > > at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75
> > > 75		ret	lr (gdb) info threads
> > >   Id   Target Id         Frame 
> > > * 1    [swapper/0] (pid: 0 tgid: 0 <C0>) cpu_v7_do_idle ()
> > >                                    ^^^^
> > > 
> > > at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75
> > >   2    init (pid: 1 tgid: 1) context_switch (cookie=...,
> > > next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> > > at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> > > 3    [kthreadd] (pid: 2 tgid: 2) context_switch (cookie=...,
> > > next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
> > > at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
> > > 
> > > When target halted CPU0 was executing in swapper.
> > > 
> > > Did that answer your question?  
> > 
> > thanks that helped. Although i still don't fully understand how the
> > initialization works. I find lkthread_get_task_info quite
> > confusing.  
> 
> I agree that function could do with being simplified / rewritten, so
> it is easier to follow.

Good to know it isn't just me ;)

Philipp


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]