This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Multiple kprobes at an address redux (take3)


It would be nice to summarize all the requirements upfront, before
moving on to the design.

One observation I have is that a register_single_kprobe equivalent 
is still useful for those who might be setting up a kprobe for debugging
or fault injection purposes --  or rather maybe something like
register_exclusive_kprobe, because one might not want to be surprised by
having multiple handlers clicking away in such a situation.

That brings me to the other question -- typically whenever we have
a call-chain of this sort, policy isn't very far behind ... quite
soon we find ourselves wanting priorities between handlers or
to decide whether to continue calling rest of the handlers or
short circuit through depending on the result of an earlier handler
etc. Sort of like notifiers ...

Perhaps that isn't something that would be very relevant for
probes which are just used for tracing purposes by various subsystems,
which is where I guess the multiple handler requirement arose.

But in the more general case, is the design flexible enough to
allow policy to be built _over_ this infrastructure (again of course
without cluttering the core kprobes logic) ?

Nomenclature wise - both aggr_probe and multi-probe are names that
sound more like a set of different probes, not multiple handlers
for the same probe as is really intended. Ideas for better names ?

Regards
Suparna

On Mon, Apr 11, 2005 at 03:32:35PM -0400, Ananth N Mavinakayanahalli wrote:
> Hi,
> 
> This patch builds on Suparna's idea of a layered approach to providing
> multiple kprobe support at an address.
> 
> Salient features:
> 
> - The interfaces to register/unregister kprobes remains the same.
> - The kprobes infrastructure takes care if a kprobe already exists at
>   the requested address.
> - No new structures are added.
> - Current kprobes code not modified (except to renaming of routines).
> - Architecture agnostic approach.
> - We now track the currently executing kprobe (in case of multiple
>   kprobes at the address). In the case of a fault during handler
>   execution, only the current kprobe's fault handler is invoked.
>   This insulates other kprobes at the location from the fault.
> 
> Details of the design are in the accompanying document.
> 
> Patch is against 2.6.12-rc2-mm2.
> 
> Thanks,
> Ananth

> Draft of 11 April 2005.
> 
> 1	Multiple kprobes at an address
> 
> 1.1	Overview
> 
> One of the requirements of SystemTAP is the capability of defining 
> multiple kprobes per probe address. Current design of kprobes does 
> not allow the registration of more than one kprobe at an address.
> 
> This is an attempt to come up with a design that will enable a user
> to register multiple kprobes at a given address. 
> 
> 
> 2	Design
> 
> 2.1	struct kprobe
> 
> Struct kprobe will now have an additional "list" field to 
> facilitate list.h usage.
> 
> struct kprobe {
> 	struct hlist_head hlist;
> 	struct list_head list;
> 	kprobe_opcode_t *addr;
> 	kprobe_pre_handler_t pre_handler;
> 	kprobe_post_handler_t post_handler;
> 	kprobe_fault_handler_t fault_handler;
> 	kprobe_break_handler_t break_handler;
> 	kprobe_opcode_t opcode;
> 	struct arch_specific_insn ainsn;
> };
> 
> NOTE: It is always a good idea to set the handler fields not being 
> used explicitly to NULL. The design determines if a probe is a 
> jprobe by virtue of the fact that only jprobes define break_handlers. 
> 
> 2.2	Base kprobes infrastructure
> 
> With this design, the "current" register/unregister_kprobe() 
> routines have been renamed register/unregister_kprobe_single() 
> and a wrapper takes care of invoking the _single() versions when 
> necessary. The _single() versions are NOT exported. 
> 
> With this change:
> 
> - The current interfaces (register/unregister_kprobes()) continue
>   to be the only ones available
> - The interfaces can be used without bothering if a kprobe already 
>   existed at the required address. The wrappers take care of that
> - The current base kprobes code does not change, except of course,
>   the renaming.
> 
> 2.2.1	Registering a kprobe
> 
> 1. Kprobes are registered with a call to register_kprobe().
> 2. The kprobe infrastructure determines if a probe already exists at
>    the requested location. 
> 3. If this is a new kprobe, registration proceeds as it done
>    currently.
> 4. If a kprobe already exists at the requested address:
>    a. If the call is to register a jprobe, the new handler 
>       registration fails since we can't have a jprobe and a kprobe 
>       at the same address. Whether a registration is for a jprobe
>       or not is determined by the presence of a break_handler.
>    b. If the call is to register a kprobe and if the pre_handler
>       for the existing kprobe is not aggr_pre_handler, we then:
>       	1. Allocate a new struct kprobe (manager kprobe).
> 	2. Populate the manager kprobe with appropriate values for
> 	   the addr, opcode, ainsn fields
> 	3. Set the manager kprobe's handlers to custom handlers 
> 	   with list walk logic
> 	4. Add the existing kprobe and the new kprobe to be 
> 	   manager kprobe's "list"
>    c. If the call is to register a kprobe and if the pre_handler
>       for the existing kprobe is aggr_pre_handler, it means that
>       we already have an manager kprobe defined at this address.
>       Unless the new probe to register is not a jprobe, it is 
>       initialized and added to the manager kprobe's list.
>       The registration fails if the new probe is a jprobe.
>       
> 2.2.2	Unregistering a kprobe
> 
> 1. Kprobes are unregistered by making a call to unregister_kprobe()
> 2. If a kprobe exists and its pre_handler is aggr_pre_handler(),
>    then, this is one of the possibly many kprobes at the address:
>    a. Delete the kprobe from the manager kprobe's list.
>    b. If the manager kprobe's "list" is empty, then free the
>       manager kprobe.
> 3. If the kprobe to be unregistered is the only one at the address,
>    remove it as is done currently.
> 
> 2.3	Handler invocation
> 
> Multiple kprobes at an address are managed by a "manager kprobe"
> allocated and freed by the kernel. All kprobes at the address are
> added to the "list" of the manager kprobe. The handlers of the
> manager kprobe are aliased to aggregate handlers that have in
> them the required list walk logic to invoke the registered
> kprobes' handlers.
> 
> This is accomplished by aliasing the aggr_probe's kprobe handlers
> to be:
> 
>         ap->kp.pre_handler = aggr_pre_handler;
>         ap->kp.post_handler = aggr_post_handler;
>         ap->kp.fault_handler = aggr_fault_handler;
> 
> Please refer to the accompanying patch for details about these 
> handlers.
> 
> 
> 2.4	More points on handler invocation
> 
> 2.4.1	Pre handlers
> 
> We care about the return code for a pre_handler only in case 
> of a jprobe. Return code from all other (read kprobe) handlers 
> are ignored. But, it is advised that all pre_handlers return 0 
> for compatibility sake with future enhancements.
> 
> For a multiple kprobes at an address, all pre handlers will be 
> called.
> 
> 2.4.2	Post handlers
> 
> All post handlers will be called.
> 
> 2.4.3	Fault handlers
> 
> We now track, using a "curr_kprobe", which kprobe handler 
> (pre/post) is currently executing. When a fault occurs and the 
> curr_kprobe is not NULL, it means that the fault occured while 
> executing the user defined handler. In that case, the 
> aggr_fault_handler invokes just the fault_handler that is 
> defined for the curr_kprobe
> 
> No other fault handlers are called.
> 
> NOTE: 
> 
> a. Fault handlers are meant not just to handle faults during
>    the execution of the handlers, but also in cases when we fault
>    while single-stepping out of line. This is the most common case
>    with user-space probes, but, the design has to keep this in 
>    mind so as to accomodate addition of user-space probes at a 
>    later date.
> 
> 2.5	Arch specific implementations
> 
> This design does not require any changes to the arch specific 
> kprobe implementations.
> 
> 
> 3	Caveats
> 
> - It is assumed that the handlers written behave well. :-)
> - Multiple kprobes cannot be registered at a location that already
>   has a jprobe registered. In other words, to register a jprobe,
>   no other kprobes must be registered at that address and 
>   vice-versa.
> - The current design builds on the existing kprobe locking 
>   infrastructure. It is hoped that this design (or a variant)
>   can be the base for the scalability changes envisaged. 
>   Suggestions are welcome.
> - There is a small race window while registering a kprobe wherein,
>   the registration may fail with an -EEXIST. This limitation can
>   be resolved if/when the x86_64 "wart" is removed. It can also
>   be tackled when we work on the kprobes scalability issues.
>  
> 
> 4. TODO
> 
> a. Indicate that a kprobe is disabled when disarm_kprobe() is 
>    called.
> b. jprobe/kprobe coexistance.
> 
> 
> 5	Questions
> 
> - Is the design implementable on all architectures kprobes is 
>   currently available?
>   A: Already done! :-)
>      With the new design, no changes are required for the arch
>      specific kprobe implementations.
> 

> diff -Naurp temp/linux-2.6.12-rc2/include/linux/kprobes.h linux-2.6.12-rc2/include/linux/kprobes.h
> --- temp/linux-2.6.12-rc2/include/linux/kprobes.h	2005-04-04 12:38:05.000000000 -0400
> +++ linux-2.6.12-rc2/include/linux/kprobes.h	2005-04-11 13:37:10.000000000 -0400
> @@ -43,6 +43,9 @@ typedef int (*kprobe_fault_handler_t) (s
>  struct kprobe {
>  	struct hlist_node hlist;
>  
> +	/* list of kprobes for multi-handler support */
> +	struct list_head list;
> +
>  	/* location of the probe point */
>  	kprobe_opcode_t *addr;
>  
> diff -Naurp temp/linux-2.6.12-rc2/kernel/kprobes.c linux-2.6.12-rc2/kernel/kprobes.c
> --- temp/linux-2.6.12-rc2/kernel/kprobes.c	2005-04-04 12:40:16.000000000 -0400
> +++ linux-2.6.12-rc2/kernel/kprobes.c	2005-04-11 14:05:02.000000000 -0400
> @@ -44,6 +44,7 @@ static struct hlist_head kprobe_table[KP
>  
>  unsigned int kprobe_cpu = NR_CPUS;
>  static DEFINE_SPINLOCK(kprobe_lock);
> +static struct kprobe *curr_kprobe;
>  
>  /* Locks kprobe: irqs must be disabled */
>  void lock_kprobes(void)
> @@ -73,7 +74,50 @@ struct kprobe *get_kprobe(void *addr)
>  	return NULL;
>  }
>  
> -int register_kprobe(struct kprobe *p)
> +int aggr_pre_handler(struct kprobe *p, struct pt_regs *regs)
> +{
> +	struct kprobe *kp;
> +
> +	list_for_each_entry(kp, &p->list, list) {
> +		if (kp->pre_handler) {
> +			curr_kprobe = kp;
> +			kp->pre_handler(kp, regs);
> +			curr_kprobe = NULL;
> +		}
> +	}
> +	return 0;
> +}
> +
> +void aggr_post_handler(struct kprobe *p, struct pt_regs *regs, 
> +		unsigned long flags)
> +{
> +	struct kprobe *kp;
> +
> +	list_for_each_entry(kp, &p->list, list) {
> +		if (kp->post_handler) {
> +			curr_kprobe = kp;
> +			kp->post_handler(kp, regs, flags);
> +			curr_kprobe = NULL;
> +		}
> +	}
> +	return;
> +}
> +
> +int aggr_fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr)
> +{
> +	/* 
> +	 * if we faulted "during" the execution of a user specified
> +	 * probe handler, invoke just that probe's fault handler
> +	 * to see if it can handle the fault.
> +	 */ 
> +	if (curr_kprobe && curr_kprobe->fault_handler) {
> +		if (curr_kprobe->fault_handler(curr_kprobe, regs, trapnr))
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static int register_kprobe_single(struct kprobe *p)
>  {
>  	int ret = 0;
>  	unsigned long flags = 0;
> @@ -104,7 +148,7 @@ rm_kprobe:
>  	return ret;
>  }
>  
> -void unregister_kprobe(struct kprobe *p)
> +static void unregister_kprobe_single(struct kprobe *p)
>  {
>  	unsigned long flags;
>  	arch_remove_kprobe(p);
> @@ -116,6 +160,120 @@ void unregister_kprobe(struct kprobe *p)
>  	spin_unlock_irqrestore(&kprobe_lock, flags);
>  }
>  
> +static inline void add_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
> +{
> +	ap->addr = p->addr;
> +	ap->opcode = p->opcode;
> +	memcpy(&ap->ainsn, &p->ainsn, 
> +			sizeof(struct arch_specific_insn));
> +
> +	ap->pre_handler = aggr_pre_handler;
> +	ap->post_handler = aggr_post_handler;
> +	ap->fault_handler = aggr_fault_handler;
> +
> +	INIT_LIST_HEAD(&ap->list);
> +	list_add(&p->list, &ap->list);
> +
> +	INIT_HLIST_NODE(&ap->hlist);
> +	hlist_del(&p->hlist);
> +	hlist_add_head(&ap->hlist,
> +		&kprobe_table[hash_ptr(ap->addr,
> +				KPROBE_HASH_BITS)]);
> +}
> +
> +static int register_aggr_kprobe(struct kprobe *p)
> +{
> +	int ret = 0;
> +	unsigned long flags = 0;
> +	struct kprobe *old_p, *ap;
> +
> +	ap = kcalloc(1, sizeof(struct kprobe), GFP_KERNEL);
> +	if (!ap) 
> +		return -ENOMEM;
> +
> +	spin_lock_irqsave(&kprobe_lock, flags);
> +	old_p = get_kprobe(p->addr);
> +	if (old_p) {
> +		if (old_p->break_handler || p->break_handler) {
> +			ret = -EEXIST;
> +			goto free_ap;
> +		} else if (old_p->pre_handler == aggr_pre_handler) {
> +			list_add(&p->list, &old_p->list);
> +			goto free_ap;
> +		} else {
> +			add_aggr_kprobe(ap, old_p);
> +			list_add(&p->list, &ap->list);
> +			goto out;
> +		}
> +	} else {
> +		/* 
> +		 * kprobe at this addr was deleted between the
> +		 * check in register_kprobe() and the get_kprobe()
> +		 * call above
> +		 */
> +		spin_unlock_irqrestore(&kprobe_lock, flags);
> +		ret = register_kprobe_single(p);
> +		if (ret == -EEXIST) {
> +			spin_lock_irqsave(&kprobe_lock, flags);
> +			old_p = get_kprobe(p->addr);
> +			if (old_p) {
> +				add_aggr_kprobe(ap, old_p);
> +				list_add(&p->list, &ap->list);
> +			}
> +			goto out;
> +		} else
> +			kfree(ap);
> +		return ret;
> +	}
> +free_ap:
> +	kfree(ap);
> +out:
> +	spin_unlock_irqrestore(&kprobe_lock, flags);
> +	return ret;
> +}
> +
> +static void unregister_aggr_kprobe(struct kprobe *old_p, 
> +		struct kprobe *p, unsigned long flags)
> +{
> +	/* we have taken the spinlock in unregister_kprobe() */
> +	list_del(&p->list);
> +
> +	if (list_empty(&old_p->list)) {
> +		spin_unlock_irqrestore(&kprobe_lock, flags);
> +		unregister_kprobe_single(old_p);
> +		kfree(old_p);
> +		return;
> +	} 
> +	spin_unlock_irqrestore(&kprobe_lock, flags);
> +}
> +
> +int register_kprobe(struct kprobe *p)
> +{
> +	int ret = 0;
> +
> +	ret = register_kprobe_single(p);
> +	if (ret == -EEXIST) 
> +		ret = register_aggr_kprobe(p);
> +
> +	return ret;
> +}
> +
> +void unregister_kprobe(struct kprobe *p)
> +{
> +	unsigned long flags;
> +	struct kprobe *old_p;
> +
> +	spin_lock_irqsave(&kprobe_lock, flags);
> +	old_p = get_kprobe(p->addr);
> +
> +	if (old_p && (old_p->pre_handler == aggr_pre_handler))
> +		unregister_aggr_kprobe(old_p, p, flags);
> +	else if (old_p == p) {
> +		spin_unlock_irqrestore(&kprobe_lock, flags);
> +		unregister_kprobe_single(p);
> +	}
> +}
> +
>  static struct notifier_block kprobe_exceptions_nb = {
>  	.notifier_call = kprobe_exceptions_notify,
>  	.priority = 0x7fffffff /* we need to notified first */


-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]