This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: regarding hal_interrupt_stack_call_pending_DSRs call


Brij Bihari Pandey <fuzzhead012@yahoo.com> writes:

> hi,
> 
> Since I don't see posts on list that give the idea
> that there are people working on ECOS on
> multiprocessor architectures? Are there any who have
> ported ECOS on multiprocessor architechtures? will
> synthetic target on linux be able to use underlying
> multiple processors?

There was a patch from <sandeep@codito.com> for some of the tests to
make them work better in SMP. So there is at least one SMP user out
there.

> 
> Has anyone on list had live experinece of SMP-ECOS on
> SMP  architechture? If noone, this means that SMP-ECOS
> is not fully tested, then kinda this possibility can't
> be denied that there could be some unknown race
> conditions hidden (though rare) that may get
> luckily/unluckily discovered once a large no. of
> parallel architechtures come into market and lot many
> people start using them for ECOS?

I doubt that there are any more race conditions in the SMP code than
in the uniprocessor code. The current implementation errs on the side
of caution -- serializing CPUs through the scheduler lock. If I had
chose to implement a per-CPU scheduler lock with inter-cpu spinlocks I
would be more inclined to agree that there may be hidden races,
spinlocks and livelocks.

> 
> But as Nick (if I remember right) had pointed few days
> back, he doesn't see heavy requriement/usage of SMP
> ECOS in market till couple of years down the line in
> future. So it is not a current issue to invest efforts
> in finding such rare possibilities.

At present all we need to do is to ensure that the SMP code does not
rot and that any new code is SMP-aware where necessary. The
availability of SMP hardware is also an issue. I have a dual CPU
Pentium machine here, but few people will want to deploy such a
monster for embedded applications.

> 
> For example (seems, from my understanding of ecos
> code, like an absurd possibility that should never
> happen) -- 
> 
> [1] the code enters into
> hal_interrupt_stack_call_pending_DSRs call, switches
> to interrupt stack, enables interrupts (mind that
> sched_lock is 1 now and we have reached here via
> unlock_inner(0) call from interrupt_end).
> 
> [2] Can it happen that in some rare possibility
> someone else executing on other processor,
> coincidentally happens to set the sched_lock to zero
> (currently the zero_sched_lock and get_sched_lock
> don't check for if caller is on owner CPU or not),

This can never happen, zero_sched_lock() will only be called if the
CPU has already claimed the scheduler lock. There are only three
places it is called, and these are all in the kernel where we know
what we are doing.

> 
> [3] So if the interrupt happens to come on CPU in
> ...pending_DSRs code (step 1 above), well.. things are
> likely to be messed up.
> 
> I have seen asserts and some debugging checks in code
> that check for such rare not-supposed-to-happen
> situations. Any plans to force the asserts in crucial
> code like zero-sched_lock into code like "if u r not
> the owner of sched_lock just wait or print PANIC
> messages and/or halt"

There is already an assertion in the HAL_SMP_SCHEDLOCK_ZERO() macro
that tests this.

> 
> One such situation where this coincidence can happen
> is   -- programmer's bad programming -- calling
> reschedule or unlock_reschedule in a code sequence w/o
> taking a schedlock in nearest past to reschedule call.
> 
> ** assuming sched_lock = 0
> 
> ** CPU1 - interrupt came, code went to interrupt_end
> -> unlock_inner -> ... pending_DSRs..
> 
> ** CPU2 - code happens to execute
> reschedule->unlock_inner->context switch call
> that brings in another thread on CPU2 that falls
> through it's earlier exceution and zeroes out
> sched_lock.
> 
> ** CPU1 - still in pending_DSRs call and another
> interrupt has come.
>

This cannot happen because CPU2 will be held trying to acquire the
scheduler lock until CPU1's DSR is finished.


-- 
Nick Garnett - eCos Kernel Architect
http://www.eCosCentric.com/

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]