This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: regarding hal_interrupt_stack_call_pending_DSRs call


hi,

Since I don't see posts on list that give the idea
that there are people working on ECOS on
multiprocessor architectures? Are there any who have
ported ECOS on multiprocessor architechtures? will
synthetic target on linux be able to use underlying
multiple processors?

Has anyone on list had live experinece of SMP-ECOS on
SMP  architechture? If noone, this means that SMP-ECOS
is not fully tested, then kinda this possibility can't
be denied that there could be some unknown race
conditions hidden (though rare) that may get
luckily/unluckily discovered once a large no. of
parallel architechtures come into market and lot many
people start using them for ECOS?

But as Nick (if I remember right) had pointed few days
back, he doesn't see heavy requriement/usage of SMP
ECOS in market till couple of years down the line in
future. So it is not a current issue to invest efforts
in finding such rare possibilities.

For example (seems, from my understanding of ecos
code, like an absurd possibility that should never
happen) -- 

[1] the code enters into
hal_interrupt_stack_call_pending_DSRs call, switches
to interrupt stack, enables interrupts (mind that
sched_lock is 1 now and we have reached here via
unlock_inner(0) call from interrupt_end).

[2] Can it happen that in some rare possibility
someone else executing on other processor,
coincidentally happens to set the sched_lock to zero
(currently the zero_sched_lock and get_sched_lock
don't check for if caller is on owner CPU or not),

[3] So if the interrupt happens to come on CPU in
...pending_DSRs code (step 1 above), well.. things are
likely to be messed up.

I have seen asserts and some debugging checks in code
that check for such rare not-supposed-to-happen
situations. Any plans to force the asserts in crucial
code like zero-sched_lock into code like "if u r not
the owner of sched_lock just wait or print PANIC
messages and/or halt"

One such situation where this coincidence can happen
is   -- programmer's bad programming -- calling
reschedule or unlock_reschedule in a code sequence w/o
taking a schedlock in nearest past to reschedule call.

** assuming sched_lock = 0

** CPU1 - interrupt came, code went to interrupt_end
-> unlock_inner -> ... pending_DSRs..

** CPU2 - code happens to execute
reschedule->unlock_inner->context switch call
that brings in another thread on CPU2 that falls
through it's earlier exceution and zeroes out
sched_lock.

** CPU1 - still in pending_DSRs call and another
interrupt has come.

I don't think that race conditions will exist, because
so far my efforts in understanding ecos code tell me
it is kinda sturdy work in ecos.

sorry if it sounds pessimistic, but I am just curious
and still trying to understand ecos better and better,
trying to avoid Linux biases as far as possible. May
be someday if i get to work on ecos on some hardware,
my spare time efforts in understanding ecos pay off.

brij


__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]