This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: DSR Scheduling Problem


Grant Edwards <grante@visi.com> writes:
> In gmane.os.ecos.general, you wrote:
>
[...]
>> We just need to come to a reasonable decision after weighting
>> all pros and cons, -- that's why I have posted this in the
>> first place.
>
> As I've said, I think keeping the old DSR scheduler and adding
> an optional new one is the reasonable choice.  For the vast
> majority of applications it just doesn't matter,

What bothers me is that it's hard to tell if it does matter or not for a
given application. It could be the case that the unfortunate worst case
behavior of LIFO just didn't happen yet due to its low probability. It's
somewhat similar to having a race somewhere, -- it could work for years
than suddenly break nex Friday, 13 :(

> and leaving the current one as the default has the lowest risk of
> breaking existing applications.  I guess I just don't think there are
> that many applications where FIFO has a measureable advantage to take
> the risk.

And I have tried to perform the analysis and it made me feel that almost
any application could be affected :( Well, we can believe that the
probability of worst case behavior is so small that we could well ignore
it, but I still feel uneasy about it.

[...]
>>>> How will user compare the choices in his tests when most of time the
>>>> algorithms behave exactly the same?
>>>
>>> That's up to the user.
>>
>> Seems like putting on the user the responsibilities he can't cope with
>>:(
>
> How are we supposed to run/evaluate tests of the user's application?

The problem is that such kinds of problems are very difficult if not
impossible to find in tests. Even if an application behaves well for a
few days, it can still suddenly break tomorrow, -- that what I meant, --
the user probably has less chances to find the problem in his tests than
we have by analyzing the system :(

[...]
>> [What in fact bothers me is why don't you care, -- do you in
>> fact still have feeling that LIFO could be better in some
>> cases? I'd be thankful if you share it with me if you have.]
>
> I have the feeling that for everything I've done, LIFO works
> just as well as FIFO would.  I'm convinced that changing to a
> new DSR scheduling scheme will be of no benefit to my
> applications and represents a small (but non-zero) risk.

In fact we are in roughly the same situation. Please tell how did you
manage to convince yourself your applications aren't affected as after
the analysis I still feel uneasy about my own application that still
worked fine with LIFO.

To be unaffected for sure, either application should have less than 3
asynch IRQ sources, or the DSR latency equal to the sum of execution
times of all the ISRs and DSRs should be not a problem. If you have only
such applications, then indeed there is no reason to bother.

> For example: We recently switched from the NetBSD TCP stack to
> the FreeBSD stack because the latter is what's recommended and
> what is being actively maintained.  There was a fringe benefit
> of somewhat lower CPU load and higher TCP/IP throughput.
>
> However, it broke our application in certain scenarios.  There
> was a bug in the FreeBSD stack.  It was fixed 6 years ago in
> the NetBSD stack, but never got fixed in the FreeBSD stack.  We
> now have a rather frustrated and irate customer and have spent
> quite a few hours duplicating the problem and tracking down the
> bug in the FreeBSD stack.

Well, we both know such things happen :( Sorry you've run into such
troubles. Though switching from one TCP stack implementation to another
is IMHO many magnitudes more risky than switching from LIFO to FIFO
where things are much more simple and easier to understand.

>
> Change is risk.
>

Yes, my switch from 1.3.1 to 2.0 also resulted in hard to find breakage
(due to a CPU hardware bug that has been hidden by 1.3.1 ARM HAL code
and has been unveiled due to the HAL re-implementation). Do I think the
old HAL should better be left there under an option?  No, I
don't. Progress requires changes, changes may break things, -- we all
are used to cope with it.

>>>> The only one I see is backward compatibility, but due to the fact
>>>> that eCos never specified exact order of DSRs it shouldn't matter.
>>>
>>> Lots of things that shouldn't matter do.
>>
>> Yes, indeed.
>
>> I don't believe you really think that *every* change to eCos sources
>> should be put under yet another option as some "working" system
>> somewhere may break, right?
>
> I think that in general, new features or fundamental changes to
> existing features should be optional if possible.  Sometimes
> that's simply not practical, but I think it is in this case.

Well, in this particular case we still disagree, as the risk is IMHO
negligible, but my disagreement is not to the level to continue to argue
against the new options.

Keeping LIFO the default may have another unfortunate effect. The issues
involved in selection of particular option are so non-obvious that most
people will probably just leave the default driven by the reasonable
rule "if you aren't sure what the option is about, leave it the
default". I still think safest option should be made the default, but I
do understand you probably won't agree with me.

-- Sergei.


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]