This is the mail archive of the ecos-patches@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC, fix for bogus timeouts in select()


tor, 21.10.2004 kl. 20.38 skrev Andrew Lunn:
> > Index: current/ChangeLog
> > ===================================================================
> > RCS file: /cvs/ecos/ecos/packages/io/fileio/current/ChangeLog,v
> > retrieving revision 1.46
> > diff -u -w -r1.46 ChangeLog
> > --- current/ChangeLog	4 Oct 2004 11:50:06 -0000	1.46
> > +++ current/ChangeLog	21 Oct 2004 12:32:21 -0000
> > @@ -1,3 +1,15 @@
> > +2004-10-21  Oyvind Harboe  <oyvind.harboe@zylin.com>
> > +
> > +        * src/select.cxx: Fix problem with bogus timeouts in select().
> > +	The problem is that a thread can receive data while it is currently
> > +	starved for CPU. It can then wake up with data arrived and timeout
> > +	expired. The fix is to check for data after timeout has expired. One
> > +	can of course claim that select() is "doing the right thing", but
> > +	it is a royal pain for developers to track down this sort of thing
> > +	so removing this API tripwire seems worthwhile. E.g. serial drivers
> > +	can spend a lot of time in DSRs copying lots of traffic. Not easily
> > +	dealt with at an application level.
> 
> Although the current implementation is probably not optimal, i don't
> see any tripwire in the API. A task/process can get
> descheduled/rescheduled at any time. Think about this on a Unix
> system.

Unix is not a realtime OS like eCos.

How does other realtime operating systems implement select()?

Hmmm... I wonder if Linux actually does an extra check after it wakes up
after a timeout...

What if the Linux implementation actually performs a check for more data
after a timeout, would that swing your opinion?

> The select() system call exited on a timeout and you are back
> into the libc select() function when you get time sliced. While some
> other process is running the ethernet device interrupt goes off and
> the stack puts new data into the socket ready for the userspace to
> read sometime in the future. Your process then gets the CPU back and
> the libc select function exits back into you application. Select tells
> you it has timed out, but there is infact data to be read on the
> socket. 

So what is the correct use of select then?

1. 	select() w/timeout
2.	select() w/zero timeout
3.	blocking read() for data if available
4.	else print "timeout error message"

Yuk!

> In practice this makes little difference. The next time around
> the loop select will exist imeadiately telling you there is data on
> the socket.

Hmmm.... I wonder how many applications that get this right. 

My definition of "API tripwire" is if  > n% get the implementation
wrong.

I believe that it is DSR interrupts that are starving me, not other
threads.

Leaving the implementation as is, doesn't exactly give me a  warm-fuzzy.

> Any application that assumes that select returning a timeout means
> there is no data on the socket is broken.

I don't think the patch itself will pass muster. The mail was intended
to bring up the issue. 

select() isn't the easiest code to read. I particularly like the "goto
camouflaged as break" trick :-)



> I will take a closer look at the patch though.
> 
>         Andrew
-- 
Øyvind Harboe
http://www.zylin.com



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]