This is the mail archive of the
ecos-patches@sources.redhat.com
mailing list for the eCos project.
Re: RFC, fix for bogus timeouts in select()
- From: Øyvind Harboe <oyvind dot harboe at zylin dot com>
- To: Andrew Lunn <andrew at lunn dot ch>
- Cc: ecos-patches at sources dot redhat dot com
- Date: Thu, 21 Oct 2004 22:30:29 +0200
- Subject: Re: RFC, fix for bogus timeouts in select()
- References: <1098362231.21934.21.camel@famine> <20041021183809.GI18923@lunn.ch>
tor, 21.10.2004 kl. 20.38 skrev Andrew Lunn:
> > Index: current/ChangeLog
> > ===================================================================
> > RCS file: /cvs/ecos/ecos/packages/io/fileio/current/ChangeLog,v
> > retrieving revision 1.46
> > diff -u -w -r1.46 ChangeLog
> > --- current/ChangeLog 4 Oct 2004 11:50:06 -0000 1.46
> > +++ current/ChangeLog 21 Oct 2004 12:32:21 -0000
> > @@ -1,3 +1,15 @@
> > +2004-10-21 Oyvind Harboe <oyvind.harboe@zylin.com>
> > +
> > + * src/select.cxx: Fix problem with bogus timeouts in select().
> > + The problem is that a thread can receive data while it is currently
> > + starved for CPU. It can then wake up with data arrived and timeout
> > + expired. The fix is to check for data after timeout has expired. One
> > + can of course claim that select() is "doing the right thing", but
> > + it is a royal pain for developers to track down this sort of thing
> > + so removing this API tripwire seems worthwhile. E.g. serial drivers
> > + can spend a lot of time in DSRs copying lots of traffic. Not easily
> > + dealt with at an application level.
>
> Although the current implementation is probably not optimal, i don't
> see any tripwire in the API. A task/process can get
> descheduled/rescheduled at any time. Think about this on a Unix
> system.
Unix is not a realtime OS like eCos.
How does other realtime operating systems implement select()?
Hmmm... I wonder if Linux actually does an extra check after it wakes up
after a timeout...
What if the Linux implementation actually performs a check for more data
after a timeout, would that swing your opinion?
> The select() system call exited on a timeout and you are back
> into the libc select() function when you get time sliced. While some
> other process is running the ethernet device interrupt goes off and
> the stack puts new data into the socket ready for the userspace to
> read sometime in the future. Your process then gets the CPU back and
> the libc select function exits back into you application. Select tells
> you it has timed out, but there is infact data to be read on the
> socket.
So what is the correct use of select then?
1. select() w/timeout
2. select() w/zero timeout
3. blocking read() for data if available
4. else print "timeout error message"
Yuk!
> In practice this makes little difference. The next time around
> the loop select will exist imeadiately telling you there is data on
> the socket.
Hmmm.... I wonder how many applications that get this right.
My definition of "API tripwire" is if > n% get the implementation
wrong.
I believe that it is DSR interrupts that are starving me, not other
threads.
Leaving the implementation as is, doesn't exactly give me a warm-fuzzy.
> Any application that assumes that select returning a timeout means
> there is no data on the socket is broken.
I don't think the patch itself will pass muster. The mail was intended
to bring up the issue.
select() isn't the easiest code to read. I particularly like the "goto
camouflaged as break" trick :-)
> I will take a closer look at the patch though.
>
> Andrew
--
Øyvind Harboe
http://www.zylin.com