This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 01/17/2014 03:55 AM, Corinna Vinschen wrote: > > This subtil difference has interesting side-effects, as soon as > processes have forked or spawned child processes which inherited a > stream read by the parent. Consider these two scenarios as examples: > > Input file foo.txt: > > aaaaaaa > bbbbbbb > ccccccc > > Example 1 (pseudo code): > > fp = fopen ("foo.txt", "r"); > while (fgets (buf, fp)) { > fprintf (stderr, "%s %ld\n", buf, lseek(fileno(fp), 0, SEEK_CUR)); > switch (fork ()) { > case CHILD: > exit (0); > case PARENT: > wait (); > break; > } > } I still stand by my claim that this program is triggering undefined semantics in POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05). Remember, this is undefined behavior: putc('a'); fork(); exit(); because the exit() triggers a flush() in both parent and child, causing the same data to be flushed twice. But the undefined behavior is easily avoided: putc('a'); fflush(NULL); fork(); exit(); because it forces the stream position to be reflected into the underlying file position prior to either parent or child trying to call exit. Your sample program has the same bug, except that it is an input stream where you are failing to fflush(NULL) in order to sync state. Let's revisit the POSIX requirements: ==== > Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec functions or _exit() (not exit()), the handle is never accessed in that process.) Let's label our two handles: Handle 1 is the parent's handle, as well as the lone handle that existed pre-fork. Handle 2 is the child's handle. > > For the first handle, the first applicable condition below applies. After the actions required below are taken, if the handle is still open, the application can close it. > > If it is a file descriptor, no action is required. But the handle is a stream, not a file descriptor, so this is not met. > > If the only further action to be performed on any handle to this open file descriptor is to close it, no action need be taken. The code isn't calling close(fileno(fp)), so this is not met. > > If it is a stream which is unbuffered, no action need be taken. > fp is buffered, so this is not met. > If it is a stream which is line buffered, and the last byte written to the stream was a <newline> (that is, as if a: > > putc('\n') > > was the most recent operation on that stream), no action need be taken. fp is not line buffered, so this is not met. > > If it is a stream which is open for writing or appending (but not also open for reading), the application shall either perform an fflush(), or the stream shall be closed. fp is not open for writing, so this is not met. > > If the stream is open for reading and it is at the end of the file (feof() is true), no action need be taken. fp is not at EOF, so this is not met. > > If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed. The application did not call fflush(fp) (or fflush(NULL)), and the stream was not closed prior to fork, so this is not met. > > For the second handle: > > If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location. The previous active handle was used to change offset (by fgets), but none of the requirements on the first handle were met, and we fail to fseek() on the second handle. Therefore, the fact that exit() calls fflush() and changes the offset of the fd, leading to an infloop in the parent, is a result of the bug in the program violating the POSIX constraints on active handle manipulation. > > If the active handle ceases to be accessible before the requirements on the first handle, above, have been met, the state of the open file description becomes undefined. This might occur during functions such as a fork() or _exit(). ==== That said, it's a nasty corner case, and there is enough code out there relies on BSD behavior on their undefined semantics, that it is nicer to accommodate that code if strict compliance to POSIX isn't required. > Example 2: > > sh$ { sed -n 1q; sed -n 1q; cat; } < foo.txt > > Output under BSD semantics: > > <nothing> This is a bug in BSD semantics. It is possible to work around the bug (in fact, many programs that use gnulib DO work around it, just fine, by explicitly calling fflush() on any seekable read stream before exiting - GNU sed hasn't yet been patched, but coreutils has). Since it can be worked around, and since it makes cygwin behave like other (buggy) implementations, it's less surprising, so I'm fine with enabling the switch for cygwin, even though... > > Output under POSIX semantics: > > ccccccc For this example, this behavior is indeed well-defined, and this is the behavior that I would prefer if it weren't for the fact that so many buggy programs out there fail to fflush(NULL) before calling fork(). It _might_ be possible to fix newlib to implicitly call fflush(NULL) before fork() on behalf of all users, but I don't know what other side effects that might expose. > > So my patch now introduces the _STDIO_BSD_SEMANTICS flag. If you define > it for your target (for instance, in sys/config.h), you get BSD > semantics, while, as long as _STDIO_BSD_SEMANTICS is undefined, your > target sticks to POSIX semantics. Thanks for tackling this. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |