This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: strptime() could hangs for hours


Corinna:
     (I apologize that this got so long, but strptime() is a bit messy.)
     I've taken a look at this, and have several thoughts.  The biggest
is
that the entire strptime really needs to be re-written as it comes
nowhere
close to doing all that is called for.  (It does not check ranges, it
does
not support the field widths for %C or %Y, it does not restrict the
lengths read, the E and O modifiers are lacking given that locale
support
has generally been added, etc.)  It does, however, provide full basic
functionality.  That major comment aside, as I don't wish to do major
surgery, here are some more practical thoughts.
     1) POSIX (2008) says "It is unspecified whether multiple calls to
strptime() using the same tm structure will update the current contents
of the structure or overwrite all contents of the structure. Conforming
applications should make a single call to strptime() with a format and
all data needed to completely specify the date and time being
converted."
     Given that applications are required to give a single call with all
data needed, and making an assumption that we don't at this time want to
go so far as to check for and enforce it, I propose that the whole
struct
tm be initialized at the start of the call.  It is permitted.  Junk
should
not pass through.  That is, if a user wants strptime() to do part of the
work and they'll edit a little afterwards, fine--but they must edit the
struct tm afterwards and not before.  (That is, the example given should
not
be able to influence the results.  Especially since automatic variables
are not initialized and can--and do--have random trash in them.)  The
only drawback to this is that it could change behavior
slightly--possibly
breaking applications that are not using it in a proper manner (as
defined by the standard's usage guidelines).
     2)  The first_day() function has another issue other than maybe
just
taking a long time to run.  It assumes that the year is not before 1970,
and gives incorrect results if it is.  POSIX explicitly speaks of %y
being able to specify 1969, so that, at least, should come out right.
Trying to make it completely general is problematic due to 1752, or
1753,
or one of the other many years that different countries went from Julian
to Gregorian.  I propose that it refuse to work before a certain year
to avoid the mess.  The two that come to mind are 1900 (tm_year value of
0), and 1753.  Since England went Gregorian in 1752 and September was
robbed of 11 days, 1753 is the first "safe" year.  England was almost
the
last to adopt, thus this covers most countries.  I chose 1753 in the
edit that I've made.  (Trivia:  I just noticed that this was written
orignally in Sweden.  They made the change 1 year after England, so we
could use 1754 in their honor, I suppose.)
     3)  Yes, present behavior is incorrect, and U, V, and W processing
need to be delayed as you suggest.  But this introduces a potential
complication.  This is because U, W, and V should be mutually exclusive.
The present implementation lets them override each other, with the last
one in the format taking precedence.  I preserved that behavior.  (I
was tempted to error if more than 1 were set, but POSIX doesn't comment
on that.  It is explicit about scanning the format from left to right.)
     4)  Code has some non-POSIX formats:  k, l, V, and Z.  (At least,
not in 2008, and not mentioned in the notes as having been removed.)
k is an alias for H, l is an alias for I, V is similar to U and W, and
Z is ignored.  Personally, I'd be inclined to remove all of them.
Thoughts?
     5)  The %y specifier does not account for POSIX qualifying
the assumed century with the clause "When format contains neither a C
conversion specifier nor a Y conversion specifier".  (This would be
a bit messy to do, so I passed for now.  I did put a "FIXME" note in.)
     6)  strptime() manpage information is missing.
     Attached is a patch that addresses the first 3 items.  It addresses
#2 and #3 by re-writing first_day()--which also addresses the possible
long run-time observation.  (It is valid for the user to give 0x6FFFFFFF
as a year (via %Y), actually, even if it makes no sense.  But at least
now it will take the same amount of time as a sane year.)  #1 was
addressed
by zeroing the whole struct tm before any other processing, except for
tm_isdst, which is set to -1 (indicating that DST status is unknown).
     If there is general agreement about deleting the non-standard
formats
mentioned in #4, I can either amend the patch or make another one.
     I have tested the new first_day() function but not the other
changes.
				Craig



-----Original Message-----
From: newlib-owner@sourceware.org [mailto:newlib-owner@sourceware.org]
On Behalf Of Corinna Vinschen
Sent: Monday, February 14, 2011 8:06 AM
To: newlib@sourceware.org
Subject: Re: strptime() could hangs for hours

On Feb 14 11:41, Aleksandr Platonov wrote:
> Hi,
> strptime() function could hangs if format string contains characters
> which set date by week number (U,W,V).
> Example:
> struct tm tm;
> tm.tm_year = 0x6FFFFFFF;
> strptime("52", "%U", &tm);
> 
> This happens because first_day() function call hangs for hours if big
> year value is passed to it.
> We could not assume that tm_year field of tm structure has correct
value
> at processing U,V,W characters in strptime(), so first_day() function
> parameter could have any value.
> In BSD libc characters U,V and W are ignored (only range check is
> performed).
> Is this behaviour of strptime() in newlib correct?

No, it's not.  Probably the right thing to do would be to store the
information that U,V,W is present together with the given week number,
and to calculate the information when the string has been completely
scanned and we only have a valid year, but no day or month.  Or,
as on BSD, ignore the input and just check validity.

Does anybody want to provide a patch?


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat

Attachment: ChangeLog.txt
Description: ChangeLog.txt

Attachment: strptime.patch.txt
Description: strptime.patch.txt


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]