This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improving libm-test.inc structure and maintenance


On Sun, May 05, 2013 at 08:23:12PM +0000, Joseph S. Myers wrote:
> On Sun, 5 May 2013, Ondrej Bilka wrote:
> 
> > You do not have to review if you do following: 
> 
> Tools may be able to use various heuristics to reduce the number of cases 
> presented for human review.  That human review is still needed to ensure 
> good, valid bug reports.  (Note that Jakub found various bugs in MPFR in 
> his random fma testing.  You need to decide what component the bug is in 
> before reporting it.)
>

Depends on what is found. If it founds only 10 cases in year then
filtering is not necessary. My main concern is that when testing finds
new bug (Which can be needle in haystack of existing bugs) then everybody 
forgotten that it took place and did not read logs. Some notification system
is necessary.

Bugzilla is best place for notification. Second alternative is send mail
which has higher probability of being ignored.

> > You have random tester that is periodicaly rebuilds lm and tests 
> > on results that he knows are inaccurate. Then it tries say 1000000
> > random inputs.
> 
> (1000000 is pretty small for this purpose.)
> 
> > It reports bug for first wrong result and waits with reporting until 
> > first wrong result becomes accurate where it can report bug again. 
> > Repeat. 
> 
> I'm thinking more on the lines of John Regehr's testing of compilers with 
> Csmith.  Reporting one bug doesn't wait on other bugs being fixed if it 
> looks to a human that they are different.  Failures appearing in different 
> functions may have the same underlying cause, while failures in the same 
> function may have different causes - that's something a human can judge.
> 
In libm functions are mostly standalone, same underlying cause can
happen only by pattern which is repeated in code. Then having list of
functions affected is handy.

I do not quite follow how you use testing with Csmith. Generate random
expressions and look how functions behave?

> I think this testing for glibc would be more about finding bugs in 
> existing code than regressions introduced by changes.

Main purpose is find new bugs but it is hard to automaticaly distinguish
that from regression.

> Also, cases that 
> aren't bugs but where the ulps are slightly higher than the biggest known 
> for a function in the testsuite, indicating that a new testcase should be 
> added to make the automatic documentation of maximum errors more accurate.
> 
Possible if but we need to set tolerance where it becomes bug.

> > This will pair automaticaly generated bugs with manual ones and it is
> > marking these duplicate has small overhead. 
> 
> The duplicates shouldn't be filed at all.
> 
> I think automatic bug filing is always a bad idea - an automatic process 
> may produce a list of *candidate* issues, tracked however is convenient, 
> but the human should be in the loop before any such candidate issue 
> becomes an actual bug report in glibc Bugzilla, not just after.
> 
What about adding separate state for example GENERATED that will not
show unless asked.

> > I think first part of having system that automates submission and
> > resolves bug when it is fixed is viable. A simple submission form could
> > be http://kam.mff.cuni.cz/~ondra/inaccuracy_example
> 
> Automatic closing of bugs is also a bad idea; a human needs to judge 
> whether the whole issue is genuinely fixed or whether the commit only 
> fixes particular cases and other parts of the same issue remain to fix.
> 
A test that tests only particular cases is inadequate test. You can not
decide if issue is fixed with tests that are green before and green
after. You also do not reliably know if regression happened. Closing
bug is good way to fix it and make human add additional neccessary data.

> A tool could usefully monitor bugs for signs of being fixed (given that a 
> human has verified the validity of those bugs) - there's no reason to 
> limit that to libm bugs, I expect a lot more bugs could have testcases 
> written in a suitable form for periodic automatic testing - but a human 
> should then judge in each case whether an automatic report that a test has 
> stopped failing does really indicate that the bug is fixed.
>
 
> > We would need properly configured machines to run tests on them to make
> > this viable.
> 
> I think testing on different architectures would be useful in order to 
> cover the range of different implementations for different floating-point 
> formats.
> 
> But this isn't about purely automated testing - it's also about a tunable 
> tool that different people could use in different ways to search for 
> particular classes of bugs of interest to them (e.g. if someone wants to 
> do a more thorough test for issues with subnormal inputs to functions).
> 
> The randomness and tunability is why this isn't well-suited to being a 
> standard part of the glibc testsuite and so an external project seems 
> better (being able to test other libm implementations with it is a 
> side-effect).
> 
> I suspect there are other areas of standard C libraries, not just libm, 
> where some form of random test generation could also be useful.
> 
I plan write something like this but currently do not have that much time.
I added it to my TODO list and probably will look in freeze.

Everybody would be welcome to join. What are options where to host it?

Ondra


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]