This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Improving libm-test.inc structure and maintenance

From: "Joseph S. Myers" <joseph at codesourcery dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: <libc-alpha at sourceware dot org>
Date: Sun, 5 May 2013 20:23:12 +0000
Subject: Re: Improving libm-test.inc structure and maintenance
References: <Pine dot LNX dot 4 dot 64 dot 1305022244550 dot 12072 at digraph dot polyomino dot org dot uk> <20130505130414 dot GA18328 at domone dot kolej dot mff dot cuni dot cz> <Pine dot LNX dot 4 dot 64 dot 1305051340490 dot 16386 at digraph dot polyomino dot org dot uk> <20130505165832 dot GA30896 at domone>

On Sun, 5 May 2013, Ondrej Bilka wrote:

> You do not have to review if you do following: 

Tools may be able to use various heuristics to reduce the number of cases 
presented for human review.  That human review is still needed to ensure 
good, valid bug reports.  (Note that Jakub found various bugs in MPFR in 
his random fma testing.  You need to decide what component the bug is in 
before reporting it.)

> You have random tester that is periodicaly rebuilds lm and tests 
> on results that he knows are inaccurate. Then it tries say 1000000
> random inputs.

(1000000 is pretty small for this purpose.)

> It reports bug for first wrong result and waits with reporting until 
> first wrong result becomes accurate where it can report bug again. 
> Repeat. 

I'm thinking more on the lines of John Regehr's testing of compilers with 
Csmith.  Reporting one bug doesn't wait on other bugs being fixed if it 
looks to a human that they are different.  Failures appearing in different 
functions may have the same underlying cause, while failures in the same 
function may have different causes - that's something a human can judge.

I think this testing for glibc would be more about finding bugs in 
existing code than regressions introduced by changes.  Also, cases that 
aren't bugs but where the ulps are slightly higher than the biggest known 
for a function in the testsuite, indicating that a new testcase should be 
added to make the automatic documentation of maximum errors more accurate.

> This will pair automaticaly generated bugs with manual ones and it is
> marking these duplicate has small overhead. 

The duplicates shouldn't be filed at all.

I think automatic bug filing is always a bad idea - an automatic process 
may produce a list of *candidate* issues, tracked however is convenient, 
but the human should be in the loop before any such candidate issue 
becomes an actual bug report in glibc Bugzilla, not just after.

> I think first part of having system that automates submission and
> resolves bug when it is fixed is viable. A simple submission form could
> be http://kam.mff.cuni.cz/~ondra/inaccuracy_example

Automatic closing of bugs is also a bad idea; a human needs to judge 
whether the whole issue is genuinely fixed or whether the commit only 
fixes particular cases and other parts of the same issue remain to fix.

A tool could usefully monitor bugs for signs of being fixed (given that a 
human has verified the validity of those bugs) - there's no reason to 
limit that to libm bugs, I expect a lot more bugs could have testcases 
written in a suitable form for periodic automatic testing - but a human 
should then judge in each case whether an automatic report that a test has 
stopped failing does really indicate that the bug is fixed.

> We would need properly configured machines to run tests on them to make
> this viable.

I think testing on different architectures would be useful in order to 
cover the range of different implementations for different floating-point 
formats.

But this isn't about purely automated testing - it's also about a tunable 
tool that different people could use in different ways to search for 
particular classes of bugs of interest to them (e.g. if someone wants to 
do a more thorough test for issues with subnormal inputs to functions).

The randomness and tunability is why this isn't well-suited to being a 
standard part of the glibc testsuite and so an external project seems 
better (being able to test other libm implementations with it is a 
side-effect).

I suspect there are other areas of standard C libraries, not just libm, 
where some form of random test generation could also be useful.

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: Improving libm-test.inc structure and maintenance
  - From: OndÅej BÃlka

References:
- Improving libm-test.inc structure and maintenance
  - From: Joseph S. Myers
- Re: Improving libm-test.inc structure and maintenance
  - From: OndÅej BÃlka
- Re: Improving libm-test.inc structure and maintenance
  - From: Joseph S. Myers
- Re: Improving libm-test.inc structure and maintenance
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]