This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Improving libm-test.inc structure and maintenance
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: <libc-alpha at sourceware dot org>
- Date: Sun, 5 May 2013 20:23:12 +0000
- Subject: Re: Improving libm-test.inc structure and maintenance
- References: <Pine dot LNX dot 4 dot 64 dot 1305022244550 dot 12072 at digraph dot polyomino dot org dot uk> <20130505130414 dot GA18328 at domone dot kolej dot mff dot cuni dot cz> <Pine dot LNX dot 4 dot 64 dot 1305051340490 dot 16386 at digraph dot polyomino dot org dot uk> <20130505165832 dot GA30896 at domone>
On Sun, 5 May 2013, Ondrej Bilka wrote:
> You do not have to review if you do following:
Tools may be able to use various heuristics to reduce the number of cases
presented for human review. That human review is still needed to ensure
good, valid bug reports. (Note that Jakub found various bugs in MPFR in
his random fma testing. You need to decide what component the bug is in
before reporting it.)
> You have random tester that is periodicaly rebuilds lm and tests
> on results that he knows are inaccurate. Then it tries say 1000000
> random inputs.
(1000000 is pretty small for this purpose.)
> It reports bug for first wrong result and waits with reporting until
> first wrong result becomes accurate where it can report bug again.
> Repeat.
I'm thinking more on the lines of John Regehr's testing of compilers with
Csmith. Reporting one bug doesn't wait on other bugs being fixed if it
looks to a human that they are different. Failures appearing in different
functions may have the same underlying cause, while failures in the same
function may have different causes - that's something a human can judge.
I think this testing for glibc would be more about finding bugs in
existing code than regressions introduced by changes. Also, cases that
aren't bugs but where the ulps are slightly higher than the biggest known
for a function in the testsuite, indicating that a new testcase should be
added to make the automatic documentation of maximum errors more accurate.
> This will pair automaticaly generated bugs with manual ones and it is
> marking these duplicate has small overhead.
The duplicates shouldn't be filed at all.
I think automatic bug filing is always a bad idea - an automatic process
may produce a list of *candidate* issues, tracked however is convenient,
but the human should be in the loop before any such candidate issue
becomes an actual bug report in glibc Bugzilla, not just after.
> I think first part of having system that automates submission and
> resolves bug when it is fixed is viable. A simple submission form could
> be http://kam.mff.cuni.cz/~ondra/inaccuracy_example
Automatic closing of bugs is also a bad idea; a human needs to judge
whether the whole issue is genuinely fixed or whether the commit only
fixes particular cases and other parts of the same issue remain to fix.
A tool could usefully monitor bugs for signs of being fixed (given that a
human has verified the validity of those bugs) - there's no reason to
limit that to libm bugs, I expect a lot more bugs could have testcases
written in a suitable form for periodic automatic testing - but a human
should then judge in each case whether an automatic report that a test has
stopped failing does really indicate that the bug is fixed.
> We would need properly configured machines to run tests on them to make
> this viable.
I think testing on different architectures would be useful in order to
cover the range of different implementations for different floating-point
formats.
But this isn't about purely automated testing - it's also about a tunable
tool that different people could use in different ways to search for
particular classes of bugs of interest to them (e.g. if someone wants to
do a more thorough test for issues with subnormal inputs to functions).
The randomness and tunability is why this isn't well-suited to being a
standard part of the glibc testsuite and so an external project seems
better (being able to test other libm implementations with it is a
side-effect).
I suspect there are other areas of standard C libraries, not just libm,
where some form of random test generation could also be useful.
--
Joseph S. Myers
joseph@codesourcery.com