This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: On glibc's resolver
- From: "Carlos O'Donell" <carlos at systemhalted dot org>
- To: Dimitrios Apostolou <jimis at gmx dot net>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 25 Dec 2012 23:17:22 -0500
- Subject: Re: On glibc's resolver
- References: <alpine.LFD.2.02.1212260413530.1646@soupermouf>
On Tue, Dec 25, 2012 at 10:14 PM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> I was trying to write a patch for glibc so hopefully this is the appropriate
> list, please let me know otherwise.
Excellent question, and this is the right list.
> I have been tracing weird behaviour of my mail client (alpine) and ended up
> in getaddrinfo() calls, which are handled by glibc's resolver. In
> particular, when I connect my laptop to different networks and the previous
> DNS server is unreachable, resolver never re-reads its cache and all queries
> timeout after several retries.
What we need is a test case with expected and observed behaviour.
Given a test case we can justify or refute the expected or observed
behaviour against relevant standards or prior art.
> Apparently this is a known issues, and a web search reveals discussions from
> as early as 2003. I'd appreciate your opinions, I was thinking of writing a
> patch but I can't figure out where it should go, alpine or glibc, code or
> documentation! Here are the replies I gathered from a web search:
Could you please provide references to the prior discussions so we can
review them also?
> 1) Use a caching daemon (nscd maybe, some argue that it does not provide a
> solution) which should be restarted/reloaded when changing networks.
>
> 2) Call res_init() if getaddrinfo() fails.
These two solutions are interesting in that the
distribution/application is in control of when and why the resolver
should carry out the costly operation of reloading whatever data is
required to resolve a name.
The distribution can rehup nscd when the network is reconfigured. The
application can call res_init() as required (perhaps as documented by
new documentation).
> 3) Patch glibc to stat() /etc/resolv.conf, checking for changes. Debian,
> Ubuntu are patched.
This sounds like the worst possible solution, imposing a penalty on
all applications for a change that is well defined in a higher level.
> 4) Use a custom DNS library, glibc is unsuitable for this purpose.
Certainly an option. You are allowed to so as you wish with your system.
However, I do not think that glibc is unsuitable for these purposes
and that with some effort we can put together a solution.
> Here is my take. About nscd, I'm having the problem on a major distro
> (Fedora) so I can only guess there are good reasons for not using it by
> default.
The complexity of caching name server requests is not something that
should be enabled by default unless there is a specific need.
> On (2), res_init() is a BSD non-standard function, and its man page doesn't
> mention such a purpose. In fact I can't be sure if it's safe to call it
> multiple times and I see no guarantee that it will re-initialise the
> resolver more than once. If it's the proposed way shouldn't it be mentioned
> in both res_init() and getaddrinfo()'s man pages, or otherwise a big warning
> that resolv.conf is never reparsed?
This seems like a sensible solution e.g. an API call that guarantees
that the resolver can operate correctly after a network configuration
change.
I haven't reviewed the code in question so I don't actually know if
res_init() is safe to be used this way. Part of your work would be to
look into this and propose the documentation patch and provide
sufficient background to justify the changes.
> On (3) I don't have a Debian system to check it, but the overhead of
> stat'ing on every request is probably unacceptable. I was thinking of
> writing a patch that would stat() and reparse after a single request
> timeout, so that following retries (unless RES_DFLRETRY is reached) will
> automatically connect to the new servers. Would that be acceptable?
No.
> Finally using a custom library sounded logical, until I started reading
> glibc's resolver. Really, with such size and complexity and even
> asynchronous interface provided, shouldn't we also provide the simplest
> facilities?
We should.
> And a related question, is there a way to setup resolver behaviour (timeout,
> retries) for a process programmatically, instead of changing the system-wide
> resolv.conf?
There is no interface for this. This is another place where
enhancements would be greatly appreciated.
Please feel free to email libc-help@sourceware.org if you have any
more general questions about how to X or Y.
Cheers,
Carlos.