This is the mail archive of the libc-alpha@cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: linux syslog problems (now linux-2.2.1+glibc-2.0.112)


[me]
>   That will have to do for now.  The DNS/poll() problem is no longer
> easily reproduced, but the syslog() problem is still around in linux-2.2.1
> & glibc-2.0.112.  I'm chasing the DHCP daemon since it doesn't fork
> as much and croaks a lot faster than sendmail does (presumably due to
> volume of syslog output).
> 
>   Right now it is being chased in the kernel arena.

  ... Which is to say, tracking system calls using strace.

  Right now, I'm pretty sure that this is going to turn out to be a
problem with, of all things, the syslog daemon!

  I don't know how it runs on other operating systems, but linux seems
to prefer using a SOCK_STREAM (vs SOCK_DGRAM) PF_UNIX socket to talk
to the syslog daemon (via /dev/log).  All the evidence is pointing to a
particular version of a syslog package (sysklogd-1.3-27 is what I had,
but it is an evolving series and I don't know how many made it out into
the wild) that would "forget" some of the sockets on 15-minute intervals
when it was sending a "mark" message to the syslog file.

  I don't know if that is the standard.  glibc tries both (favoring
SOCK_DGRAM first), but the syslog daemon only tries SOCK_STREAM for a
AF_UNIX socket.  I've seen it for a very long time.  I think I remember
noticing it and HP-UX 9.05 using different types way back when.

  Damning symptoms were the clients locked up in syslog(), but still
trying to send output (which didn't show up in the syslog file after
the point where the socket was forgotten) up until the SOCK_STREAM
buffer/backlog was filled, at which point it just blocked.  At that point
it was a pain to debug because no amount of tweaking with the daemon
will help.  Restarting (kill and restart, not just a HUP) with the daemon
before the buffer is filled up & blocked will generate a SIGPIPE in the
client, which will cause the socket to get reset and hide the problem
for a while.

  I've yet to see a client get paranoid about a syslog() not returning
promptly, probably because of the unreliable SOCK_DGRAM background.
That explains the DHCP and sendmail daemons getting wedged and might
explain the sendmail/resolver problems (named uses syslog too and,
in my case, 127.1 was my first listed resolver).


  Other than mentioning symptoms, identifying the daemon isn't that
much fun.  The three version I have (-20, -27 & -31) all identify
themselves as "syslogd 1.3-3".  The version I'm running that seems
to work just fine is sysklogd-1.3-31.  The last version that I used
(before -27) was -20, and it also seems to be very stable.  I think
this problem managed to escape my attention for so long because you
need enough syslog-volume (probably 100K or so bytes of syslog output)
on just the right socket 15 minutes or so after the daemon starts.
My less-productional machines just didn't produce the right kind of
output to choke themselves up.


  In any case, my daemons would normally be dead by now so I'm going
to give them a full day before being calling it fixed.
								--- john


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]