This is the mail archive of the libc-alpha@cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: linux syslog problems (now linux-2.2.1+glibc-2.0.111)


[Andreas Jaeger]
>  Could this be a bug in syslog?  Please try
>  updating syslog first.  The home directory is:
>  ftp://ftp.infodrom.north.de/pub/people/joey/sysklogd/

  [I'll CC the kernel folks and see if that rings any extra bells.]

  I don't think so, but I've been trying to hold syslog a constant.
Too many damn variables, but I did give it a thought.

  Right now, I'm running sysklogd-1.3-27 and I've grabbed -1.3-31
when I want to upgrade.  While everything else is hung, syslogd is still
happily printing out it's `-- MARK --' messages and I have a cron-job
that prints out a similar message every 10 minutes.  I don't seem to be
missing any of those, but syslogd sending messages to itself probably
doesn't count and the cron-job is single shot.  Killing (-HUP or -9 and
restart) the syslog daemon doesn't unstick any of the hung daemons.

  To debug dhcpd, which doesn't spawn quite so many children, I typically
will find it stuck in this state:

	#0  0x400c6672 in __libc_send ()
	#1  0x400c3131 in vsyslog ()
	#2  0x400c2e51 in syslog ()
	#3  0x8059309 in note (fmt=0x805a780 "DHCPREQUEST for %s from %s via %s")
	    at errwarn.c:133
	#4  0x804a060 in dhcprequest (packet=0xbfffe018) at dhcp.c:197
	#5  0x8049c4c in dhcp (packet=0xbfffe018) at dhcp.c:66
	#6  0x80577b0 in do_packet (interface=0x8194268, packet=0xbfffe88c, len=300, 
	    from_port=17408, from={len = 4, 
	      iabuf = "\204ñd\001", '\000' <repeats 11 times>}, hfrom=0xbffff8a4)
	    at options.c:605
	#7  0x805273f in got_one (l=0x81945c0) at dispatch.c:596
	#8  0x8052626 in dispatch () at dispatch.c:560
	#9  0x8049a86 in main (argc=1, argv=0xbffffb14, envp=0xbffffb1c) at dhcpd.c:278
	#10 0x400308af in __libc_start_main ()

  Now, doing a strace on that dhcp process gives me some interesting
results.  For one, it looks like it is trying to send() syslog output
and the syslog daemon isn't receiving it.  Other daemons aren't having
any problem at the same instant (named just dumped a bunch of *STATS
into the same file that dhcpd output goes).

  In one test, the last message syslog saw was at 14:17:38 and the
last message that syslog hung on was 14:38:54.  My first one captured
in the strace log was 14:19:12, but there were approximately 627 messages
that it tried to log that were lost.  A little breakdown of their
sizes were:

    [grep 'dhcpd:' | sed 's/^.* = //' | sort | uniq -c | sort -rn]

	    139	98
	    134	80
	    110	97
	     82	79
	     28	90
	     27	92
	     24	100
	     17	99
	     17	81
	     12	93
	     12	91
	     10	94
	      9	96
	      2	95
	      1	89
	      1	84
	      1	83
	      1	82

  Just the first three entries alone (139*98+134*80+110*97) is around
34K of missing messages.  If I'm filling up some buffer, it is big.

  Looking at it with the debugger may be interfering with its normal
operation or it normally does it but I can't see it.  In any case,
lots of this going happily on, even if syslog doesn't see it:

	SYS_174(0xd, 0xbfffcb28, 0xbfffca9c, 0x8, 0xd) = 0
	send(0, "<30>Feb  1 14:28:35 dhcpd: Ignor"..., 97, 0) = 97

  Someone suggested that it might be "stuck" in a system call and
causing an EINTR would be enough to capture meaningful output with strace.
Signals seem to be mostly fatal, but SIGCONT didn't jog it free.

  I let it run until it halted and it got stuck where I normally see it:

	select(9, [7 8], [], [], NULL)          = 1 (in [8])
	time([917908734])                       = 917908734
	recvfrom(8, "\1\1\6\0\237\6\0\0\230:\257\1\0\0"..., 4096, 0, {sin_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("132.241.175.1")}, [16]) = 300
	time([917908734])                       = 917908734
	SYS_174(0xd, 0xbfffcb28, 0xbfffca9c, 0x8, 0xd) = 0
	send(0, "<30>Feb  1 14:38:54 dhcpd: BOOTR"..., 80, 0) = 80
	SYS_174(0xd, 0xbfffcb2c, 0, 0x8, 0xd)   = 0
	time([917908734])                       = 917908734
	SYS_174(0xd, 0xbfffcb28, 0xbfffca9c, 0x8, 0xd) = 0
	send(0, "<30>Feb  1 14:38:54 dhcpd: Ignor"..., 98, 0
								--- john


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]