This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Problem with sigaction / SIGALRM / x86_64 arch.


Hi,

I'm not sure if it's the right mailing list but I got a strange
problem with sigaction on my 64bit system.

Sorry if it's a bit long, and sorry if my question is stupid : I might
be an 'old coder', but I'm quite a
noob programming Linux/signals/64bit. And sorry for my poor English,
as it's not my native language :).

Some facts :
----------------

First of all, I've been looking for this problem on Google and just
find 0 matches. So I suppose the
problem comes from something I'm doing wrong.

Then the actual code worked perfectly on my previous 32bit Linux
installation (I use Slackware in both 32 and 64 bit version).

Also, it's very hard to give a simple 'test case', as the code is part
of a cross platform C++ framework, and
'guilty' code can be in several places, moreover I use a "homemade"
build system that generates
makefile from "homemade" projects files.

So I'll try to explain the best I can, moreover I'm pretty sure of
what the problem is, but I don't know
how to get rid of this, and "why" it happens.

The context :
----------------

I used to have a timer, "SIGALRM" based, which worked well on a 32bit
Linux. The idea is I have a 'timer thread' allowing SIGALRM calls. And
the action callback regenerate a sigalrm so I got regular call
allowing me to do some 'clock' stuff. This is, I suppose, quite a
common situation.

Of course SIGALRM is blocked on all threads but the 'timer thread'.

The "bug" :
---------------

The problem is I get a "SIGSEGV" when the alarm callback 'retq'
instruction is executing. This is typically (usually) a stack frame
problem. What's strange is that even with an empty callback it
crashes, so it's not from within the callback code (anyway it worked
on 32bit). What I suspect is the kernel call me in some mode and my
app is in another mode. My ELF is 64bit (readelf says so), my kernel
is 64bit (that's the point of running a 64bit system :) ), so it
shouldn't do this, but as it crashes with a segfault, I still suspect
that some return address is not the same size when pushed by kernel
call and poped by my app. I trace using DDD but it just gives me the
crash position (retq), and nothing where 'bad' in the registers
apparently (at least during the callback execution).

Some specs :
------------------

I developed the framework on Slackware 12.2 and 13, 32bit, for a year
now, without any problem.
I now use Slackware64 current (13.X or 14 virtually).

I use static build and linking, as the purpose of the framework is to
allow developers to deliver binary applications (for commercial game
industry, it's sad, but open source is a 'no way'), and anyway "shared
libraries" is a hell of dependencies, so the objective is to have a
'ready to run' application. This is not a problem as the only ABI
should be kernel calls which should goes through interrupt (if I
understood well), and the X11 being a protocol, it's neither a problem
for that. All 'third party' libraries are compiled statically in a
isolated sub-tree so it doesn't interfere with system libraries.
Anyway I tested a 32 bit binary version on several distributions and
versions (ubuntu 9, Slackware 12.2 / 13, Redhat ??) and it worked
well, so I guess, at least in 32bit, the static link is not really a
problem, I just mention it in case it could influence this crash.

Some clue ?
---------------

I've watched the implementation (I believe) of sigaction for linux
64bit ( /sysdeps/unix/sysv/linux/ia64/sigaction.c ) and found this
comment on top :
/* Linux/ia64 only has rt signals, thus we do not even want to try falling
   back to the old style signals as the default Linux handler does. */
I might be paranoid, but it's the only 'clue' I got of something
'strange could happen', and I can't really understand what's that
means.

Another idea is related to the static link, I'm using LD explicitly,
and I wonder if, 'by default', it links to a 'shared library version'
of glibc, but even with that I don't understand "why" the function
return would crashes...

Some (useless?) code extracts  :
-----------------------------------------

[...]
struct	 sigaction	action,oldAction;
[...]
void	_onAlarmSignal(int	signal,siginfo_t* sigInfo,pvoid pUContext);
void	_registerSignals()	{
	binaryErase(action);
	action.sa_flags		=SA_SIGINFO;
	//~ |SA_RESTART
	//~ |SA_ONSTACK
	action.sa_sigaction	=_onAlarmSignal;
	action.sa_restorer	=NULL;
	sigemptyset(&action.sa_mask);

	PTIMERCALL(sigaction(SIGALRM,&action,&oldAction));
}
[...]
void	_onAlarmSignal(int	signal,siginfo_t* sigInfo,pvoid pUContext)	{
#if	0        //<---------- Yes it's disabled to be sure I don't crash
the stack myself...
	printf("\n[ALRM START] ");
	SLinuxHPTimer::Ref	timer=getHPTimer();
	nsAssert(SIGALRM==signal);
	nsAssert(timer.tid==pthread_self());	//	Are we executed on the timer thread??
	timer._userTick();
	printf(" [ALRM END]\n");
#endif
}

Last words :
--------------
So it's been 3 weeks I'm trying to get rid of this bug by myself,
reading everything I could, from LD/glibc docs to Linux signals, and
64bit concerns, but I can't find anything close to my problem. I
understand this might be even more difficult to solve remotely, but I
hope someone would have encounter this OR knows what could cause this
problem. I'll be able to provide any additional information if
required.

Hope I didn't bother you, and, of course, that someone could give me
some enlightenment ;).

Thanks,

Garry.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]