This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Consensus: Tuning runtime behaviour with environment variables.


On Sun, Jun 02, 2013 at 12:40:07AM -0300, Alexandre Oliva wrote:
> > This does not make access more efficient.
> 
> It does when using the optimized TLS relocations I introduced.  If
> there's room in the static TLS segment, the dynamic loader resolves TLS
> references to code equivalent to initial exec; if there isn't, it has to
> fallback to the much slower dynamic access modes, even though there are
> optimized fast paths there as well, compared with the TLS ABI used by
> default on x86 and x86_64.
> 
> Note that these optimizations (still) aren't the default TLS mode on
> these architectures, even though it was adopted by a few other
> architectures as the TLS ABI.  On x86 and x86_64 (and ARM IIRC) you have
> to compile with -mtls-dialect=gnu2 (and -fPIC) to get these optimizable
> relocations.
> 
> Please read http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt

Do you have any performance figures to justify this? It strikes me as
a hideous hack for tiny, possibly imaginary gains. The only real place
I see a possible benefit is also the ugliest part: the alternate
calling convention that allows the caller to avoid spilling registers.
In particular, __tls_get_addr should otherwise be nearly as fast as
the do-nothing function, at least the variant that adds the TP. If
not, this is an issue in glibc's __tls_get_addr that should be fixed.
In musl, we have:

void *__tls_get_addr(size_t *v)
{
	pthread_t self = __pthread_self();
	if (self->dtv && v[0]<=(size_t)self->dtv[0] && self->dtv[v[0]])
		return (char *)self->dtv[v[0]]+v[1];
[...]

I just noticed that allowing self->dtv to be NULL is a design flaw
with respect to performance, but even with that check, all the
branches are predictable and the hot path is just a few memory
accesses. If glibc's is considerably slower than this, I think fixing
that issue would make more sense then doing fancy hacks that only
apply to libraries whose TLS gets allocated in the static TLS
segment...

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]