This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Consensus: Tuning runtime behaviour with environment variables.


On Sun, Jun 02, 2013 at 08:02:02PM -0300, Alexandre Oliva wrote:
> > The hot path of __tls_get_addr should be just a couple dereferences
> > and branches which are always predicted correctly.
> 
> For anyone who didn't know better, it would seem like you're arguing
> that Initial Exec is pointless.

Not pointless, but overrated. And it's not obvious to me that your
optimization is closer to initial-exec in performance than it is to
global-dynamic.

> > If it's slower than that in glibc, that's a bug in glibc.
> 
> It's not, but a couple of dereferences and branches is a lot more than
> nothing.  With the TLS access model I proposed and implemented, you save
> all of that, including the cost of going through the PLT for the call.

Indeed, these aspects seem more worthwhile than eliminating the body
of __tls_get_addr.

> What you don't save is the cost of a naked call to a function that just
> returns, but that's still a lot less than that plus dereferences plus
> branches plus PLT plus frame setup plus saving and restoring
> call-clobbered registers at the caller, don't you agree?

For certain values of "a lot".

> Phrasing it another way, which of these two scenarios seem faster to
> you?

The one whose _measured_ performance is better. While measurement of
the access itself (e.g. load from TLS and store to a volatile
variable, surrounded by RDTSC) would be very interesting, what really
matters is whether you can find a real-world example where the
performance is measurably different.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]