This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Consensus: Tuning runtime behaviour with environment variables.


On Mon, Jun 03, 2013 at 07:41:01PM -0300, Alexandre Oliva wrote:
> On Jun  2, 2013, Rich Felker <dalias@aerifal.cx> wrote:
> 
> > On Sun, Jun 02, 2013 at 08:02:02PM -0300, Alexandre Oliva wrote:
> >> > The hot path of __tls_get_addr should be just a couple dereferences
> >> > and branches which are always predicted correctly.
> >> 
> >> For anyone who didn't know better, it would seem like you're arguing
> >> that Initial Exec is pointless.
> 
> > Not pointless, but overrated. And it's not obvious to me that your
> > optimization is closer to initial-exec in performance than it is to
> > global-dynamic.
> 
> So, you find two branches and two loads (besides the call) ânot too
> muchâ, but a call and a return âtoo muchâ?, or do you just enjoy to
> pointless discussions?  :-)

I just don't like tradeoffs made without justification. Your modified
TLS requires a great deal of arch-specific code. It also adds, within
each arch, a good deal more ways TLS can be done. Both of these are
major costs in terms of the combinatorics of testing and maintenance.
So the question isn't whether there's a case where there's some
nonzero benefit, but whether the benefits are worth the cost.

For glibc, it's probably 0.1% more arch-specific code because there's
already so much, so the relative cost is small. If we were to add the
feature to musl, it would probably be at least a 3-5% increase in the
amount of arch-specific code, since we have so little, and
significantly more if we actually made the arch-specific code optimal
rather than just wrapping __tls_get_addr with the equivalent of
pusha/popa.

If you follow uclibc development at all, most of the problems reported
on the mailing list these days are results of the hideous
combinatorics of archs, pthread and TLS options, and other build-time
configurables having gotten out of control to the point that nobody
keeps track of what works. I'm quite happy not to have that situation.
:-)

With all that said, after reading your paper and the performance
results, I do find them somewhat compelling. What would make them even
more compelling is if they're impressive enough (and publicized
enough) to get people to fix their broken thread-unsafe library code
to use TLS (not the preferable solution, but much easier than other
approaches, especially if the API is otherwise thread-unsafe like
OpenGL by virtue of lacking proper context arguments).

> > While measurement of the access itself (e.g. load from TLS and store
> > to a volatile variable, surrounded by RDTSC) would be very
> > interesting,
> 
> Glad you liked the paper ;-)

I liked it once I figured out why every PDF viewer was telling me the
PDF file was corrupt/invalid. :-)

> I theorize that an important reason why TLS is not used in
> performance-critical plugins is that its performance is so unbearable.

I think it's more the *belief* that performance is bad, rather than
measurements of bad performance, keeping people from using it. But
it's hard to collect data supporting one view or the other.

> One of the goals of that work was to do away with that reason to avoid
> TLS in plugins.  The other is that, well, computing stuff you don't have
> to just because Rich Felker thinks you should is not a mandate I live by
> ;-)

*nod* I'm completely on-board with this motivation.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]