This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: [Dri-devel] Re: OpenGL and the LinuxThreads pthread_descr structure
Keith Whitwell wrote:
>
>>
>> __thread doesn't require -fpic. There are 4 different TLS models (on
>> IA-32):
>> -ftls-model=global-dynamic
>> -ftls-model=local-dynamic
>> -ftls-model=initial-exec
>> -ftls-model=local-exec
>>
>> Neither of these require -fpic, though the first 3 use pic
>> register (if not -fpic, they just load it into some arbitrary register).
>> The GD model is for dlopenable libraries referencing __thread variables
>> that can be anywhere (and is most expensive, a function call), LD is for
>> dlopenable libraries referencing __thread variables within that library
>> (again, a function call, but can be one per whole function for all __thread
>> vars mentioned in it), IE is for libraries/programs which cannot be dlopened
>> and can reference __thread variables anywhere in the startup program
>> or its dependencies and LE is for programs only, referencing
>> __thread variables in it. IE involves a memory load from GOT and subtracts
>> that value from %gs:0, LE results in immediate being added to %gs:0.
>
>
>
> It doesn't sound like there's anything in there for us that's a real
> improvement: Both of the 'dlopenable' varients require a function call?
> That's a huge overhead for the application we're talking about.
Yes, in coding up an example to send you all, this issue became clear to
me. We need to define thread-local variables in libGL.so and reference
them from a dlopened driver backed. The important functions that
reference these variables are often tiny (less than 10 instructions), so
a function call here is a killer.
However, here's a critical issue that came to my attention over the
weekend: How do you generate code at runtime to reference __thread
variables? Doing runtime code generation for the immediate mode API
calls in the driver backend is quite common (there's an example of this
in Keith's T&L driver for the Radeon, mentioned earlier in this thread).
It's not clear to me how a library generating code to dereference a
__thread variable can know where that variable is. Am I mistaken?
To give you an idea of how important runtime code generation is to
modern OpenGL drivers, my Viewperf scores are easily three or four times
faster with an online generated API front end (plus the optimizations
that this allows further down the pipe).
-- Gareth