This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v4] Make bindresvport() function to multithread-safe


On Thu, Oct 25, 2012 at 1:39 PM, Rich Felker <dalias@aerifal.cx> wrote:
> On Thu, Oct 25, 2012 at 11:46:34AM -0400, Carlos O'Donell wrote:
>> > (2) The runtime cost of this function is bounded by syscalls, so
>> > optimizing userspace code for performance is mostly useless. (Not
>> > entirely since the locking cost could invalidate cache in other
>> > threads, but I still suspect this will be very rare.)
>>
>> Now this is a *great* counter-argument to using __thread variables.
>>
>> However, I still don't buy that this is the right way forward for
>> the long-term.
>
> I'm not convinced as to which approach is better either. I just think
> the topic merits discussion rather than making arbitrary decisions
> based on premature optimization.

I agree that arbitrary decisions are bad move and that we should talk
this through.

>> > Of course for this one function, the cost (a few bytes of TLS) does
>> > not matter. I'm more thinking from a general policy perspective;
>> > adding TLS all over the place will eventually add up. It would be even
>> > worse if this were initialized TLS, since that adds runtime cost too.
>>
>> I agree that adding TLS all over the place will eventually add up,
>> *but* the number of cores on devices is going up and quickly.
>> If we keep adding locks we're bound to get hosed for performance
>> since locks require complex coherency protocols between the cores.
>> If we could elide the locks and simply use local data it's a huge
>> concurrency win.
>
> The bind() in kernelspace is already going to involve host-global
> locks, so concurrency is not possible.

That's something I hadn't considered, and it's a good point.
I see at the very least an rcu_read_lock() to try lookup the fd
from the fs layer. However, it doesn't preclude doing a good job
in userspace. If userspace can operate without taking a lock then
all you need is a kernel upgrade to fix the other half of the problem.

>> I can't condone the use of a lock when a small amount of TLS solves
>> the problem, even if the TLS is repeated for every process in the
>> system. Even then we could switch to some lazy TLS allocation
>> (which we have) for more kinds of TLS variables and it's a win-win.
>
> Lazy TLS allocation means an otherwise-correct program could
> unexpectedly crash on the first usage if memory has been exhausted. I
> consider this an unacceptably bad quality-of-implementation issue.
> It's certainly unacceptable for applications with robust/realtime
> requirements.

That's also a very good point and a problem that the static lock
solution doesn't have.

However, I would expect that such a program would want to run with
LD_BIND_NOW=1, and that would avoid the lazy allocation at the
cost of .tdata size for something you don't use.

> In reality, it seems like glibc's libc.so requires TLS from the moment
> it starts, so success/failure will be resolved at startup, at least
> for the main thread. It's unclear to me however whether new threads
> have TLS reserved for libc.so as soon as they start, or whether it's
> allocated on the first usage (which will probably be almost
> immediately since errno seems to be using TLS rather than a member of
> the pthread structure these days -- am I right?).

When a thread starts the static tls image is reserved as part
of the process of creating the thread and thus pthread_create will
fail with ENOMEM if there isn't enough memory for the stack and
the implementation required space.

The tls variables in libc.so will get allocated at thread startup
and contribute to the thread size. Hence my comment that every thread
will consume +4 bytes in the tls solution.

> I know this is rather tangential to the topic at hand, but I think
> it's an important policy discussion to have since the current
> direction seems to be leading towards making glibc unusable for
> applications with robustness requirements.

I appreciate you taking the time to share your opinion and experience.

>> I'm open to more conversation on this topic since it's a policy
>> issue that will continue to be relevant in the years to come.
>
> Indeed.

To summarize.

The two scenarios are (v2):

(a) Mutex
- Decreased concurrency.
- Every program increase .data by sizeof(mutex) (maybe 40-50 bytes).
  * Technically only if COW breaks the particular page of .data?
  * Almost no startup cost, and no additional relocations.
- Function is now thread safe.

(b) __thread
- Increased concurrency.
  * Not immediately relevant if the kernel still needs to take fs lock
to find fd.
- Every thread increases .tdata by sizeof(short) (maybe 4 bytes with alignment).
  * Possibly reduced by using LD or GD tls access models but at the cost of
    additional startup relocation processing. Not very feasible.
- Function is now thread safe.

My tally:
- Rich likes (a) Mutex.
- Roland like (a) Mutex.
- Carlos likes (b) __thread.
- Peng would like us to choose one.

I'll ask Peng for a comparison of the performance between (a) and (b).

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]