This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 07/10] Add __pthread_set_abort_hook export


On Thu, 2013-01-24 at 17:35 +0100, Andi Kleen wrote:
> > (1) This is not necessary for existing and correctly synchronized code
> > because assertions will also fail in nontransactional executions (and
> > the failure will be reported as expected).
> 
> It's useful for existing code too because a transaction may see a different
> state (due to races with other threads) than the re-execution.
> 
> Of course if the code is correct synchronized it shouldn't have any such
> races, but not all code is.

There's a reason why I wrote "existing and correctly synchronized code".

> > (2) For explicitly transactional code (ie, code in which some programmer
> > explicitly used TSX), you want a facility to communicate some
> > information out of transactions without having to finish execution of
> > these transactions.
> 
> I want it for all transactional code, both implicit and explicit.

In that case, wouldn't it be better to have something that is robust
enough so that we can put it into assert()?  Otherwise, if you have
existing implicitly transactional code, you couldn't use the assertions
it might contain, and would have to rewrite them to use TXN_ASSERT
instead.

> 
> > For (2), if the explicitly transactional code is correct and it's just
> > performance issues we want to debug, we could do without terminating the
> > transaction (unless we have to write so much data out that we're hitting
> 
> Transactional performance issues usually lead to aborts. When an abort
> happened all the information (except what we can get from the profiler)
> is lost.

If it's not always an abort, that's could be fine because then this is
like sampling.  That might miss some cases, but in contrast to
assertions the program can actually continue.

> > HTM capacity limits, etc.).  That is, I'm wondering whether assertions
> > are the right tool for this.
> 
> I did a lot of transactional debugging and I think they are quite
> useful for hard problems. For easy to medium problems the information from the 
> profiler is usually enough.

I agree, and I wouldn't argue that printf is _never_ useful for
debugging.  Nonetheless, I don't see what's wrong with looking for
something better.

> > 
> > For (3) and also (2) if it's not just a performance problem, we need to
> > terminate the current transaction to be able to get information out of
> > it when we can't continue to execute it.  With TSX, we can either use
> > the 8 bits that we can communicate via abort, or we could commit the
> > transaction early, and then abort.
> 
> Hmm, you mean _xend(); assert()? That doesn't work for nested locking.
> 
> Ok one could do while (_xtest()) _xend(); assert(). 
> 
> HLE is not really supported by the abort hooks, requires RTM.
> 
> > Early commit would work if just RTM is used (ie, while (_xtest())
> > _xcommit(); ).  But I guess it would fail if xacquire/xrelease is mixed
> > in, or does TSX not complain about replacing xrelease with an RTM
> > commit?
> 
> RTM inside HLE aborts.

And as you said, HLE inside RTM txns aborts too, which means that
whenever we could get something out with an abort, we could also commit
the txn early (ie, with the simple loop I suggested).  Or not?

> > 
> > If TSX complains, we get a fault, IIRC, so when this fault happened
> > within the code with the loop above, we'd still know that some assertion
> > fired.  If we inline this code, or add other hints regarding what called
> > it, I guess we could find out which assertion triggered the fault by
> > looking at the code around where the fault happened?  Thoughts?
> 
> Inline the only way to know the code is to use XABORT and encode 
> it in the abort code.

Do you mean to that the fault will not reveal the real address but just
the xbegin instruction's addr?  Forgot about this one...

> That is what TXN_ASSERT() does essentially,
> just in a more user friendly way.
> 
> For many cases the profiler works too, but not for all, that is why
> we added the abort hook mechanism.
> 
> > only if the outermost transaction was not started with xacquire.  But
> > with TSX we just have <255 values that we can get out (ie, without the
> > values reserved for hold locks etc.).  And when we abort, we jump to
> > whatever started the outermost transaction, which could be code in
> > applications (programmers using transactions explicitly), glibc (e.g.,
> > lock elision), libstdc++ (if it doesn't use glibc locks), boost
> > (likewise), libitm (__transaction { }), and so on.  So to make this work
> > in general, all those components would have to support the special
> > assertions.
> 
> They all would need to call the abort hook correct.

Which tells me that it would be much more practical to commit early.

> However a common case is just using the pthread locking, with that it
> just works.

Not sure what "common" means here given that you can't buy the hardware
yet...

> > 
> > To actually support the assertions, abort codes need to be interpreted
> > consistently, and all assertions in a process need to be encoded using
> > <255 values.
> > 
> > Who is supposed to be the consumer of the abort codes? (I've asked this
> > previously, but you haven't answered.)  Is this code in the program, or
> > something else?  This matter because it's the other end of the
> > assertion, obviously.
> 
> For TXN_ASSERT() it's just the assert facility inside the program.
> 
> For some common cases that are in standard lock library we have a few
> reserved codes that can be observed in the profiler.
> 
> 0xff = lock busy
> 0xfe = lock is locked (not in pthread)
> 0xfd = nested trylock (just added)
> 
> The profiler doesn't need the abort hook of course.
> 
> > What do you think?  Are there any other alternatives?
> 
> while (_xtest()) _xend(): assert() may work.

So let's try this instead.

It would be nice if we could get this into assert_fail (if I remember
the name correctly), but I'm not very optimistic that we can do this in
a robust way given all this relies on (e.g, no RTM inside HLE etc.)


Torvald


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]