This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/4294] rwlock hangs under stress load


------- Additional Comments From twong at gear6 dot com  2008-06-17 00:18 -------
In fact, we hit the same problem just in the past few weeks.
We are running 2.6.12 kernel on ADM processors.

After heavy use of the rwlock, we found that the number of readers
count can go to negative - sometimes -1, -2, -3. etc.  when it fact
there were no readers.   The system would hang because the writers
can never get the rwlock.

We have added some code to keep seperate readers count (protected
by different mutex) and we stop when we detect -1 in the rwlock.

In one case, our independent log shows that there are 2 simultaneous
reads so the number of readers should be 2 after the rwlock is acquired.
However it remains at 1.  It seems that one of the incl instruction is
missing.

This is reproduced by running our reflexOS - a NFS cache application
with many many network connections, which uses the rwlock.  We can reproduced
this problem in 30 minutes to 1 hour with out code.

We have written smaller standalone programs to reproduce the problem
but we could not.   We believe it has a lot to do with the workload and
memory usage.  Unfortunately no root cause is reported here but
we will switch to mutex as suggested.

We suspect it has something to do with the memory barrier features
in AMD.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]