This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

pthread_cond_timedwait/posix timers


There are some problems with the POSIX timer and condition wait
implementation in glibc.  I am very interested in fixing them, but think
that it would be best to have a discussion over the list first.

Problems

1.  The pthread_cond_timedwait() function, through the call sequence
    pthread_cond_timedwait_relative() -> timedsuspend() ->
    __pthread_timedsuspend_new() -> __libc_nanosleep(), effects an absolute
    timeout with a relative timeout, which has the following two results:

1.1 There is a race condition in  __pthread_timedsuspend_new(), where if the
    thread is preempted between its __gettimeofday() and __libc_nanosleep()
    calls, pthread_cond_timedwait() will return after the desired time.

1.2 The SuSv3 implies in [1] and [2] that pthread_cond_timedwait() should
    respect system clock discontinuities.  As I understand its phrasing, if
    CLOCK_REALTIME is rolled back, pthread_cond_timedwait() should notice
    and not expire until the absolute time in its argument is registered,
    and if CLOCK_REALTIME is advanced, the function should notice and expire
    immediately if necessary.  Because of the above-mentioned relative time
    calculation, if the system clock is altered discontinuously
    pthread_cond_timedwait() will return at a different absolute time than
    desired (see test case [3]).

2.  As mentioned in a previous post to libc-alpha[4], glibc's timers are
    broken by system clock discontinuities.  Because timers are implemented
    with a pthread_cond_timedwait(), I assumed that the latter function was
    the problem.  After discovering problem #1 explained in the above, I
    went back and found that the timer problem is actually
    (line 408, linuxthreads/sysdeps/pthread/timer_routines.c) a comparison
    of a clock_gettime() call with the timer_node's expirytime value, which
    has the following result:

2.1 If an application program sets up a interval timer of 500ms, and the
    system clock is rolled back by 1 minute between timer signal deliveries,
    the next signal will not be delivered in 500ms, but rather 1.5s.  This
    is just an example, there are several different race conditions which
    can occur.

So in short there are two inverse problems: pthread_cond_timedwait() is
supposed to respect the system clock, and thread_func() is not.

Possible Solution

1.  Make pthread_cond_timedwait() not just a wrapper but rather its own
    function that does respect system clock changes.

2.  Modify pthread_cond_timedwait_relative() to take an actual relative
    timeout and an additional argument similar to the second argument of
    nanosleep().

3.  Leave thread_func() mostly as is, have it pass the new
    pthread_cond_timedwait_relative() function the timer_node.it_interval
    value, add a needs_expiry flag to timer_node which is set on a return of
    ETIMEDOUT from the latter and reset upon timer signal delivery, change
    the check on line 408 to that of the needs_expiry flag.

Please comment.  Thanks.

Amos Waterland

----

[1] http://www.opengroup.org/onlinepubs/007904975/functions/clock_settime.html

  If the value of the CLOCK_REALTIME clock is set via clock_settime(),
  the new value of the clock shall be used to determine the time of
  expiration for absolute time services based upon the CLOCK_REALTIME
  clock. This applies to the time at which armed absolute timers expire.
  If the absolute time requested at the invocation of such a time
  service is before the new value of the clock, the time service shall
  expire immediately as if the clock had reached the requested time
  normally.

[2] http://www.opengroup.org/onlinepubs/007904975/functions/pthread_cond_timedwait.html

  For cases when the system clock is advanced discontinuously by an
  operator, it is expected that implementations process any timed wait
  expiring at an intervening time as if that time had actually occurred.

  ... there is a race condition associated with specifying an absolute
  timeout on top of a function that specifies relative timeouts.

[3]

/* Test for pthread_cond_timedwait not respecting absolute time.
 * Amos Waterland <apw@us.ibm.com>
 * 07 Aug 2002
 */

#include <errno.h>
#include <error.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <sys/wait.h>

int
main (int argc, char *argv[])
{
  int result = 0;

  {
    int r;
    struct timespec now, then, timeout;
    pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
    pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

    if (pthread_mutex_lock (&mutex))
      error (1, errno, "mutex lock failed");

    if (clock_gettime (CLOCK_REALTIME, &then))
      error (1, errno, "clock_gettime failed");

    timeout.tv_sec = then.tv_sec + 2;  /* Timeout in 2s.  */
    timeout.tv_nsec = then.tv_nsec;

    switch (fork()) {
    case -1:
      error (1, errno, "fork failed");
    case 0:
    {
      printf ("[child] started ... sleeping for 1s\n");
      sleep (1);

      if (clock_gettime (CLOCK_REALTIME, &now))
        error (1, errno, "clock_gettime failed");

      then.tv_sec = now.tv_sec - 10;  /* Set time backwards by 10s.  */
      then.tv_nsec = now.tv_nsec;

      if (clock_settime (CLOCK_REALTIME, &then))
        error (1, 1, "clock_settime failed");

      printf ("[child] returning ... system clock is now rewound by 10s\n");
      exit (0);
    }
    default:
    }

    printf ("[parent] starting timed wait with 2s timeout ...\n");
    r = pthread_cond_timedwait (&cond, &mutex, &timeout);
    printf ("[parent] leaving timed wait ...\n");

    if (clock_gettime (CLOCK_REALTIME, &now))
      error (1, errno, "clock_gettime failed");

    if (now.tv_sec < timeout.tv_sec)
      {
        ++result;
        printf ("\npthread_cond_timedwait is not respecting absolute time\n");
        printf ("pthread_cond_timedwait should've timed out at: {%li %li}\n",
                timeout.tv_sec, timeout.tv_nsec);
        printf ("pthread_cond_timedwait actually timed out at:  {%li %li}\n",
                now.tv_sec, now.tv_nsec);
      }

    if (clock_gettime (CLOCK_REALTIME, &now))
      error (1, errno, "clock_gettime failed");

    then.tv_sec = now.tv_sec + 10;  /* Restore 10s to clock.  */
    then.tv_nsec = now.tv_nsec;

    if (clock_settime (CLOCK_REALTIME, &then))
      error (1, 1, "clock_settime failed");
  }

  exit (result);
}

[4] http://sources.redhat.com/ml/libc-alpha/2002-07/msg00207.html


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]