This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/15630] New: Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus.


http://sourceware.org/bugzilla/show_bug.cgi?id=15630

            Bug ID: 15630
           Summary: Fix use of cpu_set_t with sched_getaffinity when
                    booted on a system with more than 1024 possible cpus.
           Product: glibc
           Version: 2.18
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: carlos at redhat dot com
                CC: drepper.fsp at gmail dot com

The glibc functions sched_getaffinity and sched_setaffinity have slightly
different semantics than the kernel sched_getaffinity and sched_setaffinity
functions.

The result is that if you boot in a system with more than 1024 possible cpus,
and you use a fixed cpu_set_t with sched_getaffinity, the call will never
succeed and will always return EINVAL. The glibc manual page does not document
sched_getaffinity as returning EINVAL. The call to sched_getaffinity should
always succeed.

This is not a hypothetical problem, I am already seeing users with this
problem.

Let us talk more about the API differences and what can be done in glibc to
mitigate the problem.

The most important difference is that if you call either of the kernel routines
with a cpusetsize that is smaller than the kernel's possible cpu mask size the
kernel routines return EINVAL. The kernel previously did accounting based on
the configured maximum rather than possible cpus, leading to problems if you'd
simply compiled with NR_CPUS > 1024 instead of actually booting on a system
where the low-level firmware detected > 1024 possible CPUs.

There are 3 ways to determine the correct size of the possible cpu mask size:

(a) Read it from sysfs /sys/devices/system/cpu/online, which has the actual
number of possibly online cpus.

(b) Interpret /proc/cpuinfo or /proc/stat.

(c) Call the kernel syscall sched_getaffinity with increasingly larger values
for cpusetsize in an attempt to manually determine the cpu mask size.

Methods (a) and (b) are already used by sysconf(_SC_PROCESSORS_ONLN) to
determine the value to return.

Method (c) is used by sched_setaffinity to determine the size of the kernel
mask and then reject any bits which are set outside of the mask and return
EINVAL.

Method (c) is recommended by a patched RHEL man page [1] for sched_getaffinity,
but that patch has not made it upstream to the Linux Kernel man pages project.

The goal is therefore to make using a fixed cpu_set_t work at all times, but
only support the first 1024 cpus. To support more than 1024 cpus you need to
use the dynamically sized macros and method (a) (if you want all the cpus).

In order to make a fixed cpu_set_t size work all the time the following changes
need to be made to glibc:

(1) Enhance sysconf(_SC_PROCESSORS_ONLN) to additionally use method (c) as a
last resort to determine the number of online cpus. In addition sysconf should
cache the value for the lifetime of the process. The code in sysconf should be
the only place we cache the value (currently we also cache it in
sched_setaffinity).

(2) Cleanup sched_setaffinity to call sysconf to determine the number of online
cpus and use that to check if the incoming bitmask is valid. Additionally if
possible we should check for non-zero entries a long at a time instead of a
byte at a time.

(3) Fix sched_getaffinity and have it call sysconf to determine the number of
online cpus and use that to get the kernel cpu mask affinity values, copying
back the minimum of the sizes, either user or kernel, and zeroing the rest.
This call should never fail.

Static applications can't easily be fixed to work around this problem. The only
solution there is to have the kernel stop returning EINVAL and instead do what
glibc does which is to copy only the part of the buffer that the user
requested. However, doing that would break existing glibc's which rely on
EINVAL to compute the mask size. Therefore changing the kernel semantics are
not a good solution (except on a system-by-system basis in the extreme case
where a single static application was being supported).

Step (3) ensures that using a fixed cpu_set_t size works when you are booted on
hardware that has more than 1024 possible cpus.

Unfortunately it breaks the recommended pattern of using sched_getaffinity and
looking for EINVAL to determine the size of the mask, but this was never a
method that glibc documented or supported. The patched man page has the
starting buffer size of 1024, so at least such a pattern would allow access to
the first 1024 cpus. It is strongly recommended that users use sysconf to
determine the number of possible cpus.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=974679

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]