This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

NIS performance patches


Thorston and glibc developers,

I am a systems administrator at Texas Instruments. We have been deploying Linux clusters in our datacenter here in Dallas, and as more and more clusters go online, we've been noticing a performance problem with NIS. By default, ypbind pings the NIS servers every 20 seconds asking "are you still alive?", and every 15 minutes, it rebinds to the NIS server regardless of whether or not there were problems. With the number of Linux clients on our network, this is creating a high level of stress on our NIS servers. (nscd helps a little, but we've been having issues with nscd dieing; we're still debugging this issue.)

There is a -no-ping option to ypbind that disables this 20sec/15min activity, however, this also disables a section of code that is useful in case a NIS server goes down. With the 20sec pinging going on, if a NIS server crashes, ypbind will find a new server to bind to. But if the -no-ping option is used, the client will remain bound to the bad server forever. If you look in the test_bindings() function (ypbind-mt-1.12/src/serv_list.c), there are two lines near
the top:
if (ping_interval < 1)
pthread_exit (&success);
The -no-ping option sets ping_interval=0, so this thread exits and the
code that tries to rebind never gets run.


We have developed some patches that modify the behavior of both glibc and ypbind so that, in the case of an error, ypbind will look for a new server even if -no-ping was used. We're also working with RedHat to incorporate these patches, but RedHat does not want to created a forked version of ypbind and glibc. They would be more comfortable if these patches were accepted by you.

I've attached the patches to this e-mail, and I have a more detailed description of what the patches do below. These patches were generated against the source from RedHat Enterprise Linux 3, which means:
ypbind-mt-1.12
glibc-2.3.2-200309260658


You'll also notice in the patch for ypbind that I've defined USE_BROADCAST=0. We have another issue where a few NIS servers are extremely fast at responding to pings even when heavily loaded, so of our 11 NIS servers, most clients bind to the fastest 3 and the other 8 sit by idle. We're experimenting with USE_BROADCAST=0 and a Perl script that randomizes the order of the servers in /etc/yp.conf to get better load balancing. If this works well, an actual 'configure' switch would be better than my patch that simply hacks the 'configure' script.

Please take a look at our patches and let me know what you think. I believe these patches will help make Linux a better product for the enterprise.

Thank you!

-----------------
Jeff Bastian
jmbastia@ti.com
Unix System Admin
Texas Instruments
-----------------

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Detailed Description

What exactly do our ypbind patches do?
 1) There is a chunk of code in the test_bindings() function in
    (ypbind-mt-1.12/src/serv_list.c) that tests the current server
    for accessibility and finds a new server if its broken.  This
    chunk of code is inside a while(1) loop that controls the
    20sec/15min actions.  We've moved this chunk of code to a
    function called test_bindings_once() and placed a call to
    this new function in test_bindings(), so the functionality
    is not changed here.
 2) In the ypbindproc_domain() function
    (ypbind-mt-1.12/src/ypbind_server.c) there is an added call to
    the test_bindings_once() function.  If everything is working,
    this added test is a small price.  If the server is down, it
    will look for a new one.

So, the only change in behavior here is that ypbindproc_domain() will
test the current binding once before returning.  If necessary, a new
command line flag can be added so ypbindproc_domain() only calls
test_bindings_once() if this new flag is present, e.g.
 ypbind -no-ping --xyz
 ...
 ypbindproc_domain(...)
 {
   ...
   if (xyz)
     test_bindings_once(1);
   find_domain (domain, result);
   ...
 }



However, this is only half the picture.  Meanwhile, over in glibc
land, more changes are made in the NIS client code to make this whole
new system work.

What exactly do our glibc patches do?
 1) Like test_bindings() above, we take the __yp_bind() function
    (glibc-2.3.2-200309260658/nis/ypclnt.c) and move chunks of it
    into three smaller functions.
      a) __yp_bind_client_create() is a small chunk of code that
         was duplicated in __yp_bind(), once for the section of
         code that looks at /var/yp/binding, the other for the
         section that talks to the ypbind daemon
      b) the section that looks at /var/yp/binding was moved into
         the __yp_bind_file() function, with a call to
         __yp_bind_client_create() where the duplicate code used
         to reside
      c) the section that talks to ypbind daemon was moved into
         the __yp_bind_ypbindprog() function, again, with a call to
         __yp_bind_client_create() replacing the duplicate code
    And, of course, calls to __yp_bind_file() and
    __yp_bind_ypbindprog() are inserted into __yp_bind() where the
    code used to reside.  Also, like ypbind change (1) above, this
    does not alter the functionality at all.
 2) If /var/yp/binding has bad data in it (e.g., a server that went
    offline), then calls do_ypcall() will fail w/o trying to find
    a new NIS server.  So, two lines of code are added to do_ypcall()
    that, in the event of an error from clnt_call(), try calling
    __yp_bind_ypbindprog() (one of our new functions), which in turn
    does
      clnt_call (client, YPBINDPROC_DOMAIN, ...)
    which in turn calls ypbindproc_domain() in the ypbind daemon
    which calls test_bindings_once() and hopefully finds a new
    server.

In summary, while at first our patches appear to do lots of surgery
to the source code, there's really only two small changes:
 1) Call test_bindings_once() from ypbindproc_domain(), possibly
    controlled by a new command line flag
 2) Call __yp_bind_ypbindprog() from do_ypcall() if clnt_call()
    returns an error



Attachment: nis_patches_rhel3.tar.gz
Description: application/gzip


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]