This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: glibc aio performance
- From: "Amos P Waterland" <waterland at us dot ibm dot com>
- To: "Don Capps" <don dot capps2 at verizon dot net>
- Cc: libc-alpha at sources dot redhat dot com, "Thomas Gall" <tom_gall at vnet dot ibm dot com>
- Date: Thu, 30 May 2002 15:46:37 -0600
- Subject: Re: glibc aio performance
Don:
Thank you very much for your analysis. I have a few questions.
I ran the suggested options on my Redhat 7.3 box, and got the following
results. (I noticed that in your results, your second command line had -s
200M, but the report had 307200: maybe the 2 was just a typo in the email?)
% iozone -r 64 -s 300M -i 0 -i 1
KB reclen write rewrite read reread
307200 64 32881 32494 42239 43256
% iozone -k 32 -r 64 -s 300M -i 0 -i 1
KB reclen write rewrite read reread
307200 64 18548 19151 26866 26512
As you can see, the results are much better, but the AIO is still 30-40%
slower than the SIO. (I re-ran the tests several times to try to iron out
timing anomalies.) Do you think that thread setup, teardown, and overhead
accounts for this? (I did try using just two threads, but did not get
significantly better throughput.)
Thanks in advance.
Amos Waterland
"Don Capps"
<don.capps2@veriz To: <libc-alpha@sources.redhat.com>, Amos P
on.net> Waterland/Austin/IBM@IBMUS
cc: Thomas Gall/Rochester/IBM@IBMUS
05/30/02 02:34 PM Subject: Re: glibc aio performance
Please respond to
"Don Capps"
Amos,
I believe that this may be a case of pilot error :-)
In the case where you ran
iozone -i0 -i1 -k128
You asked for Iozone to use POSIX async I/O and
to use 128 async writes/reads of 4k and wrote/read
a file that was 512 Kbytes in size.
Thus.. there are 128 async read threads doing exactly
one 4k operation.
Let's look under the hood.
If one spawns 128 async reads/writes then this is the
equivalent of calling fork() 128 times and having
the new process do a single 4k operation and then
terminate. The overhead of creating the threads,
and terminating the threads is eating your lunch.
Here are some suggestions that you might find useful.
Try using a file size that is much larger.
Example: -s 300M
Try using a transfer size that is larger.
Example: -r 64
Try using fewer async ops.
Example: -k 32
In the examples above the threads will do a more
significant amount of work and they will be
re-used many times before they terminate.
There would be 32 threads and each thread will
do 64 kbyte transfers. The total number of
transfers will be 4800. Each thread will now
do 150 ops before it terminates.
Here is output from my Redhat 7.2 box.
./iozone -r 64 -s 300M -i 0 -i 1
KB reclen write rewrite read reread
307200 64 37314 29174 28042 28057
./iozone -k 32 -r 64 -s 200M -i 0 -i 1
KB reclen write rewrite read reread
307200 64 25793 30625 27376 27335
Since the filesystem is on a single disk drive there
is no advantage to the async operations. But, the
result is well within reason.
I don't believe that there is anything wrong with
glibc or with Iozone but more of a case of getting
what you asked for and finding out that the question
was probably not a good one :-)
The moral of the story is: If you are going to spawn
a thread then it would be wise to have it do some
significant work before it terminates.
Hope this helps,
Don Capps
----- Original Message -----
From: "Amos P Waterland" <waterland@us.ibm.com>
To: <libc-alpha@sources.redhat.com>
Cc: <capps@iozone.org>; <wnorcott@us.oracle.com>; <tom_gall@vnet.ibm.com>
Sent: Thursday, May 30, 2002 1:01 PM
Subject: glibc aio performance
> Hello, I have been looking at the glibc asynchronous I/O implementation,
> and have run into a bit of an issue.
>
> When I run IOzone (an open source filesystem benchmark tool) with AIO
> enabled, it reports write KB/s throughput on the order of 45 times slower
> than that reported without AIO.
>
> % iozone -i0 -i1 #run just the write and read tests
> [snip]
> KB reclen write rewrite read reread [snip]
> 512 4 101769 178334 331595 345702
> % iozone -i0 -i1 -k128 #do same, but use no-bcopy aio
> [snip]
> KB reclen write rewrite read reread [snip]
> 512 4 2232 47210 121874 103372
>
> I have looked at the source code for the glibc implementation, and it is
> not obvious to me why keeping a thread pool, each of whose contituents
> perform a pread(2) or pwrite(2), should be so much slower than
synchronous
> I/O. I looked at the source code for IOzone, and found that it uses
> libasync.c for AIO, but could find no obvious performance problems in its
> code.
>
> So my question is: Might there be a problem with the way IOzone is using
> glibc's implementation of AIO, or is glibc's implementation known to have
> performance problems?
>
> Amos Waterland
>
>