This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

[Fwd: [Fwd: reopening fprintf I18N issue]]

To: libchacker <libc-hacker at sourceware dot cygnus dot com>
Subject: [Fwd: [Fwd: reopening fprintf I18N issue]]
From: Mark Brown <bmark at us dot ibm dot com>
Date: Thu, 28 Sep 2000 10:20:50 -0500
Organization: IBM Corp.
Reply-To: bmark at us dot ibm dot com

This is from Gary Miller....

gwm@us.ibm.com wrote:
> 
> Mark,
> 
> The distinction between the fprintf family of functions and the fwprintf
> family of functions is that the fprintf functions output is byte and the
> fwprintf output is wide character (wchar_t). The fwprintf wide character
> (wchar_t) data are printed as if having been processed by fputwc(), but the
> fundamental difference between the two families of functions is that one is
> "byte" oriented and the other is "wide character (wchar_t)" oriented.
> 
>    The metric for the output of the fprintf family of functions for
>    padding, precision, etc. should be bytes.
> 
>    The metric for the output (intermediate wide character sequence) of the
>    fwprintf family functions for padding, precision, etc. should be wide
>    characters (wchar_t).
> 
> This may seem counter intuitive, but it is the only way to have a rational
> distinction of the functions. One of the things that might seem strange is
> that one can have a differing number of bytes of printed output for the
> fwprintf functions after fputwc() has been applied depending on the
> underlying multibyte codeset of the locale: consider the differences
> between SJIS and EUC-JP.
> 
> BTW, because of the differences among codesets (SJIS, EUC-JP, EUC-TW,
> EUC-CN, 8859-x, UTF-8), I believe that it is a bad idea to attempt to apply
> precision operations on character (byte) strings. Precision operations on
> wide character strings should have "consistent" visual output -- the
> underlying data buffers will have varying numbers of bytes due to the
> differences in codesets after the data has been processed by fputwc().
> 
> Gary W. Miller                             Phone: ( 512) 838-8297
> IBM 2BCA/903 ZIP 9350           T/L:        678-8297
> 11400 Burnet Road                    FAX:     (512) 838-0169
> Austin, Texas 78758                  Internet: gwm@us.ibm.com
> 
> Mark Brown/Austin/IBM@IBMUS on 09-27-2000 11:04:20 PM
> 
> Please respond to Mark Brown/Austin/IBM@IBMUS
> 
> To:   Gary W Miller/Austin/IBM@IBMUS
> cc:
> Subject:  [Fwd: reopening fprintf I18N issue]
> 
> Gary
> 
> FYI
> 
> Mark
> 
> Ulrich Drepper wrote:
> >
> > Sorrya for the late reply, I'm still catching up.
> >
> > > This is an incorrect interpretation. The printf() class of functions
> > > is always _byte_based_; a char is a byte in ISO C. Note that there
> > > is no "l" (ell) qualifier present (SUSv2), thus the argument to %s
> > > is to be a pointer to an array of char (ISO C) -- this is because it
> > > is going to be treated as bytes. Note the "if precision....that many
> > > bytes are written" sentence.
> >
> > I believe you that this is what current implementations do because it
> > is what one expects from a non-locale-aware implementation.
> >
> > > As to what is going on in the test results, let me add something:
> > >
> > > [FAIL] printf([%-6.1s],??????)
> > >     sys[[<SPC><SPC><SPC><SPC><SPC><SPC>]] != exp
> [[<SPC><SPC><SPC><SPC><SPC>]]
> > >                                                   ^
> > >                                                    There should be an
> > >                                                    undisplayable single
> byte
> > >                                                    here if you look at
> the
> > >                                                    actual output!
> >
> > There is none in the files I got and this is good so.  The problem is
> > that if this byte would be there the entire output is unusable.  I
> > just changed the code to implement it the way you suggest it and now I
> > cannot even use iconv anymore.  I cannot imagine that this is what
> > people want to use.
> >
> > This leaves in my opinion only two ways out:
> >
> > - just like the test output I have, the byte is simply omitted.  This
> >   has the big drawback that now the output precision is not honored in
> >   some case and string concatenation etc might fail because junk
> >   characters are included in the string.
> >
> > - do it the way I've implemented it.  It always provides a usable output.
> >
> > I do not really know what to do.  Writing out garbage bytes seems much
> > worse than diverging from the behavior of other implementations.
> >
> > > > [FAIL] printf([%-6.3s],??????)
> > > >     sys[[?<SPC><SPC><SPC><SPC>]] != exp[[?<SPC><SPC><SPC>]]
> > >                                             ^
> > >                                             a single byte here as well.
> >
> > Neither here is this byte present.  I guess your Japanese guys are
> > agreeing with me that this additional byte is bad.
> >
> > > Now, onward to the swprintf() issue. Gary thinks the spec here is
> horribly
> > > muddled, and that both the test and glibc are doing the wrong thing. We
> are
> > > going to submit an aardvark to Austin Group on this. For what it is
> worth,
> > > glibc is closer the Gary's expected behavior.
> >
> > I've got meanwhile some comments from the original author of the amd1
> > specs.  His intentions were a bit different from what I had
> > implemented and this is I've changed now.  I think my implementation
> > is now in line what ISO C99 is intended to be.
> >
> > --
> > ---------------.                          ,-.   1325 Chesapeake Terrace
> > Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
> > Red Hat          `--' drepper at redhat.com   `------------------------
> 
> --
> Mark S. Brown
> bmark@us.ibm.com
> Senior Technical Staff Member                          512.838.3926
> T/L678.3926
> IBM RS/6000 AIX System Architecture                        Mark
> Brown/Austin/IBM
> IBM Corporation, Austin, Texas

-- 
Mark S. Brown                                                   bmark@us.ibm.com
Senior Technical Staff Member                          512.838.3926  T/L678.3926
IBM RS/6000 AIX System Architecture                        Mark Brown/Austin/IBM
IBM Corporation, Austin, Texas

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]