This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Using XML in GDB?


On Fri, Jan 27, 2006 at 08:25:42PM +0200, Eli Zaretskii wrote:
> Can we at least have a pipe-dream list of things we think GDB would
> ideally like to know about targets, and how structured each one of
> them is?

Well, I don't think I can.  I haven't a clue; every time I think I've
got a handle on the set, people come up with creative new ones.  For
instance I hadn't considered that we might want memory-mapped I/O
devices to be explicitly explained to GDB.

> > If we're going to do that, it would be a real shame not to consider
> > localization; most ARM system programmers can probably manage the
> > English names of the registers, but if we want to offer help text,
> > being able to provide it in Japanese is a big win.  So that means
> > character encodings, and in turn that means we need to be somewhat
> > careful with the contents of descriptions.
> 
> That part is something I never understood in your reasoning: XML does
> not do anything special to allow UTF-8, nor help you deal with the
> resulting non-ASCII text on the GDB side.  If the underlying libc
> supports UTF-8, you have that now; if it doesn't, you won't be better
> off even if the target speaks XML.

The mere existance of character encodings isn't the issue; the
issue is encoding free-form text, possibly containing strange
"characters", within a structured element.  In particular, within a
structured element that a client may not recognize and support.

We've got field separators - colon and semicolon in my working copy,
and the status of newlines is fuzzy.  If they may validly occur within
free-form text we need to have an alternate way to escape them. In
ASCII how to do this is quite clear-cut.  In UTF-8 it's a little less
clear-cut although still pretty simple - but it does require knowing
something about the contents of UTF-8 when defining the encoding, if
you want the encoded result to still be valid UTF-8.  And I do, because
otherwise it will become awkward to edit the descriptions in a text
editor.

If you want to optionally support other encodings rather than UTF-8 it
becomes even trickier.  You have to know, eventually, how fields are
encoded.  For us I don't think that's necessary; we can define all
encoded text as UTF-8.  But there's a similar problem if someone wants
to add a descriptive element transfered as a binary blob for some
reason - I don't have an example for this, but I can certainly accept
that someone will come up with one someday.  Maybe bytecode!

XML's already considered this and solved it.  There are defined ways
to express a document's encoding, and to escape characters that
would otherwise serve as syntax elements.  You can store arbitrary text
or byte sequences in an element (e.g OpenDocument).

> > The biggest win of XML, for me, is that there are standard answers to
> > all of these problems and standard tools for editing and
> > checking XML files.
> 
> Is XML the only widely used standard that supports what we want?

I'm sure it isn't, but I think it's the most standardized.  You could
do something similar in an RFC-822 style format, for instance (Header:
value as in email, in case any of the list readers aren't familiar with
RFC-822; it also does handle multiline values, but I'm not sure how it
is on encoded text).

I'm not a die-hard XML advocate.  In fact I've never used it before
for a new project, although I'm fairly familiar with it.  If someone
has an alternate representation that they believe is superior, I'm
listening.  What I want to do, however, is draw the line past which
we should use standardized representations instead of ad-hoc.

-- 
Daniel Jacobowitz
CodeSourcery


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]