This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
[RFC] File-I/O, target access to host file system via gdb remote protocol enhancement
- From: Corinna Vinschen <vinschen at redhat dot com>
- To: gdb at sources dot redhat dot com
- Date: Mon, 11 Nov 2002 13:13:54 +0100
- Subject: [RFC] File-I/O, target access to host file system via gdb remote protocol enhancement
- Reply-to: gdb at sources dot redhat dot com
Hi,
this RFC tries to introduce a remote protocol enhancement, which
already has been implemented at Red Hat. The idea is to allow
the remote target (which likely has no own file system at all)
to access the host file system to store and retrieve data from
a gdb session, as if the hosts filesystem is local to the target.
Basically this means, the gdb stub on the target translates calls
to low level IO routines as open, read, write, close into calls
to gdb on the host machine, which in turn calls this routines
locally to support the target.
A second part of this implementation is to map the basic stdio
streams (file descriptors 0-2) to the gdb console, to enable
the user to serve a remote target application interactively.
This should work in gdb's CLI as well as in the GUI.
The existing implementation maps only a handful of useful functions
but the protocoll itself is easily expandable to support a lot more.
The wish is, to contribute the File-I/O enhancement to the FSF.
The official text follows.
Thanks in advance,
Corinna
==========================================================================
Abstract:
File I/O shall allow the target to use the hosts file system
and console I/O when calling various system calls. For that
reason, system calls on the target system will get translated
into a remote communication to the host system which then performs
the needed actions and returns with an adequate response to
the target system. This simulates file system operations even
on file system-less targets. Since remote communication between
GDB and a target system is already well defined, the file I/O
protocol will be part of this already existing GDB remote serial
protocol.
Requirements:
The protocol should be host- and target-system independent.
The protocol can't expect that values, used to control the
exact behaviour of system calls, or datatypes are identical
on host and target. This requires the protocol to use an
independent representation of datatypes and values. It's in
the responsibility of both connection points (Redboot on the
target, GDB on the host) to translate the system dependent
values into the unified protocol values when data is transmitted.
The communication is synchronous. A system call is possible only
when GDB is waiting for the continuing or stepping target. While
GDB handles the request for a syscall, the target is stopped to allow
deterministic access to the target's memory.
Therefore file I/O is not interuptible by target signals. It is
possible to interrupt file I/O by a user interrupt (Ctrl-C), though.
The target's request to perform a host system call does not finish
the latest action. That means, after finishing the system call,
the target returns to continuing the previous activity (continue, step).
No additional continue or step request from GDB is required:
(gdb) continue
<- target requests 'syscall X'
target is stopped, GDB executes syscall
-> GDB returns result
... target continues, GDB returns to wait for the target
<- target hits breakpoint and send a Txx packet
The protocol is only used for files on the host file system and
for I/O on the console. Character or block special devices, pipes,
named pipes or sockets or any other communication method on the host
system are not supported by this protocol.
Protocol basics:
The file I/O protocol is part of the already existing GDB remote serial
protocol. It uses the not yet used 'F' packet type for the communication.
Since a file I/O system call can only occur when GDB is waiting
for the continuing or stepping target, the file I/O request is
a new reply that GDB has to expect as a result of a former 'c',
'C', 's' or 'S' packet.
This 'F' packet contains all information needed to allow GDB to
call the appropriate host system call. This especially includes:
- A unique identifier for the requested syscall.
- All parameters to the syscall. Pointers are given as addresses
into the target memory. Pointers to strings are given as pointer/
length pair. Numerical values are given as they are. Numerical
control values are given in the protocol specific representation.
At that point GDB has to perform the following actions.
- If parameter pointer values are given, which point to data
needed as input to a system call, GDB requests this data
from the target with a standard 'm' packet request. This
additional communication has to be expected by the target
implementation and is handled as any other 'm' packet
communication.
- Translating all values from protocol representation to host
representation as needed. Datatypes are coerced into the
host types.
- Call syscall.
- Coerce datatypes back to protocol representation.
- If pointer parameters in the request packet point to buffer
space in which a system call is expected to copy data to,
the data is transmitted to the target using a 'M' packet.
This packet has to be expected by the target implementation
and is handled as any other 'M' packet communication.
Eventually GDB replies with another 'F' packet which contains all
necessary information for the target to continue. This at least
contains
- Return value.
- Errno, if has been changed by the system call.
- "Ctrl-C" flag.
After having done the needed type and value coercion, the target
continues the latest continue or step action.
Memory transfer:
Structured data which is transferred using a memory read or write
packet as e.g. a struct stat is expected to be in a protocol specific
format with all numerical multibyte datatypes being big endian.
This should be done by the target before the 'F' packet is sent resp.
by GDB before it transfers memory to the target. Transferred pointers
to structured data should point to the already coerced data at any time.
The "Ctrl-C" message:
A special case is, if the "Ctrl-C" flag is set in the GDB reply
packet. In this case the target should behave, as if it had
gotten a break message. The meaning for the target is "system
call interupted by SIGINT". Consequentially, the target should
actually stop (as with a break message) and return to GDB with
a "T02" packet. In this case, it's important for the target
to know, in which state the system call was interrupted. Since
this action is by design not an atomic operation, we have to
differ between two cases.
- The syscall hasn't been performed on the host yet.
- The syscall on the host has been finished.
These two states can be distinguished by the target by the value
of the returned errno. If it's the protocol representation of
EINTR, the syscall hasn't been performed. This is equivalent
to the EINTR handling on POSIX systems. In any other case,
the target may presume that the syscall has been finished --
successful or not -- and should behave as if the break message
arrived right after the syscall.
IMPORTANT: GDB must behave reliable. If the system call has not
been called yet, GDB may send the 'F' reply immediately, setting
EINTR as errno in the packet. If the system call on the host has
been finished before the user requests a break, the full action
must be finshed by GDB. This requires sending 'M' packets as they
fit. The 'F' packet may only be send when either nothing has happened
or the full action has been completed.
The 'F' request packet:
The 'F' request packet has the following format:
F<call-id>[,<parameter>]...
<call-id> is the identifier which says which host system call should
be called. This is just the name of the function as listed in
Appendix A.
Parameters are hexadecimal integer values, either the real values
or pointers to target buffer space. These are appended to the
call-id, each separated from it's predecessor by a comma. All
values are transmitted in their ASCII string representation,
conforming to the following regular expression
[+-]?[0-9a-fA-F]+
The 'F' reply packet:
The 'F' reply packet has the following format:
F<retcode>[,<errno>[,<Ctrl-C-Flag>]][;<call specific attachment>]
The call specific attachment isn't used in this first proposal
but it's designated to allow extensions needed by special not
yet defined calls. No contents are defined yet.
The parameters have to be transmitted as hexadecimal ASCII strings
as described in the previous chapter.
<retcode> is the return code of the call as hexadecimal value.
<errno> is the errno set by the call, in protocol specific
representation. This parameter can be omitted if the call
was successful.
<Ctrl-C-Flag> is only send if the user requested a break. In this
case, the errno must be send as well, even if the call was successful.
The Ctrl-C flag itself consists of the character 'C':
F0,0,C
or, if the call was interupted before the host call has been performed:
F-1,4,C
assuming 4 is the protocol specific representation of EINTR.
Console I/O:
By default and if not explicitely closed by the target system, the file
descriptors 0, 1 and 2 are connected to the GDB console. Output on the
GDB console is handled as any other file output operation (write(1,...)
or write(2,...)). Console input is handled by GDB so that after the
target read request from file descriptor 0 all following typing is
buffered until either one of the following conditions is met:
- The user presses Ctrl-C. The behaviour is as explained above,
the read() system call is treated as finished.
- The user presses <Enter>. This is treated as end of input with
a trailing line feed.
- The user presses Ctrl-D. This is treated as end of input. No
trailing character, especially no Ctrl-D is appended to the input.
If the user has typed more characters as fit in the buffer given to
the read call, the trailing characters are buffered in GDB until
either another read(0,...) is requested by the target or debugging
is stopped on users request.
The "isatty" call:
A special case in this protocol is the library call isatty(3) which
is implemented as it's own call inside of this protocol. It returns
1 to the target if the file descriptor given as parameter is attached
to the GDB console, 0 otherwise. Implementing through system calls
would require implementing ioctl() and would be more complex than
needed.
The "system" call:
The other special case in this protocol is the system(3) call which
is implemented as it's own call, too. GDB is taking over the full
task of calling the necessary host calls to perform the system()
call. The return value of system is simplified before it's returned
to the target. Basically, the only signal transmitted back is EINTR
in case the user pressed Ctrl-C. Otherwise the return value consists
entirely of the exit status of the called command.
Appendix A: List of calls.
All constants are given in their POSIX notation. The usage inside
of protocol packets requires translation from host/target representation
into protocol representation. The values of these constans are given
in Appendix C. The protocol representation of the used datatypes is
given in Appendix B.
A.1 open
Call-Id: open
Synopsis: int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
Request: Fopen,pathptr/len,flags,mode
`flags' is the bitwise or of the following values:
O_CREAT If the file does not exist it will be created. The host
rules apply as far as file ownership and time stamps
are concerned.
O_EXCL When used with O_CREAT, if the file already exists it is
an error and open() fails.
O_TRUNC If the file already exists and the open mode allows writing
(O_RDWR or O_WRONLY is given) it will be truncated to length 0.
O_APPEND The file is opened in append mode.
O_RDONLY The file is opened for reading only.
O_WRONLY The file is opened for writing only.
O_RDWR The file is opened for reading and writing.
Each other bit is silently ignored.
`mode' is the bitwise or of the following values:
S_IRUSR User has read permission.
S_IWUSR User has write permission.
S_IRGRP Group has read permission.
S_IWGRP Group has write permission.
S_IROTH Others have read permission.
S_IWOTH Others have write permission.
Each other bit is silently ignored.
Return value: open returns the new file descriptor or -1 if an error
occured.
Errors:
EEXIST pathname already exists and O_CREAT and O_EXCL were used.
EISDIR pathname refers to a directory.
EACCES The requested access is not allowed.
ENAMETOOLONG
pathname was too long.
ENOENT A directory component in pathname does not exist.
ENODEV pathname refers to a device, pipe, named pipe or socket.
EROFS pathname refers to a file on a read-only filesystem and
write access was requested.
EFAULT pathname is an invalid pointer value.
ENOSPC No space on device to create the file.
EMFILE The process already has the maximum number of files open.
ENFILE The limit on the total number of files open on the system
has been reached.
EINTR The call was interrupted by the user.
A.2 close
Call-Id: close
Synopsis: int close(int fd);
Request: Fclose,fd
Return value: close returns zero on success, or -1 if an error occurred.
Errors:
EBADF fd isn't a valid open file descriptor.
EINTR The call was interrupted by the user.
A.3 read
Call-Id: read
Synopsis: int read(int fd, void *buf, unsigned int count);
Request: Fread,fd,bufptr,count
Return value: On success, the number of bytes read is returned.
Zero indicates end of file. If count is zero, read
returns zero as well. On error, -1 is returned.
Errors:
EBADF fd is not a valid file descriptor or is not open for reading.
EFAULT buf is an invalid pointer value.
EINTR The call was interrupted by the user.
A.4 write
Call-Id: write
Synopsis: int write(int fd, const void *buf, unsigned int count);
Request: Fwrite,fd,bufptr,count
Return value: On success, the number of bytes written are returned.
Zero indicates nothing was written. On error, -1 is returned.
Errors:
EBADF fd is not a valid file descriptor or is not open for writing.
EFAULT buf is an invalid pointer value.
EFBIG An attempt was made to write a file that exceeds the host
specific maximum file size allowed.
ENOSPC No space on device to write the data.
EINTR The call was interrupted by the user.
A.5 lseek
Call-Id: lseek
Synopsis: long lseek (int fd, long offset, int flag);
Request: Flseek,fd,offset,flag
`flag' is one of:
SEEK_SET The offset is set to offset bytes.
SEEK_CUR The offset is set to its current location plus offset bytes.
SEEK_END The offset is set to the size of the file plus offset bytes.
Return value: On success, the resulting unsigned offset in bytes from the
beginning of the file is returned. Otherwise, a value of -1
is returned.
Errors:
EBADF fd is not a valid open file descriptor.
ESPIPE fd is associated with the GDB console.
EINVAL flag is not a proper value.
EINTR The call was interrupted by the user.
A.6 rename
Call-Id: rename
Synopsis: int rename(const char *oldpath, const char *newpath);
Request: Frename,oldpathptr/len,newpathptr/len
Return value: On success, zero is returned. On error, -1 is returned.
Errors:
EISDIR newpath is an existing directory, but oldpath is not a
directory.
EEXIST newpath is a non-empty directory.
EBUSY oldpath or newpath is a directory that is in use by some
process.
EINVAL An attempt was made to make a directory a subdirectory of
itself.
ENOTDIR A component used as a directory in oldpath or new path
is not a directory. Or oldpath is a directory and newpath
exists but is not a directory.
EFAULT oldpathptr or newpathptr are invalid pointer values.
EACCES No access to the file or the path of the file.
ENAMETOOLONG
oldpath or newpath was too long.
ENOENT A directory component in oldpath or newpath does not exist.
EROFS The file is on a read-only filesystem.
ENOSPC The device containing the file has no room for the new
directory entry.
EINTR The call was interrupted by the user.
A.7 unlink
Call-Id: unlink
Synopsis: int unlink(const char *pathname);
Request: Funlink,pathnameptr/len
Return value: On success, zero is returned. On error, -1 is returned.
Errors:
EACCES No access to the file or the path of the file.
EPERM The system does not allow unlinking of directories.
EBUSY The file pathname cannot be unlinked because it's
being used by another process.
EFAULT pathnameptr is an invalid pointer value.
ENAMETOOLONG
pathname was too long.
ENOENT A directory component in pathname does not exist.
ENOTDIR A component of the path is not a directory.
EROFS The file is on a read-only filesystem.
EINTR The call was interrupted by the user.
A.8 stat, fstat
Call-Id: stat, fstat
Synopsis: int stat(const char *pathname, struct stat *buf);
int fstat(int fd, struct stat *buf);
Request: Fstat,pathnameptr/len,bufptr
Ffstat,fd,bufptr
Return value: On success, zero is returned. On error, -1 is returned.
Errors:
EBADF fd is not a valid open file.
ENOENT A directory component in pathname does not exist or the
path is an empty string.
ENOTDIR A component of the path is not a directory.
EFAULT pathnameptr is an invalid pointer value.
EACCES No access to the file or the path of the file.
ENAMETOOLONG
pathname was too long.
EINTR The call was interrupted by the user.
A.9 gettimeofday
Call-Id: gettimeofday
Synopsis: int gettimeofday(struct timeval *tv, void *tz);
Request: Fgettimeofday,tvptr,tzptr
Return value: On success, 0 is returned, -1 otherwise.
Errors:
EINVAL tz is a non-NULL pointer.
EFAULT tvptr and/or tzptr is an invalid pointer value.
A.10 isatty
Call-Id: isatty
Synopsis: int isatty(int fd);
Request: Fisatty,fd
Return value: Returns 1 if fd refers to the GDB console, 0 otherwise.
Errors:
EINTR The call was interrupted by the user.
A.11 system
Call-Id: system
Synopsis: int system(const char *command);
Request: Fsystem,commandptr/len
Return value: The value returned is -1 on error and the return status
of the command otherwise. Only the exit status of the
command is returned, which is extracted from the hosts
system return value by calling WEXITSTATUS(retval).
In case /bin/sh could not be executed, 127 is returned.
Errors:
EINTR The call was interrupted by the user.
Appendix B: Protocol specific representation of datatypes.
B.1 Integral datatypes
The integral datatypes used in the system calls are
int, unsigned int, long, unsigned long, mode_t and time_t.
Int, unsigned int, mode_t and time_t are implemented as 32 bit values
in this protocol.
Long and unsigned long are implemented as 64 bit types.
To allow range checking on host and target, corresponding MIN and MAX
values (similar to those in limits.h) are defined in Appendix C.
B.2 Pointer values
Pointers to target data is transmitted as they are. A difference
is made for pointers to buffers for which the length isn't
transmitted as part of the function call, namely strings. Strings
are transmitted as a pointer/length pair, both as hex values, e. g.
1aaf/12
which is a pointer to data of length 18 bytes at position 0x1aaf.
The length is defined as the full string length in bytes, including
the trailing null byte. Example:
"hello, world" at address 0x123456
is transmitted as
123456/d
B.3 struct stat
The buffer of type struct stat used by the target and GDB is defined
as follows:
struct stat {
unsigned int st_dev; /* device */
unsigned int st_ino; /* inode */
mode_t st_mode; /* protection */
unsigned int st_nlink; /* number of hard links */
unsigned int st_uid; /* user ID of owner */
unsigned int st_gid; /* group ID of owner */
unsigned int st_rdev; /* device type (if inode device) */
unsigned long st_size; /* total size, in bytes */
unsigned long st_blksize; /* blocksize for filesystem I/O */
unsigned long st_blocks; /* number of blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last change */
};
The integral datatypes are conforming to the definition in B.1 so this
structure is of size 64 bytes.
The values of several fields have a restricted meaning and/or
range of values.
st_dev: 0 file
1 console
st_ino: No valid meaning for the target. Transmitted unchanged.
st_mode: Valid mode bits are described in Appendix C. Any other
bits have currently no meaning for the target.
st_uid: No valid meaning for the target. Transmitted unchanged.
st_gid: No valid meaning for the target. Transmitted unchanged.
st_rdev: No valid meaning for the target. Transmitted unchanged.
st_atime, st_mtime, st_ctime:
These values have a host and file system dependent
accuracy. Especially on Windows hosts the file systems
don't support exact timing values.
The target gets a struct stat of the above representation and is
responsible to coerce it to the target representation before
continuing.
Note that due to size differences between the host and target
representation of stat members, these members could eventually
get truncated on the target.
B.4 struct timeval
The buffer of type struct timeval used by the target and GDB is defined
as follows:
struct timeval {
time_t tv_sec; /* second */
long tv_usec; /* microsecond */
};
The integral datatypes are conforming to the definition in B.1 so this
structure is of size 8 bytes.
Appendix C: Constants
The following values are used for the constants inside of the
protocol. GDB and target are resposible to translate these
values before and after the call as needed.
C.1 Open flags
All values are given in hexadecimal representation.
O_RDONLY 0
O_WRONLY 1
O_RDWR 2
O_APPEND 8
O_CREAT 200
O_TRUNC 400
O_EXCL 800
C.2 mode_t values
All values are given in octal representation.
S_IFREG 100000
S_IFDIR 40000
S_IRUSR 400
S_IWUSR 200
S_IXUSR 100
S_IRGRP 40
S_IWGRP 20
S_IXGRP 10
S_IROTH 4
S_IWOTH 2
S_IXOTH 1
C.3 Errno values
All values are given in decimal representation.
EPERM 1
ENOENT 2
EINTR 4
EBADF 9
EACCES 13
EFAULT 14
EBUSY 16
EEXIST 17
ENODEV 19
ENOTDIR 20
EISDIR 21
EINVAL 22
ENFILE 23
EMFILE 24
EFBIG 27
ENOSPC 28
ESPIPE 29
EROFS 30
ENAMETOOLONG 91
EUNKNOWN 9999
EUNKNOWN is used as a fallback error value if a host system returns
any error value not in the list of supported error numbers.
C.4 Lseek flags
SEEK_SET 0
SEEK_CUR 1
SEEK_END 2
C.5 Limits
INT_MIN -2147483648
INT_MAX 2147483647
UINT_MAX 4294967295
LONG_MIN -9223372036854775808
LONG_MAX 9223372036854775807
ULONG_MAX 18446744073709551615
Appendix D: GDB setting for system(3)
Due to security concerns about always allowing to call `system(3)'
on the host, GDB gets an additional setting. The user has to
explicitely allow the system(3) call in the user interface. Otherwise
the system(3) call will fail and the target receives an error code EPERM.
The setting is done using the following syntax:
set remote system-call-allowed VAL
with VAL being 0 or 1 for disaallowing resp. allowing the system(3)
call. The user can view the setting by calling
show remote system-call-allowed
Appendix E: Examples
In the examples below, `<-' and `->' are used to indicate transmitted
and received data from GDB's point of view.
E.1 write call
<- Fwrite,3,1234,6 <== fd=3, bufptr=0x1234, len=6
-> m1234,6 <== read memory from target
<- XXXXXX
> F6 <== return "6 bytes written"
E.2 read call
<- Fread,3,1234,6 <== fd=3, bufptr=0x1234, len=6
-> M1234,6,XXXXXX <== write syscall result into...
<- OK <== target's memory
-> F6 <== return "6 bytes read"
E.3 read call, call fails on the host due to invalid file descriptor
<- Fread,3,1234,6
-> F-1,16 <== EINVAL
E.4 read call, writing data on target fails
<- Fread,3,1234,6
-> M1234,6,XXXXXX
<- Ee
-> F-1,e
E.5 read call, user presses Ctrl-C before syscall on host is called
<- Fread,3,1234,6
... <== M request or not, depends on user
-> F-1,4,C
<- T02
E.6 read call, user presses Ctrl-C after syscall on host is called
<- Fread,3,1234,6
-> M1234,6,XXXXXX
<- XXXXXX
-> F-1,4,C
<- T02
===========================================================================
--
Corinna Vinschen
Cygwin Developer
Red Hat, Inc.
mailto:vinschen@redhat.com