This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] File-I/O, target access to host file system via gdb remote protocol enhancement


Hi,

this RFC tries to introduce a remote protocol enhancement, which
already has been implemented at Red Hat.  The idea is to allow
the remote target (which likely has no own file system at all)
to access the host file system to store and retrieve data from
a gdb session, as if the hosts filesystem is local to the target.

Basically this means, the gdb stub on the target translates calls
to low level IO routines as open, read, write, close into calls
to gdb on the host machine, which in turn calls this routines
locally to support the target.

A second part of this implementation is to map the basic stdio
streams (file descriptors 0-2) to the gdb console, to enable
the user to serve a remote target application interactively.
This should work in gdb's CLI as well as in the GUI.

The existing implementation maps only a handful of useful functions
but the protocoll itself is easily expandable to support a lot more.

The wish is, to contribute the File-I/O enhancement to the FSF.
The official text follows.

Thanks in advance,
Corinna

==========================================================================
Abstract:

  File I/O shall allow the target to use the hosts file system
  and console I/O when calling various system calls.  For that
  reason, system calls on the target system will get translated
  into a remote communication to the host system which then performs
  the needed actions and returns with an adequate response to
  the target system.  This simulates file system operations even
  on file system-less targets.  Since remote communication between
  GDB and a target system is already well defined, the file I/O
  protocol will be part of this already existing GDB remote serial
  protocol.

Requirements:

  The protocol should be host- and target-system independent.
  The protocol can't expect that values, used to control the
  exact behaviour of system calls, or datatypes are identical
  on host and target.  This requires the protocol to use an
  independent representation of datatypes and values.  It's in
  the responsibility of both connection points (Redboot on the
  target, GDB on the host) to translate the system dependent
  values into the unified protocol values when data is transmitted.

  The communication is synchronous.  A system call is possible only
  when GDB is waiting for the continuing or stepping target.  While
  GDB handles the request for a syscall, the target is stopped to allow
  deterministic access to the target's memory.
  Therefore file I/O is not interuptible by target signals.  It is
  possible to interrupt file I/O by a user interrupt (Ctrl-C), though.

  The target's request to perform a host system call does not finish
  the latest action.  That means, after finishing the system call,
  the target returns to continuing the previous activity (continue, step).
  No additional continue or step request from GDB is required:

    (gdb) continue

      <- target requests 'syscall X'

      target is stopped, GDB executes syscall

      -> GDB returns result

      ... target continues, GDB returns to wait for the target

      <- target hits breakpoint and send a Txx packet

  The protocol is only used for files on the host file system and
  for I/O on the console.  Character or block special devices, pipes,
  named pipes or sockets or any other communication method on the host
  system are not supported by this protocol.

Protocol basics:

  The file I/O protocol is part of the already existing GDB remote serial
  protocol.  It uses the not yet used 'F' packet type for the communication.

  Since a file I/O system call can only occur when GDB is waiting
  for the continuing or stepping target, the file I/O request is
  a new reply that GDB has to expect as a result of a former 'c',
  'C', 's' or 'S' packet.

  This 'F' packet contains all information needed to allow GDB to
  call the appropriate host system call.  This especially includes:

  - A unique identifier for the requested syscall.
  - All parameters to the syscall.  Pointers are given as addresses
    into the target memory.  Pointers to strings are given as pointer/
    length pair.  Numerical values are given as they are.  Numerical
    control values are given in the protocol specific representation.

  At that point GDB has to perform the following actions.

  - If parameter pointer values are given, which point to data
    needed as input to a system call, GDB requests this data
    from the target with a standard 'm' packet request.  This
    additional communication has to be expected by the target
    implementation and is handled as any other 'm' packet
    communication.

  - Translating all values from protocol representation to host
    representation as needed.  Datatypes are coerced into the
    host types.

  - Call syscall.

  - Coerce datatypes back to protocol representation.

  - If pointer parameters in the request packet point to buffer
    space in which a system call is expected to copy data to,
    the data is transmitted to the target using a 'M' packet.
    This packet has to be expected by the target implementation
    and is handled as any other 'M' packet communication.

  Eventually GDB replies with another 'F' packet which contains all
  necessary information for the target to continue.  This at least
  contains

  - Return value.
  - Errno, if has been changed by the system call.
  - "Ctrl-C" flag.

  After having done the needed type and value coercion, the target
  continues the latest continue or step action.

Memory transfer:

  Structured data which is transferred using a memory read or write
  packet as e.g. a struct stat is expected to be in a protocol specific
  format with all numerical multibyte datatypes being big endian.
  This should be done by the target before the 'F' packet is sent resp.
  by GDB before it transfers memory to the target.  Transferred pointers
  to structured data should point to the already coerced data at any time.

The "Ctrl-C" message:

  A special case is, if the "Ctrl-C" flag is set in the GDB reply
  packet.  In this case the target should behave, as if it had
  gotten a break message.  The meaning for the target is "system
  call interupted by SIGINT".  Consequentially, the target should
  actually stop (as with a break message) and return to GDB with
  a "T02" packet.  In this case, it's important for the target
  to know, in which state the system call was interrupted.  Since
  this action is by design not an atomic operation, we have to
  differ between two cases.

  - The syscall hasn't been performed on the host yet.
  - The syscall on the host has been finished.

  These two states can be distinguished by the target by the value
  of the returned errno.  If it's the protocol representation of
  EINTR, the syscall hasn't been performed.  This is equivalent
  to the EINTR handling on POSIX systems.  In any other case,
  the target may presume that the syscall has been finished --
  successful or not -- and should behave as if the break message
  arrived right after the syscall.

  IMPORTANT:  GDB must behave reliable.  If the system call has not
  been called yet, GDB may send the 'F' reply immediately, setting
  EINTR as errno in the packet.  If the system call on the host has
  been finished before the user requests a break, the full action
  must be finshed by GDB.  This requires sending 'M' packets as they
  fit.  The 'F' packet may only be send when either nothing has happened
  or the full action has been completed.

The 'F' request packet:

  The 'F' request packet has the following format:

    F<call-id>[,<parameter>]...

  <call-id> is the identifier which says which host system call should
  be called.  This is just the name of the function as listed in
  Appendix A.

  Parameters are hexadecimal integer values, either the real values
  or pointers to target buffer space.  These are appended to the
  call-id, each separated from it's predecessor by a comma.  All
  values are transmitted in their ASCII string representation,
  conforming to the following regular expression

    [+-]?[0-9a-fA-F]+

The 'F' reply packet:

  The 'F' reply packet has the following format:

    F<retcode>[,<errno>[,<Ctrl-C-Flag>]][;<call specific attachment>]

  The call specific attachment isn't used in this first proposal
  but it's designated to allow extensions needed by special not
  yet defined calls.  No contents are defined yet.

  The parameters have to be transmitted as hexadecimal ASCII strings
  as described in the previous chapter.

  <retcode> is the return code of the call as hexadecimal value.  

  <errno> is the errno set by the call, in protocol specific
  representation.  This parameter can be omitted if the call
  was successful.

  <Ctrl-C-Flag> is only send if the user requested a break.  In this
  case, the errno must be send as well, even if the call was successful.
  The Ctrl-C flag itself consists of the character 'C':

  F0,0,C

  or, if the call was interupted before the host call has been performed:

  F-1,4,C

  assuming 4 is the protocol specific representation of EINTR.

Console I/O:

  By default and if not explicitely closed by the target system, the file
  descriptors 0, 1 and 2 are connected to the GDB console.  Output on the
  GDB console is handled as any other file output operation (write(1,...)
  or write(2,...)).  Console input is handled by GDB so that after the
  target read request from file descriptor 0 all following typing is
  buffered until either one of the following conditions is met:

  - The user presses Ctrl-C.  The behaviour is as explained above,
    the read() system call is treated as finished.

  - The user presses <Enter>.  This is treated as end of input with
    a trailing line feed.

  - The user presses Ctrl-D.  This is treated as end of input.  No
    trailing character, especially no Ctrl-D is appended to the input.

  If the user has typed more characters as fit in the buffer given to
  the read call, the trailing characters are buffered in GDB until
  either another read(0,...) is requested by the target or debugging
  is stopped on users request.
  
The "isatty" call:

  A special case in this protocol is the library call isatty(3) which
  is implemented as it's own call inside of this protocol.  It returns
  1 to the target if the file descriptor given as parameter is attached
  to the GDB console, 0 otherwise.  Implementing through system calls
  would require implementing ioctl() and would be more complex than
  needed.

The "system" call:

  The other special case in this protocol is the system(3) call which
  is implemented as it's own call, too.  GDB is taking over the full
  task of calling the necessary host calls to perform the system()
  call.  The return value of system is simplified before it's returned
  to the target.  Basically, the only signal transmitted back is EINTR
  in case the user pressed Ctrl-C.  Otherwise the return value consists
  entirely of the exit status of the called command.

Appendix A: List of calls.

  All constants are given in their POSIX notation.  The usage inside
  of protocol packets requires translation from host/target representation
  into protocol representation.  The values of these constans are given
  in Appendix C.  The protocol representation of the used datatypes is
  given in Appendix B.

A.1 open

  Call-Id:	open

  Synopsis:	int open(const char *pathname, int flags);
  		int open(const char *pathname, int flags, mode_t mode);

  Request:	Fopen,pathptr/len,flags,mode

    `flags' is the bitwise or of the following values:

    O_CREAT	If the file does not exist it will be created.  The host
    		rules apply as far as file ownership and time stamps
		are concerned.

    O_EXCL	When used with O_CREAT, if the file already exists it is
    		an error and open() fails.

    O_TRUNC	If the file already exists and the open mode allows writing
    		(O_RDWR or O_WRONLY is given) it will be truncated to length 0.

    O_APPEND	The file is opened in append mode.

    O_RDONLY	The file is opened for reading only.

    O_WRONLY	The file is opened for writing only.

    O_RDWR	The file is opened for reading and writing.

    Each other bit is silently ignored.

    `mode' is the bitwise or of the following values:

    S_IRUSR	User has read permission.

    S_IWUSR	User has write permission.

    S_IRGRP	Group has read permission.

    S_IWGRP	Group has write permission.

    S_IROTH	Others have read permission.

    S_IWOTH	Others have write permission.

    Each other bit is silently ignored.

  Return value:	open returns the new file descriptor or -1 if an error
  		occured.

  Errors:

    EEXIST	pathname already exists and O_CREAT and O_EXCL were used.

    EISDIR	pathname refers to a directory.

    EACCES	The requested access is not allowed.

    ENAMETOOLONG
    		pathname was too long.

    ENOENT	A directory component in pathname does not exist.

    ENODEV	pathname refers to a device, pipe, named pipe or socket.

    EROFS	pathname refers to a file on a read-only filesystem and
    		write access was requested.

    EFAULT	pathname is an invalid pointer value.

    ENOSPC	No space on device to create the file.

    EMFILE	The process already has the maximum number of files open.

    ENFILE	The limit on the total number of files open on the system
    		has been reached.

    EINTR       The call was interrupted by the user.

A.2 close

  Call-Id:	close

  Synopsis:	int close(int fd);

  Request:	Fclose,fd

  Return value:	close returns zero on success, or -1 if an error occurred.

  Errors:

    EBADF	fd isn't a valid open file descriptor.

    EINTR       The call was interrupted by the user.

A.3 read

  Call-Id:      read

  Synopsis:     int read(int fd, void *buf, unsigned int count);

  Request:	Fread,fd,bufptr,count

  Return value:	On success, the number of bytes read is returned.
		Zero indicates end of file.  If count is zero, read
		returns zero as well.  On error, -1 is returned.

  Errors:

    EBADF	fd is not a valid file descriptor or is not open for reading.

    EFAULT	buf is an invalid pointer value.

    EINTR       The call was interrupted by the user.

A.4 write

  Call-Id:      write

  Synopsis:     int write(int fd, const void *buf, unsigned int count);

  Request:	Fwrite,fd,bufptr,count

  Return value:	On success, the number of bytes written are returned.
  		Zero indicates nothing was written.  On error, -1 is returned.

  Errors:

    EBADF	fd is not a valid file descriptor or is not open for writing.

    EFAULT	buf is an invalid pointer value.

    EFBIG	An attempt was made to write a file that exceeds the host
		specific maximum file size allowed.

    ENOSPC	No space on device to write the data.

    EINTR       The call was interrupted by the user.

A.5 lseek

  Call-Id:      lseek

  Synopsis:	long lseek (int fd, long offset, int flag);

  Request:      Flseek,fd,offset,flag

  `flag' is one of:

    SEEK_SET	The offset is set to offset bytes.

    SEEK_CUR	The offset is set to its current location plus offset bytes.

    SEEK_END	The offset is set to the size of the file plus offset bytes.

  Return value: On success, the resulting unsigned offset in bytes from the
  		beginning of the file is returned.  Otherwise, a value of -1
		is returned.

  Errors:

    EBADF	fd is not a valid open file descriptor.

    ESPIPE	fd is associated with the GDB console.

    EINVAL	flag is not a proper value.

    EINTR       The call was interrupted by the user.

A.6 rename

  Call-Id:      rename

  Synopsis:     int rename(const char *oldpath, const char *newpath);

  Request:      Frename,oldpathptr/len,newpathptr/len

  Return value:	On success, zero is returned.  On error, -1 is returned.

  Errors:

    EISDIR	newpath is an existing directory, but oldpath is not a
    		directory.

    EEXIST	newpath is a non-empty directory.

    EBUSY	oldpath or newpath is a directory that is in use by some
    		process.

    EINVAL	An attempt was made to make a directory a subdirectory of
    		itself.

    ENOTDIR	A  component used as a directory in oldpath or new path
    		is not a directory.  Or oldpath is a directory and newpath
		exists but is not a directory.

    EFAULT	oldpathptr or newpathptr are invalid pointer values.

    EACCES	No access to the file or the path of the file.

    ENAMETOOLONG
    		oldpath or newpath was too long.

    ENOENT	A directory component in oldpath or newpath does not exist.

    EROFS	The file is on a read-only filesystem.

    ENOSPC	The device containing the file has no room for the new
		directory entry.

    EINTR       The call was interrupted by the user.

A.7 unlink

  Call-Id:      unlink

  Synopsis:     int unlink(const char *pathname);

  Request:      Funlink,pathnameptr/len

  Return value: On success, zero is returned.  On error, -1 is returned.

  Errors:

    EACCES	No access to the file or the path of the file.

    EPERM	The system does not allow unlinking of directories.

    EBUSY	The file pathname cannot be unlinked because it's
    		being used by another process.

    EFAULT	pathnameptr is an invalid pointer value.

    ENAMETOOLONG
    		pathname was too long.

    ENOENT	A directory component in pathname does not exist.

    ENOTDIR	A component of the path is not a directory.

    EROFS	The file is on a read-only filesystem.

    EINTR       The call was interrupted by the user.

A.8 stat, fstat

  Call-Id:      stat, fstat

  Synopsis:     int stat(const char *pathname, struct stat *buf);
  		int fstat(int fd, struct stat *buf);

  Request:	Fstat,pathnameptr/len,bufptr
  		Ffstat,fd,bufptr

  Return value: On success, zero is returned.  On error, -1 is returned.

  Errors:

    EBADF	fd is not a valid open file.

    ENOENT	A directory component in pathname does not exist or the
    		path is an empty string.

    ENOTDIR	A component of the path is not a directory.

    EFAULT	pathnameptr is an invalid pointer value.

    EACCES	No access to the file or the path of the file.

    ENAMETOOLONG
    		pathname was too long.

    EINTR       The call was interrupted by the user.

A.9 gettimeofday

  Call-Id:	gettimeofday

  Synopsis:	int gettimeofday(struct timeval *tv, void *tz);

  Request:	Fgettimeofday,tvptr,tzptr

  Return value:	On success, 0 is returned, -1 otherwise.

  Errors:

    EINVAL	tz is a non-NULL pointer.

    EFAULT	tvptr and/or tzptr is an invalid pointer value.

A.10 isatty

  Call-Id:	isatty

  Synopsis:	int isatty(int fd);

  Request:	Fisatty,fd

  Return value:	Returns 1 if fd refers to the GDB console, 0 otherwise.

  Errors:

    EINTR       The call was interrupted by the user.

A.11 system

  Call-Id:	system

  Synopsis:     int system(const char *command);

  Request:      Fsystem,commandptr/len

  Return value: The value returned is -1 on error and the return status
  		of the command otherwise.  Only the exit status of the
		command is returned, which is extracted from the hosts
		system return value by calling WEXITSTATUS(retval).
		In case /bin/sh could not be executed, 127 is returned.

  Errors:

    EINTR       The call was interrupted by the user.

Appendix B: Protocol specific representation of datatypes.

B.1 Integral datatypes

  The integral datatypes used in the system calls are

    int, unsigned int, long, unsigned long, mode_t and time_t.

  Int, unsigned int, mode_t and time_t are implemented as 32 bit values
  in this protocol.

  Long and unsigned long are implemented as 64 bit types.
  
  To allow range checking on host and target, corresponding MIN and MAX
  values (similar to those in limits.h) are defined in Appendix C.

B.2 Pointer values

  Pointers to target data is transmitted as they are.  A difference
  is made for pointers to buffers for which the length isn't
  transmitted as part of the function call, namely strings.  Strings
  are transmitted as a pointer/length pair, both as hex values, e. g.

    1aaf/12

  which is a pointer to data of length 18 bytes at position 0x1aaf.
  The length is defined as the full string length in bytes, including
  the trailing null byte.  Example:

    "hello, world" at address 0x123456

  is transmitted as

    123456/d

B.3 struct stat

  The buffer of type struct stat used by the target and GDB is defined
  as follows:

    struct stat {
	unsigned int  st_dev;      /* device */
	unsigned int  st_ino;      /* inode */
	mode_t        st_mode;     /* protection */
	unsigned int  st_nlink;    /* number of hard links */
	unsigned int  st_uid;      /* user ID of owner */
	unsigned int  st_gid;      /* group ID of owner */
	unsigned int  st_rdev;     /* device type (if inode device) */
	unsigned long st_size;     /* total size, in bytes */
	unsigned long st_blksize;  /* blocksize for filesystem I/O */
	unsigned long st_blocks;   /* number of blocks allocated */
	time_t        st_atime;    /* time of last access */
	time_t        st_mtime;    /* time of last modification */
	time_t        st_ctime;    /* time of last change */
    };

  The integral datatypes are conforming to the definition in B.1 so this
  structure is of size 64 bytes.

  The values of several fields have a restricted meaning and/or
  range of values.

    st_dev:	0	file
		1	console

    st_ino:	No valid meaning for the target.  Transmitted unchanged.

    st_mode:	Valid mode bits are described in Appendix C.  Any other
    		bits have currently no meaning for the target.

    st_uid:	No valid meaning for the target.  Transmitted unchanged.

    st_gid:	No valid meaning for the target.  Transmitted unchanged.

    st_rdev:	No valid meaning for the target.  Transmitted unchanged.

    st_atime, st_mtime, st_ctime:
    		These values have a host and file system dependent
		accuracy.  Especially on Windows hosts the file systems
		don't support exact timing values.

  The target gets a struct stat of the above representation and is
  responsible to coerce it to the target representation before
  continuing.

  Note that due to size differences between the host and target 
  representation of stat members, these members could eventually
  get truncated on the target.

B.4 struct timeval

  The buffer of type struct timeval used by the target and GDB is defined
  as follows:

    struct timeval {
        time_t tv_sec;  /* second */
	long   tv_usec; /* microsecond */
    };

  The integral datatypes are conforming to the definition in B.1 so this
  structure is of size 8 bytes.

Appendix C: Constants

  The following values are used for the constants inside of the
  protocol.  GDB and target are resposible to translate these
  values before and after the call as needed.

C.1 Open flags

  All values are given in hexadecimal representation.

  O_RDONLY	  0
  O_WRONLY	  1
  O_RDWR	  2
  O_APPEND	  8
  O_CREAT	200
  O_TRUNC	400
  O_EXCL	800

C.2 mode_t values

  All values are given in octal representation.

  S_IFREG	100000
  S_IFDIR	 40000
  S_IRUSR	   400
  S_IWUSR	   200
  S_IXUSR	   100
  S_IRGRP	    40
  S_IWGRP	    20
  S_IXGRP	    10
  S_IROTH	     4
  S_IWOTH	     2
  S_IXOTH	     1

C.3 Errno values

  All values are given in decimal representation.

  EPERM		  1
  ENOENT	  2
  EINTR		  4
  EBADF		  9
  EACCES	 13
  EFAULT	 14
  EBUSY		 16
  EEXIST	 17
  ENODEV	 19
  ENOTDIR	 20
  EISDIR	 21
  EINVAL	 22
  ENFILE	 23
  EMFILE	 24
  EFBIG		 27
  ENOSPC	 28
  ESPIPE	 29
  EROFS		 30
  ENAMETOOLONG	 91
  EUNKNOWN	 9999

  EUNKNOWN is used as a fallback error value if a host system returns
  any error value not in the list of supported error numbers.

C.4 Lseek flags

  SEEK_SET	0
  SEEK_CUR	1
  SEEK_END	2

C.5 Limits

  INT_MIN	-2147483648
  INT_MAX	 2147483647
  UINT_MAX	 4294967295
  LONG_MIN	-9223372036854775808
  LONG_MAX	 9223372036854775807
  ULONG_MAX	 18446744073709551615

Appendix D: GDB setting for system(3)

  Due to security concerns about always allowing to call `system(3)'
  on the host, GDB gets an additional setting.  The user has to
  explicitely allow the system(3) call in the user interface.  Otherwise
  the system(3) call will fail and the target receives an error code EPERM.

  The setting is done using the following syntax:

    set remote system-call-allowed VAL

  with VAL being 0 or 1 for disaallowing resp. allowing the system(3)
  call.  The user can view the setting by calling

    show remote system-call-allowed

Appendix E: Examples

  In the examples below, `<-' and `->' are used to indicate transmitted
  and received data from GDB's point of view.

E.1 write call

  <- Fwrite,3,1234,6		<== fd=3, bufptr=0x1234, len=6
  -> m1234,6			<== read memory from target
  <- XXXXXX
  > F6				<== return "6 bytes written"

E.2 read call

  <- Fread,3,1234,6		<== fd=3, bufptr=0x1234, len=6
  -> M1234,6,XXXXXX		<== write syscall result into...
  <- OK				<== target's memory
  -> F6				<== return "6 bytes read"

E.3 read call, call fails on the host due to invalid file descriptor

  <- Fread,3,1234,6
  -> F-1,16			<== EINVAL

E.4 read call, writing data on target fails

  <- Fread,3,1234,6
  -> M1234,6,XXXXXX
  <- Ee
  -> F-1,e

E.5 read call, user presses Ctrl-C before syscall on host is called

  <- Fread,3,1234,6
  ...				<== M request or not, depends on user
  -> F-1,4,C
  <- T02

E.6 read call, user presses Ctrl-C after syscall on host is called

  <- Fread,3,1234,6
  -> M1234,6,XXXXXX
  <- XXXXXX
  -> F-1,4,C
  <- T02

===========================================================================

-- 
Corinna Vinschen
Cygwin Developer
Red Hat, Inc.
mailto:vinschen@redhat.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]