This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying (Re: [PATCH v5 3/5] C++-fy and prepare for sharing fork_inferior)


On Wednesday, April 12 2017, Pedro Alves wrote:

> On 04/12/2017 11:26 PM, Sergio Durigan Junior wrote:
>> On Wednesday, April 12 2017, Pedro Alves wrote:
>> 
>>> On 04/12/2017 06:04 AM, Sergio Durigan Junior wrote:
>>>
>>>>>>>  static void
>>>>>>> -breakup_args (char *scratch, char **argv)
>>>>>>> +breakup_args (const std::string &scratch, std::vector<char *> &argv)
>>>>>>>  {
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> +
>>>>>>> +      std::string arg = scratch.substr (cur_pos, next_sep - cur_pos);
>>>>>>> +
>>>>>>
>>>>>> This creates a temporary string (heap allocates) ...
>>>>>>
>>>>>>> +      argv.push_back (xstrdup (arg.c_str ()));
>>>>>>
>>>>>> ... and here you create yet another copy.
>>>>>>
>>>>>> You should be able to avoid it by using e.g., savestring:
>>>>>>
>>>>>>     char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos);
>>>>>>     argv.push_back (arg);
>>>>>
>>>>> Fair enough.  I had my mind on "C++-only mode" when writing this code.
>>>
>>> Yup, in C++, it's good to keep unnecessary temporaries
>>> and hidden heap allocations in mind.  Actually, now that I look a bit
>>> deeper, I think we can avoid a "premature pessimization" here, keeping
>>> the same level of clarity.
>>>
>>> I think it'd be good to push this patch below.  WDYT?
>> 
>> Hm, I don't know.  I mean, I understand where you're coming from and
>> what you're trying to achieve, but somehow I thought the old code
>> (especially the part that modified the argument string in place, using
>> \0's as separators) was somewhat hacky.  Even though the new code uses
>> more memory, it it also more readable, at least IMNSHO.  And also more
>> const-correct, since we can make breakup_args_for_exec receive a const
>> as the first argument.
>
> There's nothing wrong with writing to a string in place.  A std::string
> is just a _container_ of chars with convenience methods.  It's
> been designed to support embedded \0s.  And it's not a problem for a
> function to take a copy of a value if it needs to modify it!
> const-correct refers to when functions that _don't need to
> modify an argument nevertheless declare non-const reference/pointer.

Sorry, maybe I should have made myself clearer.  I did not mean to say
that there was anything actually wrong with modifying a string in-place;
I just meant that it seems somewhat hacky and more error-prone to me
than separate the string into substrings.

I totally understand that a std::string is just a container, and that it
is OK to modify it if needed.  But we're not talking about a generic
string manipulation here; we're talking about inserting \0's on the
string in order to split it.  Anyway, I totally understand this is valid
code.

About the const-correctness comment: you're right, I should have said
that the original function is better IMHO because it doesn't modify its
arguments.

> Effectively you can't avoid _some_ creating _some_ new string(s). 
> But instead of taking one single copy (O(1)), the current code is making
> many (N) tiny copies.  Trips to the generic allocator / malloc should
> be one of the first things to avoid if easy to avoid, as a principle.

Yep, I am aware that the code is more efficient using less strings, but
I thought "OK, how many args will we be dealing with here, usually?",
and I decided that it wasn't worth.

> The patch I had posted is exactly equivalent to the one below, except
> the state/storage in this one is moved to a class.  Maybe you'd
> find this clearer?  (It'd need some cleaning up for comments
> further at least, but it's getting quite late here...)

I liked your patch.  There are some things I'd like to comment, but
overall it is good and I think it does a nice job of separating things
on fork_inferior.  As I said before, I won't really make a fuss about
this approach if you think it is the right way to go; my comments were
solely based on my personal taste.

> From 689a881839f9d4aeb36695676856db2eaf8e97ab Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 13 Apr 2017 04:13:22 +0100
> Subject: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string
>  copying
>
> The previous patch converted the argv building from an
> alloca-allocated array of non-owning arg pointers, to a std::vector of
> owning pointers, which results in N string dups, with N being the
> number of arguments in the vector, and then requires manually
> releasing the pointers owned by the vector.
>
> This patch makes the vector hold non-owning pointers, and avoids the
> string dups, by doing one single string copy of the arguments upfront,
> and replacing separators with NULL terminators in place, like we used
> to.
>
> With this, there's no need to remember to call free_vector_argv either.
>
> gdb/ChangeLog:
> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
>
> 	* fork-child.c (breakup_args): Make 'scratch' non-const.
> 	Replace separators with NULL terminators in place.  Change
> 	type of vector.  (fork_inferior): The argument vector now
> 	holds non-owning pointers.  Don't strdup strings into the
> 	vector.  Remove free_vector_argv call.
> ---
>  gdb/fork-child.c | 278 ++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 172 insertions(+), 106 deletions(-)
>
> diff --git a/gdb/fork-child.c b/gdb/fork-child.c
> index 6b7386e..91986f7 100644
> --- a/gdb/fork-child.c
> +++ b/gdb/fork-child.c
> @@ -43,38 +43,96 @@ extern char **environ;
>  
>  static char *exec_wrapper;
>  
> -/* Break up SCRATCH into an argument vector suitable for passing to
> -   execvp and store it in ARGV.  E.g., on "run a b c d" this routine
> +/* Build the argument vector for execv.  */
> +
> +class execv_argv_builder
> +{
> +public:
> +  /* EXEC_FILE is the file to run.  ALLARGS is a string containing the
> +     arguments to the program.  If starting with a shell, SHELL_FILE
> +     is the shell to run.  Otherwise, SHELL_FILE is NULL.  */
> +  execv_argv_builder (const char *exec_file,
> +		      const std::string &allargs,
> +		      const char *shell_file);
> +
> +  /* Return a pointer to the built argv, in the type expected by
> +     execv.  */
> +  char **argv ()
> +  {
> +    /* It is guaranteed that the exec functions do not modify the
> +       arguments, but they nevertheless expect "char **", so that's
> +       the type we return.  */
> +    return const_cast<char **> (&m_argv[0]);
> +  }
> +
> +private:
> +  void build_no_shell (const char *exec_file,
> +		       const std::string &allargs);
> +
> +  void build_shell (const char *exec_file,
> +		    const std::string &allargs,
> +		    const char *shell_file);

Missing comments on the functions.

> +
> +  /* The argument vector built.  Holds non-owning pointers.  */
> +  std::vector<const char *> m_argv;

OOC, I notice you're prefixing a lot (if not all) of your class
variables with "m_".  I understand this is a code convention to signal
when a variable is a member of a class, but are we following this
officially?  Because (again!) I'm not really fond of this :-).

> +  /* Storage.  Arguments in C_ARGV point inside these.  */
> +
> +  /* For the !shell case.  */
> +  std::string m_broke_up_args;

If we're going this way, then I'd like to see this comment expanded to
explain how the args are split, i.e., using \0 as the delimiter.

> +
> +  /* For the shell case.  */
> +  std::string m_shell_command;

> +};
> +
> +/* Break up allargs into an argument vector suitable for passing to
> +   execvp and store it in M_ARGV.  E.g., on "run a b c d" this routine
>     would get as input the string "a b c d", and as output it would
> -   fill in ARGV with the four arguments "a", "b", "c", "d".  */
> +   fill in M_ARGV with the four arguments "a", "b", "c", "d".  */
>  
> -static void
> -breakup_args (const std::string &scratch, std::vector<char *> &argv)
> +void
> +execv_argv_builder::build_no_shell (const char *exec_file,
> +				    const std::string &allargs)
>  {
> -  for (size_t cur_pos = 0; cur_pos < scratch.size ();)
> +  /* We're going to call execvp.  Create argument vector.  */
> +
> +  /* Save/work with a copy.  The pointers pushed to M_ARGV point
> +     directly into M_BROKE_UP_ARGS, which is modified in place with
> +     the necessary NULL terminators.  */
> +  m_broke_up_args = allargs;
> +
> +  m_argv.push_back (exec_file);
> +
> +  for (size_t cur_pos = 0; cur_pos < m_broke_up_args.size ();)
>      {
>        /* Skip whitespace-like chars.  */
> -      std::size_t pos = scratch.find_first_not_of (" \t\n", cur_pos);
> +      std::size_t pos = m_broke_up_args.find_first_not_of (" \t\n", cur_pos);
>  
>        if (pos != std::string::npos)
>  	cur_pos = pos;
>  
>        /* Find the position of the next separator.  */
> -      std::size_t next_sep = scratch.find_first_of (" \t\n", cur_pos);
> +      std::size_t next_sep = m_broke_up_args.find_first_of (" \t\n", cur_pos);
>  
> -      /* No separator found, which means this is the last
> -	 argument.  */
>        if (next_sep == std::string::npos)
> -	next_sep = scratch.size ();
> +	{
> +	  /* No separator found, which means this is the last
> +	     argument.  */
> +	  next_sep = m_broke_up_args.size ();
> +	}
> +      else
> +	{
> +	  /* Replace the separator with a terminator.  */
> +	  m_broke_up_args[next_sep++] = '\0';
> +	}
>  
> -      char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos);
> -      argv.push_back (arg);
> +      m_argv.push_back (&m_broke_up_args[cur_pos]);
>  
>        cur_pos = next_sep;
>      }
>  
>    /* NULL-terminate the vector.  */
> -  argv.push_back (NULL);
> +  m_argv.push_back (NULL);
>  }
>  
>  /* When executing a command under the given shell, return non-zero if
> @@ -101,6 +159,101 @@ escape_bang_in_quoted_argument (const char *shell_file)
>    return 0;
>  }

Missing a comment here explaining the function below (or pointing to the
right place).

> +execv_argv_builder::execv_argv_builder (const char *exec_file,
> +					const std::string &allargs,
> +					const char *shell_file)
> +{
> +  if (shell_file == NULL)
> +    build_no_shell (exec_file, allargs);
> +  else
> +    build_shell (exec_file, allargs, shell_file);
> +}

And here.

> +
> +void
> +execv_argv_builder::build_shell (const char *exec_file,
> +				 const std::string &allargs,
> +				 const char *shell_file)
> +{
> +  /* We're going to call a shell.  */
> +  const char *p;
> +  int need_to_quote;
> +  const int escape_bang = escape_bang_in_quoted_argument (shell_file);
> +
> +  m_shell_command = "exec ";
> +
> +  /* Add any exec wrapper.  That may be a program name with arguments,
> +     so the user must handle quoting.  */
> +  if (exec_wrapper)
> +    {
> +      m_shell_command += exec_wrapper;
> +      m_shell_command += ' ';
> +    }
> +
> +  /* Now add exec_file, quoting as necessary.  */
> +
> +  /* Quoting in this style is said to work with all shells.  But csh
> +     on IRIX 4.0.1 can't deal with it.  So we only quote it if we need
> +     to.  */
> +  p = exec_file;
> +  while (1)
> +    {
> +      switch (*p)
> +	{
> +	case '\'':
> +	case '!':
> +	case '"':
> +	case '(':
> +	case ')':
> +	case '$':
> +	case '&':
> +	case ';':
> +	case '<':
> +	case '>':
> +	case ' ':
> +	case '\n':
> +	case '\t':
> +	  need_to_quote = 1;
> +	  goto end_scan;
> +
> +	case '\0':
> +	  need_to_quote = 0;
> +	  goto end_scan;
> +
> +	default:
> +	  break;
> +	}
> +      ++p;
> +    }
> + end_scan:
> +  if (need_to_quote)
> +    {
> +      m_shell_command += '\'';
> +      for (p = exec_file; *p != '\0'; ++p)
> +	{
> +	  if (*p == '\'')
> +	    m_shell_command += "'\\''";
> +	  else if (*p == '!' && escape_bang)
> +	    m_shell_command += "\\!";
> +	  else
> +	    m_shell_command += *p;
> +	}
> +      m_shell_command += '\'';
> +    }
> +  else
> +    m_shell_command += exec_file;
> +
> +  m_shell_command += " " + allargs;
> +
> +  /* If we decided above to start up with a shell, we exec the shell,
> +     "-c" says to interpret the next arg as a shell command to
> +     execute, and this command is "exec <target-program> <args>".  */
> +  m_argv.reserve (4);
> +  m_argv.push_back (shell_file);
> +  m_argv.push_back ("-c");
> +  m_argv.push_back (m_shell_command.c_str ());
> +  m_argv.push_back (NULL);
> +}
> +
>  /* See inferior.h.  */
>  
>  void
> @@ -155,7 +308,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
>    static const char *exec_file;
>    char **save_our_env;
>    int shell = 0;
> -  std::vector<char *> argv;

I believe you can also remove the 'static' keyword from the 'shell_file'
declaration.

>    const char *inferior_io_terminal = get_inferior_io_terminal ();
>    struct inferior *inf;
>    int i;
> @@ -183,95 +335,9 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
>        shell = 1;
>      }
>  
> -  if (!shell)
> -    {
> -      /* We're going to call execvp.  Create argument vector.  */
> -      argv.push_back (xstrdup (exec_file));
> -      breakup_args (allargs, argv);
> -    }
> -  else
> -    {
> -      /* We're going to call a shell.  */
> -      std::string shell_command;
> -      const char *p;
> -      int need_to_quote;
> -      const int escape_bang = escape_bang_in_quoted_argument (shell_file);
> -
> -      shell_command = std::string ("exec ");
> -
> -      /* Add any exec wrapper.  That may be a program name with arguments, so
> -	 the user must handle quoting.  */
> -      if (exec_wrapper)
> -	{
> -	  shell_command += exec_wrapper;
> -	  shell_command += ' ';
> -	}
> -
> -      /* Now add exec_file, quoting as necessary.  */
> -
> -      /* Quoting in this style is said to work with all shells.  But
> -         csh on IRIX 4.0.1 can't deal with it.  So we only quote it if
> -         we need to.  */
> -      p = exec_file;
> -      while (1)
> -	{
> -	  switch (*p)
> -	    {
> -	    case '\'':
> -	    case '!':
> -	    case '"':
> -	    case '(':
> -	    case ')':
> -	    case '$':
> -	    case '&':
> -	    case ';':
> -	    case '<':
> -	    case '>':
> -	    case ' ':
> -	    case '\n':
> -	    case '\t':
> -	      need_to_quote = 1;
> -	      goto end_scan;
> -
> -	    case '\0':
> -	      need_to_quote = 0;
> -	      goto end_scan;
> -
> -	    default:
> -	      break;
> -	    }
> -	  ++p;
> -	}
> -    end_scan:
> -      if (need_to_quote)
> -	{
> -	  shell_command += '\'';
> -	  for (p = exec_file; *p != '\0'; ++p)
> -	    {
> -	      if (*p == '\'')
> -		shell_command += "'\\''";
> -	      else if (*p == '!' && escape_bang)
> -		shell_command += "\\!";
> -	      else
> -		shell_command += *p;
> -	    }
> -	  shell_command += '\'';
> -	}
> -      else
> -	shell_command += exec_file;
> -
> -      shell_command += " " + allargs;
> -
> -      /* If we decided above to start up with a shell, we exec the
> -	 shell, "-c" says to interpret the next arg as a shell command
> -	 to execute, and this command is "exec <target-program>
> -	 <args>".  We xstrdup all the strings here because they will
> -	 be free'd later in the code.  */
> -      argv.push_back (xstrdup (shell_file));
> -      argv.push_back (xstrdup ("-c"));
> -      argv.push_back (xstrdup (shell_command.c_str ()));
> -      argv.push_back (NULL);
> -    }
> +  /* Build the argument vector.  */
> +  execv_argv_builder argv_builder (exec_file, allargs,
> +				   shell ? shell_file : NULL);

Good code cleanup!

BTW, it is possible to simplify this part and remove the "shell"
variable completely, using only "shell_file".

>  
>    /* Retain a copy of our environment variables, since the child will
>       replace the value of environ and if we're vforked, we have to
> @@ -376,6 +442,8 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
>           path to find $SHELL.  Rich Pixley says so, and I agree.  */
>        environ = env;
>  
> +      char **argv = argv_builder.argv ();
> +
>        if (exec_fun != NULL)
>          (*exec_fun) (argv[0], &argv[0], env);
>        else
> @@ -393,8 +461,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
>        _exit (0177);
>      }
>  
> -  free_vector_argv (argv);
> -
>    /* Restore our environment in case a vforked child clob'd it.  */
>    environ = save_our_env;
>  
> -- 
> 2.5.5

Otherwise, LGTM.

Thanks for doing this,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]