This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying (Re: [PATCH v5 3/5] C++-fy and prepare for sharing fork_inferior)
- From: Sergio Durigan Junior <sergiodj at redhat dot com>
- To: Pedro Alves <palves at redhat dot com>
- Cc: GDB Patches <gdb-patches at sourceware dot org>
- Date: Thu, 13 Apr 2017 00:31:21 -0400
- Subject: Re: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying (Re: [PATCH v5 3/5] C++-fy and prepare for sharing fork_inferior)
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=sergiodj at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 43F6885376
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 43F6885376
- References: <1482464361-4068-1-git-send-email-sergiodj@redhat.com> <20170330014915.4894-1-sergiodj@redhat.com> <20170330014915.4894-4-sergiodj@redhat.com> <f39b6736-db30-93ef-f6f6-83dfe2c28b60@redhat.com> <877f2q3223.fsf@redhat.com> <87o9w2z068.fsf@redhat.com> <36f28afa-1174-df6a-cb54-fdcf129fa816@redhat.com> <878tn5w9d7.fsf@redhat.com> <01d7a597-ac60-9764-142b-bcaefd269271@redhat.com>
On Wednesday, April 12 2017, Pedro Alves wrote:
> On 04/12/2017 11:26 PM, Sergio Durigan Junior wrote:
>> On Wednesday, April 12 2017, Pedro Alves wrote:
>>
>>> On 04/12/2017 06:04 AM, Sergio Durigan Junior wrote:
>>>
>>>>>>> static void
>>>>>>> -breakup_args (char *scratch, char **argv)
>>>>>>> +breakup_args (const std::string &scratch, std::vector<char *> &argv)
>>>>>>> {
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> +
>>>>>>> + std::string arg = scratch.substr (cur_pos, next_sep - cur_pos);
>>>>>>> +
>>>>>>
>>>>>> This creates a temporary string (heap allocates) ...
>>>>>>
>>>>>>> + argv.push_back (xstrdup (arg.c_str ()));
>>>>>>
>>>>>> ... and here you create yet another copy.
>>>>>>
>>>>>> You should be able to avoid it by using e.g., savestring:
>>>>>>
>>>>>> char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos);
>>>>>> argv.push_back (arg);
>>>>>
>>>>> Fair enough. I had my mind on "C++-only mode" when writing this code.
>>>
>>> Yup, in C++, it's good to keep unnecessary temporaries
>>> and hidden heap allocations in mind. Actually, now that I look a bit
>>> deeper, I think we can avoid a "premature pessimization" here, keeping
>>> the same level of clarity.
>>>
>>> I think it'd be good to push this patch below. WDYT?
>>
>> Hm, I don't know. I mean, I understand where you're coming from and
>> what you're trying to achieve, but somehow I thought the old code
>> (especially the part that modified the argument string in place, using
>> \0's as separators) was somewhat hacky. Even though the new code uses
>> more memory, it it also more readable, at least IMNSHO. And also more
>> const-correct, since we can make breakup_args_for_exec receive a const
>> as the first argument.
>
> There's nothing wrong with writing to a string in place. A std::string
> is just a _container_ of chars with convenience methods. It's
> been designed to support embedded \0s. And it's not a problem for a
> function to take a copy of a value if it needs to modify it!
> const-correct refers to when functions that _don't need to
> modify an argument nevertheless declare non-const reference/pointer.
Sorry, maybe I should have made myself clearer. I did not mean to say
that there was anything actually wrong with modifying a string in-place;
I just meant that it seems somewhat hacky and more error-prone to me
than separate the string into substrings.
I totally understand that a std::string is just a container, and that it
is OK to modify it if needed. But we're not talking about a generic
string manipulation here; we're talking about inserting \0's on the
string in order to split it. Anyway, I totally understand this is valid
code.
About the const-correctness comment: you're right, I should have said
that the original function is better IMHO because it doesn't modify its
arguments.
> Effectively you can't avoid _some_ creating _some_ new string(s).
> But instead of taking one single copy (O(1)), the current code is making
> many (N) tiny copies. Trips to the generic allocator / malloc should
> be one of the first things to avoid if easy to avoid, as a principle.
Yep, I am aware that the code is more efficient using less strings, but
I thought "OK, how many args will we be dealing with here, usually?",
and I decided that it wasn't worth.
> The patch I had posted is exactly equivalent to the one below, except
> the state/storage in this one is moved to a class. Maybe you'd
> find this clearer? (It'd need some cleaning up for comments
> further at least, but it's getting quite late here...)
I liked your patch. There are some things I'd like to comment, but
overall it is good and I think it does a nice job of separating things
on fork_inferior. As I said before, I won't really make a fuss about
this approach if you think it is the right way to go; my comments were
solely based on my personal taste.
> From 689a881839f9d4aeb36695676856db2eaf8e97ab Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 13 Apr 2017 04:13:22 +0100
> Subject: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string
> copying
>
> The previous patch converted the argv building from an
> alloca-allocated array of non-owning arg pointers, to a std::vector of
> owning pointers, which results in N string dups, with N being the
> number of arguments in the vector, and then requires manually
> releasing the pointers owned by the vector.
>
> This patch makes the vector hold non-owning pointers, and avoids the
> string dups, by doing one single string copy of the arguments upfront,
> and replacing separators with NULL terminators in place, like we used
> to.
>
> With this, there's no need to remember to call free_vector_argv either.
>
> gdb/ChangeLog:
> yyyy-mm-dd Pedro Alves <palves@redhat.com>
>
> * fork-child.c (breakup_args): Make 'scratch' non-const.
> Replace separators with NULL terminators in place. Change
> type of vector. (fork_inferior): The argument vector now
> holds non-owning pointers. Don't strdup strings into the
> vector. Remove free_vector_argv call.
> ---
> gdb/fork-child.c | 278 ++++++++++++++++++++++++++++++++++---------------------
> 1 file changed, 172 insertions(+), 106 deletions(-)
>
> diff --git a/gdb/fork-child.c b/gdb/fork-child.c
> index 6b7386e..91986f7 100644
> --- a/gdb/fork-child.c
> +++ b/gdb/fork-child.c
> @@ -43,38 +43,96 @@ extern char **environ;
>
> static char *exec_wrapper;
>
> -/* Break up SCRATCH into an argument vector suitable for passing to
> - execvp and store it in ARGV. E.g., on "run a b c d" this routine
> +/* Build the argument vector for execv. */
> +
> +class execv_argv_builder
> +{
> +public:
> + /* EXEC_FILE is the file to run. ALLARGS is a string containing the
> + arguments to the program. If starting with a shell, SHELL_FILE
> + is the shell to run. Otherwise, SHELL_FILE is NULL. */
> + execv_argv_builder (const char *exec_file,
> + const std::string &allargs,
> + const char *shell_file);
> +
> + /* Return a pointer to the built argv, in the type expected by
> + execv. */
> + char **argv ()
> + {
> + /* It is guaranteed that the exec functions do not modify the
> + arguments, but they nevertheless expect "char **", so that's
> + the type we return. */
> + return const_cast<char **> (&m_argv[0]);
> + }
> +
> +private:
> + void build_no_shell (const char *exec_file,
> + const std::string &allargs);
> +
> + void build_shell (const char *exec_file,
> + const std::string &allargs,
> + const char *shell_file);
Missing comments on the functions.
> +
> + /* The argument vector built. Holds non-owning pointers. */
> + std::vector<const char *> m_argv;
OOC, I notice you're prefixing a lot (if not all) of your class
variables with "m_". I understand this is a code convention to signal
when a variable is a member of a class, but are we following this
officially? Because (again!) I'm not really fond of this :-).
> + /* Storage. Arguments in C_ARGV point inside these. */
> +
> + /* For the !shell case. */
> + std::string m_broke_up_args;
If we're going this way, then I'd like to see this comment expanded to
explain how the args are split, i.e., using \0 as the delimiter.
> +
> + /* For the shell case. */
> + std::string m_shell_command;
> +};
> +
> +/* Break up allargs into an argument vector suitable for passing to
> + execvp and store it in M_ARGV. E.g., on "run a b c d" this routine
> would get as input the string "a b c d", and as output it would
> - fill in ARGV with the four arguments "a", "b", "c", "d". */
> + fill in M_ARGV with the four arguments "a", "b", "c", "d". */
>
> -static void
> -breakup_args (const std::string &scratch, std::vector<char *> &argv)
> +void
> +execv_argv_builder::build_no_shell (const char *exec_file,
> + const std::string &allargs)
> {
> - for (size_t cur_pos = 0; cur_pos < scratch.size ();)
> + /* We're going to call execvp. Create argument vector. */
> +
> + /* Save/work with a copy. The pointers pushed to M_ARGV point
> + directly into M_BROKE_UP_ARGS, which is modified in place with
> + the necessary NULL terminators. */
> + m_broke_up_args = allargs;
> +
> + m_argv.push_back (exec_file);
> +
> + for (size_t cur_pos = 0; cur_pos < m_broke_up_args.size ();)
> {
> /* Skip whitespace-like chars. */
> - std::size_t pos = scratch.find_first_not_of (" \t\n", cur_pos);
> + std::size_t pos = m_broke_up_args.find_first_not_of (" \t\n", cur_pos);
>
> if (pos != std::string::npos)
> cur_pos = pos;
>
> /* Find the position of the next separator. */
> - std::size_t next_sep = scratch.find_first_of (" \t\n", cur_pos);
> + std::size_t next_sep = m_broke_up_args.find_first_of (" \t\n", cur_pos);
>
> - /* No separator found, which means this is the last
> - argument. */
> if (next_sep == std::string::npos)
> - next_sep = scratch.size ();
> + {
> + /* No separator found, which means this is the last
> + argument. */
> + next_sep = m_broke_up_args.size ();
> + }
> + else
> + {
> + /* Replace the separator with a terminator. */
> + m_broke_up_args[next_sep++] = '\0';
> + }
>
> - char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos);
> - argv.push_back (arg);
> + m_argv.push_back (&m_broke_up_args[cur_pos]);
>
> cur_pos = next_sep;
> }
>
> /* NULL-terminate the vector. */
> - argv.push_back (NULL);
> + m_argv.push_back (NULL);
> }
>
> /* When executing a command under the given shell, return non-zero if
> @@ -101,6 +159,101 @@ escape_bang_in_quoted_argument (const char *shell_file)
> return 0;
> }
Missing a comment here explaining the function below (or pointing to the
right place).
> +execv_argv_builder::execv_argv_builder (const char *exec_file,
> + const std::string &allargs,
> + const char *shell_file)
> +{
> + if (shell_file == NULL)
> + build_no_shell (exec_file, allargs);
> + else
> + build_shell (exec_file, allargs, shell_file);
> +}
And here.
> +
> +void
> +execv_argv_builder::build_shell (const char *exec_file,
> + const std::string &allargs,
> + const char *shell_file)
> +{
> + /* We're going to call a shell. */
> + const char *p;
> + int need_to_quote;
> + const int escape_bang = escape_bang_in_quoted_argument (shell_file);
> +
> + m_shell_command = "exec ";
> +
> + /* Add any exec wrapper. That may be a program name with arguments,
> + so the user must handle quoting. */
> + if (exec_wrapper)
> + {
> + m_shell_command += exec_wrapper;
> + m_shell_command += ' ';
> + }
> +
> + /* Now add exec_file, quoting as necessary. */
> +
> + /* Quoting in this style is said to work with all shells. But csh
> + on IRIX 4.0.1 can't deal with it. So we only quote it if we need
> + to. */
> + p = exec_file;
> + while (1)
> + {
> + switch (*p)
> + {
> + case '\'':
> + case '!':
> + case '"':
> + case '(':
> + case ')':
> + case '$':
> + case '&':
> + case ';':
> + case '<':
> + case '>':
> + case ' ':
> + case '\n':
> + case '\t':
> + need_to_quote = 1;
> + goto end_scan;
> +
> + case '\0':
> + need_to_quote = 0;
> + goto end_scan;
> +
> + default:
> + break;
> + }
> + ++p;
> + }
> + end_scan:
> + if (need_to_quote)
> + {
> + m_shell_command += '\'';
> + for (p = exec_file; *p != '\0'; ++p)
> + {
> + if (*p == '\'')
> + m_shell_command += "'\\''";
> + else if (*p == '!' && escape_bang)
> + m_shell_command += "\\!";
> + else
> + m_shell_command += *p;
> + }
> + m_shell_command += '\'';
> + }
> + else
> + m_shell_command += exec_file;
> +
> + m_shell_command += " " + allargs;
> +
> + /* If we decided above to start up with a shell, we exec the shell,
> + "-c" says to interpret the next arg as a shell command to
> + execute, and this command is "exec <target-program> <args>". */
> + m_argv.reserve (4);
> + m_argv.push_back (shell_file);
> + m_argv.push_back ("-c");
> + m_argv.push_back (m_shell_command.c_str ());
> + m_argv.push_back (NULL);
> +}
> +
> /* See inferior.h. */
>
> void
> @@ -155,7 +308,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
> static const char *exec_file;
> char **save_our_env;
> int shell = 0;
> - std::vector<char *> argv;
I believe you can also remove the 'static' keyword from the 'shell_file'
declaration.
> const char *inferior_io_terminal = get_inferior_io_terminal ();
> struct inferior *inf;
> int i;
> @@ -183,95 +335,9 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
> shell = 1;
> }
>
> - if (!shell)
> - {
> - /* We're going to call execvp. Create argument vector. */
> - argv.push_back (xstrdup (exec_file));
> - breakup_args (allargs, argv);
> - }
> - else
> - {
> - /* We're going to call a shell. */
> - std::string shell_command;
> - const char *p;
> - int need_to_quote;
> - const int escape_bang = escape_bang_in_quoted_argument (shell_file);
> -
> - shell_command = std::string ("exec ");
> -
> - /* Add any exec wrapper. That may be a program name with arguments, so
> - the user must handle quoting. */
> - if (exec_wrapper)
> - {
> - shell_command += exec_wrapper;
> - shell_command += ' ';
> - }
> -
> - /* Now add exec_file, quoting as necessary. */
> -
> - /* Quoting in this style is said to work with all shells. But
> - csh on IRIX 4.0.1 can't deal with it. So we only quote it if
> - we need to. */
> - p = exec_file;
> - while (1)
> - {
> - switch (*p)
> - {
> - case '\'':
> - case '!':
> - case '"':
> - case '(':
> - case ')':
> - case '$':
> - case '&':
> - case ';':
> - case '<':
> - case '>':
> - case ' ':
> - case '\n':
> - case '\t':
> - need_to_quote = 1;
> - goto end_scan;
> -
> - case '\0':
> - need_to_quote = 0;
> - goto end_scan;
> -
> - default:
> - break;
> - }
> - ++p;
> - }
> - end_scan:
> - if (need_to_quote)
> - {
> - shell_command += '\'';
> - for (p = exec_file; *p != '\0'; ++p)
> - {
> - if (*p == '\'')
> - shell_command += "'\\''";
> - else if (*p == '!' && escape_bang)
> - shell_command += "\\!";
> - else
> - shell_command += *p;
> - }
> - shell_command += '\'';
> - }
> - else
> - shell_command += exec_file;
> -
> - shell_command += " " + allargs;
> -
> - /* If we decided above to start up with a shell, we exec the
> - shell, "-c" says to interpret the next arg as a shell command
> - to execute, and this command is "exec <target-program>
> - <args>". We xstrdup all the strings here because they will
> - be free'd later in the code. */
> - argv.push_back (xstrdup (shell_file));
> - argv.push_back (xstrdup ("-c"));
> - argv.push_back (xstrdup (shell_command.c_str ()));
> - argv.push_back (NULL);
> - }
> + /* Build the argument vector. */
> + execv_argv_builder argv_builder (exec_file, allargs,
> + shell ? shell_file : NULL);
Good code cleanup!
BTW, it is possible to simplify this part and remove the "shell"
variable completely, using only "shell_file".
>
> /* Retain a copy of our environment variables, since the child will
> replace the value of environ and if we're vforked, we have to
> @@ -376,6 +442,8 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
> path to find $SHELL. Rich Pixley says so, and I agree. */
> environ = env;
>
> + char **argv = argv_builder.argv ();
> +
> if (exec_fun != NULL)
> (*exec_fun) (argv[0], &argv[0], env);
> else
> @@ -393,8 +461,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
> _exit (0177);
> }
>
> - free_vector_argv (argv);
> -
> /* Restore our environment in case a vforked child clob'd it. */
> environ = save_our_env;
>
> --
> 2.5.5
Otherwise, LGTM.
Thanks for doing this,
--
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF 31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/