This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
deleting relocs, objcopy and BFD

From: jose dot marchesi at oracle dot com (Jose E. Marchesi)
To: binutils at sourceware dot org
Date: Thu, 27 Apr 2017 17:42:31 +0200
Subject: deleting relocs, objcopy and BFD
Authentication-results: sourceware.org; auth=none
Howdy!  Apologies in advance for the long email :)

The recent commit

commit 1d15e434f43bc41a07bc7b0648fcb7e6ccbe8dcc
Author: Nick Clifton <nickc@redhat.com>
Date:   Thu Apr 13 14:50:56 2017 +0100

    Add note merging to strip and add code to merge stack size notes.

    * objcopy.c: Add --no-merge-notes option to disable note merging.
      Add --[no-]merge-notes option to strip, and enable it by default.
      (num_bytes): New function.
      (merge_gnu_build_notes): Add code to merge stack size notes.
    * binutils.texi: Update strip and objcopy documentation.
    * readelf.c (print_gnu_build_attribute_name): Use defined
      constants for note types.

Introduced a regression in two ELF targets: elf64-sparc and elf64-mips:

FAIL: merge notes section (64-bits)

This happens because objcopy segfauls when relocs get deleted as the
result of removing GNU build notes in a merge:

$ ./objcopy --merge-notes tmpdir/bintest.o tmpdir/copy.o
Segmentation fault

While investigating it I found two different problems: one in the way
BFD handles relocations and another in the way objcopy deletes
relocations.

Let me describe these problems and suggest some solutions.

Problem 1: incoherence with internal relocs and external relocs breaks
the API.

The main fields involved in the handling of relocations in `asection'
structs are (from section.c):

.  {* If an input section, a pointer to a vector of relocation
.     records for the data in this section.  *}
.  struct reloc_cache_entry *relocation;
.
.  {* If an output section, a pointer to a vector of pointers to
.     relocation records for the data in this section.  *}
.  struct reloc_cache_entry **orelocation;
.
.  {* The number of relocation records in one of the above.  *}
.  unsigned reloc_count;

For the vast majority of the targets supported by BFD, the semantics
documented above are correct: there is a 1-1 relationship between
internal relocs (arelents) and the external relocs.

However, there are two ELF targets, sparc64 and mips64, for which the
relationship is different:

- In mips64 every external relocation is implementing using three
  internal relocations.

- In sparc64 all external relocations are implemented using a single
  internal relocation, with the exception of R_SPARC_OLO10, which
  is implemented using two internal relocations.

Due to the above, `sec->reloc_count' really contains the number of
external relocations, and thus:

- In mips64 `sec->relocation' contains `sec->reloc_count * 3' entries.

- In sparc64 `sec->relocation' contains `sec->reloc_count + N' entries,
  where N can only be determined traversing all the SHR_RELA external
  relocations.

- In any other target, `sec->relocation' contains `sec->reloc_count'
  entries.

The ELF backend handles this in two ways in two different areas: linking
and other purposes.

The ELF linker routines use a constant in `struct elf_size_info', which
is defined by the ELF backends:

/* The number of internal relocations to allocate per external
   relocation entry.  */
unsigned char int_rels_per_ext_rel;

Like all the other fields in this struct, `int_rels_per_ext_rel' is a
constant, and it is defined:

elf64-mips: int_rels_per_ext_rel = 3
elf64-sparc: int_rels_per_ext_rel = 1
All other targets: int_rels_per_ext_rel = 1

Note that `1' is right for sparc64 because it looks like the
R_SPARC_OLO10 relocation is not used during linking.  So no problems
there.

For every other purpose (i.e. assembler and binutils) mips64 and sparc64
accomodated their peculiar requirements by providing special versions of
`bfd_canonicalize_reloc'.

`bfd_canonicalize_reloc' is used to get a copy of the `sec->relocation'
array in `*relpp'.  The documentation of this function in bfd.c says:

  "Call the back end associated with the open BFD @var{abfd} and
   translate the external form of the relocation information attached to
   @var{sec} into the internal canonical form.  Place the table into
   memory at @var{loc}, which has been preallocated, usually by a call
   to <<bfd_get_reloc_upper_bound>>.  Returns the number of relocs, or
   -1 on error."

By "the number of relocs" it means the number of arelents stored in
`sec->relocation' after the operation, i.e. the number of internal
relocations.

- The default ELF implementation in elf.c first calls
  `slurp_reloc_table', that is expected to read the SHT_REL and SHT_RELA
  sections, convert from the external ELF relocation entries to internal
  arelents and update `sec->relocation'.

  Then it copies `sec->reloc_count' entries from `sec->relocation' to
  the output argument `relpp' and finally returns `sec->reloc_count' to
  the caller.

- The mips64 ELF target does something slightly different: it also calls
  `slurp_reloc_table' that updates `sec->relocation'.

  Then it copies `sec->reloc_count * 3' entries from `sec->relocation'
  to the output argument `relpp' and finally returns `sec->reloc_count *
  3' to the caller.

- The ELF64 ELF target is also different: it calls `slurp_reloc_table'
  that updates `sec->relocation'.

  Then it copies `elf_section_data(sec)->reloc_count' entries from
  `sec->relocation' to the output argument `relpp' and finally returns
  `elf_section_data(sec)->reloc_count'.

  Note that `bfd_elf_section_data(sec)->reloc_count' is _not_
  `sec->reloc_count'.  It is an additional field added by the SPARC ELF
  backend in elfxx-sparc.h used to store the exact number of arelents
  generated from the external relocs.

Lets now look at `slurp_reloc_table'.  As mentioned above, this function
is expected to scan the SHT_REL and SHT_RELA sections, transform the ELF
relocations to arelents and set the value of `sec->relocation'
accordingly.  mips64 and sparc64 also have to provide specialized
versions of this internal function:

- The default ELF implementation in elfcode.h first checks that
  `sec->relocation' is not NULL.  If it is, it just returns.

  Then it counts the number of relocations stored in SHR_REL
  (reloc_count) and SH_RELA (reloc_count2) for the section, and
  allocates space for `sec->relocation', sets its contents, and return:

  amt = (reloc_count + reloc_count2) * sizeof (arelent);
  relents = (arelent *) bfd_alloc (abfd, amt);
  [...]
  /* Set contents of `relents'.  */
  [...]
  asection->relocation = relents;
  return TRUE;

- The mips64 ELF target also checks `sec->relocation' and happily
  returns if it is NULL.

  Then it counts the number of relocations stored in SHR_REL and
  SHR_RELA for the section, and allocates space for `sec->relocation',
  but it knows that an ELF relocation translates into 3 arelents, so it
  does:

  amt = (reloc_count + reloc_count2) * 3 * sizeof (arelent);
  relents = bfd_alloc (abfd, amt);

  Interestingly, this implementation _do_ update `sec->reloc_count' as
  it scans the sections reading the relocation information.
  Effectively, `sec->reloc_count' is set to `reloc_count +
  reloc_count2'.

  It finally returns the arelents just created:

  asect->relocation = relents;
  return TRUE;

- The sparc64 ELF target also checks `sec->relocation' and happily
  returns if it is NULL.

  But then it doesn't count the number of relocations stored in SHT_REL
  and SHR_RELA to determine the number of relocations in the section,
  but relies on the current contents of `sec->reloc_count' instead
  (note, _not_ `bfd_elf_section_data(sec)->reloc_count' that is
  0 at this point):

  amt = asect->reloc_count;
  amt *= 2 * sizeof (arelent);
  asect->relocation = (arelent *) bfd_alloc (abfd, amt);
           
  This is an upper bound, as the ratio between ELF relocations and
  arelents is not constant (like 1-1 for generic and 1-3 for mips64)
  since only one SPARC relocation type (R_SPARC_OLO10) needs two
  arelents.  So, this implementation updates
  `bfd_elf_section_data(sec)->reloc_count' as it scans and transforms
  the relocations, setting it to the exact number of ELF relocations.

  Then it returns.

So we can conclude:

1. Before being called for the first (and unique) time,
   `sec->reloc_count' is expected to contain the number of external
   relocations associated with the section.

2. After being called, `sec->reloc_count' still contains the number of
   external relocations associated with the section.  In the case of
   sparc64, `bfd_elf_section_data(sec)->reloc_count' also contains the
   exact number of internal relocations generated.  This is needed
   because, unlike in generic and mips64, the ratio is not constant.

Another function for which mips64 and sparc64 have to provide
specialized implementations is `bfd_get_reloc_upper_bound'.  This one is
easy:

- The default ELF implementation returns space for `sec->reloc_count + 1'
  arelents.

- The mips64 ELF implementation returns space for `sec->reloc_count * 3
  + 1' arelents.

- The sparc64 ELF implementation returns space for `sec->reloc_count * 2
  + 1' arelents.  This is a worst-case scenario in which all external
  relocs are R_SPARC_OLO10.

However, there is a problem with `bfd_set_reloc'.  There is a single
implementation that is used for all targets, and all it does is:

void
bfd_set_reloc (bfd *ignore_abfd ATTRIBUTE_UNUSED,
               sec_ptr asect,
               arelent **location,
               unsigned int count)
{
  asect->orelocation = location;
  asect->reloc_count = count;
}
                                                
i.e. it assumes a 1-1 relationship between internal relocs and external
relocs.

In the simple use case:

  relcount = bfd_canonicalize_reloc (ibfd, isec, relpp, sympp);

  bfd_set_reloc (ibfd, isec, relpp, relcount);

  /* ... Work with relocs in `relpp' ...  */

  sec_ptr osec = isec->output_section;
  relcount = bfd_canonicalize_reloc (obfd, osec, relpp, isympp);

  bfd_close (obfd);

`bfd_close' calls `_bfd_elf_write_object_contents (obfd)' for
every section, which in turn calls `elfNN_BE_write_relocs'.

Fortunately, both `elf64_mips_write_relocs' and
`elf64_sparc_write_relocs' interpret `sec->reloc_count' as the number of
_internal relocations_ stored in `sec->orelocation', so all is
good. (even if pretty confusing to the casual reader.)

But now consider this:

  relcount = bfd_canonicalize_reloc (ibfd, isec, relpp, sympp); /* (a) */

  /* ... Work with relocs in `relpp' ... */

  bfd_set_reloc (ibfd, isec, relpp, relcount);                  /* (b) */

  [...]

  relcount = bfd_canonicalize_reloc (ibfd, isec, relpp, sympp); /* (c) */

  /* ... Work with relocs in `relpp' ... */

  sec_ptr osec = isec->output_section;
  bfd_set_reloc (obfd, osec, relpp, relcount);                  /* (d) */

  bfd_close (obfd);

The above code is buggy in both mips64 and sparc64, for related but
slightly different reasons:

- In mips64 it fails because the `sec->reloc_count' set in (b) is the
  number of internal relocations, which is misinterpreted by the
  `bfd_canonicalize_reloc' in (c), that returns a bogus (bigger) count.
  This bogus count is then set in (d), and `bfd_close' will likely
  segfault (or corrupt memory) while writing relocs.

- In sparc64 it fails if the relcount set in (b) or (d) is different
  than the internally saved `bfd_elf_section_data(sec)->reloc_count' the
  first (and only) time `slurp_reloc_table' scans the SHT_RELA relocs.

The above sequence of events is what happens in objcopy when it removes
relocations in `merge_gnu_build_notes':

  relsize = bfd_get_reloc_upper_bound (abfd, sec);

  [...]

  relpp = (arelent **) xmalloc (relsize);
  relcount = bfd_canonicalize_reloc (abfd, sec, relpp, isympp);

  [...]

  for (rel = relpp; rel < relpp + relcount; rel ++)
   {
    [...]
    if ((* rel)->address >= (bfd_vma) ((new + note_size) - new_contents))
        (* rel)->address -= note_size;
    else   
      (* rel)->howto = NULL;
   }

   [...]

   for (rel = relpp; rel < relpp + relcount; rel ++)
     if ((* rel)->howto == NULL)
      {
        /* Delete eliminated relocs.
           FIXME: There are better ways to do this. */
        memmove (rel, rel + 1, ((relcount - (rel - relpp)) - 1) * sizeof (*rel));
        relcount --;
      }
   bfd_set_reloc (abfd, sec, relpp, relcount);

and then later copies the relocations of the merged section in
`copy_relocations_in_section':

  [...]

  relpp = (arelent **) xmalloc (relsize);
  relcount = bfd_canonicalize_reloc (ibfd, isection, relpp, isympp);

  [...]

  bfd_set_reloc (obfd, osection, relcount == 0 ? NULL : relpp, relcount);


So, how to fix this problem?  I can think on two approaches:

a) To document in the BFD API that after `bfd_set_reloc' is invoked on
   some section, it is invalid to use `bfd_canonicalize_reloc' in the
   same section, adding an assert to detect such invalid uses (the
   assert could check for the presence of a non-NULL sec->orelocation).

   Client code will have to save the number of internal relocations in a
   variable to avoid having to call `bfd_canonicalize_reloc' again.

   This would obviously involve to fix objcopy to not call
   `bfd_canonicalize_reloc' twice.

b) To remove the API limitation/bug in BFD, somehow.  Maybe adding
   end-of-list sentinels to `sec->relocation' and `sec->orelocation',
   adjusting `bfd_set_reloc' to install it according to its `relcount'
   argument, leaving `sec->reloc_count' untouched, and also making
   `elfNN_BE_write_relocs' to use the sentinel when writing.

WDYT?  a), b) or something else?  Better ideas for b)?

---

Problem 2: objcopy may break internal relocation sequences while merging
build attribute notes.
  
Both mips64 and sparc64 use sequences of internal relocs that conform a
single external relocation.  When objcopy deletes relocations (as the
result of merged build notes, or while stripping symbols) it may break
some of these sequences.

For example:

1. Build binutils with --target=mips64-unknown-openbsd

2. ./gas/as-new binutils/testsuite/binutils-all/note-2-64.s -o foo.o

3. ./binutils/objcopy --merge-notes foo.o bar.o will segfault.

4. Run the above command under GDB and check that:

   - At the beginning of `merge_gnu_build_notes' the
    .gnu_build_attributes section has 3 external relocations.  This
    makes `bfd_canonicalize_reloc' to return a count of 3 * 3 = 9
    internal relocs.

  - The deletion logic in `merge_gnu_build_notes' then deletes two
    internal relocs, leaving 7.  This sounds like a sequence of three
    internal relocs gets broken.  I am not sure if this really a problem
    (don't know much of mips64) but it is worth a check.

The same problem could happen with the sparc64 R_SPARC_LO10,R_SPARC_13
sequence.  Note that in both the mips64 triads and the sparc64 tuples
all the relocations share the same address.

Thoughts?
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]