ld: Canonical format

1 
1 5.1.2 The BFD canonical object-file format
1 ------------------------------------------
1 
1 The greatest potential for loss of information occurs when there is the
1 least overlap between the information provided by the source format,
1 that stored by the canonical format, and that needed by the destination
1 format.  A brief description of the canonical form may help you
1 understand which kinds of data you can count on preserving across
1 conversions.
1 
1 _files_
1      Information stored on a per-file basis includes target machine
1      architecture, particular implementation format type, a demand
1      pageable bit, and a write protected bit.  Information like Unix
1      magic numbers is not stored here--only the magic numbers' meaning,
1      so a 'ZMAGIC' file would have both the demand pageable bit and the
1      write protected text bit set.  The byte order of the target is
1      stored on a per-file basis, so that big- and little-endian object
1      files may be used with one another.
1 
1 _sections_
1      Each section in the input file contains the name of the section,
1      the section's original address in the object file, size and
1      alignment information, various flags, and pointers into other BFD
1      data structures.
1 
1 _symbols_
1      Each symbol contains a pointer to the information for the object
1      file which originally defined it, its name, its value, and various
1      flag bits.  When a BFD back end reads in a symbol table, it
1      relocates all symbols to make them relative to the base of the
1      section where they were defined.  Doing this ensures that each
1      symbol points to its containing section.  Each symbol also has a
1      varying amount of hidden private data for the BFD back end.  Since
1      the symbol points to the original file, the private data format for
1      that symbol is accessible.  'ld' can operate on a collection of
1      symbols of wildly different formats without problems.
1 
1      Normal global and simple local symbols are maintained on output, so
1      an output file (no matter its format) will retain symbols pointing
1      to functions and to global, static, and common variables.  Some
1      symbol information is not worth retaining; in 'a.out', type
1      information is stored in the symbol table as long symbol names.
1      This information would be useless to most COFF debuggers; the
1      linker has command line switches to allow users to throw it away.
1 
1      There is one word of type information within the symbol, so if the
1      format supports symbol type information within symbols (for
1      example, COFF, IEEE, Oasys) and the type is simple enough to fit
1      within one word (nearly everything but aggregates), the information
1      will be preserved.
1 
1 _relocation level_
1      Each canonical BFD relocation record contains a pointer to the
1      symbol to relocate to, the offset of the data to relocate, the
1      section the data is in, and a pointer to a relocation type
1      descriptor.  Relocation is performed by passing messages through
1      the relocation type descriptor and the symbol pointer.  Therefore,
1      relocations can be performed on output data using a relocation
1      method that is only available in one of the input formats.  For
1      instance, Oasys provides a byte relocation format.  A relocation
1      record requesting this relocation type would point indirectly to a
1      routine to perform this, so the relocation may be performed on a
1      byte being written to a 68k COFF file, even though 68k COFF has no
1      such relocation type.
1 
1 _line numbers_
1      Object formats can contain, for debugging purposes, some form of
1      mapping between symbols, source line numbers, and addresses in the
1      output file.  These addresses have to be relocated along with the
1      symbol information.  Each symbol with an associated list of line
1      number records points to the first record of the list.  The head of
1      a line number list consists of a pointer to the symbol, which
1      allows finding out the address of the function whose line number is
1      being described.  The rest of the list is made up of pairs: offsets
1      into the section and line numbers.  Any format which can simply
1      derive this information can pass it successfully between formats
1      (COFF, IEEE and Oasys).
1