as: Xtensa Automatic Alignment

1 
1 9.55.3.2 Automatic Instruction Alignment
1 ........................................
1 
1 The Xtensa assembler will automatically align certain instructions, both
1 to optimize performance and to satisfy architectural requirements.
1 
1    As an optimization to improve performance, the assembler attempts to
1 align branch targets so they do not cross instruction fetch boundaries.
1 (Xtensa processors can be configured with either 32-bit or 64-bit
1 instruction fetch widths.)  An instruction immediately following a call
1 is treated as a branch target in this context, because it will be the
1 target of a return from the call.  This alignment has the potential to
1 reduce branch penalties at some expense in code size.  This optimization
1 is enabled by default.  You can disable it with the '--no-target-align'
1 command-line option (⇒Command Line Options Xtensa Options.).
1 
1    The target alignment optimization is done without adding instructions
1 that could increase the execution time of the program.  If there are
1 density instructions in the code preceding a target, the assembler can
1 change the target alignment by widening some of those instructions to
1 the equivalent 24-bit instructions.  Extra bytes of padding can be
1 inserted immediately following unconditional jump and return
1 instructions.  This approach is usually successful in aligning many, but
1 not all, branch targets.
1 
1    The 'LOOP' family of instructions must be aligned such that the first
1 instruction in the loop body does not cross an instruction fetch
1 boundary (e.g., with a 32-bit fetch width, a 'LOOP' instruction must be
1 on either a 1 or 2 mod 4 byte boundary).  The assembler knows about this
1 restriction and inserts the minimal number of 2 or 3 byte no-op
1 instructions to satisfy it.  When no-op instructions are added, any
1 label immediately preceding the original loop will be moved in order to
1 refer to the loop instruction, not the newly generated no-op
1 instruction.  To preserve binary compatibility across processors with
1 different fetch widths, the assembler conservatively assumes a 32-bit
1 fetch width when aligning 'LOOP' instructions (except if the first
1 instruction in the loop is a 64-bit instruction).
1 
1    Previous versions of the assembler automatically aligned 'ENTRY'
1 instructions to 4-byte boundaries, but that alignment is now the
1 programmer's responsibility.
1