gcc: Extended Asm

1 
1 6.45.2 Extended Asm - Assembler Instructions with C Expression Operands
1 -----------------------------------------------------------------------
1 
1 With extended 'asm' you can read and write C variables from assembler
1 and perform jumps from assembler code to C labels.  Extended 'asm'
1 syntax uses colons (':') to delimit the operand parameters after the
1 assembler template:
1 
1      asm ASM-QUALIFIERS ( ASSEMBLERTEMPLATE
1                       : OUTPUTOPERANDS
1                       [ : INPUTOPERANDS
1                       [ : CLOBBERS ] ])
1 
1      asm ASM-QUALIFIERS ( ASSEMBLERTEMPLATE
1                            :
1                            : INPUTOPERANDS
1                            : CLOBBERS
1                            : GOTOLABELS)
1  where in the last form, ASM-QUALIFIERS contains 'goto' (and in the
1 first form, not).
1 
1  The 'asm' keyword is a GNU extension.  When writing code that can be
1 compiled with '-ansi' and the various '-std' options, use '__asm__'
1 instead of 'asm' (⇒Alternate Keywords).
1 
1 Qualifiers
1 ..........
1 
1 'volatile'
1      The typical use of extended 'asm' statements is to manipulate input
1      values to produce output values.  However, your 'asm' statements
1      may also produce side effects.  If so, you may need to use the
11      'volatile' qualifier to disable certain optimizations.  ⇒
      Volatile.
1 
1 'inline'
1      If you use the 'inline' qualifier, then for inlining purposes the
11      size of the asm is taken as the smallest size possible (⇒Size
      of an asm).
1 
1 'goto'
1      This qualifier informs the compiler that the 'asm' statement may
1      perform a jump to one of the labels listed in the GOTOLABELS.
1      ⇒GotoLabels.
1 
1 Parameters
1 ..........
1 
1 ASSEMBLERTEMPLATE
1      This is a literal string that is the template for the assembler
1      code.  It is a combination of fixed text and tokens that refer to
1      the input, output, and goto parameters.  ⇒AssemblerTemplate.
1 
1 OUTPUTOPERANDS
1      A comma-separated list of the C variables modified by the
1      instructions in the ASSEMBLERTEMPLATE.  An empty list is permitted.
1      ⇒OutputOperands.
1 
1 INPUTOPERANDS
1      A comma-separated list of C expressions read by the instructions in
11      the ASSEMBLERTEMPLATE.  An empty list is permitted.  ⇒
      InputOperands.
1 
1 CLOBBERS
1      A comma-separated list of registers or other values changed by the
1      ASSEMBLERTEMPLATE, beyond those listed as outputs.  An empty list
1      is permitted.  ⇒Clobbers and Scratch Registers.
1 
1 GOTOLABELS
1      When you are using the 'goto' form of 'asm', this section contains
1      the list of all C labels to which the code in the ASSEMBLERTEMPLATE
1      may jump.  ⇒GotoLabels.
1 
1      'asm' statements may not perform jumps into other 'asm' statements,
1      only to the listed GOTOLABELS.  GCC's optimizers do not know about
1      other jumps; therefore they cannot take account of them when
1      deciding how to optimize.
1 
1  The total number of input + output + goto operands is limited to 30.
1 
1 Remarks
1 .......
1 
1 The 'asm' statement allows you to include assembly instructions directly
1 within C code.  This may help you to maximize performance in
1 time-sensitive code or to access assembly instructions that are not
1 readily available to C programs.
1 
1  Note that extended 'asm' statements must be inside a function.  Only
1 basic 'asm' may be outside functions (⇒Basic Asm).  Functions
11 declared with the 'naked' attribute also require basic 'asm' (⇒
 Function Attributes).
1 
1  While the uses of 'asm' are many and varied, it may help to think of an
1 'asm' statement as a series of low-level instructions that convert input
1 parameters to output parameters.  So a simple (if not particularly
1 useful) example for i386 using 'asm' might look like this:
1 
1      int src = 1;
1      int dst;
1 
1      asm ("mov %1, %0\n\t"
1          "add $1, %0"
1          : "=r" (dst)
1          : "r" (src));
1 
1      printf("%d\n", dst);
1 
1  This code copies 'src' to 'dst' and add 1 to 'dst'.
1 
1 6.45.2.1 Volatile
1 .................
1 
1 GCC's optimizers sometimes discard 'asm' statements if they determine
1 there is no need for the output variables.  Also, the optimizers may
1 move code out of loops if they believe that the code will always return
1 the same result (i.e.  none of its input values change between calls).
1 Using the 'volatile' qualifier disables these optimizations.  'asm'
1 statements that have no output operands, including 'asm goto'
1 statements, are implicitly volatile.
1 
1  This i386 code demonstrates a case that does not use (or require) the
1 'volatile' qualifier.  If it is performing assertion checking, this code
1 uses 'asm' to perform the validation.  Otherwise, 'dwRes' is
1 unreferenced by any code.  As a result, the optimizers can discard the
1 'asm' statement, which in turn removes the need for the entire 'DoCheck'
1 routine.  By omitting the 'volatile' qualifier when it isn't needed you
1 allow the optimizers to produce the most efficient code possible.
1 
1      void DoCheck(uint32_t dwSomeValue)
1      {
1         uint32_t dwRes;
1 
1         // Assumes dwSomeValue is not zero.
1         asm ("bsfl %1,%0"
1           : "=r" (dwRes)
1           : "r" (dwSomeValue)
1           : "cc");
1 
1         assert(dwRes > 3);
1      }
1 
1  The next example shows a case where the optimizers can recognize that
1 the input ('dwSomeValue') never changes during the execution of the
1 function and can therefore move the 'asm' outside the loop to produce
1 more efficient code.  Again, using 'volatile' disables this type of
1 optimization.
1 
1      void do_print(uint32_t dwSomeValue)
1      {
1         uint32_t dwRes;
1 
1         for (uint32_t x=0; x < 5; x++)
1         {
1            // Assumes dwSomeValue is not zero.
1            asm ("bsfl %1,%0"
1              : "=r" (dwRes)
1              : "r" (dwSomeValue)
1              : "cc");
1 
1            printf("%u: %u %u\n", x, dwSomeValue, dwRes);
1         }
1      }
1 
1  The following example demonstrates a case where you need to use the
1 'volatile' qualifier.  It uses the x86 'rdtsc' instruction, which reads
1 the computer's time-stamp counter.  Without the 'volatile' qualifier,
1 the optimizers might assume that the 'asm' block will always return the
1 same value and therefore optimize away the second call.
1 
1      uint64_t msr;
1 
1      asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
1              "shl $32, %%rdx\n\t"  // Shift the upper bits left.
1              "or %%rdx, %0"        // 'Or' in the lower bits.
1              : "=a" (msr)
1              :
1              : "rdx");
1 
1      printf("msr: %llx\n", msr);
1 
1      // Do other work...
1 
1      // Reprint the timestamp
1      asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
1              "shl $32, %%rdx\n\t"  // Shift the upper bits left.
1              "or %%rdx, %0"        // 'Or' in the lower bits.
1              : "=a" (msr)
1              :
1              : "rdx");
1 
1      printf("msr: %llx\n", msr);
1 
1  GCC's optimizers do not treat this code like the non-volatile code in
1 the earlier examples.  They do not move it out of loops or omit it on
1 the assumption that the result from a previous call is still valid.
1 
1  Note that the compiler can move even volatile 'asm' instructions
1 relative to other code, including across jump instructions.  For
1 example, on many targets there is a system register that controls the
1 rounding mode of floating-point operations.  Setting it with a volatile
1 'asm', as in the following PowerPC example, does not work reliably.
1 
1      asm volatile("mtfsf 255, %0" : : "f" (fpenv));
1      sum = x + y;
1 
1  The compiler may move the addition back before the volatile 'asm'.  To
1 make it work as expected, add an artificial dependency to the 'asm' by
1 referencing a variable in the subsequent code, for example:
1 
1      asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
1      sum = x + y;
1 
1  Under certain circumstances, GCC may duplicate (or remove duplicates
1 of) your assembly code when optimizing.  This can lead to unexpected
1 duplicate symbol errors during compilation if your asm code defines
1 symbols or labels.  Using '%=' (⇒AssemblerTemplate) may help
1 resolve this problem.
1 
1 6.45.2.2 Assembler Template
1 ...........................
1 
1 An assembler template is a literal string containing assembler
1 instructions.  The compiler replaces tokens in the template that refer
1 to inputs, outputs, and goto labels, and then outputs the resulting
1 string to the assembler.  The string can contain any instructions
1 recognized by the assembler, including directives.  GCC does not parse
1 the assembler instructions themselves and does not know what they mean
1 or even whether they are valid assembler input.  However, it does count
1 the statements (⇒Size of an asm).
1 
1  You may place multiple assembler instructions together in a single
1 'asm' string, separated by the characters normally used in assembly code
1 for the system.  A combination that works in most places is a newline to
1 break the line, plus a tab character to move to the instruction field
1 (written as '\n\t').  Some assemblers allow semicolons as a line
1 separator.  However, note that some assembler dialects use semicolons to
1 start a comment.
1 
1  Do not expect a sequence of 'asm' statements to remain perfectly
1 consecutive after compilation, even when you are using the 'volatile'
1 qualifier.  If certain instructions need to remain consecutive in the
1 output, put them in a single multi-instruction asm statement.
1 
1  Accessing data from C programs without using input/output operands
1 (such as by using global symbols directly from the assembler template)
1 may not work as expected.  Similarly, calling functions directly from an
1 assembler template requires a detailed understanding of the target
1 assembler and ABI.
1 
1  Since GCC does not parse the assembler template, it has no visibility
1 of any symbols it references.  This may result in GCC discarding those
1 symbols as unreferenced unless they are also listed as input, output, or
1 goto operands.
1 
1 Special format strings
1 ......................
1 
1 In addition to the tokens described by the input, output, and goto
1 operands, these tokens have special meanings in the assembler template:
1 
1 '%%'
1      Outputs a single '%' into the assembler code.
1 
1 '%='
1      Outputs a number that is unique to each instance of the 'asm'
1      statement in the entire compilation.  This option is useful when
1      creating local labels and referring to them multiple times in a
1      single template that generates multiple assembler instructions.
1 
1 '%{'
1 '%|'
1 '%}'
1      Outputs '{', '|', and '}' characters (respectively) into the
1      assembler code.  When unescaped, these characters have special
1      meaning to indicate multiple assembler dialects, as described
1      below.
1 
1 Multiple assembler dialects in 'asm' templates
1 ..............................................
1 
1 On targets such as x86, GCC supports multiple assembler dialects.  The
1 '-masm' option controls which dialect GCC uses as its default for inline
1 assembler.  The target-specific documentation for the '-masm' option
1 contains the list of supported dialects, as well as the default dialect
1 if the option is not specified.  This information may be important to
1 understand, since assembler code that works correctly when compiled
11 using one dialect will likely fail if compiled using another.  ⇒x86
 Options.
1 
1  If your code needs to support multiple assembler dialects (for example,
1 if you are writing public headers that need to support a variety of
1 compilation options), use constructs of this form:
1 
1      { dialect0 | dialect1 | dialect2... }
1 
1  This construct outputs 'dialect0' when using dialect #0 to compile the
1 code, 'dialect1' for dialect #1, etc.  If there are fewer alternatives
1 within the braces than the number of dialects the compiler supports, the
1 construct outputs nothing.
1 
1  For example, if an x86 compiler supports two dialects ('att', 'intel'),
1 an assembler template such as this:
1 
1      "bt{l %[Offset],%[Base] | %[Base],%[Offset]}; jc %l2"
1 
1 is equivalent to one of
1 
1      "btl %[Offset],%[Base] ; jc %l2"   /* att dialect */
1      "bt %[Base],%[Offset]; jc %l2"     /* intel dialect */
1 
1  Using that same compiler, this code:
1 
1      "xchg{l}\t{%%}ebx, %1"
1 
1 corresponds to either
1 
1      "xchgl\t%%ebx, %1"                 /* att dialect */
1      "xchg\tebx, %1"                    /* intel dialect */
1 
1  There is no support for nesting dialect alternatives.
1 
1 6.45.2.3 Output Operands
1 ........................
1 
1 An 'asm' statement has zero or more output operands indicating the names
1 of C variables modified by the assembler code.
1 
1  In this i386 example, 'old' (referred to in the template string as
1 '%0') and '*Base' (as '%1') are outputs and 'Offset' ('%2') is an input:
1 
1      bool old;
1 
1      __asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base.
1               "sbb %0,%0"      // Use the CF to calculate old.
1         : "=r" (old), "+rm" (*Base)
1         : "Ir" (Offset)
1         : "cc");
1 
1      return old;
1 
1  Operands are separated by commas.  Each operand has this format:
1 
1      [ [ASMSYMBOLICNAME] ] CONSTRAINT (CVARIABLENAME)
1 
1 ASMSYMBOLICNAME
1      Specifies a symbolic name for the operand.  Reference the name in
1      the assembler template by enclosing it in square brackets (i.e.
1      '%[Value]').  The scope of the name is the 'asm' statement that
1      contains the definition.  Any valid C variable name is acceptable,
1      including names already defined in the surrounding code.  No two
1      operands within the same 'asm' statement can use the same symbolic
1      name.
1 
1      When not using an ASMSYMBOLICNAME, use the (zero-based) position of
1      the operand in the list of operands in the assembler template.  For
1      example if there are three output operands, use '%0' in the
1      template to refer to the first, '%1' for the second, and '%2' for
1      the third.
1 
1 CONSTRAINT
1      A string constant specifying constraints on the placement of the
1      operand; ⇒Constraints, for details.
1 
1      Output constraints must begin with either '=' (a variable
1      overwriting an existing value) or '+' (when reading and writing).
1      When using '=', do not assume the location contains the existing
1      value on entry to the 'asm', except when the operand is tied to an
1      input; ⇒Input Operands InputOperands.
1 
1      After the prefix, there must be one or more additional constraints
1      (⇒Constraints) that describe where the value resides.
1      Common constraints include 'r' for register and 'm' for memory.
1      When you list more than one possible location (for example,
1      '"=rm"'), the compiler chooses the most efficient one based on the
1      current context.  If you list as many alternates as the 'asm'
1      statement allows, you permit the optimizers to produce the best
1      possible code.  If you must use a specific register, but your
1      Machine Constraints do not provide sufficient control to select the
1      specific register you want, local register variables may provide a
1      solution (⇒Local Register Variables).
1 
1 CVARIABLENAME
1      Specifies a C lvalue expression to hold the output, typically a
1      variable name.  The enclosing parentheses are a required part of
1      the syntax.
1 
1  When the compiler selects the registers to use to represent the output
11 operands, it does not use any of the clobbered registers (⇒Clobbers
 and Scratch Registers).
1 
1  Output operand expressions must be lvalues.  The compiler cannot check
1 whether the operands have data types that are reasonable for the
1 instruction being executed.  For output expressions that are not
1 directly addressable (for example a bit-field), the constraint must
1 allow a register.  In that case, GCC uses the register as the output of
1 the 'asm', and then stores that register into the output.
1 
1  Operands using the '+' constraint modifier count as two operands (that
1 is, both as input and output) towards the total maximum of 30 operands
1 per 'asm' statement.
1 
1  Use the '&' constraint modifier (⇒Modifiers) on all output
1 operands that must not overlap an input.  Otherwise, GCC may allocate
1 the output operand in the same register as an unrelated input operand,
1 on the assumption that the assembler code consumes its inputs before
1 producing outputs.  This assumption may be false if the assembler code
1 actually consists of more than one instruction.
1 
1  The same problem can occur if one output parameter (A) allows a
1 register constraint and another output parameter (B) allows a memory
1 constraint.  The code generated by GCC to access the memory address in B
1 can contain registers which _might_ be shared by A, and GCC considers
1 those registers to be inputs to the asm.  As above, GCC assumes that
1 such input registers are consumed before any outputs are written.  This
1 assumption may result in incorrect behavior if the asm writes to A
1 before using B.  Combining the '&' modifier with the register constraint
1 on A ensures that modifying A does not affect the address referenced by
1 B.  Otherwise, the location of B is undefined if A is modified before
1 using B.
1 
1  'asm' supports operand modifiers on operands (for example '%k2' instead
1 of simply '%2').  Typically these qualifiers are hardware dependent.
11 The list of supported modifiers for x86 is found at ⇒x86 Operand
 modifiers x86Operandmodifiers.
1 
1  If the C code that follows the 'asm' makes no use of any of the output
1 operands, use 'volatile' for the 'asm' statement to prevent the
11 optimizers from discarding the 'asm' statement as unneeded (see ⇒
 Volatile).
1 
1  This code makes no use of the optional ASMSYMBOLICNAME.  Therefore it
1 references the first output operand as '%0' (were there a second, it
1 would be '%1', etc).  The number of the first input operand is one
1 greater than that of the last output operand.  In this i386 example,
1 that makes 'Mask' referenced as '%1':
1 
1      uint32_t Mask = 1234;
1      uint32_t Index;
1 
1        asm ("bsfl %1, %0"
1           : "=r" (Index)
1           : "r" (Mask)
1           : "cc");
1 
1  That code overwrites the variable 'Index' ('='), placing the value in a
1 register ('r').  Using the generic 'r' constraint instead of a
1 constraint for a specific register allows the compiler to pick the
1 register to use, which can result in more efficient code.  This may not
1 be possible if an assembler instruction requires a specific register.
1 
1  The following i386 example uses the ASMSYMBOLICNAME syntax.  It
1 produces the same result as the code above, but some may consider it
1 more readable or more maintainable since reordering index numbers is not
1 necessary when adding or removing operands.  The names 'aIndex' and
1 'aMask' are only used in this example to emphasize which names get used
1 where.  It is acceptable to reuse the names 'Index' and 'Mask'.
1 
1      uint32_t Mask = 1234;
1      uint32_t Index;
1 
1        asm ("bsfl %[aMask], %[aIndex]"
1           : [aIndex] "=r" (Index)
1           : [aMask] "r" (Mask)
1           : "cc");
1 
1  Here are some more examples of output operands.
1 
1      uint32_t c = 1;
1      uint32_t d;
1      uint32_t *e = &c;
1 
1      asm ("mov %[e], %[d]"
1         : [d] "=rm" (d)
1         : [e] "rm" (*e));
1 
1  Here, 'd' may either be in a register or in memory.  Since the compiler
1 might already have the current value of the 'uint32_t' location pointed
1 to by 'e' in a register, you can enable it to choose the best location
1 for 'd' by specifying both constraints.
1 
1 6.45.2.4 Flag Output Operands
1 .............................
1 
1 Some targets have a special register that holds the "flags" for the
1 result of an operation or comparison.  Normally, the contents of that
1 register are either unmodifed by the asm, or the asm is considered to
1 clobber the contents.
1 
1  On some targets, a special form of output operand exists by which
1 conditions in the flags register may be outputs of the asm.  The set of
1 conditions supported are target specific, but the general rule is that
1 the output variable must be a scalar integer, and the value is boolean.
1 When supported, the target defines the preprocessor symbol
1 '__GCC_ASM_FLAG_OUTPUTS__'.
1 
1  Because of the special nature of the flag output operands, the
1 constraint may not include alternatives.
1 
1  Most often, the target has only one flags register, and thus is an
1 implied operand of many instructions.  In this case, the operand should
1 not be referenced within the assembler template via '%0' etc, as there's
1 no corresponding text in the assembly language.
1 
1 x86 family
1      The flag output constraints for the x86 family are of the form
1      '=@ccCOND' where COND is one of the standard conditions defined in
1      the ISA manual for 'jCC' or 'setCC'.
1 
1      'a'
1           "above" or unsigned greater than
1      'ae'
1           "above or equal" or unsigned greater than or equal
1      'b'
1           "below" or unsigned less than
1      'be'
1           "below or equal" or unsigned less than or equal
1      'c'
1           carry flag set
1      'e'
1      'z'
1           "equal" or zero flag set
1      'g'
1           signed greater than
1      'ge'
1           signed greater than or equal
1      'l'
1           signed less than
1      'le'
1           signed less than or equal
1      'o'
1           overflow flag set
1      'p'
1           parity flag set
1      's'
1           sign flag set
1      'na'
1      'nae'
1      'nb'
1      'nbe'
1      'nc'
1      'ne'
1      'ng'
1      'nge'
1      'nl'
1      'nle'
1      'no'
1      'np'
1      'ns'
1      'nz'
1           "not" FLAG, or inverted versions of those above
1 
1 6.45.2.5 Input Operands
1 .......................
1 
1 Input operands make values from C variables and expressions available to
1 the assembly code.
1 
1  Operands are separated by commas.  Each operand has this format:
1 
1      [ [ASMSYMBOLICNAME] ] CONSTRAINT (CEXPRESSION)
1 
1 ASMSYMBOLICNAME
1      Specifies a symbolic name for the operand.  Reference the name in
1      the assembler template by enclosing it in square brackets (i.e.
1      '%[Value]').  The scope of the name is the 'asm' statement that
1      contains the definition.  Any valid C variable name is acceptable,
1      including names already defined in the surrounding code.  No two
1      operands within the same 'asm' statement can use the same symbolic
1      name.
1 
1      When not using an ASMSYMBOLICNAME, use the (zero-based) position of
1      the operand in the list of operands in the assembler template.  For
1      example if there are two output operands and three inputs, use '%2'
1      in the template to refer to the first input operand, '%3' for the
1      second, and '%4' for the third.
1 
1 CONSTRAINT
1      A string constant specifying constraints on the placement of the
1      operand; ⇒Constraints, for details.
1 
1      Input constraint strings may not begin with either '=' or '+'.
1      When you list more than one possible location (for example,
1      '"irm"'), the compiler chooses the most efficient one based on the
1      current context.  If you must use a specific register, but your
1      Machine Constraints do not provide sufficient control to select the
1      specific register you want, local register variables may provide a
1      solution (⇒Local Register Variables).
1 
1      Input constraints can also be digits (for example, '"0"').  This
1      indicates that the specified input must be in the same place as the
1      output constraint at the (zero-based) index in the output
1      constraint list.  When using ASMSYMBOLICNAME syntax for the output
1      operands, you may use these names (enclosed in brackets '[]')
1      instead of digits.
1 
1 CEXPRESSION
1      This is the C variable or expression being passed to the 'asm'
1      statement as input.  The enclosing parentheses are a required part
1      of the syntax.
1 
1  When the compiler selects the registers to use to represent the input
11 operands, it does not use any of the clobbered registers (⇒Clobbers
 and Scratch Registers).
1 
1  If there are no output operands but there are input operands, place two
1 consecutive colons where the output operands would go:
1 
1      __asm__ ("some instructions"
1         : /* No outputs. */
1         : "r" (Offset / 8));
1 
1  *Warning:* Do _not_ modify the contents of input-only operands (except
1 for inputs tied to outputs).  The compiler assumes that on exit from the
1 'asm' statement these operands contain the same values as they had
1 before executing the statement.  It is _not_ possible to use clobbers to
1 inform the compiler that the values in these inputs are changing.  One
1 common work-around is to tie the changing input variable to an output
1 variable that never gets used.  Note, however, that if the code that
1 follows the 'asm' statement makes no use of any of the output operands,
1 the GCC optimizers may discard the 'asm' statement as unneeded (see
1 ⇒Volatile).
1 
1  'asm' supports operand modifiers on operands (for example '%k2' instead
1 of simply '%2').  Typically these qualifiers are hardware dependent.
11 The list of supported modifiers for x86 is found at ⇒x86 Operand
 modifiers x86Operandmodifiers.
1 
1  In this example using the fictitious 'combine' instruction, the
1 constraint '"0"' for input operand 1 says that it must occupy the same
1 location as output operand 0.  Only input operands may use numbers in
1 constraints, and they must each refer to an output operand.  Only a
1 number (or the symbolic assembler name) in the constraint can guarantee
1 that one operand is in the same place as another.  The mere fact that
1 'foo' is the value of both operands is not enough to guarantee that they
1 are in the same place in the generated assembler code.
1 
1      asm ("combine %2, %0"
1         : "=r" (foo)
1         : "0" (foo), "g" (bar));
1 
1  Here is an example using symbolic names.
1 
1      asm ("cmoveq %1, %2, %[result]"
1         : [result] "=r"(result)
1         : "r" (test), "r" (new), "[result]" (old));
1 
1 6.45.2.6 Clobbers and Scratch Registers
1 .......................................
1 
1 While the compiler is aware of changes to entries listed in the output
1 operands, the inline 'asm' code may modify more than just the outputs.
1 For example, calculations may require additional registers, or the
1 processor may overwrite a register as a side effect of a particular
1 assembler instruction.  In order to inform the compiler of these
1 changes, list them in the clobber list.  Clobber list items are either
1 register names or the special clobbers (listed below).  Each clobber
1 list item is a string constant enclosed in double quotes and separated
1 by commas.
1 
1  Clobber descriptions may not in any way overlap with an input or output
1 operand.  For example, you may not have an operand describing a register
1 class with one member when listing that register in the clobber list.
1 Register Variables::) and used as 'asm' input or output operands must
1 have no part mentioned in the clobber description.  In particular, there
1 is no way to specify that input operands get modified without also
1 specifying them as output operands.
1 
1  When the compiler selects which registers to use to represent input and
1 output operands, it does not use any of the clobbered registers.  As a
1 result, clobbered registers are available for any use in the assembler
1 code.
1 
1  Here is a realistic example for the VAX showing the use of clobbered
1 registers:
1 
1      asm volatile ("movc3 %0, %1, %2"
1                         : /* No outputs. */
1                         : "g" (from), "g" (to), "g" (count)
1                         : "r0", "r1", "r2", "r3", "r4", "r5", "memory");
1 
1  Also, there are two special clobber arguments:
1 
1 '"cc"'
1      The '"cc"' clobber indicates that the assembler code modifies the
1      flags register.  On some machines, GCC represents the condition
1      codes as a specific hardware register; '"cc"' serves to name this
1      register.  On other machines, condition code handling is different,
1      and specifying '"cc"' has no effect.  But it is valid no matter
1      what the target.
1 
1 '"memory"'
1      The '"memory"' clobber tells the compiler that the assembly code
1      performs memory reads or writes to items other than those listed in
1      the input and output operands (for example, accessing the memory
1      pointed to by one of the input parameters).  To ensure memory
1      contains correct values, GCC may need to flush specific register
1      values to memory before executing the 'asm'.  Further, the compiler
1      does not assume that any values read from memory before an 'asm'
1      remain unchanged after that 'asm'; it reloads them as needed.
1      Using the '"memory"' clobber effectively forms a read/write memory
1      barrier for the compiler.
1 
1      Note that this clobber does not prevent the _processor_ from doing
1      speculative reads past the 'asm' statement.  To prevent that, you
1      need processor-specific fence instructions.
1 
1  Flushing registers to memory has performance implications and may be an
1 issue for time-sensitive code.  You can provide better information to
1 GCC to avoid this, as shown in the following examples.  At a minimum,
1 aliasing rules allow GCC to know what memory _doesn't_ need to be
1 flushed.
1 
1  Here is a fictitious sum of squares instruction, that takes two
1 pointers to floating point values in memory and produces a floating
1 point register output.  Notice that 'x', and 'y' both appear twice in
1 the 'asm' parameters, once to specify memory accessed, and once to
1 specify a base register used by the 'asm'.  You won't normally be
1 wasting a register by doing this as GCC can use the same register for
1 both purposes.  However, it would be foolish to use both '%1' and '%3'
1 for 'x' in this 'asm' and expect them to be the same.  In fact, '%3' may
1 well not be a register.  It might be a symbolic memory reference to the
1 object pointed to by 'x'.
1 
1      asm ("sumsq %0, %1, %2"
1           : "+f" (result)
1           : "r" (x), "r" (y), "m" (*x), "m" (*y));
1 
1  Here is a fictitious '*z++ = *x++ * *y++' instruction.  Notice that the
1 'x', 'y' and 'z' pointer registers must be specified as input/output
1 because the 'asm' modifies them.
1 
1      asm ("vecmul %0, %1, %2"
1           : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
1           : "m" (*x), "m" (*y));
1 
1  An x86 example where the string memory argument is of unknown length.
1 
1      asm("repne scasb"
1          : "=c" (count), "+D" (p)
1          : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
1 
1  If you know the above will only be reading a ten byte array then you
1 could instead use a memory input like: '"m" (*(const char (*)[10]) p)'.
1 
1  Here is an example of a PowerPC vector scale implemented in assembly,
1 complete with vector and condition code clobbers, and some initialized
1 offset registers that are unchanged by the 'asm'.
1 
1      void
1      dscal (size_t n, double *x, double alpha)
1      {
1        asm ("/* lots of asm here */"
1             : "+m" (*(double (*)[n]) x), "+&r" (n), "+b" (x)
1             : "d" (alpha), "b" (32), "b" (48), "b" (64),
1               "b" (80), "b" (96), "b" (112)
1             : "cr0",
1               "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39",
1               "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47");
1      }
1 
1  Rather than allocating fixed registers via clobbers to provide scratch
1 registers for an 'asm' statement, an alternative is to define a variable
1 and make it an early-clobber output as with 'a2' and 'a3' in the example
1 below.  This gives the compiler register allocator more freedom.  You
1 can also define a variable and make it an output tied to an input as
1 with 'a0' and 'a1', tied respectively to 'ap' and 'lda'.  Of course,
1 with tied outputs your 'asm' can't use the input value after modifying
1 the output register since they are one and the same register.  What's
1 more, if you omit the early-clobber on the output, it is possible that
1 GCC might allocate the same register to another of the inputs if GCC
1 could prove they had the same value on entry to the 'asm'.  This is why
1 'a1' has an early-clobber.  Its tied input, 'lda' might conceivably be
1 known to have the value 16 and without an early-clobber share the same
1 register as '%11'.  On the other hand, 'ap' can't be the same as any of
1 the other inputs, so an early-clobber on 'a0' is not needed.  It is also
1 not desirable in this case.  An early-clobber on 'a0' would cause GCC to
1 allocate a separate register for the '"m" (*(const double (*)[]) ap)'
1 input.  Note that tying an input to an output is the way to set up an
1 initialized temporary register modified by an 'asm' statement.  An input
1 not tied to an output is assumed by GCC to be unchanged, for example
1 '"b" (16)' below sets up '%11' to 16, and GCC might use that register in
1 following code if the value 16 happened to be needed.  You can even use
1 a normal 'asm' output for a scratch if all inputs that might share the
1 same register are consumed before the scratch is used.  The VSX
1 registers clobbered by the 'asm' statement could have used this
1 technique except for GCC's limit on the number of 'asm' parameters.
1 
1      static void
1      dgemv_kernel_4x4 (long n, const double *ap, long lda,
1                        const double *x, double *y, double alpha)
1      {
1        double *a0;
1        double *a1;
1        double *a2;
1        double *a3;
1 
1        __asm__
1          (
1           /* lots of asm here */
1           "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
1           "#a0=%3 a1=%4 a2=%5 a3=%6"
1           :
1             "+m" (*(double (*)[n]) y),
1             "+&r" (n),	// 1
1             "+b" (y),	// 2
1             "=b" (a0),	// 3
1             "=&b" (a1),	// 4
1             "=&b" (a2),	// 5
1             "=&b" (a3)	// 6
1           :
1             "m" (*(const double (*)[n]) x),
1             "m" (*(const double (*)[]) ap),
1             "d" (alpha),	// 9
1             "r" (x),		// 10
1             "b" (16),	// 11
1             "3" (ap),	// 12
1             "4" (lda)	// 13
1           :
1             "cr0",
1             "vs32","vs33","vs34","vs35","vs36","vs37",
1             "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
1           );
1      }
1 
1 6.45.2.7 Goto Labels
1 ....................
1 
1 'asm goto' allows assembly code to jump to one or more C labels.  The
1 GOTOLABELS section in an 'asm goto' statement contains a comma-separated
1 list of all C labels to which the assembler code may jump.  GCC assumes
1 that 'asm' execution falls through to the next statement (if this is not
1 the case, consider using the '__builtin_unreachable' intrinsic after the
1 'asm' statement).  Optimization of 'asm goto' may be improved by using
1 the 'hot' and 'cold' label attributes (⇒Label Attributes).
1 
1  An 'asm goto' statement cannot have outputs.  This is due to an
1 internal restriction of the compiler: control transfer instructions
1 cannot have outputs.  If the assembler code does modify anything, use
1 the '"memory"' clobber to force the optimizers to flush all register
1 values to memory and reload them if necessary after the 'asm' statement.
1 
1  Also note that an 'asm goto' statement is always implicitly considered
1 volatile.
1 
1  To reference a label in the assembler template, prefix it with '%l'
1 (lowercase 'L') followed by its (zero-based) position in GOTOLABELS plus
1 the number of input operands.  For example, if the 'asm' has three
1 inputs and references two labels, refer to the first label as '%l3' and
1 the second as '%l4').
1 
1  Alternately, you can reference labels using the actual C label name
1 enclosed in brackets.  For example, to reference a label named 'carry',
1 you can use '%l[carry]'.  The label must still be listed in the
1 GOTOLABELS section when using this approach.
1 
1  Here is an example of 'asm goto' for i386:
1 
1      asm goto (
1          "btl %1, %0\n\t"
1          "jc %l2"
1          : /* No outputs. */
1          : "r" (p1), "r" (p2)
1          : "cc"
1          : carry);
1 
1      return 0;
1 
1      carry:
1      return 1;
1 
1  The following example shows an 'asm goto' that uses a memory clobber.
1 
1      int frob(int x)
1      {
1        int y;
1        asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5"
1                  : /* No outputs. */
1                  : "r"(x), "r"(&y)
1                  : "r5", "memory"
1                  : error);
1        return y;
1      error:
1        return -1;
1      }
1 
1 6.45.2.8 x86 Operand Modifiers
1 ..............................
1 
1 References to input, output, and goto operands in the assembler template
1 of extended 'asm' statements can use modifiers to affect the way the
1 operands are formatted in the code output to the assembler.  For
1 example, the following code uses the 'h' and 'b' modifiers for x86:
1 
1      uint16_t  num;
1      asm volatile ("xchg %h0, %b0" : "+a" (num) );
1 
1 These modifiers generate this assembler code:
1 
1      xchg %ah, %al
1 
1  The rest of this discussion uses the following code for illustrative
1 purposes.
1 
1      int main()
1      {
1         int iInt = 1;
1 
1      top:
1 
1         asm volatile goto ("some assembler instructions here"
1         : /* No outputs. */
1         : "q" (iInt), "X" (sizeof(unsigned char) + 1)
1         : /* No clobbers. */
1         : top);
1      }
1 
1  With no modifiers, this is what the output from the operands would be
1 for the 'att' and 'intel' dialects of assembler:
1 
1 Operand   'att'  'intel'
1 -----------------------------------
1 '%0'      '%eax' 'eax'
1 '%1'      '$2'   '2'
1 '%2'      '$.L2' 'OFFSET
1                  FLAT:.L2'
1 
1  The table below shows the list of supported modifiers and their
1 effects.
1 
1 Modifier   Description                                  Operand   'att'   'intel'
1 ------------------------------------------------------------------------------------
1 'z'        Print the opcode suffix for the size of      '%z0'     'l'
1            the current integer operand (one of
1            'b'/'w'/'l'/'q').
1 'b'        Print the QImode name of the register.       '%b0'     '%al'   'al'
1 'h'        Print the QImode name for a "high"           '%h0'     '%ah'   'ah'
1            register.
1 'w'        Print the HImode name of the register.       '%w0'     '%ax'   'ax'
1 'k'        Print the SImode name of the register.       '%k0'     '%eax'  'eax'
1 'q'        Print the DImode name of the register.       '%q0'     '%rax'  'rax'
1 'l'        Print the label name with no punctuation.    '%l2'     '.L2'   '.L2'
1 'c'        Require a constant operand and print the     '%c1'     '2'     '2'
1            constant expression with no punctuation.
1 
1  'V' is a special modifier which prints the name of the full integer
1 register without '%'.
1 
1 6.45.2.9 x86 Floating-Point 'asm' Operands
1 ..........................................
1 
1 On x86 targets, there are several rules on the usage of stack-like
1 registers in the operands of an 'asm'.  These rules apply only to the
1 operands that are stack-like registers:
1 
1   1. Given a set of input registers that die in an 'asm', it is
1      necessary to know which are implicitly popped by the 'asm', and
1      which must be explicitly popped by GCC.
1 
1      An input register that is implicitly popped by the 'asm' must be
1      explicitly clobbered, unless it is constrained to match an output
1      operand.
1 
1   2. For any input register that is implicitly popped by an 'asm', it is
1      necessary to know how to adjust the stack to compensate for the
1      pop.  If any non-popped input is closer to the top of the reg-stack
1      than the implicitly popped register, it would not be possible to
1      know what the stack looked like--it's not clear how the rest of the
1      stack "slides up".
1 
1      All implicitly popped input registers must be closer to the top of
1      the reg-stack than any input that is not implicitly popped.
1 
1      It is possible that if an input dies in an 'asm', the compiler
1      might use the input register for an output reload.  Consider this
1      example:
1 
1           asm ("foo" : "=t" (a) : "f" (b));
1 
1      This code says that input 'b' is not popped by the 'asm', and that
1      the 'asm' pushes a result onto the reg-stack, i.e., the stack is
1      one deeper after the 'asm' than it was before.  But, it is possible
1      that reload may think that it can use the same register for both
1      the input and the output.
1 
1      To prevent this from happening, if any input operand uses the 'f'
1      constraint, all output register constraints must use the '&'
1      early-clobber modifier.
1 
1      The example above is correctly written as:
1 
1           asm ("foo" : "=&t" (a) : "f" (b));
1 
1   3. Some operands need to be in particular places on the stack.  All
1      output operands fall in this category--GCC has no other way to know
1      which registers the outputs appear in unless you indicate this in
1      the constraints.
1 
1      Output operands must specifically indicate which register an output
1      appears in after an 'asm'.  '=f' is not allowed: the operand
1      constraints must select a class with a single register.
1 
1   4. Output operands may not be "inserted" between existing stack
1      registers.  Since no 387 opcode uses a read/write operand, all
1      output operands are dead before the 'asm', and are pushed by the
1      'asm'.  It makes no sense to push anywhere but the top of the
1      reg-stack.
1 
1      Output operands must start at the top of the reg-stack: output
1      operands may not "skip" a register.
1 
1   5. Some 'asm' statements may need extra stack space for internal
1      calculations.  This can be guaranteed by clobbering stack registers
1      unrelated to the inputs and outputs.
1 
1  This 'asm' takes one input, which is internally popped, and produces
1 two outputs.
1 
1      asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp));
1 
1 This 'asm' takes two inputs, which are popped by the 'fyl2xp1' opcode,
1 and replaces them with one output.  The 'st(1)' clobber is necessary for
1 the compiler to know that 'fyl2xp1' pops both inputs.
1 
1      asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)");
1