gcc: Vector Extensions

1 
1 6.50 Using Vector Instructions through Built-in Functions
1 =========================================================
1 
1 On some targets, the instruction set contains SIMD vector instructions
1 which operate on multiple values contained in one large register at the
1 same time.  For example, on the x86 the MMX, 3DNow! and SSE extensions
1 can be used this way.
1 
1  The first step in using these extensions is to provide the necessary
1 data types.  This should be done using an appropriate 'typedef':
1 
1      typedef int v4si __attribute__ ((vector_size (16)));
1 
1 The 'int' type specifies the base type, while the attribute specifies
1 the vector size for the variable, measured in bytes.  For example, the
1 declaration above causes the compiler to set the mode for the 'v4si'
1 type to be 16 bytes wide and divided into 'int' sized units.  For a
1 32-bit 'int' this means a vector of 4 units of 4 bytes, and the
1 corresponding mode of 'foo' is V4SI.
1 
1  The 'vector_size' attribute is only applicable to integral and float
1 scalars, although arrays, pointers, and function return values are
1 allowed in conjunction with this construct.  Only sizes that are a power
1 of two are currently allowed.
1 
1  All the basic integer types can be used as base types, both as signed
1 and as unsigned: 'char', 'short', 'int', 'long', 'long long'.  In
1 addition, 'float' and 'double' can be used to build floating-point
1 vector types.
1 
1  Specifying a combination that is not valid for the current architecture
1 causes GCC to synthesize the instructions using a narrower mode.  For
1 example, if you specify a variable of type 'V4SI' and your architecture
1 does not allow for this specific SIMD type, GCC produces code that uses
1 4 'SIs'.
1 
1  The types defined in this manner can be used with a subset of normal C
1 operations.  Currently, GCC allows using the following operators on
1 these types: '+, -, *, /, unary minus, ^, |, &, ~, %'.
1 
1  The operations behave like C++ 'valarrays'.  Addition is defined as the
1 addition of the corresponding elements of the operands.  For example, in
1 the code below, each of the 4 elements in A is added to the
1 corresponding 4 elements in B and the resulting vector is stored in C.
1 
1      typedef int v4si __attribute__ ((vector_size (16)));
1 
1      v4si a, b, c;
1 
1      c = a + b;
1 
1  Subtraction, multiplication, division, and the logical operations
1 operate in a similar manner.  Likewise, the result of using the unary
1 minus or complement operators on a vector type is a vector whose
1 elements are the negative or complemented values of the corresponding
1 elements in the operand.
1 
1  It is possible to use shifting operators '<<', '>>' on integer-type
1 vectors.  The operation is defined as following: '{a0, a1, ..., an} >>
1 {b0, b1, ..., bn} == {a0 >> b0, a1 >> b1, ..., an >> bn}'.  Vector
1 operands must have the same number of elements.
1 
1  For convenience, it is allowed to use a binary vector operation where
1 one operand is a scalar.  In that case the compiler transforms the
1 scalar operand into a vector where each element is the scalar from the
1 operation.  The transformation happens only if the scalar could be
1 safely converted to the vector-element type.  Consider the following
1 code.
1 
1      typedef int v4si __attribute__ ((vector_size (16)));
1 
1      v4si a, b, c;
1      long l;
1 
1      a = b + 1;    /* a = b + {1,1,1,1}; */
1      a = 2 * b;    /* a = {2,2,2,2} * b; */
1 
1      a = l + a;    /* Error, cannot convert long to int. */
1 
1  Vectors can be subscripted as if the vector were an array with the same
1 number of elements and base type.  Out of bound accesses invoke
1 undefined behavior at run time.  Warnings for out of bound accesses for
1 vector subscription can be enabled with '-Warray-bounds'.
1 
1  Vector comparison is supported with standard comparison operators: '==,
1 !=, <, <=, >, >='.  Comparison operands can be vector expressions of
1 integer-type or real-type.  Comparison between integer-type vectors and
1 real-type vectors are not supported.  The result of the comparison is a
1 vector of the same width and number of elements as the comparison
1 operands with a signed integral element type.
1 
1  Vectors are compared element-wise producing 0 when comparison is false
1 and -1 (constant of the appropriate type where all bits are set)
1 otherwise.  Consider the following example.
1 
1      typedef int v4si __attribute__ ((vector_size (16)));
1 
1      v4si a = {1,2,3,4};
1      v4si b = {3,2,1,4};
1      v4si c;
1 
1      c = a >  b;     /* The result would be {0, 0,-1, 0}  */
1      c = a == b;     /* The result would be {0,-1, 0,-1}  */
1 
1  In C++, the ternary operator '?:' is available.  'a?b:c', where 'b' and
1 'c' are vectors of the same type and 'a' is an integer vector with the
1 same number of elements of the same size as 'b' and 'c', computes all
1 three arguments and creates a vector '{a[0]?b[0]:c[0], a[1]?b[1]:c[1],
1 ...}'.  Note that unlike in OpenCL, 'a' is thus interpreted as 'a != 0'
1 and not 'a < 0'.  As in the case of binary operations, this syntax is
1 also accepted when one of 'b' or 'c' is a scalar that is then
1 transformed into a vector.  If both 'b' and 'c' are scalars and the type
1 of 'true?b:c' has the same size as the element type of 'a', then 'b' and
1 'c' are converted to a vector type whose elements have this type and
1 with the same number of elements as 'a'.
1 
1  In C++, the logic operators '!, &&, ||' are available for vectors.
1 '!v' is equivalent to 'v == 0', 'a && b' is equivalent to 'a!=0 & b!=0'
1 and 'a || b' is equivalent to 'a!=0 | b!=0'.  For mixed operations
1 between a scalar 's' and a vector 'v', 's && v' is equivalent to
1 's?v!=0:0' (the evaluation is short-circuit) and 'v && s' is equivalent
1 to 'v!=0 & (s?-1:0)'.
1 
1  Vector shuffling is available using functions '__builtin_shuffle (vec,
1 mask)' and '__builtin_shuffle (vec0, vec1, mask)'.  Both functions
1 construct a permutation of elements from one or two vectors and return a
1 vector of the same type as the input vector(s).  The MASK is an integral
1 vector with the same width (W) and element count (N) as the output
1 vector.
1 
1  The elements of the input vectors are numbered in memory ordering of
1 VEC0 beginning at 0 and VEC1 beginning at N.  The elements of MASK are
1 considered modulo N in the single-operand case and modulo 2*N in the
1 two-operand case.
1 
1  Consider the following example,
1 
1      typedef int v4si __attribute__ ((vector_size (16)));
1 
1      v4si a = {1,2,3,4};
1      v4si b = {5,6,7,8};
1      v4si mask1 = {0,1,1,3};
1      v4si mask2 = {0,4,2,5};
1      v4si res;
1 
1      res = __builtin_shuffle (a, mask1);       /* res is {1,2,2,4}  */
1      res = __builtin_shuffle (a, b, mask2);    /* res is {1,5,3,6}  */
1 
1  Note that '__builtin_shuffle' is intentionally semantically compatible
1 with the OpenCL 'shuffle' and 'shuffle2' functions.
1 
1  You can declare variables and use them in function calls and returns,
1 as well as in assignments and some casts.  You can specify a vector type
1 as a return type for a function.  Vector types can also be used as
1 function arguments.  It is possible to cast from one vector type to
1 another, provided they are of the same size (in fact, you can also cast
1 vectors to and from other datatypes of the same size).
1 
1  You cannot operate between vectors of different lengths or different
1 signedness without a cast.
1