gcc: Vector Extensions
1
1 6.50 Using Vector Instructions through Built-in Functions
1 =========================================================
1
1 On some targets, the instruction set contains SIMD vector instructions
1 which operate on multiple values contained in one large register at the
1 same time. For example, on the x86 the MMX, 3DNow! and SSE extensions
1 can be used this way.
1
1 The first step in using these extensions is to provide the necessary
1 data types. This should be done using an appropriate 'typedef':
1
1 typedef int v4si __attribute__ ((vector_size (16)));
1
1 The 'int' type specifies the base type, while the attribute specifies
1 the vector size for the variable, measured in bytes. For example, the
1 declaration above causes the compiler to set the mode for the 'v4si'
1 type to be 16 bytes wide and divided into 'int' sized units. For a
1 32-bit 'int' this means a vector of 4 units of 4 bytes, and the
1 corresponding mode of 'foo' is V4SI.
1
1 The 'vector_size' attribute is only applicable to integral and float
1 scalars, although arrays, pointers, and function return values are
1 allowed in conjunction with this construct. Only sizes that are a power
1 of two are currently allowed.
1
1 All the basic integer types can be used as base types, both as signed
1 and as unsigned: 'char', 'short', 'int', 'long', 'long long'. In
1 addition, 'float' and 'double' can be used to build floating-point
1 vector types.
1
1 Specifying a combination that is not valid for the current architecture
1 causes GCC to synthesize the instructions using a narrower mode. For
1 example, if you specify a variable of type 'V4SI' and your architecture
1 does not allow for this specific SIMD type, GCC produces code that uses
1 4 'SIs'.
1
1 The types defined in this manner can be used with a subset of normal C
1 operations. Currently, GCC allows using the following operators on
1 these types: '+, -, *, /, unary minus, ^, |, &, ~, %'.
1
1 The operations behave like C++ 'valarrays'. Addition is defined as the
1 addition of the corresponding elements of the operands. For example, in
1 the code below, each of the 4 elements in A is added to the
1 corresponding 4 elements in B and the resulting vector is stored in C.
1
1 typedef int v4si __attribute__ ((vector_size (16)));
1
1 v4si a, b, c;
1
1 c = a + b;
1
1 Subtraction, multiplication, division, and the logical operations
1 operate in a similar manner. Likewise, the result of using the unary
1 minus or complement operators on a vector type is a vector whose
1 elements are the negative or complemented values of the corresponding
1 elements in the operand.
1
1 It is possible to use shifting operators '<<', '>>' on integer-type
1 vectors. The operation is defined as following: '{a0, a1, ..., an} >>
1 {b0, b1, ..., bn} == {a0 >> b0, a1 >> b1, ..., an >> bn}'. Vector
1 operands must have the same number of elements.
1
1 For convenience, it is allowed to use a binary vector operation where
1 one operand is a scalar. In that case the compiler transforms the
1 scalar operand into a vector where each element is the scalar from the
1 operation. The transformation happens only if the scalar could be
1 safely converted to the vector-element type. Consider the following
1 code.
1
1 typedef int v4si __attribute__ ((vector_size (16)));
1
1 v4si a, b, c;
1 long l;
1
1 a = b + 1; /* a = b + {1,1,1,1}; */
1 a = 2 * b; /* a = {2,2,2,2} * b; */
1
1 a = l + a; /* Error, cannot convert long to int. */
1
1 Vectors can be subscripted as if the vector were an array with the same
1 number of elements and base type. Out of bound accesses invoke
1 undefined behavior at run time. Warnings for out of bound accesses for
1 vector subscription can be enabled with '-Warray-bounds'.
1
1 Vector comparison is supported with standard comparison operators: '==,
1 !=, <, <=, >, >='. Comparison operands can be vector expressions of
1 integer-type or real-type. Comparison between integer-type vectors and
1 real-type vectors are not supported. The result of the comparison is a
1 vector of the same width and number of elements as the comparison
1 operands with a signed integral element type.
1
1 Vectors are compared element-wise producing 0 when comparison is false
1 and -1 (constant of the appropriate type where all bits are set)
1 otherwise. Consider the following example.
1
1 typedef int v4si __attribute__ ((vector_size (16)));
1
1 v4si a = {1,2,3,4};
1 v4si b = {3,2,1,4};
1 v4si c;
1
1 c = a > b; /* The result would be {0, 0,-1, 0} */
1 c = a == b; /* The result would be {0,-1, 0,-1} */
1
1 In C++, the ternary operator '?:' is available. 'a?b:c', where 'b' and
1 'c' are vectors of the same type and 'a' is an integer vector with the
1 same number of elements of the same size as 'b' and 'c', computes all
1 three arguments and creates a vector '{a[0]?b[0]:c[0], a[1]?b[1]:c[1],
1 ...}'. Note that unlike in OpenCL, 'a' is thus interpreted as 'a != 0'
1 and not 'a < 0'. As in the case of binary operations, this syntax is
1 also accepted when one of 'b' or 'c' is a scalar that is then
1 transformed into a vector. If both 'b' and 'c' are scalars and the type
1 of 'true?b:c' has the same size as the element type of 'a', then 'b' and
1 'c' are converted to a vector type whose elements have this type and
1 with the same number of elements as 'a'.
1
1 In C++, the logic operators '!, &&, ||' are available for vectors.
1 '!v' is equivalent to 'v == 0', 'a && b' is equivalent to 'a!=0 & b!=0'
1 and 'a || b' is equivalent to 'a!=0 | b!=0'. For mixed operations
1 between a scalar 's' and a vector 'v', 's && v' is equivalent to
1 's?v!=0:0' (the evaluation is short-circuit) and 'v && s' is equivalent
1 to 'v!=0 & (s?-1:0)'.
1
1 Vector shuffling is available using functions '__builtin_shuffle (vec,
1 mask)' and '__builtin_shuffle (vec0, vec1, mask)'. Both functions
1 construct a permutation of elements from one or two vectors and return a
1 vector of the same type as the input vector(s). The MASK is an integral
1 vector with the same width (W) and element count (N) as the output
1 vector.
1
1 The elements of the input vectors are numbered in memory ordering of
1 VEC0 beginning at 0 and VEC1 beginning at N. The elements of MASK are
1 considered modulo N in the single-operand case and modulo 2*N in the
1 two-operand case.
1
1 Consider the following example,
1
1 typedef int v4si __attribute__ ((vector_size (16)));
1
1 v4si a = {1,2,3,4};
1 v4si b = {5,6,7,8};
1 v4si mask1 = {0,1,1,3};
1 v4si mask2 = {0,4,2,5};
1 v4si res;
1
1 res = __builtin_shuffle (a, mask1); /* res is {1,2,2,4} */
1 res = __builtin_shuffle (a, b, mask2); /* res is {1,5,3,6} */
1
1 Note that '__builtin_shuffle' is intentionally semantically compatible
1 with the OpenCL 'shuffle' and 'shuffle2' functions.
1
1 You can declare variables and use them in function calls and returns,
1 as well as in assignments and some casts. You can specify a vector type
1 as a return type for a function. Vector types can also be used as
1 function arguments. It is possible to cast from one vector type to
1 another, provided they are of the same size (in fact, you can also cast
1 vectors to and from other datatypes of the same size).
1
1 You cannot operate between vectors of different lengths or different
1 signedness without a cast.
1