gcc: MIPS DSP Built-in Functions

1 
1 6.59.13 MIPS DSP Built-in Functions
1 -----------------------------------
1 
1 The MIPS DSP Application-Specific Extension (ASE) includes new
1 instructions that are designed to improve the performance of DSP and
1 media applications.  It provides instructions that operate on packed
1 8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data.
1 
1  GCC supports MIPS DSP operations using both the generic vector
1 extensions (⇒Vector Extensions) and a collection of MIPS-specific
1 built-in functions.  Both kinds of support are enabled by the '-mdsp'
1 command-line option.
1 
1  Revision 2 of the ASE was introduced in the second half of 2006.  This
1 revision adds extra instructions to the original ASE, but is otherwise
1 backwards-compatible with it.  You can select revision 2 using the
1 command-line option '-mdspr2'; this option implies '-mdsp'.
1 
1  The SCOUNT and POS bits of the DSP control register are global.  The
1 WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and POS
1 bits.  During optimization, the compiler does not delete these
1 instructions and it does not delete calls to functions containing these
1 instructions.
1 
1  At present, GCC only provides support for operations on 32-bit vectors.
1 The vector type associated with 8-bit integer data is usually called
1 'v4i8', the vector type associated with Q7 is usually called 'v4q7', the
1 vector type associated with 16-bit integer data is usually called
1 'v2i16', and the vector type associated with Q15 is usually called
1 'v2q15'.  They can be defined in C as follows:
1 
1      typedef signed char v4i8 __attribute__ ((vector_size(4)));
1      typedef signed char v4q7 __attribute__ ((vector_size(4)));
1      typedef short v2i16 __attribute__ ((vector_size(4)));
1      typedef short v2q15 __attribute__ ((vector_size(4)));
1 
1  'v4i8', 'v4q7', 'v2i16' and 'v2q15' values are initialized in the same
1 way as aggregates.  For example:
1 
1      v4i8 a = {1, 2, 3, 4};
1      v4i8 b;
1      b = (v4i8) {5, 6, 7, 8};
1 
1      v2q15 c = {0x0fcb, 0x3a75};
1      v2q15 d;
1      d = (v2q15) {0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15};
1 
1  _Note:_ The CPU's endianness determines the order in which values are
1 packed.  On little-endian targets, the first value is the least
1 significant and the last value is the most significant.  The opposite
1 order applies to big-endian targets.  For example, the code above sets
1 the lowest byte of 'a' to '1' on little-endian targets and '4' on
1 big-endian targets.
1 
1  _Note:_ Q7, Q15 and Q31 values must be initialized with their integer
1 representation.  As shown in this example, the integer representation of
1 a Q7 value can be obtained by multiplying the fractional value by
1 '0x1.0p7'.  The equivalent for Q15 values is to multiply by '0x1.0p15'.
1 The equivalent for Q31 values is to multiply by '0x1.0p31'.
1 
1  The table below lists the 'v4i8' and 'v2q15' operations for which
1 hardware support exists.  'a' and 'b' are 'v4i8' values, and 'c' and 'd'
1 are 'v2q15' values.
1 
1 C code                               MIPS instruction
1 'a + b'                              'addu.qb'
1 'c + d'                              'addq.ph'
1 'a - b'                              'subu.qb'
1 'c - d'                              'subq.ph'
1 
1  The table below lists the 'v2i16' operation for which hardware support
1 exists for the DSP ASE REV 2.  'e' and 'f' are 'v2i16' values.
1 
1 C code                               MIPS instruction
1 'e * f'                              'mul.ph'
1 
1  It is easier to describe the DSP built-in functions if we first define
1 the following types:
1 
1      typedef int q31;
1      typedef int i32;
1      typedef unsigned int ui32;
1      typedef long long a64;
1 
1  'q31' and 'i32' are actually the same as 'int', but we use 'q31' to
1 indicate a Q31 fractional value and 'i32' to indicate a 32-bit integer
1 value.  Similarly, 'a64' is the same as 'long long', but we use 'a64' to
1 indicate values that are placed in one of the four DSP accumulators
1 ('$ac0', '$ac1', '$ac2' or '$ac3').
1 
1  Also, some built-in functions prefer or require immediate numbers as
1 parameters, because the corresponding DSP instructions accept both
1 immediate numbers and register operands, or accept immediate numbers
1 only.  The immediate parameters are listed as follows.
1 
1      imm0_3: 0 to 3.
1      imm0_7: 0 to 7.
1      imm0_15: 0 to 15.
1      imm0_31: 0 to 31.
1      imm0_63: 0 to 63.
1      imm0_255: 0 to 255.
1      imm_n32_31: -32 to 31.
1      imm_n512_511: -512 to 511.
1 
1  The following built-in functions map directly to a particular MIPS DSP
1 instruction.  Please refer to the architecture specification for details
1 on what each instruction does.
1 
1      v2q15 __builtin_mips_addq_ph (v2q15, v2q15)
1      v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15)
1      q31 __builtin_mips_addq_s_w (q31, q31)
1      v4i8 __builtin_mips_addu_qb (v4i8, v4i8)
1      v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8)
1      v2q15 __builtin_mips_subq_ph (v2q15, v2q15)
1      v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15)
1      q31 __builtin_mips_subq_s_w (q31, q31)
1      v4i8 __builtin_mips_subu_qb (v4i8, v4i8)
1      v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8)
1      i32 __builtin_mips_addsc (i32, i32)
1      i32 __builtin_mips_addwc (i32, i32)
1      i32 __builtin_mips_modsub (i32, i32)
1      i32 __builtin_mips_raddu_w_qb (v4i8)
1      v2q15 __builtin_mips_absq_s_ph (v2q15)
1      q31 __builtin_mips_absq_s_w (q31)
1      v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15)
1      v2q15 __builtin_mips_precrq_ph_w (q31, q31)
1      v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31)
1      v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15)
1      q31 __builtin_mips_preceq_w_phl (v2q15)
1      q31 __builtin_mips_preceq_w_phr (v2q15)
1      v2q15 __builtin_mips_precequ_ph_qbl (v4i8)
1      v2q15 __builtin_mips_precequ_ph_qbr (v4i8)
1      v2q15 __builtin_mips_precequ_ph_qbla (v4i8)
1      v2q15 __builtin_mips_precequ_ph_qbra (v4i8)
1      v2q15 __builtin_mips_preceu_ph_qbl (v4i8)
1      v2q15 __builtin_mips_preceu_ph_qbr (v4i8)
1      v2q15 __builtin_mips_preceu_ph_qbla (v4i8)
1      v2q15 __builtin_mips_preceu_ph_qbra (v4i8)
1      v4i8 __builtin_mips_shll_qb (v4i8, imm0_7)
1      v4i8 __builtin_mips_shll_qb (v4i8, i32)
1      v2q15 __builtin_mips_shll_ph (v2q15, imm0_15)
1      v2q15 __builtin_mips_shll_ph (v2q15, i32)
1      v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15)
1      v2q15 __builtin_mips_shll_s_ph (v2q15, i32)
1      q31 __builtin_mips_shll_s_w (q31, imm0_31)
1      q31 __builtin_mips_shll_s_w (q31, i32)
1      v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7)
1      v4i8 __builtin_mips_shrl_qb (v4i8, i32)
1      v2q15 __builtin_mips_shra_ph (v2q15, imm0_15)
1      v2q15 __builtin_mips_shra_ph (v2q15, i32)
1      v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15)
1      v2q15 __builtin_mips_shra_r_ph (v2q15, i32)
1      q31 __builtin_mips_shra_r_w (q31, imm0_31)
1      q31 __builtin_mips_shra_r_w (q31, i32)
1      v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15)
1      v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15)
1      v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15)
1      q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15)
1      q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15)
1      a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8)
1      a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8)
1      a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8)
1      a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8)
1      a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15)
1      a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31)
1      a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15)
1      a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31)
1      a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15)
1      a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15)
1      a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15)
1      a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15)
1      a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15)
1      i32 __builtin_mips_bitrev (i32)
1      i32 __builtin_mips_insv (i32, i32)
1      v4i8 __builtin_mips_repl_qb (imm0_255)
1      v4i8 __builtin_mips_repl_qb (i32)
1      v2q15 __builtin_mips_repl_ph (imm_n512_511)
1      v2q15 __builtin_mips_repl_ph (i32)
1      void __builtin_mips_cmpu_eq_qb (v4i8, v4i8)
1      void __builtin_mips_cmpu_lt_qb (v4i8, v4i8)
1      void __builtin_mips_cmpu_le_qb (v4i8, v4i8)
1      i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8)
1      i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8)
1      i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8)
1      void __builtin_mips_cmp_eq_ph (v2q15, v2q15)
1      void __builtin_mips_cmp_lt_ph (v2q15, v2q15)
1      void __builtin_mips_cmp_le_ph (v2q15, v2q15)
1      v4i8 __builtin_mips_pick_qb (v4i8, v4i8)
1      v2q15 __builtin_mips_pick_ph (v2q15, v2q15)
1      v2q15 __builtin_mips_packrl_ph (v2q15, v2q15)
1      i32 __builtin_mips_extr_w (a64, imm0_31)
1      i32 __builtin_mips_extr_w (a64, i32)
1      i32 __builtin_mips_extr_r_w (a64, imm0_31)
1      i32 __builtin_mips_extr_s_h (a64, i32)
1      i32 __builtin_mips_extr_rs_w (a64, imm0_31)
1      i32 __builtin_mips_extr_rs_w (a64, i32)
1      i32 __builtin_mips_extr_s_h (a64, imm0_31)
1      i32 __builtin_mips_extr_r_w (a64, i32)
1      i32 __builtin_mips_extp (a64, imm0_31)
1      i32 __builtin_mips_extp (a64, i32)
1      i32 __builtin_mips_extpdp (a64, imm0_31)
1      i32 __builtin_mips_extpdp (a64, i32)
1      a64 __builtin_mips_shilo (a64, imm_n32_31)
1      a64 __builtin_mips_shilo (a64, i32)
1      a64 __builtin_mips_mthlip (a64, i32)
1      void __builtin_mips_wrdsp (i32, imm0_63)
1      i32 __builtin_mips_rddsp (imm0_63)
1      i32 __builtin_mips_lbux (void *, i32)
1      i32 __builtin_mips_lhx (void *, i32)
1      i32 __builtin_mips_lwx (void *, i32)
1      a64 __builtin_mips_ldx (void *, i32) [MIPS64 only]
1      i32 __builtin_mips_bposge32 (void)
1      a64 __builtin_mips_madd (a64, i32, i32);
1      a64 __builtin_mips_maddu (a64, ui32, ui32);
1      a64 __builtin_mips_msub (a64, i32, i32);
1      a64 __builtin_mips_msubu (a64, ui32, ui32);
1      a64 __builtin_mips_mult (i32, i32);
1      a64 __builtin_mips_multu (ui32, ui32);
1 
1  The following built-in functions map directly to a particular MIPS DSP
1 REV 2 instruction.  Please refer to the architecture specification for
1 details on what each instruction does.
1 
1      v4q7 __builtin_mips_absq_s_qb (v4q7);
1      v2i16 __builtin_mips_addu_ph (v2i16, v2i16);
1      v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16);
1      v4i8 __builtin_mips_adduh_qb (v4i8, v4i8);
1      v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8);
1      i32 __builtin_mips_append (i32, i32, imm0_31);
1      i32 __builtin_mips_balign (i32, i32, imm0_3);
1      i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8);
1      i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8);
1      i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8);
1      a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16);
1      a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16);
1      v2i16 __builtin_mips_mul_ph (v2i16, v2i16);
1      v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16);
1      q31 __builtin_mips_mulq_rs_w (q31, q31);
1      v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15);
1      q31 __builtin_mips_mulq_s_w (q31, q31);
1      a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16);
1      v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16);
1      v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31);
1      v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31);
1      i32 __builtin_mips_prepend (i32, i32, imm0_31);
1      v4i8 __builtin_mips_shra_qb (v4i8, imm0_7);
1      v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7);
1      v4i8 __builtin_mips_shra_qb (v4i8, i32);
1      v4i8 __builtin_mips_shra_r_qb (v4i8, i32);
1      v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15);
1      v2i16 __builtin_mips_shrl_ph (v2i16, i32);
1      v2i16 __builtin_mips_subu_ph (v2i16, v2i16);
1      v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16);
1      v4i8 __builtin_mips_subuh_qb (v4i8, v4i8);
1      v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8);
1      v2q15 __builtin_mips_addqh_ph (v2q15, v2q15);
1      v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15);
1      q31 __builtin_mips_addqh_w (q31, q31);
1      q31 __builtin_mips_addqh_r_w (q31, q31);
1      v2q15 __builtin_mips_subqh_ph (v2q15, v2q15);
1      v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15);
1      q31 __builtin_mips_subqh_w (q31, q31);
1      q31 __builtin_mips_subqh_r_w (q31, q31);
1      a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16);
1      a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16);
1      a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15);
1      a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15);
1      a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15);
1      a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15);
1