gawk: Bitwise Functions

1 
1 9.1.6 Bit-Manipulation Functions
1 --------------------------------
1 
1      I can explain it for you, but I can't understand it for you.
1                             -- _Anonymous_
1 
1    Many languages provide the ability to perform "bitwise" operations on
1 two integer numbers.  In other words, the operation is performed on each
1 successive pair of bits in the operands.  Three common operations are
11 bitwise AND, OR, and XOR. The operations are described in ⇒Table
 9.6 table-bitwise-ops.
1 
1                 Bit operator
1           |  AND  |   OR  |  XOR
1           |---+---+---+---+---+---
1 Operands  | 0 | 1 | 0 | 1 | 0 | 1
1 ----------+---+---+---+---+---+---
1     0     | 0   0 | 0   1 | 0   1
1     1     | 0   1 | 1   1 | 1   0
1 
1 Table 9.6: Bitwise operations
1 
1    As you can see, the result of an AND operation is 1 only when _both_
1 bits are 1.  The result of an OR operation is 1 if _either_ bit is 1.
1 The result of an XOR operation is 1 if either bit is 1, but not both.
1 The next operation is the "complement"; the complement of 1 is 0 and the
1 complement of 0 is 1.  Thus, this operation "flips" all the bits of a
1 given value.
1 
1    Finally, two other common operations are to shift the bits left or
1 right.  For example, if you have a bit string '10111001' and you shift
1 it right by three bits, you end up with '00010111'.(1)  If you start
1 over again with '10111001' and shift it left by three bits, you end up
1 with '11001000'.  The following list describes 'gawk''s built-in
1 functions that implement the bitwise operations.  Optional parameters
1 are enclosed in square brackets ([ ]):
1 
1 'and(V1, V2 [, ...])'
1      Return the bitwise AND of the arguments.  There must be at least
1      two.
1 
1 'compl(VAL)'
1      Return the bitwise complement of VAL.
1 
1 'lshift(VAL, COUNT)'
1      Return the value of VAL, shifted left by COUNT bits.
1 
1 'or(V1, V2 [, ...])'
1      Return the bitwise OR of the arguments.  There must be at least
1      two.
1 
1 'rshift(VAL, COUNT)'
1      Return the value of VAL, shifted right by COUNT bits.
1 
1 'xor(V1, V2 [, ...])'
1      Return the bitwise XOR of the arguments.  There must be at least
1      two.
1 
1      CAUTION: Beginning with 'gawk' version 4.2, negative operands are
1      not allowed for any of these functions.  A negative operand
1      produces a fatal error.  See the sidebar "Beware The Smoke and
1      Mirrors!"  for more information as to why.
1 
1    Here is a user-defined function (⇒User-defined) that
1 illustrates the use of these functions:
1 
1      # bits2str --- turn an integer into readable ones and zeros
1 
1      function bits2str(bits,        data, mask)
1      {
1          if (bits == 0)
1              return "0"
1 
1          mask = 1
1          for (; bits != 0; bits = rshift(bits, 1))
1              data = (and(bits, mask) ? "1" : "0") data
1 
1          while ((length(data) % 8) != 0)
1              data = "0" data
1 
1          return data
1      }
1 
1      BEGIN {
1          printf "123 = %s\n", bits2str(123)
1          printf "0123 = %s\n", bits2str(0123)
1          printf "0x99 = %s\n", bits2str(0x99)
1          comp = compl(0x99)
1          printf "compl(0x99) = %#x = %s\n", comp, bits2str(comp)
1          shift = lshift(0x99, 2)
1          printf "lshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
1          shift = rshift(0x99, 2)
1          printf "rshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
1      }
1 
1 This program produces the following output when run:
1 
1      $ gawk -f testbits.awk
1      -| 123 = 01111011
1      -| 0123 = 01010011
1      -| 0x99 = 10011001
1      -| compl(0x99) = 0x3fffffffffff66 =
1      -| 00111111111111111111111111111111111111111111111101100110
1      -| lshift(0x99, 2) = 0x264 = 0000001001100100
1      -| rshift(0x99, 2) = 0x26 = 00100110
1 
1    The 'bits2str()' function turns a binary number into a string.
1 Initializing 'mask' to one creates a binary value where the rightmost
1 bit is set to one.  Using this mask, the function repeatedly checks the
1 rightmost bit.  ANDing the mask with the value indicates whether the
1 rightmost bit is one or not.  If so, a '"1"' is concatenated onto the
1 front of the string.  Otherwise, a '"0"' is added.  The value is then
1 shifted right by one bit and the loop continues until there are no more
1 one bits.
1 
1    If the initial value is zero, it returns a simple '"0"'.  Otherwise,
1 at the end, it pads the value with zeros to represent multiples of 8-bit
1 quantities.  This is typical in modern computers.
1 
1    The main code in the 'BEGIN' rule shows the difference between the
11 decimal and octal values for the same numbers (⇒
 Nondecimal-numbers), and then demonstrates the results of the
1 'compl()', 'lshift()', and 'rshift()' functions.
1 
1                      Beware The Smoke and Mirrors!
1 
1    It other languages, bitwise operations are performed on integer
1 values, not floating-point values.  As a general statement, such
1 operations work best when performed on unsigned integers.
1 
1    'gawk' attempts to treat the arguments to the bitwise functions as
1 unsigned integers.  For this reason, negative arguments produce a fatal
1 error.
1 
1    In normal operation, for all of these functions, first the
1 double-precision floating-point value is converted to the widest C
1 unsigned integer type, then the bitwise operation is performed.  If the
1 result cannot be represented exactly as a C 'double', leading nonzero
1 bits are removed one by one until it can be represented exactly.  The
1 result is then converted back into a C 'double'.(2)
1 
1    However, when using arbitrary precision arithmetic with the '-M'
1 option (⇒Arbitrary Precision Arithmetic), the results may differ.
1 This is particularly noticeable with the 'compl()' function:
1 
1      $ gawk 'BEGIN { print compl(42) }'
1      -| 9007199254740949
1      $ gawk -M 'BEGIN { print compl(42) }'
1      -| -43
1 
1    What's going on becomes clear when printing the results in
1 hexadecimal:
1 
1      $ gawk 'BEGIN { printf "%#x\n", compl(42) }'
1      -| 0x1fffffffffffd5
1      $ gawk -M 'BEGIN { printf "%#x\n", compl(42) }'
1      -| 0xffffffffffffffd5
1 
1    When using the '-M' option, under the hood, 'gawk' uses GNU MP
1 arbitrary precision integers which have at least 64 bits of precision.
1 When not using '-M', 'gawk' stores integral values in regular
1 double-precision floating point, which only maintain 53 bits of
1 precision.  Furthermore, the GNU MP library treats (or at least seems to
1 treat) the leading bit as a sign bit; thus the result with '-M' in this
1 case is a negative number.
1 
1    In short, using 'gawk' for any but the simplest kind of bitwise
1 operations is probably a bad idea; caveat emptor!
1 
1    ---------- Footnotes ----------
1 
1    (1) This example shows that zeros come in on the left side.  For
1 'gawk', this is always true, but in some languages, it's possible to
1 have the left side fill with ones.
1 
1    (2) If you don't understand this paragraph, the upshot is that 'gawk'
1 can only store a particular range of integer values; numbers outside
1 that range are reduced to fit within the range.
1