gawk: Bitwise Functions
1
1 9.1.6 Bit-Manipulation Functions
1 --------------------------------
1
1 I can explain it for you, but I can't understand it for you.
1 -- _Anonymous_
1
1 Many languages provide the ability to perform "bitwise" operations on
1 two integer numbers. In other words, the operation is performed on each
1 successive pair of bits in the operands. Three common operations are
11 bitwise AND, OR, and XOR. The operations are described in ⇒Table
9.6 table-bitwise-ops.
1
1 Bit operator
1 | AND | OR | XOR
1 |---+---+---+---+---+---
1 Operands | 0 | 1 | 0 | 1 | 0 | 1
1 ----------+---+---+---+---+---+---
1 0 | 0 0 | 0 1 | 0 1
1 1 | 0 1 | 1 1 | 1 0
1
1 Table 9.6: Bitwise operations
1
1 As you can see, the result of an AND operation is 1 only when _both_
1 bits are 1. The result of an OR operation is 1 if _either_ bit is 1.
1 The result of an XOR operation is 1 if either bit is 1, but not both.
1 The next operation is the "complement"; the complement of 1 is 0 and the
1 complement of 0 is 1. Thus, this operation "flips" all the bits of a
1 given value.
1
1 Finally, two other common operations are to shift the bits left or
1 right. For example, if you have a bit string '10111001' and you shift
1 it right by three bits, you end up with '00010111'.(1) If you start
1 over again with '10111001' and shift it left by three bits, you end up
1 with '11001000'. The following list describes 'gawk''s built-in
1 functions that implement the bitwise operations. Optional parameters
1 are enclosed in square brackets ([ ]):
1
1 'and(V1, V2 [, ...])'
1 Return the bitwise AND of the arguments. There must be at least
1 two.
1
1 'compl(VAL)'
1 Return the bitwise complement of VAL.
1
1 'lshift(VAL, COUNT)'
1 Return the value of VAL, shifted left by COUNT bits.
1
1 'or(V1, V2 [, ...])'
1 Return the bitwise OR of the arguments. There must be at least
1 two.
1
1 'rshift(VAL, COUNT)'
1 Return the value of VAL, shifted right by COUNT bits.
1
1 'xor(V1, V2 [, ...])'
1 Return the bitwise XOR of the arguments. There must be at least
1 two.
1
1 CAUTION: Beginning with 'gawk' version 4.2, negative operands are
1 not allowed for any of these functions. A negative operand
1 produces a fatal error. See the sidebar "Beware The Smoke and
1 Mirrors!" for more information as to why.
1
1 Here is a user-defined function (⇒User-defined) that
1 illustrates the use of these functions:
1
1 # bits2str --- turn an integer into readable ones and zeros
1
1 function bits2str(bits, data, mask)
1 {
1 if (bits == 0)
1 return "0"
1
1 mask = 1
1 for (; bits != 0; bits = rshift(bits, 1))
1 data = (and(bits, mask) ? "1" : "0") data
1
1 while ((length(data) % 8) != 0)
1 data = "0" data
1
1 return data
1 }
1
1 BEGIN {
1 printf "123 = %s\n", bits2str(123)
1 printf "0123 = %s\n", bits2str(0123)
1 printf "0x99 = %s\n", bits2str(0x99)
1 comp = compl(0x99)
1 printf "compl(0x99) = %#x = %s\n", comp, bits2str(comp)
1 shift = lshift(0x99, 2)
1 printf "lshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
1 shift = rshift(0x99, 2)
1 printf "rshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
1 }
1
1 This program produces the following output when run:
1
1 $ gawk -f testbits.awk
1 -| 123 = 01111011
1 -| 0123 = 01010011
1 -| 0x99 = 10011001
1 -| compl(0x99) = 0x3fffffffffff66 =
1 -| 00111111111111111111111111111111111111111111111101100110
1 -| lshift(0x99, 2) = 0x264 = 0000001001100100
1 -| rshift(0x99, 2) = 0x26 = 00100110
1
1 The 'bits2str()' function turns a binary number into a string.
1 Initializing 'mask' to one creates a binary value where the rightmost
1 bit is set to one. Using this mask, the function repeatedly checks the
1 rightmost bit. ANDing the mask with the value indicates whether the
1 rightmost bit is one or not. If so, a '"1"' is concatenated onto the
1 front of the string. Otherwise, a '"0"' is added. The value is then
1 shifted right by one bit and the loop continues until there are no more
1 one bits.
1
1 If the initial value is zero, it returns a simple '"0"'. Otherwise,
1 at the end, it pads the value with zeros to represent multiples of 8-bit
1 quantities. This is typical in modern computers.
1
1 The main code in the 'BEGIN' rule shows the difference between the
11 decimal and octal values for the same numbers (⇒
Nondecimal-numbers), and then demonstrates the results of the
1 'compl()', 'lshift()', and 'rshift()' functions.
1
1 Beware The Smoke and Mirrors!
1
1 It other languages, bitwise operations are performed on integer
1 values, not floating-point values. As a general statement, such
1 operations work best when performed on unsigned integers.
1
1 'gawk' attempts to treat the arguments to the bitwise functions as
1 unsigned integers. For this reason, negative arguments produce a fatal
1 error.
1
1 In normal operation, for all of these functions, first the
1 double-precision floating-point value is converted to the widest C
1 unsigned integer type, then the bitwise operation is performed. If the
1 result cannot be represented exactly as a C 'double', leading nonzero
1 bits are removed one by one until it can be represented exactly. The
1 result is then converted back into a C 'double'.(2)
1
1 However, when using arbitrary precision arithmetic with the '-M'
1 option (⇒Arbitrary Precision Arithmetic), the results may differ.
1 This is particularly noticeable with the 'compl()' function:
1
1 $ gawk 'BEGIN { print compl(42) }'
1 -| 9007199254740949
1 $ gawk -M 'BEGIN { print compl(42) }'
1 -| -43
1
1 What's going on becomes clear when printing the results in
1 hexadecimal:
1
1 $ gawk 'BEGIN { printf "%#x\n", compl(42) }'
1 -| 0x1fffffffffffd5
1 $ gawk -M 'BEGIN { printf "%#x\n", compl(42) }'
1 -| 0xffffffffffffffd5
1
1 When using the '-M' option, under the hood, 'gawk' uses GNU MP
1 arbitrary precision integers which have at least 64 bits of precision.
1 When not using '-M', 'gawk' stores integral values in regular
1 double-precision floating point, which only maintain 53 bits of
1 precision. Furthermore, the GNU MP library treats (or at least seems to
1 treat) the leading bit as a sign bit; thus the result with '-M' in this
1 case is a negative number.
1
1 In short, using 'gawk' for any but the simplest kind of bitwise
1 operations is probably a bad idea; caveat emptor!
1
1 ---------- Footnotes ----------
1
1 (1) This example shows that zeros come in on the left side. For
1 'gawk', this is always true, but in some languages, it's possible to
1 have the left side fill with ones.
1
1 (2) If you don't understand this paragraph, the upshot is that 'gawk'
1 can only store a particular range of integer values; numbers outside
1 that range are reduced to fit within the range.
1