gawk: General Data Types
1
1 16.4.2 General-Purpose Data Types
1 ---------------------------------
1
1 I have a true love/hate relationship with unions.
1 -- _Arnold Robbins_
1
1 That's the thing about unions: the compiler will arrange things so
1 they can accommodate both love and hate.
1 -- _Chet Ramey_
1
1 The extension API defines a number of simple types and structures for
1 general-purpose use. Additional, more specialized, data structures are
1 introduced in subsequent minor nodes, together with the functions that
1 use them.
1
1 The general-purpose types and structures are as follows:
1
1 'typedef void *awk_ext_id_t;'
1 A value of this type is received from 'gawk' when an extension is
1 loaded. That value must then be passed back to 'gawk' as the first
1 parameter of each API function.
1
1 '#define awk_const ...'
1 This macro expands to 'const' when compiling an extension, and to
1 nothing when compiling 'gawk' itself. This makes certain fields in
1 the API data structures unwritable from extension code, while
1 allowing 'gawk' to use them as it needs to.
1
1 'typedef enum awk_bool {'
1 ' awk_false = 0,'
1 ' awk_true'
1 '} awk_bool_t;'
1 A simple Boolean type.
1
1 'typedef struct awk_string {'
1 ' char *str; /* data */'
1 ' size_t len; /* length thereof, in chars */'
1 '} awk_string_t;'
1 This represents a mutable string. 'gawk' owns the memory pointed
1 to if it supplied the value. Otherwise, it takes ownership of the
1 memory pointed to. _Such memory must come from calling one of the
1 'gawk_malloc()', 'gawk_calloc()', or 'gawk_realloc()' functions!_
1
1 As mentioned earlier, strings are maintained using the current
1 multibyte encoding.
1
1 'typedef enum {'
1 ' AWK_UNDEFINED,'
1 ' AWK_NUMBER,'
1 ' AWK_STRING,'
1 ' AWK_REGEX,'
1 ' AWK_STRNUM,'
1 ' AWK_ARRAY,'
1 ' AWK_SCALAR, /* opaque access to a variable */'
1 ' AWK_VALUE_COOKIE /* for updating a previously created value */'
1 '} awk_valtype_t;'
1 This 'enum' indicates the type of a value. It is used in the
1 following 'struct'.
1
1 'typedef struct awk_value {'
1 ' awk_valtype_t val_type;'
1 ' union {'
1 ' awk_string_t s;'
1 ' awknum_t n;'
1 ' awk_array_t a;'
1 ' awk_scalar_t scl;'
1 ' awk_value_cookie_t vc;'
1 ' } u;'
1 '} awk_value_t;'
1 An "'awk' value." The 'val_type' member indicates what kind of
1 value the 'union' holds, and each member is of the appropriate
1 type.
1
1 '#define str_value u.s'
1 '#define strnum_value str_value'
1 '#define regex_value str_value'
1 '#define num_value u.n.d'
1 '#define num_type u.n.type'
1 '#define num_ptr u.n.ptr'
1 '#define array_cookie u.a'
1 '#define scalar_cookie u.scl'
1 '#define value_cookie u.vc'
1 Using these macros makes accessing the fields of the 'awk_value_t'
1 more readable.
1
1 'enum AWK_NUMBER_TYPE {'
1 ' AWK_NUMBER_TYPE_DOUBLE,'
1 ' AWK_NUMBER_TYPE_MPFR,'
1 ' AWK_NUMBER_TYPE_MPZ'
1 '};'
1 This 'enum' is used in the following structure for defining the
1 type of numeric value that is being worked with. It is declared at
1 the top level of the file so that it works correctly for C++ as
1 well as for C.
1
1 'typedef struct awk_number {'
1 ' double d;'
1 ' enum AWK_NUMBER_TYPE type;'
1 ' void *ptr;'
1 '} awk_number_t;'
1 This represents a numeric value. Internally, 'gawk' stores every
1 number as either a C 'double', a GMP integer, or an MPFR
1 arbitrary-precision floating-point value. In order to allow
1 extensions to also support GMP and MPFR values, numeric values are
1 passed in this structure.
1
1 The double-precision 'd' element is always populated in data
1 received from 'gawk'. In addition, by examining the 'type' member,
1 an extension can determine if the 'ptr' member is either a GMP
1 integer (type 'mpz_ptr'), or an MPFR floating-point value (type
1 'mpfr_ptr_t'), and cast it appropriately.
1
1 'typedef void *awk_scalar_t;'
1 Scalars can be represented as an opaque type. These values are
1 obtained from 'gawk' and then passed back into it. This is
1 discussed in a general fashion in the text following this list, and
1 in more detail in ⇒Symbol table by cookie.
1
1 'typedef void *awk_value_cookie_t;'
1 A "value cookie" is an opaque type representing a cached value.
1 This is also discussed in a general fashion in the text following
1 this list, and in more detail in ⇒Cached values.
1
1 Scalar values in 'awk' are numbers, strings, strnums, or typed
1 regexps. The 'awk_value_t' struct represents values. The 'val_type'
1 member indicates what is in the 'union'.
1
1 Representing numbers is easy--the API uses a C 'double'. Strings
1 require more work. Because 'gawk' allows embedded NUL bytes in string
1 values, a string must be represented as a pair containing a data pointer
1 and length. This is the 'awk_string_t' type.
1
1 A strnum (numeric string) value is represented as a string and
1 consists of user input data that appears to be numeric. When an
1 extension creates a strnum value, the result is a string flagged as user
1 input. Subsequent parsing by 'gawk' then determines whether it looks
1 like a number and should be treated as a strnum, or as a regular string.
1
1 This is useful in cases where an extension function would like to do
1 something comparable to the 'split()' function which sets the strnum
1 attribute on the array elements it creates. For example, an extension
1 that implements CSV splitting would want to use this feature. This is
1 also useful for a function that retrieves a data item from a database.
1 The PostgreSQL 'PQgetvalue()' function, for example, returns a string
1 that may be numeric or textual depending on the contents.
1
1 Typed regexp values (⇒Strong Regexp Constants) are not of much
1 use to extension functions. Extension functions can tell that they've
1 received them, and create them for scalar values. Otherwise, they can
1 examine the text of the regexp through 'regex_value.str' and
1 'regex_value.len'.
1
1 Identifiers (i.e., the names of global variables) can be associated
1 with either scalar values or with arrays. In addition, 'gawk' provides
1 true arrays of arrays, where any given array element can itself be an
11 array. Discussion of arrays is delayed until ⇒Array
Manipulation.
1
1 The various macros listed earlier make it easier to use the elements
1 of the 'union' as if they were fields in a 'struct'; this is a common
1 coding practice in C. Such code is easier to write and to read, but it
1 remains _your_ responsibility to make sure that the 'val_type' member
1 correctly reflects the type of the value in the 'awk_value_t' struct.
1
1 Conceptually, the first three members of the 'union' (number, string,
1 and array) are all that is needed for working with 'awk' values.
1 However, because the API provides routines for accessing and changing
1 the value of a global scalar variable only by using the variable's name,
1 there is a performance penalty: 'gawk' must find the variable each time
1 it is accessed and changed. This turns out to be a real issue, not just
1 a theoretical one.
1
1 Thus, if you know that your extension will spend considerable time
1 reading and/or changing the value of one or more scalar variables, you
1 can obtain a "scalar cookie"(1) object for that variable, and then use
1 the cookie for getting the variable's value or for changing the
1 variable's value. The 'awk_scalar_t' type holds a scalar cookie, and
1 the 'scalar_cookie' macro provides access to the value of that type in
1 the 'awk_value_t' struct. Given a scalar cookie, 'gawk' can directly
1 retrieve or modify the value, as required, without having to find it
1 first.
1
1 The 'awk_value_cookie_t' type and 'value_cookie' macro are similar.
1 If you know that you wish to use the same numeric or string _value_ for
1 one or more variables, you can create the value once, retaining a "value
1 cookie" for it, and then pass in that value cookie whenever you wish to
1 set the value of a variable. This saves storage space within the
1 running 'gawk' process and reduces the time needed to create the value.
1
1 ---------- Footnotes ----------
1
1 (1) See the "cookie" entry in the Jargon file
1 (http://catb.org/jargon/html/C/cookie.html) for a definition of
1 "cookie", and the "magic cookie" entry in the Jargon file
1 (http://catb.org/jargon/html/M/magic-cookie.html) for a nice example.
1 See also the entry for "Cookie" in the ⇒Glossary.
1