gawk: Definition Syntax

1 
1 9.2.1 Function Definition Syntax
1 --------------------------------
1 
1      It's entirely fair to say that the awk syntax for local variable
1      definitions is appallingly awful.
1                          -- _Brian Kernighan_
1 
1    Definitions of functions can appear anywhere between the rules of an
1 'awk' program.  Thus, the general form of an 'awk' program is extended
1 to include sequences of rules _and_ user-defined function definitions.
1 There is no need to put the definition of a function before all uses of
1 the function.  This is because 'awk' reads the entire program before
1 starting to execute any of it.
1 
1    The definition of a function named NAME looks like this:
1 
1      'function' NAME'('[PARAMETER-LIST]')'
1      '{'
1           BODY-OF-FUNCTION
1      '}'
1 
1 Here, NAME is the name of the function to define.  A valid function name
1 is like a valid variable name: a sequence of letters, digits, and
1 underscores that doesn't start with a digit.  Here too, only the 52
1 upper- and lowercase English letters may be used in a function name.
1 Within a single 'awk' program, any particular name can only be used as a
1 variable, array, or function.
1 
1    PARAMETER-LIST is an optional list of the function's arguments and
1 local variable names, separated by commas.  When the function is called,
1 the argument names are used to hold the argument values given in the
1 call.
1 
1    A function cannot have two parameters with the same name, nor may it
1 have a parameter with the same name as the function itself.
1 
1      CAUTION: According to the POSIX standard, function parameters
1      cannot have the same name as one of the special predefined
1      variables (⇒Built-in Variables), nor may a function
1      parameter have the same name as another function.
1 
1      Not all versions of 'awk' enforce these restrictions.  'gawk'
11      always enforces the first restriction.  With '--posix' (⇒
      Options), it also enforces the second restriction.
1 
1    Local variables act like the empty string if referenced where a
1 string value is required, and like zero if referenced where a numeric
1 value is required.  This is the same as the behavior of regular
1 variables that have never been assigned a value.  (There is more to
1 understand about local variables; ⇒Dynamic Typing.)
1 
1    The BODY-OF-FUNCTION consists of 'awk' statements.  It is the most
1 important part of the definition, because it says what the function
1 should actually _do_.  The argument names exist to give the body a way
1 to talk about the arguments; local variables exist to give the body
1 places to keep temporary values.
1 
1    Argument names are not distinguished syntactically from local
1 variable names.  Instead, the number of arguments supplied when the
1 function is called determines how many argument variables there are.
1 Thus, if three argument values are given, the first three names in
1 PARAMETER-LIST are arguments and the rest are local variables.
1 
1    It follows that if the number of arguments is not the same in all
1 calls to the function, some of the names in PARAMETER-LIST may be
1 arguments on some occasions and local variables on others.  Another way
1 to think of this is that omitted arguments default to the null string.
1 
1    Usually when you write a function, you know how many names you intend
1 to use for arguments and how many you intend to use as local variables.
1 It is conventional to place some extra space between the arguments and
1 the local variables, in order to document how your function is supposed
1 to be used.
1 
1    During execution of the function body, the arguments and local
1 variable values hide, or "shadow", any variables of the same names used
1 in the rest of the program.  The shadowed variables are not accessible
1 in the function definition, because there is no way to name them while
1 their names have been taken away for the arguments and local variables.
1 All other variables used in the 'awk' program can be referenced or set
1 normally in the function's body.
1 
1    The arguments and local variables last only as long as the function
1 body is executing.  Once the body finishes, you can once again access
1 the variables that were shadowed while the function was running.
1 
1    The function body can contain expressions that call functions.  They
1 can even call this function, either directly or by way of another
1 function.  When this happens, we say the function is "recursive".  The
1 act of a function calling itself is called "recursion".
1 
1    All the built-in functions return a value to their caller.
1 User-defined functions can do so also, using the 'return' statement,
1 which is described in detail in ⇒Return Statement.  Many of the
1 subsequent examples in this minor node use the 'return' statement.
1 
1    In many 'awk' implementations, including 'gawk', the keyword
1 'function' may be abbreviated 'func'.  (c.e.)  However, POSIX only
1 specifies the use of the keyword 'function'.  This actually has some
11 practical implications.  If 'gawk' is in POSIX-compatibility mode (⇒
 Options), then the following statement does _not_ define a function:
1 
1      func foo() { a = sqrt($1) ; print a }
1 
1 Instead, it defines a rule that, for each record, concatenates the value
1 of the variable 'func' with the return value of the function 'foo'.  If
1 the resulting string is non-null, the action is executed.  This is
1 probably not what is desired.  ('awk' accepts this input as
1 syntactically valid, because functions may be used before they are
1 defined in 'awk' programs.(1))
1 
1    To ensure that your 'awk' programs are portable, always use the
1 keyword 'function' when defining a function.
1 
1    ---------- Footnotes ----------
1 
1    (1) This program won't actually run, because 'foo()' is undefined.
1