gawk: Profiling

1 
1 12.5 Profiling Your 'awk' Programs
1 ==================================
1 
1 You may produce execution traces of your 'awk' programs.  This is done
1 by passing the option '--profile' to 'gawk'.  When 'gawk' has finished
1 running, it creates a profile of your program in a file named
1 'awkprof.out'.  Because it is profiling, it also executes up to 45%
1 slower than 'gawk' normally does.
1 
1    As shown in the following example, the '--profile' option can be used
1 to change the name of the file where 'gawk' will write the profile:
1 
1      gawk --profile=myprog.prof -f myprog.awk data1 data2
1 
1 In the preceding example, 'gawk' places the profile in 'myprog.prof'
1 instead of in 'awkprof.out'.
1 
1    Here is a sample session showing a simple 'awk' program, its input
1 data, and the results from running 'gawk' with the '--profile' option.
1 First, the 'awk' program:
1 
1      BEGIN { print "First BEGIN rule" }
1 
1      END { print "First END rule" }
1 
1      /foo/ {
1          print "matched /foo/, gosh"
1          for (i = 1; i <= 3; i++)
1              sing()
1      }
1 
1      {
1          if (/foo/)
1              print "if is true"
1          else
1              print "else is true"
1      }
1 
1      BEGIN { print "Second BEGIN rule" }
1 
1      END { print "Second END rule" }
1 
1      function sing(    dummy)
1      {
1          print "I gotta be me!"
1      }
1 
1    Following is the input data:
1 
1      foo
1      bar
1      baz
1      foo
1      junk
1 
1    Here is the 'awkprof.out' that results from running the 'gawk'
1 profiler on this program and data (this example also illustrates that
1 'awk' programmers sometimes get up very early in the morning to work):
1 
1          # gawk profile, created Mon Sep 29 05:16:21 2014
1 
1          # BEGIN rule(s)
1 
1          BEGIN {
1       1          print "First BEGIN rule"
1          }
1 
1          BEGIN {
1       1          print "Second BEGIN rule"
1          }
1 
1          # Rule(s)
1 
1       5  /foo/ { # 2
1       2          print "matched /foo/, gosh"
1       6          for (i = 1; i <= 3; i++) {
1       6                  sing()
1                  }
1          }
1 
1       5  {
1       5          if (/foo/) { # 2
1       2                  print "if is true"
1       3          } else {
1       3                  print "else is true"
1                  }
1          }
1 
1          # END rule(s)
1 
1          END {
1       1          print "First END rule"
1          }
1 
1          END {
1       1          print "Second END rule"
1          }
1 
1 
1          # Functions, listed alphabetically
1 
1       6  function sing(dummy)
1          {
1       6          print "I gotta be me!"
1          }
1 
1    This example illustrates many of the basic features of profiling
1 output.  They are as follows:
1 
1    * The program is printed in the order 'BEGIN' rules, 'BEGINFILE'
1      rules, pattern-action rules, 'ENDFILE' rules, 'END' rules, and
1      functions, listed alphabetically.  Multiple 'BEGIN' and 'END' rules
1      retain their separate identities, as do multiple 'BEGINFILE' and
1      'ENDFILE' rules.
1 
1    * Pattern-action rules have two counts.  The first count, to the left
1      of the rule, shows how many times the rule's pattern was _tested_.
1      The second count, to the right of the rule's opening left brace in
1      a comment, shows how many times the rule's action was _executed_.
1      The difference between the two indicates how many times the rule's
1      pattern evaluated to false.
1 
1    * Similarly, the count for an 'if'-'else' statement shows how many
1      times the condition was tested.  To the right of the opening left
1      brace for the 'if''s body is a count showing how many times the
1      condition was true.  The count for the 'else' indicates how many
1      times the test failed.
1 
1    * The count for a loop header (such as 'for' or 'while') shows how
1      many times the loop test was executed.  (Because of this, you can't
1      just look at the count on the first statement in a rule to
1      determine how many times the rule was executed.  If the first
1      statement is a loop, the count is misleading.)
1 
1    * For user-defined functions, the count next to the 'function'
1      keyword indicates how many times the function was called.  The
1      counts next to the statements in the body show how many times those
1      statements were executed.
1 
1    * The layout uses "K&R" style with TABs.  Braces are used everywhere,
1      even when the body of an 'if', 'else', or loop is only a single
1      statement.
1 
1    * Parentheses are used only where needed, as indicated by the
1      structure of the program and the precedence rules.  For example,
1      '(3 + 5) * 4' means add three and five, then multiply the total by
1      four.  However, '3 + 5 * 4' has no parentheses, and means '3 + (5 *
1      4)'.
1 
1    * Parentheses are used around the arguments to 'print' and 'printf'
1      only when the 'print' or 'printf' statement is followed by a
1      redirection.  Similarly, if the target of a redirection isn't a
1      scalar, it gets parenthesized.
1 
1    * 'gawk' supplies leading comments in front of the 'BEGIN' and 'END'
1      rules, the 'BEGINFILE' and 'ENDFILE' rules, the pattern-action
1      rules, and the functions.
1 
1    The profiled version of your program may not look exactly like what
1 you typed when you wrote it.  This is because 'gawk' creates the
1 profiled version by "pretty-printing" its internal representation of the
1 program.  The advantage to this is that 'gawk' can produce a standard
1 representation.  Also, things such as:
1 
1      /foo/
1 
1 come out as:
1 
1      /foo/   {
1          print $0
1      }
1 
1 which is correct, but possibly unexpected.
1 
1    Besides creating profiles when a program has completed, 'gawk' can
1 produce a profile while it is running.  This is useful if your 'awk'
1 program goes into an infinite loop and you want to see what has been
1 executed.  To use this feature, run 'gawk' with the '--profile' option
1 in the background:
1 
1      $ gawk --profile -f myprog &
1      [1] 13992
1 
1 The shell prints a job number and process ID number; in this case,
1 13992.  Use the 'kill' command to send the 'USR1' signal to 'gawk':
1 
1      $ kill -USR1 13992
1 
1 As usual, the profiled version of the program is written to
1 'awkprof.out', or to a different file if one was specified with the
1 '--profile' option.
1 
1    Along with the regular profile, as shown earlier, the profile file
1 includes a trace of any active functions:
1 
1      # Function Call Stack:
1 
1      #   3. baz
1      #   2. bar
1      #   1. foo
1      # -- main --
1 
1    You may send 'gawk' the 'USR1' signal as many times as you like.
1 Each time, the profile and function call trace are appended to the
1 output profile file.
1 
1    If you use the 'HUP' signal instead of the 'USR1' signal, 'gawk'
1 produces the profile and the function call trace and then exits.
1 
1    When 'gawk' runs on MS-Windows systems, it uses the 'INT' and 'QUIT'
1 signals for producing the profile, and in the case of the 'INT' signal,
1 'gawk' exits.  This is because these systems don't support the 'kill'
1 command, so the only signals you can deliver to a program are those
1 generated by the keyboard.  The 'INT' signal is generated by the
1 'Ctrl-c' or 'Ctrl-BREAK' key, while the 'QUIT' signal is generated by
1 the 'Ctrl-\' key.
1 
1    Finally, 'gawk' also accepts another option, '--pretty-print'.  When
1 called this way, 'gawk' "pretty-prints" the program into 'awkprof.out',
1 without any execution counts.
1 
1      NOTE: Once upon a time, the '--pretty-print' option would also run
1      your program.  This is is no longer the case.
1 
1    There is a significant difference between the output created when
1 profiling, and that created when pretty-printing.  Pretty-printed output
1 preserves the original comments that were in the program, although their
1 placement may not correspond exactly to their original locations in the
1 source code.(1)
1 
1    However, as a deliberate design decision, profiling output _omits_
1 the original program's comments.  This allows you to focus on the
1 execution count data and helps you avoid the temptation to use the
1 profiler for pretty-printing.
1 
1    Additionally, pretty-printed output does not have the leading
1 indentation that the profiling output does.  This makes it easy to
1 pretty-print your code once development is completed, and then use the
1 result as the final version of your program.
1 
1    Because the internal representation of your program is formatted to
1 recreate an 'awk' program, profiling and pretty-printing automatically
1 disable 'gawk''s default optimizations.
1 
1    Pretty printing also preserves the original format of numeric
1 constants; if you used an octal or hexadecimal value in your source
1 code, it will appear that way in the output.
1 
1    ---------- Footnotes ----------
1 
1    (1) 'gawk' does the best it can to preserve the distinction between
1 comments at the end of a statement and comments on lines by themselves.
1 Due to implementation constraints, it does not always do so correctly,
1 particularly for 'switch' statements.  The 'gawk' maintainers hope to
1 improve this in a subsequent release.
1