gawk: Profiling
1
1 12.5 Profiling Your 'awk' Programs
1 ==================================
1
1 You may produce execution traces of your 'awk' programs. This is done
1 by passing the option '--profile' to 'gawk'. When 'gawk' has finished
1 running, it creates a profile of your program in a file named
1 'awkprof.out'. Because it is profiling, it also executes up to 45%
1 slower than 'gawk' normally does.
1
1 As shown in the following example, the '--profile' option can be used
1 to change the name of the file where 'gawk' will write the profile:
1
1 gawk --profile=myprog.prof -f myprog.awk data1 data2
1
1 In the preceding example, 'gawk' places the profile in 'myprog.prof'
1 instead of in 'awkprof.out'.
1
1 Here is a sample session showing a simple 'awk' program, its input
1 data, and the results from running 'gawk' with the '--profile' option.
1 First, the 'awk' program:
1
1 BEGIN { print "First BEGIN rule" }
1
1 END { print "First END rule" }
1
1 /foo/ {
1 print "matched /foo/, gosh"
1 for (i = 1; i <= 3; i++)
1 sing()
1 }
1
1 {
1 if (/foo/)
1 print "if is true"
1 else
1 print "else is true"
1 }
1
1 BEGIN { print "Second BEGIN rule" }
1
1 END { print "Second END rule" }
1
1 function sing( dummy)
1 {
1 print "I gotta be me!"
1 }
1
1 Following is the input data:
1
1 foo
1 bar
1 baz
1 foo
1 junk
1
1 Here is the 'awkprof.out' that results from running the 'gawk'
1 profiler on this program and data (this example also illustrates that
1 'awk' programmers sometimes get up very early in the morning to work):
1
1 # gawk profile, created Mon Sep 29 05:16:21 2014
1
1 # BEGIN rule(s)
1
1 BEGIN {
1 1 print "First BEGIN rule"
1 }
1
1 BEGIN {
1 1 print "Second BEGIN rule"
1 }
1
1 # Rule(s)
1
1 5 /foo/ { # 2
1 2 print "matched /foo/, gosh"
1 6 for (i = 1; i <= 3; i++) {
1 6 sing()
1 }
1 }
1
1 5 {
1 5 if (/foo/) { # 2
1 2 print "if is true"
1 3 } else {
1 3 print "else is true"
1 }
1 }
1
1 # END rule(s)
1
1 END {
1 1 print "First END rule"
1 }
1
1 END {
1 1 print "Second END rule"
1 }
1
1
1 # Functions, listed alphabetically
1
1 6 function sing(dummy)
1 {
1 6 print "I gotta be me!"
1 }
1
1 This example illustrates many of the basic features of profiling
1 output. They are as follows:
1
1 * The program is printed in the order 'BEGIN' rules, 'BEGINFILE'
1 rules, pattern-action rules, 'ENDFILE' rules, 'END' rules, and
1 functions, listed alphabetically. Multiple 'BEGIN' and 'END' rules
1 retain their separate identities, as do multiple 'BEGINFILE' and
1 'ENDFILE' rules.
1
1 * Pattern-action rules have two counts. The first count, to the left
1 of the rule, shows how many times the rule's pattern was _tested_.
1 The second count, to the right of the rule's opening left brace in
1 a comment, shows how many times the rule's action was _executed_.
1 The difference between the two indicates how many times the rule's
1 pattern evaluated to false.
1
1 * Similarly, the count for an 'if'-'else' statement shows how many
1 times the condition was tested. To the right of the opening left
1 brace for the 'if''s body is a count showing how many times the
1 condition was true. The count for the 'else' indicates how many
1 times the test failed.
1
1 * The count for a loop header (such as 'for' or 'while') shows how
1 many times the loop test was executed. (Because of this, you can't
1 just look at the count on the first statement in a rule to
1 determine how many times the rule was executed. If the first
1 statement is a loop, the count is misleading.)
1
1 * For user-defined functions, the count next to the 'function'
1 keyword indicates how many times the function was called. The
1 counts next to the statements in the body show how many times those
1 statements were executed.
1
1 * The layout uses "K&R" style with TABs. Braces are used everywhere,
1 even when the body of an 'if', 'else', or loop is only a single
1 statement.
1
1 * Parentheses are used only where needed, as indicated by the
1 structure of the program and the precedence rules. For example,
1 '(3 + 5) * 4' means add three and five, then multiply the total by
1 four. However, '3 + 5 * 4' has no parentheses, and means '3 + (5 *
1 4)'.
1
1 * Parentheses are used around the arguments to 'print' and 'printf'
1 only when the 'print' or 'printf' statement is followed by a
1 redirection. Similarly, if the target of a redirection isn't a
1 scalar, it gets parenthesized.
1
1 * 'gawk' supplies leading comments in front of the 'BEGIN' and 'END'
1 rules, the 'BEGINFILE' and 'ENDFILE' rules, the pattern-action
1 rules, and the functions.
1
1 The profiled version of your program may not look exactly like what
1 you typed when you wrote it. This is because 'gawk' creates the
1 profiled version by "pretty-printing" its internal representation of the
1 program. The advantage to this is that 'gawk' can produce a standard
1 representation. Also, things such as:
1
1 /foo/
1
1 come out as:
1
1 /foo/ {
1 print $0
1 }
1
1 which is correct, but possibly unexpected.
1
1 Besides creating profiles when a program has completed, 'gawk' can
1 produce a profile while it is running. This is useful if your 'awk'
1 program goes into an infinite loop and you want to see what has been
1 executed. To use this feature, run 'gawk' with the '--profile' option
1 in the background:
1
1 $ gawk --profile -f myprog &
1 [1] 13992
1
1 The shell prints a job number and process ID number; in this case,
1 13992. Use the 'kill' command to send the 'USR1' signal to 'gawk':
1
1 $ kill -USR1 13992
1
1 As usual, the profiled version of the program is written to
1 'awkprof.out', or to a different file if one was specified with the
1 '--profile' option.
1
1 Along with the regular profile, as shown earlier, the profile file
1 includes a trace of any active functions:
1
1 # Function Call Stack:
1
1 # 3. baz
1 # 2. bar
1 # 1. foo
1 # -- main --
1
1 You may send 'gawk' the 'USR1' signal as many times as you like.
1 Each time, the profile and function call trace are appended to the
1 output profile file.
1
1 If you use the 'HUP' signal instead of the 'USR1' signal, 'gawk'
1 produces the profile and the function call trace and then exits.
1
1 When 'gawk' runs on MS-Windows systems, it uses the 'INT' and 'QUIT'
1 signals for producing the profile, and in the case of the 'INT' signal,
1 'gawk' exits. This is because these systems don't support the 'kill'
1 command, so the only signals you can deliver to a program are those
1 generated by the keyboard. The 'INT' signal is generated by the
1 'Ctrl-c' or 'Ctrl-BREAK' key, while the 'QUIT' signal is generated by
1 the 'Ctrl-\' key.
1
1 Finally, 'gawk' also accepts another option, '--pretty-print'. When
1 called this way, 'gawk' "pretty-prints" the program into 'awkprof.out',
1 without any execution counts.
1
1 NOTE: Once upon a time, the '--pretty-print' option would also run
1 your program. This is is no longer the case.
1
1 There is a significant difference between the output created when
1 profiling, and that created when pretty-printing. Pretty-printed output
1 preserves the original comments that were in the program, although their
1 placement may not correspond exactly to their original locations in the
1 source code.(1)
1
1 However, as a deliberate design decision, profiling output _omits_
1 the original program's comments. This allows you to focus on the
1 execution count data and helps you avoid the temptation to use the
1 profiler for pretty-printing.
1
1 Additionally, pretty-printed output does not have the leading
1 indentation that the profiling output does. This makes it easy to
1 pretty-print your code once development is completed, and then use the
1 result as the final version of your program.
1
1 Because the internal representation of your program is formatted to
1 recreate an 'awk' program, profiling and pretty-printing automatically
1 disable 'gawk''s default optimizations.
1
1 Pretty printing also preserves the original format of numeric
1 constants; if you used an octal or hexadecimal value in your source
1 code, it will appear that way in the output.
1
1 ---------- Footnotes ----------
1
1 (1) 'gawk' does the best it can to preserve the distinction between
1 comments at the end of a statement and comments on lines by themselves.
1 Due to implementation constraints, it does not always do so correctly,
1 particularly for 'switch' statements. The 'gawk' maintainers hope to
1 improve this in a subsequent release.
1