gprof: Sampling Error
1
1 6.1 Statistical Sampling Error
1 ==============================
1
1 The run-time figures that 'gprof' gives you are based on a sampling
1 process, so they are subject to statistical inaccuracy. If a function
1 runs only a small amount of time, so that on the average the sampling
1 process ought to catch that function in the act only once, there is a
1 pretty good chance it will actually find that function zero times, or
1 twice.
1
1 By contrast, the number-of-calls and basic-block figures are derived
1 by counting, not sampling. They are completely accurate and will not
1 vary from run to run if your program is deterministic and single
1 threaded. In multi-threaded applications, or single threaded
1 applications that link with multi-threaded libraries, the counts are
1 only deterministic if the counting function is thread-safe. (Note:
1 beware that the mcount counting function in glibc is _not_ thread-safe).
1 ⇒Implementation of Profiling Implementation.
1
1 The "sampling period" that is printed at the beginning of the flat
1 profile says how often samples are taken. The rule of thumb is that a
1 run-time figure is accurate if it is considerably bigger than the
1 sampling period.
1
1 The actual amount of error can be predicted. For N samples, the
1 _expected_ error is the square-root of N. For example, if the sampling
1 period is 0.01 seconds and 'foo''s run-time is 1 second, N is 100
1 samples (1 second/0.01 seconds), sqrt(N) is 10 samples, so the expected
1 error in 'foo''s run-time is 0.1 seconds (10*0.01 seconds), or ten
1 percent of the observed value. Again, if the sampling period is 0.01
1 seconds and 'bar''s run-time is 100 seconds, N is 10000 samples, sqrt(N)
1 is 100 samples, so the expected error in 'bar''s run-time is 1 second,
1 or one percent of the observed value. It is likely to vary this much
1 _on the average_ from one profiling run to the next. (_Sometimes_ it
1 will vary more.)
1
1 This does not mean that a small run-time figure is devoid of
1 information. If the program's _total_ run-time is large, a small
1 run-time for one function does tell you that that function used an
1 insignificant fraction of the whole program's time. Usually this means
1 it is not worth optimizing.
1
1 One way to get more accuracy is to give your program more (but
1 similar) input data so it will take longer. Another way is to combine
1 the data from several runs, using the '-s' option of 'gprof'. Here is
1 how:
1
1 1. Run your program once.
1
1 2. Issue the command 'mv gmon.out gmon.sum'.
1
1 3. Run your program again, the same as before.
1
1 4. Merge the new data in 'gmon.out' into 'gmon.sum' with this command:
1
1 gprof -s EXECUTABLE-FILE gmon.out gmon.sum
1
1 5. Repeat the last two steps as often as you wish.
1
1 6. Analyze the cumulative data using this command:
1
1 gprof EXECUTABLE-FILE gmon.sum > OUTPUT-FILE
1