autoconf: Limitations of Usual Tools

1 
1 11.15 Limitations of Usual Tools
1 ================================
1 
1 The small set of tools you can expect to find on any machine can still
1 include some limitations you should be aware of.
1 
1 `awk'
1      Don't leave white space before the opening parenthesis in a user
1      function call.  Posix does not allow this and GNU Awk rejects it:
1 
1           $ gawk 'function die () { print "Aaaaarg!"  }
1                   BEGIN { die () }'
1           gawk: cmd. line:2:         BEGIN { die () }
1           gawk: cmd. line:2:                      ^ parse error
1           $ gawk 'function die () { print "Aaaaarg!"  }
1                   BEGIN { die() }'
1           Aaaaarg!
1 
1      Posix says that if a program contains only `BEGIN' actions, and
1      contains no instances of `getline', then the program merely
1      executes the actions without reading input.  However, traditional
1      Awk implementations (such as Solaris 10 `awk') read and discard
1      input in this case.  Portable scripts can redirect input from
1      `/dev/null' to work around the problem.  For example:
1 
1           awk 'BEGIN {print "hello world"}' </dev/null
1 
1      Posix says that in an `END' action, `$NF' (and presumably, `$1')
1      retain their value from the last record read, if no intervening
1      `getline' occurred.  However, some implementations (such as
1      Solaris 10 `/usr/bin/awk', `nawk', or Darwin `awk') reset these
1      variables.  A workaround is to use an intermediate variable prior
1      to the `END' block.  For example:
1 
1           $ cat end.awk
1           { tmp = $1 }
1           END { print "a", $1, $NF, "b", tmp }
1           $ echo 1 | awk -f end.awk
1           a   b 1
1           $ echo 1 | gawk -f end.awk
1           a 1 1 b 1
1 
1      If you want your program to be deterministic, don't depend on `for'
1      on arrays:
1 
1           $ cat for.awk
1           END {
1             arr["foo"] = 1
1             arr["bar"] = 1
1             for (i in arr)
1               print i
1           }
1           $ gawk -f for.awk </dev/null
1           foo
1           bar
1           $ nawk -f for.awk </dev/null
1           bar
1           foo
1 
1      Some Awk implementations, such as HP-UX 11.0's native one,
1      mishandle anchors:
1 
1           $ echo xfoo | $AWK '/foo|^bar/ { print }'
1           $ echo bar | $AWK '/foo|^bar/ { print }'
1           bar
1           $ echo xfoo | $AWK '/^bar|foo/ { print }'
1           xfoo
1           $ echo bar | $AWK '/^bar|foo/ { print }'
1           bar
1 
1      Either do not depend on such patterns (i.e., use `/^(.*foo|bar)/',
1      or use a simple test to reject such implementations.
1 
1      On `ia64-hp-hpux11.23', Awk mishandles `printf' conversions after
1      `%u':
1 
1           $ awk 'BEGIN { printf "%u %d\n", 0, -1 }'
1           0 0
1 
1      AIX version 5.2 has an arbitrary limit of 399 on the length of
1      regular expressions and literal strings in an Awk program.
1 
1      Traditional Awk implementations derived from Unix version 7, such
1      as Solaris `/bin/awk', have many limitations and do not conform to
1      Posix.  Nowadays `AC_PROG_AWK' (⇒Particular Programs) finds
1      you an Awk that doesn't have these problems, but if for some
1      reason you prefer not to use `AC_PROG_AWK' you may need to address
11      them.  For more detailed descriptions, see ⇒`awk' language
      history (gawk)Language History.
1 
1      Traditional Awk does not support multidimensional arrays or
1      user-defined functions.
1 
1      Traditional Awk does not support the `-v' option.  You can use
1      assignments after the program instead, e.g., `$AWK '{print v $1}'
1      v=x'; however, don't forget that such assignments are not
1      evaluated until they are encountered (e.g., after any `BEGIN'
1      action).
1 
1      Traditional Awk does not support the keywords `delete' or `do'.
1 
1      Traditional Awk does not support the expressions `A?B:C', `!A',
1      `A^B', or `A^=B'.
1 
1      Traditional Awk does not support the predefined `CONVFMT' or
1      `ENVIRON' variables.
1 
1      Traditional Awk supports only the predefined functions `exp',
1      `index', `int', `length', `log', `split', `sprintf', `sqrt', and
1      `substr'.
1 
1      Traditional Awk `getline' is not at all compatible with Posix;
1      avoid it.
1 
1      Traditional Awk has `for (i in a) ...' but no other uses of the
1      `in' keyword.  For example, it lacks `if (i in a) ...'.
1 
1      In code portable to both traditional and modern Awk, `FS' must be a
1      string containing just one ordinary character, and similarly for
1      the field-separator argument to `split'.
1 
1      Traditional Awk has a limit of 99 fields in a record.  Since some
1      Awk implementations, like Tru64's, split the input even if you
1      don't refer to any field in the script, to circumvent this
1      problem, set `FS' to an unusual character and use `split'.
1 
1      Traditional Awk has a limit of at most 99 bytes in a number
1      formatted by `OFMT'; for example, `OFMT="%.300e"; print 0.1;'
1      typically dumps core.
1 
1      The original version of Awk had a limit of at most 99 bytes per
1      `split' field, 99 bytes per `substr' substring, and 99 bytes per
1      run of non-special characters in a `printf' format, but these bugs
1      have been fixed on all practical hosts that we know of.
1 
1      HP-UX 11.00 and IRIX 6.5 Awk require that input files have a line
1      length of at most 3070 bytes.
1 
1 `basename'
1      Not all hosts have a working `basename'.  You can use `expr'
1      instead.
1 
1 `cat'
1      Don't rely on any option.
1 
1 `cc'
1      The command `cc -c foo.c' traditionally produces an object file
1      named `foo.o'.  Most compilers allow `-c' to be combined with `-o'
1      to specify a different object file name, but Posix does not
1      require this combination and a few compilers lack support for it.
1      ⇒C Compiler, for how GNU Make tests for this feature with
1      `AC_PROG_CC_C_O'.
1 
1      When a compilation such as `cc -o foo foo.c' fails, some compilers
1      (such as CDS on Reliant Unix) leave a `foo.o'.
1 
1      HP-UX `cc' doesn't accept `.S' files to preprocess and assemble.
1      `cc -c foo.S' appears to succeed, but in fact does nothing.
1 
1      The default executable, produced by `cc foo.c', can be
1 
1         * `a.out' -- usual Posix convention.
1 
1         * `b.out' -- i960 compilers (including `gcc').
1 
1         * `a.exe' -- DJGPP port of `gcc'.
1 
1         * `a_out.exe' -- GNV `cc' wrapper for DEC C on OpenVMS.
1 
1         * `foo.exe' -- various MS-DOS compilers.
1 
1      The C compiler's traditional name is `cc', but other names like
1      `gcc' are common.  Posix 1003.1-2001 specifies the name `c99', but
1      older Posix editions specified `c89' and anyway these standard
1      names are rarely used in practice.  Typically the C compiler is
1      invoked from makefiles that use `$(CC)', so the value of the `CC'
1      make variable selects the compiler name.
1 
1 `chgrp'
1 `chown'
1      It is not portable to change a file's group to a group that the
1      owner does not belong to.
1 
1 `chmod'
1      Avoid usages like `chmod -w file'; use `chmod a-w file' instead,
1      for two reasons.  First, plain `-w' does not necessarily make the
1      file unwritable, since it does not affect mode bits that
1      correspond to bits in the file mode creation mask.  Second, Posix
1      says that the `-w' might be interpreted as an
1      implementation-specific option, not as a mode; Posix suggests
1      using `chmod -- -w file' to avoid this confusion, but unfortunately
1      `--' does not work on some older hosts.
1 
1 `cmp'
1      `cmp' performs a raw data comparison of two files, while `diff'
1      compares two text files.  Therefore, if you might compare DOS
1      files, even if only checking whether two files are different, use
1      `diff' to avoid spurious differences due to differences of newline
1      encoding.
1 
1 `cp'
1      Avoid the `-r' option, since Posix 1003.1-2004 marks it as
1      obsolescent and its behavior on special files is
1      implementation-defined.  Use `-R' instead.  On GNU hosts the two
1      options are equivalent, but on Solaris hosts (for example) `cp -r'
1      reads from pipes instead of replicating them.  AIX 5.3 `cp -R' may
1      corrupt its own memory with some directory hierarchies and error
1      out or dump core:
1 
1           mkdir -p 12345678/12345678/12345678/12345678
1           touch 12345678/12345678/x
1           cp -R 12345678 t
1           cp: 0653-440 12345678/12345678/: name too long.
1 
1      Some `cp' implementations (e.g., BSD/OS 4.2) do not allow trailing
1      slashes at the end of nonexistent destination directories.  To
1      avoid this problem, omit the trailing slashes.  For example, use
1      `cp -R source /tmp/newdir' rather than `cp -R source /tmp/newdir/'
1      if `/tmp/newdir' does not exist.
1 
1      The ancient SunOS 4 `cp' does not support `-f', although its `mv'
1      does.
1 
1      Traditionally, file timestamps had 1-second resolution, and `cp
1      -p' copied the timestamps exactly.  However, many modern file
1      systems have timestamps with 1-nanosecond resolution.
1      Unfortunately, some older `cp -p' implementations truncate
1      timestamps when copying files, which can cause the destination
1      file to appear to be older than the source.  The exact amount of
1      truncation depends on the resolution of the system calls that `cp'
1      uses.  Traditionally this was `utime', which has 1-second
1      resolution.  Less-ancient `cp' implementations such as GNU Core
1      Utilities 5.0.91 (2003) use `utimes', which has 1-microsecond
1      resolution.  Modern implementations such as GNU Core Utilities
1      6.12 (2008) can set timestamps to the full nanosecond resolution,
1      using the modern system calls `futimens' and `utimensat' when they
1      are available.  As of 2011, though, many platforms do not yet
1      fully support these new system calls.
1 
1      Bob Proulx notes that `cp -p' always _tries_ to copy ownerships.
1      But whether it actually does copy ownerships or not is a system
1      dependent policy decision implemented by the kernel.  If the
1      kernel allows it then it happens.  If the kernel does not allow it
1      then it does not happen.  It is not something `cp' itself has
1      control over.
1 
1      In Unix System V any user can chown files to any other user, and
1      System V also has a non-sticky `/tmp'.  That probably derives from
1      the heritage of System V in a business environment without hostile
1      users.  BSD changed this to be a more secure model where only root
1      can `chown' files and a sticky `/tmp' is used.  That undoubtedly
1      derives from the heritage of BSD in a campus environment.
1 
1      GNU/Linux and Solaris by default follow BSD, but can be configured
1      to allow a System V style `chown'.  On the other hand, HP-UX
1      follows System V, but can be configured to use the modern security
1      model and disallow `chown'.  Since it is an
1      administrator-configurable parameter you can't use the name of the
1      kernel as an indicator of the behavior.
1 
1 `date'
1      Some versions of `date' do not recognize special `%' directives,
1      and unfortunately, instead of complaining, they just pass them
1      through, and exit with success:
1 
1           $ uname -a
1           OSF1 medusa.sis.pasteur.fr V5.1 732 alpha
1           $ date "+%s"
1           %s
1 
1 `diff'
1      Option `-u' is nonportable.
1 
1      Some implementations, such as Tru64's, fail when comparing to
1      `/dev/null'.  Use an empty file instead.
1 
1 `dirname'
1      Not all hosts have a working `dirname', and you should instead use
1      `AS_DIRNAME' (⇒Programming in M4sh).  For example:
1 
1           dir=`dirname "$file"`       # This is not portable.
1           dir=`AS_DIRNAME(["$file"])` # This is more portable.
1 
1 `egrep'
1      Posix 1003.1-2001 no longer requires `egrep', but many hosts do
1      not yet support the Posix replacement `grep -E'.  Also, some
1      traditional implementations do not work on long input lines.  To
1      work around these problems, invoke `AC_PROG_EGREP' and then use
1      `$EGREP'.
1 
1      Portable extended regular expressions should use `\' only to escape
1      characters in the string `$()*+.?[\^{|'.  For example, `\}' is not
1      portable, even though it typically matches `}'.
1 
1      The empty alternative is not portable.  Use `?' instead.  For
1      instance with Digital Unix v5.0:
1 
1           > printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$'
1           |foo
1           > printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$'
1           bar|
1           > printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$'
1           foo
1           |bar
1 
11      `$EGREP' also suffers the limitations of `grep' (⇒Limitations
      of Usual Tools grep.).
1 
1 `expr'
1      Not all implementations obey the Posix rule that `--' separates
1      options from arguments; likewise, not all implementations provide
1      the extension to Posix that the first argument can be treated as
1      part of a valid expression rather than an invalid option if it
1      begins with `-'.  When performing arithmetic, use `expr 0 + $var'
1      if `$var' might be a negative number, to keep `expr' from
1      interpreting it as an option.
1 
1      No `expr' keyword starts with `X', so use `expr X"WORD" :
1      'XREGEX'' to keep `expr' from misinterpreting WORD.
1 
1      Don't use `length', `substr', `match' and `index'.
1 
1 `expr' (`|')
1      You can use `|'.  Although Posix does require that `expr '''
1      return the empty string, it does not specify the result when you
1      `|' together the empty string (or zero) with the empty string.  For
1      example:
1 
1           expr '' \| ''
1 
1      Posix 1003.2-1992 returns the empty string for this case, but
1      traditional Unix returns `0' (Solaris is one such example).  In
1      Posix 1003.1-2001, the specification was changed to match
1      traditional Unix's behavior (which is bizarre, but it's too late
1      to fix this).  Please note that the same problem does arise when
1      the empty string results from a computation, as in:
1 
1           expr bar : foo \| foo : bar
1 
1      Avoid this portability problem by avoiding the empty string.
1 
1 `expr' (`:')
1      Portable `expr' regular expressions should use `\' to escape only
1      characters in the string `$()*.0123456789[\^n{}'.  For example,
1      alternation, `\|', is common but Posix does not require its
1      support, so it should be avoided in portable scripts.  Similarly,
1      `\+' and `\?' should be avoided.
1 
1      Portable `expr' regular expressions should not begin with `^'.
1      Patterns are automatically anchored so leading `^' is not needed
1      anyway.
1 
1      On the other hand, the behavior of the `$' anchor is not portable
1      on multi-line strings.  Posix is ambiguous whether the anchor
1      applies to each line, as was done in older versions of the GNU
1      Core Utilities, or whether it applies only to the end of the
1      overall string, as in Coreutils 6.0 and most other implementations.
1 
1           $ baz='foo
1           > bar'
1           $ expr "X$baz" : 'X\(foo\)$'
1 
1           $ expr-5.97 "X$baz" : 'X\(foo\)$'
1           foo
1 
1      The Posix standard is ambiguous as to whether `expr 'a' : '\(b\)''
1      outputs `0' or the empty string.  In practice, it outputs the
1      empty string on most platforms, but portable scripts should not
1      assume this.  For instance, the QNX 4.25 native `expr' returns `0'.
1 
1      One might think that a way to get a uniform behavior would be to
1      use the empty string as a default value:
1 
1           expr a : '\(b\)' \| ''
1 
1      Unfortunately this behaves exactly as the original expression; see
1      the `expr' (`|') entry for more information.
1 
1      Some ancient `expr' implementations (e.g., SunOS 4 `expr' and
1      Solaris 8 `/usr/ucb/expr') have a silly length limit that causes
1      `expr' to fail if the matched substring is longer than 120 bytes.
1      In this case, you might want to fall back on `echo|sed' if `expr'
1      fails.  Nowadays this is of practical importance only for the rare
1      installer who mistakenly puts `/usr/ucb' before `/usr/bin' in
1      `PATH'.
1 
1      On Mac OS X 10.4, `expr' mishandles the pattern `[^-]' in some
1      cases.  For example, the command
1           expr Xpowerpc-apple-darwin8.1.0 : 'X[^-]*-[^-]*-\(.*\)'
1 
1      outputs `apple-darwin8.1.0' rather than the correct `darwin8.1.0'.
1      This particular case can be worked around by substituting `[^--]'
1      for `[^-]'.
1 
1      Don't leave, there is some more!
1 
1      The QNX 4.25 `expr', in addition of preferring `0' to the empty
1      string, has a funny behavior in its exit status: it's always 1
1      when parentheses are used!
1 
1           $ val=`expr 'a' : 'a'`; echo "$?: $val"
1           0: 1
1           $ val=`expr 'a' : 'b'`; echo "$?: $val"
1           1: 0
1 
1           $ val=`expr 'a' : '\(a\)'`; echo "?: $val"
1           1: a
1           $ val=`expr 'a' : '\(b\)'`; echo "?: $val"
1           1: 0
1 
1      In practice this can be a big problem if you are ready to catch
1      failures of `expr' programs with some other method (such as using
1      `sed'), since you may get twice the result.  For instance
1 
1           $ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'
1 
1      outputs `a' on most hosts, but `aa' on QNX 4.25.  A simple
1      workaround consists of testing `expr' and using a variable set to
1      `expr' or to `false' according to the result.
1 
1      Tru64 `expr' incorrectly treats the result as a number, if it can
1      be interpreted that way:
1 
1           $ expr 00001 : '.*\(...\)'
1           1
1 
1      On HP-UX 11, `expr' only supports a single sub-expression.
1 
1           $ expr 'Xfoo' : 'X\(f\(oo\)*\)$'
1           expr: More than one '\(' was used.
1 
1 `fgrep'
1      Posix 1003.1-2001 no longer requires `fgrep', but many hosts do
1      not yet support the Posix replacement `grep -F'.  Also, some
1      traditional implementations do not work on long input lines.  To
1      work around these problems, invoke `AC_PROG_FGREP' and then use
1      `$FGREP'.
1 
1      Tru64/OSF 5.1 `fgrep' does not match an empty pattern.
1 
1 `find'
1      The option `-maxdepth' seems to be GNU specific.  Tru64 v5.1,
1      NetBSD 1.5 and Solaris `find' commands do not understand it.
1 
1      The replacement of `{}' is guaranteed only if the argument is
1      exactly _{}_, not if it's only a part of an argument.  For
1      instance on DU, and HP-UX 10.20 and HP-UX 11:
1 
1           $ touch foo
1           $ find . -name foo -exec echo "{}-{}" \;
1           {}-{}
1 
1      while GNU `find' reports `./foo-./foo'.
1 
1 `grep'
1      Portable scripts can rely on the `grep' options `-c', `-l', `-n',
1      and `-v', but should avoid other options.  For example, don't use
1      `-w', as Posix does not require it and Irix 6.5.16m's `grep' does
1      not support it.  Also, portable scripts should not combine `-c'
1      with `-l', as Posix does not allow this.
1 
1      Some of the options required by Posix are not portable in practice.
1      Don't use `grep -q' to suppress output, because many `grep'
1      implementations (e.g., Solaris) do not support `-q'.  Don't use
1      `grep -s' to suppress output either, because Posix says `-s' does
1      not suppress output, only some error messages; also, the `-s'
1      option of traditional `grep' behaved like `-q' does in most modern
1      implementations.  Instead, redirect the standard output and
1      standard error (in case the file doesn't exist) of `grep' to
1      `/dev/null'.  Check the exit status of `grep' to determine whether
1      it found a match.
1 
1      The QNX4 implementation fails to count lines with `grep -c '$'',
1      but works with `grep -c '^''.  Other alternatives for counting
1      lines are to use `sed -n '$='' or `wc -l'.
1 
1      Some traditional `grep' implementations do not work on long input
1      lines.  On AIX the default `grep' silently truncates long lines on
1      the input before matching.
1 
1      Also, many implementations do not support multiple regexps with
1      `-e': they either reject `-e' entirely (e.g., Solaris) or honor
1      only the last pattern (e.g., IRIX 6.5 and NeXT).  To work around
1      these problems, invoke `AC_PROG_GREP' and then use `$GREP'.
1 
1      Another possible workaround for the multiple `-e' problem is to
1      separate the patterns by newlines, for example:
1 
1           grep 'foo
1           bar' in.txt
1 
1      except that this fails with traditional `grep' implementations and
1      with OpenBSD 3.8 `grep'.
1 
1      Traditional `grep' implementations (e.g., Solaris) do not support
1      the `-E' or `-F' options.  To work around these problems, invoke
1      `AC_PROG_EGREP' and then use `$EGREP', and similarly for
1      `AC_PROG_FGREP' and `$FGREP'.  Even if you are willing to require
1      support for Posix `grep', your script should not use both `-E' and
1      `-F', since Posix does not allow this combination.
1 
1      Portable `grep' regular expressions should use `\' only to escape
1      characters in the string `$()*.0123456789[\^{}'.  For example,
1      alternation, `\|', is common but Posix does not require its
1      support in basic regular expressions, so it should be avoided in
1      portable scripts.  Solaris and HP-UX `grep' do not support it.
1      Similarly, the following escape sequences should also be avoided:
1      `\<', `\>', `\+', `\?', `\`', `\'', `\B', `\b', `\S', `\s', `\W',
1      and `\w'.
1 
1      Posix does not specify the behavior of `grep' on binary files.  An
1      example where this matters is using BSD `grep' to search text that
1      includes embedded ANSI escape sequences for colored output to
1      terminals (`\033[m' is the sequence to restore normal output); the
1      behavior depends on whether input is seekable:
1 
1           $ printf 'esc\033[mape\n' > sample
1           $ grep . sample
1           Binary file sample matches
1           $ cat sample | grep .
1           escape
1 
1 `join'
1      Solaris 8 `join' has bugs when the second operand is standard
1      input, and when standard input is a pipe.  For example, the
1      following shell script causes Solaris 8 `join' to loop forever:
1 
1           cat >file <<'EOF'
1           1 x
1           2 y
1           EOF
1           cat file | join file -
1 
1      Use `join - file' instead.
1 
1      On NetBSD, `join -a 1 file1 file2' mistakenly behaves like `join
1      -a 1 -a 2 1 file1 file2', resulting in a usage warning; the
1      workaround is to use `join -a1 file1 file2' instead.
1 
1 `ln'
1      Don't rely on `ln' having a `-f' option.  Symbolic links are not
1      available on old systems; use `$(LN_S)' as a portable substitute.
1 
1      For versions of the DJGPP before 2.04, `ln' emulates symbolic links
1      to executables by generating a stub that in turn calls the real
1      program.  This feature also works with nonexistent files like in
1      the Posix spec.  So `ln -s file link' generates `link.exe', which
1      attempts to call `file.exe' if run.  But this feature only works
1      for executables, so `cp -p' is used instead for these systems.
1      DJGPP versions 2.04 and later have full support for symbolic links.
1 
1 `ls'
1      The portable options are `-acdilrtu'.  Current practice is for
1      `-l' to output both owner and group, even though ancient versions
1      of `ls' omitted the group.
1 
1      On ancient hosts, `ls foo' sent the diagnostic `foo not found' to
1      standard output if `foo' did not exist.  Hence a shell command
1      like `sources=`ls *.c 2>/dev/null`' did not always work, since it
1      was equivalent to `sources='*.c not found'' in the absence of `.c'
1      files.  This is no longer a practical problem, since current `ls'
1      implementations send diagnostics to standard error.
1 
1      The behavior of `ls' on a directory that is being concurrently
1      modified is not always predictable, because of a data race where
1      cached information returned by `readdir' does not match the current
1      directory state.  In fact, MacOS 10.5 has an intermittent bug where
1      `readdir', and thus `ls', sometimes lists a file more than once if
1      other files were added or removed from the directory immediately
1      prior to the `ls' call.  Since `ls' already sorts its output, the
1      duplicate entries can be avoided by piping the results through
1      `uniq'.
1 
1 `mkdir'
1      No `mkdir' option is portable to older systems.  Instead of `mkdir
11      Programming in M4sh::) or `AC_PROG_MKDIR_P' (⇒Particular
      Programs).
1 
1      Combining the `-m' and `-p' options, as in `mkdir -m go-w -p DIR',
1      often leads to trouble.  FreeBSD `mkdir' incorrectly attempts to
1      change the permissions of DIR even if it already exists.  HP-UX
1      11.23 and IRIX 6.5 `mkdir' often assign the wrong permissions to
1      any newly-created parents of DIR.
1 
1      Posix does not clearly specify whether `mkdir -p foo' should
1      succeed when `foo' is a symbolic link to an already-existing
1      directory.  The GNU Core Utilities 5.1.0 `mkdir' succeeds, but
1      Solaris `mkdir' fails.
1 
1      Traditional `mkdir -p' implementations suffer from race conditions.
1      For example, if you invoke `mkdir -p a/b' and `mkdir -p a/c' at
1      the same time, both processes might detect that `a' is missing,
1      one might create `a', then the other might try to create `a' and
1      fail with a `File exists' diagnostic.  The GNU Core Utilities
1      (`fileutils' version 4.1), FreeBSD 5.0, NetBSD 2.0.2, and OpenBSD
1      2.4 are known to be race-free when two processes invoke `mkdir -p'
1      simultaneously, but earlier versions are vulnerable.  Solaris
1      `mkdir' is still vulnerable as of Solaris 10, and other
1      traditional Unix systems are probably vulnerable too.  This
1      possible race is harmful in parallel builds when several Make
1      rules call `mkdir -p' to construct directories.  You may use
1      `install-sh -d' as a safe replacement, provided this script is
1      recent enough; the copy shipped with Autoconf 2.60 and Automake
1      1.10 is OK, but copies from older versions are vulnerable.
1 
1 `mkfifo'
1 `mknod'
1      The GNU Coding Standards state that `mknod' is safe to use on
1      platforms where it has been tested to exist; but it is generally
1      portable only for creating named FIFOs, since device numbers are
1      platform-specific.  Autotest uses `mkfifo' to implement parallel
1      testsuites.  Posix states that behavior is unspecified when
1      opening a named FIFO for both reading and writing; on at least
1      Cygwin, this results in failure on any attempt to read or write to
1      that file descriptor.
1 
1 `mktemp'
1      Shell scripts can use temporary files safely with `mktemp', but it
1      does not exist on all systems.  A portable way to create a safe
1      temporary file name is to create a temporary directory with mode
1      700 and use a file inside this directory.  Both methods prevent
1      attackers from gaining control, though `mktemp' is far less likely
1      to fail gratuitously under attack.
1 
1      Here is sample code to create a new temporary directory `$dir'
1      safely:
1 
1           # Create a temporary directory $dir in $TMPDIR (default /tmp).
1           # Use mktemp if possible; otherwise fall back on mkdir,
1           # with $RANDOM to make collisions less likely.
1           : "${TMPDIR:=/tmp}"
1           {
1             dir=`
1               (umask 077 && mktemp -d "$TMPDIR/fooXXXXXX") 2>/dev/null
1             ` &&
1             test -d "$dir"
1           } || {
1             dir=$TMPDIR/foo$$-$RANDOM
1             (umask 077 && mkdir "$dir")
1           } || exit $?
1 
1 `mv'
1      The only portable options are `-f' and `-i'.
1 
1      Moving individual files between file systems is portable (it was
1      in Unix version 6), but it is not always atomic: when doing `mv
1      new existing', there's a critical section where neither the old
1      nor the new version of `existing' actually exists.
1 
1      On some systems moving files from `/tmp' can sometimes cause
1      undesirable (but perfectly valid) warnings, even if you created
1      these files.  This is because `/tmp' belongs to a group that
1      ordinary users are not members of, and files created in `/tmp'
1      inherit the group of `/tmp'.  When the file is copied, `mv' issues
1      a diagnostic without failing:
1 
1           $ touch /tmp/foo
1           $ mv /tmp/foo .
1           error-->mv: ./foo: set owner/group (was: 100/0): Operation not permitted
1           $ echo $?
1           0
1           $ ls foo
1           foo
1 
1      This annoying behavior conforms to Posix, unfortunately.
1 
1      Moving directories across mount points is not portable, use `cp'
1      and `rm'.
1 
1      DOS variants cannot rename or remove open files, and do not
1      support commands like `mv foo bar >foo', even though this is
1      perfectly portable among Posix hosts.
1 
1 `od'
1      In Mac OS X 10.3, `od' does not support the standard Posix options
1      `-A', `-j', `-N', or `-t', or the XSI option `-s'.  The only
1      supported Posix option is `-v', and the only supported XSI options
1      are those in `-bcdox'.  The BSD `hexdump' program can be used
1      instead.
1 
1      This problem no longer exists in Mac OS X 10.4.3.
1 
1 `rm'
1      The `-f' and `-r' options are portable.
1 
1      It is not portable to invoke `rm' without options or operands.  On
1      the other hand, Posix now requires `rm -f' to silently succeed
1      when there are no operands (useful for constructs like `rm -rf
1      $filelist' without first checking if `$filelist' was empty).  But
1      this was not always portable; at least NetBSD `rm' built before
1      2008 would fail with a diagnostic.
1 
1      A file might not be removed even if its parent directory is
1      writable and searchable.  Many Posix hosts cannot remove a mount
1      point, a named stream, a working directory, or a last link to a
1      file that is being executed.
1 
1      DOS variants cannot rename or remove open files, and do not
1      support commands like `rm foo >foo', even though this is perfectly
1      portable among Posix hosts.
1 
1 `rmdir'
1      Just as with `rm', some platforms refuse to remove a working
1      directory.
1 
1 `sed'
1      Patterns should not include the separator (unless escaped), even
1      as part of a character class.  In conformance with Posix, the Cray
1      `sed' rejects `s/[^/]*$//': use `s%[^/]*$%%'.  Even when escaped,
1      patterns should not include separators that are also used as `sed'
1      metacharacters.  For example, GNU sed 4.0.9 rejects
1      `s,x\{1\,\},,', while sed 4.1 strips the backslash before the comma
1      before evaluating the basic regular expression.
1 
1      Avoid empty patterns within parentheses (i.e., `\(\)').  Posix does
1      not require support for empty patterns, and Unicos 9 `sed' rejects
1      them.
1 
1      Unicos 9 `sed' loops endlessly on patterns like `.*\n.*'.
1 
1      Sed scripts should not use branch labels longer than 7 characters
1      and should not contain comments; AIX 5.3 `sed' rejects indented
1      comments.  HP-UX sed has a limit of 99 commands (not counting `:'
1      commands) and 48 labels, which cannot be circumvented by using
1      more than one script file.  It can execute up to 19 reads with the
1      `r' command per cycle.  Solaris `/usr/ucb/sed' rejects usages that
1      exceed a limit of about 6000 bytes for the internal representation
1      of commands.
1 
1      Avoid redundant `;', as some `sed' implementations, such as NetBSD
1      1.4.2's, incorrectly try to interpret the second `;' as a command:
1 
1           $ echo a | sed 's/x/x/;;s/x/x/'
1           sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
1 
1      Some `sed' implementations have a buffer limited to 4000 bytes,
1      and this limits the size of input lines, output lines, and internal
1      buffers that can be processed portably.  Likewise, not all `sed'
1      implementations can handle embedded `NUL' or a missing trailing
1      newline.
1 
1      Remember that ranges within a bracket expression of a regular
1      expression are only well-defined in the `C' (or `POSIX') locale.
1      Meanwhile, support for character classes like `[[:upper:]]' is not
1      yet universal, so if you cannot guarantee the setting of `LC_ALL',
1      it is better to spell out a range `[ABCDEFGHIJKLMNOPQRSTUVWXYZ]'
1      than to rely on `[A-Z]'.
1 
1      Additionally, Posix states that regular expressions are only
1      well-defined on characters.  Unfortunately, there exist platforms
1      such as MacOS X 10.5 where not all 8-bit byte values are valid
1      characters, even though that platform has a single-byte `C'
1      locale.  And Posix allows the existence of a multi-byte `C'
1      locale, although that does not yet appear to be a common
1      implementation.  At any rate, it means that not all bytes will be
1      matched by the regular expression `.':
1 
1           $ printf '\200\n' | LC_ALL=C sed -n /./p | wc -l
1           0
1           $ printf '\200\n' | LC_ALL=en_US.ISO8859-1 sed -n /./p | wc -l
1           1
1 
1      Portable `sed' regular expressions should use `\' only to escape
1      characters in the string `$()*.0123456789[\^n{}'.  For example,
1      alternation, `\|', is common but Posix does not require its
1      support, so it should be avoided in portable scripts.  Solaris
1      `sed' does not support alternation; e.g., `sed '/a\|b/d'' deletes
1      only lines that contain the literal string `a|b'.  Similarly, `\+'
1      and `\?' should be avoided.
1 
1      Anchors (`^' and `$') inside groups are not portable.
1 
1      Nested parentheses in patterns (e.g., `\(\(a*\)b*)\)') are quite
1      portable to current hosts, but was not supported by some ancient
1      `sed' implementations like SVR3.
1 
1      Some `sed' implementations, e.g., Solaris, restrict the special
1      role of the asterisk `*' to one-character regular expressions and
1      back-references, and the special role of interval expressions
1      `\{M\}', `\{M,\}', or `\{M,N\}' to one-character regular
1      expressions.  This may lead to unexpected behavior:
1 
1           $ echo '1*23*4' | /usr/bin/sed 's/\(.\)*/x/g'
1           x2x4
1           $ echo '1*23*4' | /usr/xpg4/bin/sed 's/\(.\)*/x/g'
1           x
1 
1      The `-e' option is mostly portable.  However, its argument cannot
1      start with `a', `c', or `i', as this runs afoul of a Tru64 5.1 bug.
1      Also, its argument cannot be empty, as this fails on AIX 5.3.
1      Some people prefer to use `-e':
1 
1           sed -e 'COMMAND-1' \
1               -e 'COMMAND-2'
1 
1      as opposed to the equivalent:
1 
1           sed '
1             COMMAND-1
1             COMMAND-2
1           '
1 
1      The following usage is sometimes equivalent:
1 
1           sed 'COMMAND-1;COMMAND-2'
1 
1      but Posix says that this use of a semicolon has undefined effect if
1      COMMAND-1's verb is `{', `a', `b', `c', `i', `r', `t', `w', `:',
1      or `#', so you should use semicolon only with simple scripts that
1      do not use these verbs.
1 
1      Posix up to the 2008 revision requires the argument of the `-e'
1      option to be a syntactically complete script.  GNU `sed' allows to
1      pass multiple script fragments, each as argument of a separate
1      `-e' option, that are then combined, with newlines between the
1      fragments, and a future Posix revision may allow this as well.
1      This approach is not portable with script fragments ending in
1      backslash; for example, the `sed' programs on Solaris 10, HP-UX
1      11, and AIX don't allow splitting in this case:
1 
1           $ echo a | sed -n -e 'i\
1           0'
1           0
1           $ echo a | sed -n -e 'i\' -e 0
1           Unrecognized command: 0
1 
1      In practice, however, this technique of joining fragments through
1      `-e' works for multiple `sed' functions within `{' and `}', even
1      if that is not specified by Posix:
1 
1           $ echo a | sed -n -e '/a/{' -e s/a/b/ -e p -e '}'
1           b
1 
1      Commands inside { } brackets are further restricted.  Posix 2008
1      says that they cannot be preceded by addresses, `!', or `;', and
1      that each command must be followed immediately by a newline,
1      without any intervening blanks or semicolons.  The closing bracket
1      must be alone on a line, other than white space preceding or
1      following it.  However, a future version of Posix may standardize
1      the use of addresses within brackets.
1 
1      Contrary to yet another urban legend, you may portably use `&' in
1      the replacement part of the `s' command to mean "what was
1      matched".  All descendants of Unix version 7 `sed' (at least; we
1      don't have first hand experience with older `sed' implementations)
1      have supported it.
1 
1      Posix requires that you must not have any white space between `!'
1      and the following command.  It is OK to have blanks between the
1      address and the `!'.  For instance, on Solaris:
1 
1           $ echo "foo" | sed -n '/bar/ ! p'
1           error-->Unrecognized command: /bar/ ! p
1           $ echo "foo" | sed -n '/bar/! p'
1           error-->Unrecognized command: /bar/! p
1           $ echo "foo" | sed -n '/bar/ !p'
1           foo
1 
1      Posix also says that you should not combine `!' and `;'.  If you
1      use `!', it is best to put it on a command that is delimited by
1      newlines rather than `;'.
1 
1      Also note that Posix requires that the `b', `t', `r', and `w'
1      commands be followed by exactly one space before their argument.
1      On the other hand, no white space is allowed between `:' and the
1      subsequent label name.
1 
1      If a sed script is specified on the command line and ends in an
1      `a', `c', or `i' command, the last line of inserted text should be
1      followed by a newline.  Otherwise some `sed' implementations
1      (e.g., OpenBSD 3.9) do not append a newline to the inserted text.
1 
1      Many `sed' implementations (e.g., MacOS X 10.4, OpenBSD 3.9,
1      Solaris 10 `/usr/ucb/sed') strip leading white space from the text
1      of `a', `c', and `i' commands.  Prepend a backslash to work around
1      this incompatibility with Posix:
1 
1           $ echo flushleft | sed 'a\
1           >    indented
1           > '
1           flushleft
1           indented
1           $ echo foo | sed 'a\
1           > \   indented
1           > '
1           flushleft
1              indented
1 
1      Posix requires that with an empty regular expression, the last
1      non-empty regular expression from either an address specification
1      or substitution command is applied.  However, busybox 1.6.1
1      complains when using a substitution command with a replacement
1      containing a back-reference to an empty regular expression; the
1      workaround is repeating the regular expression.
1 
1           $ echo abc | busybox sed '/a\(b\)c/ s//\1/'
1           sed: No previous regexp.
1           $ echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/'
1           b
1 
1 `sed' (`t')
1      Some old systems have `sed' that "forget" to reset their `t' flag
1      when starting a new cycle.  For instance on MIPS RISC/OS, and on
1      IRIX 5.3, if you run the following `sed' script (the line numbers
1      are not actual part of the texts):
1 
1           s/keep me/kept/g  # a
1           t end             # b
1           s/.*/deleted/g    # c
1           :end              # d
1 
1      on
1 
1           delete me         # 1
1           delete me         # 2
1           keep me           # 3
1           delete me         # 4
1 
1      you get
1 
1           deleted
1           delete me
1           kept
1           deleted
1 
1      instead of
1 
1           deleted
1           deleted
1           kept
1           deleted
1 
1      Why?  When processing line 1, (c) matches, therefore sets the `t'
1      flag, and the output is produced.  When processing line 2, the `t'
1      flag is still set (this is the bug).  Command (a) fails to match,
1      but `sed' is not supposed to clear the `t' flag when a
1      substitution fails.  Command (b) sees that the flag is set,
1      therefore it clears it, and jumps to (d), hence you get `delete me'
1      instead of `deleted'.  When processing line (3), `t' is clear, (a)
1      matches, so the flag is set, hence (b) clears the flags and jumps.
1      Finally, since the flag is clear, line 4 is processed properly.
1 
1      There are two things one should remember about `t' in `sed'.
1      Firstly, always remember that `t' jumps if _some_ substitution
1      succeeded, not only the immediately preceding substitution.
1      Therefore, always use a fake `t clear' followed by a `:clear' on
1      the next line, to reset the `t' flag where needed.
1 
1      Secondly, you cannot rely on `sed' to clear the flag at each new
1      cycle.
1 
1      One portable implementation of the script above is:
1 
1           t clear
1           :clear
1           s/keep me/kept/g
1           t end
1           s/.*/deleted/g
1           :end
1 
1 `sleep'
1      Using `sleep' is generally portable.  However, remember that
1      adding a `sleep' to work around timestamp issues, with a minimum
1      granularity of one second, doesn't scale well for parallel builds
1      on modern machines with sub-second process completion.
1 
1 `sort'
1      Remember that sort order is influenced by the current locale.
1      Inside `configure', the C locale is in effect, but in Makefile
1      snippets, you may need to specify `LC_ALL=C sort'.
1 
1 `tar'
1      There are multiple file formats for `tar'; if you use Automake,
1      the macro `AM_INIT_AUTOMAKE' has some options controlling which
1      level of portability to use.
1 
1 `touch'
1      If you specify the desired timestamp (e.g., with the `-r' option),
1      older `touch' implementations use the `utime' or `utimes' system
1      call, which can result in the same kind of timestamp truncation
1      problems that `cp -p' has.
1 
1      On ancient BSD systems, `touch' or any command that results in an
1      empty file does not update the timestamps, so use a command like
1      `echo' as a workaround.  Also, GNU `touch' 3.16r (and presumably
1      all before that) fails to work on SunOS 4.1.3 when the empty file
1      is on an NFS-mounted 4.2 volume.  However, these problems are no
1      longer of practical concern.
1 
1 `tr'
1      Not all versions of `tr' handle all backslash character escapes.
1      For example, Solaris 10 `/usr/ucb/tr' falls over, even though
1      Solaris contains more modern `tr' in other locations.  Using octal
1      escapes is more portable for carriage returns, since `\015' is the
1      same for both ASCII and EBCDIC, and since use of literal carriage
1      returns in scripts causes a number of other problems.  But for
1      other characters, like newline, using octal escapes ties the
1      operation to ASCII, so it is better to use literal characters.
1 
1           $ { echo moon; echo light; } | /usr/ucb/tr -d '\n' ; echo
1           moo
1           light
1           $ { echo moon; echo light; } | /usr/bin/tr -d '\n' ; echo
1           moonlight
1           $ { echo moon; echo light; } | /usr/ucb/tr -d '\012' ; echo
1           moonlight
1           $ nl='
1           '; { echo moon; echo light; } | /usr/ucb/tr -d "$nl" ; echo
1           moonlight
1 
1      Not all versions of `tr' recognize direct ranges of characters: at
1      least Solaris `/usr/bin/tr' still fails to do so.  But you can use
1      `/usr/xpg4/bin/tr' instead, or add brackets (which in Posix
1      transliterate to themselves).
1 
1           $ echo "Hazy Fantazy" | LC_ALL=C /usr/bin/tr a-z A-Z
1           HAZy FAntAZy
1           $ echo "Hazy Fantazy" | LC_ALL=C /usr/bin/tr '[a-z]' '[A-Z]'
1           HAZY FANTAZY
1           $ echo "Hazy Fantazy" | LC_ALL=C /usr/xpg4/bin/tr a-z A-Z
1           HAZY FANTAZY
1 
1      When providing two arguments, be sure the second string is at
1      least as long as the first.
1 
1           $ echo abc | /usr/xpg4/bin/tr bc d
1           adc
1           $ echo abc | coreutils/tr bc d
1           add
1 
1      Posix requires `tr' to operate on binary files.  But at least
1      Solaris `/usr/ucb/tr' and `/usr/bin/tr' silently discard `NUL' in
1      the input prior to doing any translation.  When using `tr' to
1      process a binary file that may contain `NUL' bytes, it is
1      necessary to use `/usr/xpg4/bin/tr' instead, or `/usr/xpg6/bin/tr'
1      if that is available.
1 
1           $ printf 'a\0b' | /usr/ucb/tr x x | od -An -tx1
1            61 62
1           $ printf 'a\0b' | /usr/bin/tr x x | od -An -tx1
1            61 62
1           $ printf 'a\0b' | /usr/xpg4/bin/tr x x | od -An -tx1
1            61 00 62
1 
1      Solaris `/usr/ucb/tr' additionally fails to handle `\0' as the
1      octal escape for `NUL'.
1 
1           $ printf 'abc' | /usr/ucb/tr 'bc' '\0d' | od -An -tx1
1            61 62 63
1           $ printf 'abc' | /usr/bin/tr 'bc' '\0d' | od -An -tx1
1            61 00 64
1           $ printf 'abc' | /usr/xpg4/bin/tr 'bc' '\0d' | od -An -tx1
1            61 00 64
1 
1