autoconf: Limitations of Usual Tools
1
1 11.15 Limitations of Usual Tools
1 ================================
1
1 The small set of tools you can expect to find on any machine can still
1 include some limitations you should be aware of.
1
1 `awk'
1 Don't leave white space before the opening parenthesis in a user
1 function call. Posix does not allow this and GNU Awk rejects it:
1
1 $ gawk 'function die () { print "Aaaaarg!" }
1 BEGIN { die () }'
1 gawk: cmd. line:2: BEGIN { die () }
1 gawk: cmd. line:2: ^ parse error
1 $ gawk 'function die () { print "Aaaaarg!" }
1 BEGIN { die() }'
1 Aaaaarg!
1
1 Posix says that if a program contains only `BEGIN' actions, and
1 contains no instances of `getline', then the program merely
1 executes the actions without reading input. However, traditional
1 Awk implementations (such as Solaris 10 `awk') read and discard
1 input in this case. Portable scripts can redirect input from
1 `/dev/null' to work around the problem. For example:
1
1 awk 'BEGIN {print "hello world"}' </dev/null
1
1 Posix says that in an `END' action, `$NF' (and presumably, `$1')
1 retain their value from the last record read, if no intervening
1 `getline' occurred. However, some implementations (such as
1 Solaris 10 `/usr/bin/awk', `nawk', or Darwin `awk') reset these
1 variables. A workaround is to use an intermediate variable prior
1 to the `END' block. For example:
1
1 $ cat end.awk
1 { tmp = $1 }
1 END { print "a", $1, $NF, "b", tmp }
1 $ echo 1 | awk -f end.awk
1 a b 1
1 $ echo 1 | gawk -f end.awk
1 a 1 1 b 1
1
1 If you want your program to be deterministic, don't depend on `for'
1 on arrays:
1
1 $ cat for.awk
1 END {
1 arr["foo"] = 1
1 arr["bar"] = 1
1 for (i in arr)
1 print i
1 }
1 $ gawk -f for.awk </dev/null
1 foo
1 bar
1 $ nawk -f for.awk </dev/null
1 bar
1 foo
1
1 Some Awk implementations, such as HP-UX 11.0's native one,
1 mishandle anchors:
1
1 $ echo xfoo | $AWK '/foo|^bar/ { print }'
1 $ echo bar | $AWK '/foo|^bar/ { print }'
1 bar
1 $ echo xfoo | $AWK '/^bar|foo/ { print }'
1 xfoo
1 $ echo bar | $AWK '/^bar|foo/ { print }'
1 bar
1
1 Either do not depend on such patterns (i.e., use `/^(.*foo|bar)/',
1 or use a simple test to reject such implementations.
1
1 On `ia64-hp-hpux11.23', Awk mishandles `printf' conversions after
1 `%u':
1
1 $ awk 'BEGIN { printf "%u %d\n", 0, -1 }'
1 0 0
1
1 AIX version 5.2 has an arbitrary limit of 399 on the length of
1 regular expressions and literal strings in an Awk program.
1
1 Traditional Awk implementations derived from Unix version 7, such
1 as Solaris `/bin/awk', have many limitations and do not conform to
1 Posix. Nowadays `AC_PROG_AWK' (⇒Particular Programs) finds
1 you an Awk that doesn't have these problems, but if for some
1 reason you prefer not to use `AC_PROG_AWK' you may need to address
11 them. For more detailed descriptions, see ⇒`awk' language
history (gawk)Language History.
1
1 Traditional Awk does not support multidimensional arrays or
1 user-defined functions.
1
1 Traditional Awk does not support the `-v' option. You can use
1 assignments after the program instead, e.g., `$AWK '{print v $1}'
1 v=x'; however, don't forget that such assignments are not
1 evaluated until they are encountered (e.g., after any `BEGIN'
1 action).
1
1 Traditional Awk does not support the keywords `delete' or `do'.
1
1 Traditional Awk does not support the expressions `A?B:C', `!A',
1 `A^B', or `A^=B'.
1
1 Traditional Awk does not support the predefined `CONVFMT' or
1 `ENVIRON' variables.
1
1 Traditional Awk supports only the predefined functions `exp',
1 `index', `int', `length', `log', `split', `sprintf', `sqrt', and
1 `substr'.
1
1 Traditional Awk `getline' is not at all compatible with Posix;
1 avoid it.
1
1 Traditional Awk has `for (i in a) ...' but no other uses of the
1 `in' keyword. For example, it lacks `if (i in a) ...'.
1
1 In code portable to both traditional and modern Awk, `FS' must be a
1 string containing just one ordinary character, and similarly for
1 the field-separator argument to `split'.
1
1 Traditional Awk has a limit of 99 fields in a record. Since some
1 Awk implementations, like Tru64's, split the input even if you
1 don't refer to any field in the script, to circumvent this
1 problem, set `FS' to an unusual character and use `split'.
1
1 Traditional Awk has a limit of at most 99 bytes in a number
1 formatted by `OFMT'; for example, `OFMT="%.300e"; print 0.1;'
1 typically dumps core.
1
1 The original version of Awk had a limit of at most 99 bytes per
1 `split' field, 99 bytes per `substr' substring, and 99 bytes per
1 run of non-special characters in a `printf' format, but these bugs
1 have been fixed on all practical hosts that we know of.
1
1 HP-UX 11.00 and IRIX 6.5 Awk require that input files have a line
1 length of at most 3070 bytes.
1
1 `basename'
1 Not all hosts have a working `basename'. You can use `expr'
1 instead.
1
1 `cat'
1 Don't rely on any option.
1
1 `cc'
1 The command `cc -c foo.c' traditionally produces an object file
1 named `foo.o'. Most compilers allow `-c' to be combined with `-o'
1 to specify a different object file name, but Posix does not
1 require this combination and a few compilers lack support for it.
1 ⇒C Compiler, for how GNU Make tests for this feature with
1 `AC_PROG_CC_C_O'.
1
1 When a compilation such as `cc -o foo foo.c' fails, some compilers
1 (such as CDS on Reliant Unix) leave a `foo.o'.
1
1 HP-UX `cc' doesn't accept `.S' files to preprocess and assemble.
1 `cc -c foo.S' appears to succeed, but in fact does nothing.
1
1 The default executable, produced by `cc foo.c', can be
1
1 * `a.out' -- usual Posix convention.
1
1 * `b.out' -- i960 compilers (including `gcc').
1
1 * `a.exe' -- DJGPP port of `gcc'.
1
1 * `a_out.exe' -- GNV `cc' wrapper for DEC C on OpenVMS.
1
1 * `foo.exe' -- various MS-DOS compilers.
1
1 The C compiler's traditional name is `cc', but other names like
1 `gcc' are common. Posix 1003.1-2001 specifies the name `c99', but
1 older Posix editions specified `c89' and anyway these standard
1 names are rarely used in practice. Typically the C compiler is
1 invoked from makefiles that use `$(CC)', so the value of the `CC'
1 make variable selects the compiler name.
1
1 `chgrp'
1 `chown'
1 It is not portable to change a file's group to a group that the
1 owner does not belong to.
1
1 `chmod'
1 Avoid usages like `chmod -w file'; use `chmod a-w file' instead,
1 for two reasons. First, plain `-w' does not necessarily make the
1 file unwritable, since it does not affect mode bits that
1 correspond to bits in the file mode creation mask. Second, Posix
1 says that the `-w' might be interpreted as an
1 implementation-specific option, not as a mode; Posix suggests
1 using `chmod -- -w file' to avoid this confusion, but unfortunately
1 `--' does not work on some older hosts.
1
1 `cmp'
1 `cmp' performs a raw data comparison of two files, while `diff'
1 compares two text files. Therefore, if you might compare DOS
1 files, even if only checking whether two files are different, use
1 `diff' to avoid spurious differences due to differences of newline
1 encoding.
1
1 `cp'
1 Avoid the `-r' option, since Posix 1003.1-2004 marks it as
1 obsolescent and its behavior on special files is
1 implementation-defined. Use `-R' instead. On GNU hosts the two
1 options are equivalent, but on Solaris hosts (for example) `cp -r'
1 reads from pipes instead of replicating them. AIX 5.3 `cp -R' may
1 corrupt its own memory with some directory hierarchies and error
1 out or dump core:
1
1 mkdir -p 12345678/12345678/12345678/12345678
1 touch 12345678/12345678/x
1 cp -R 12345678 t
1 cp: 0653-440 12345678/12345678/: name too long.
1
1 Some `cp' implementations (e.g., BSD/OS 4.2) do not allow trailing
1 slashes at the end of nonexistent destination directories. To
1 avoid this problem, omit the trailing slashes. For example, use
1 `cp -R source /tmp/newdir' rather than `cp -R source /tmp/newdir/'
1 if `/tmp/newdir' does not exist.
1
1 The ancient SunOS 4 `cp' does not support `-f', although its `mv'
1 does.
1
1 Traditionally, file timestamps had 1-second resolution, and `cp
1 -p' copied the timestamps exactly. However, many modern file
1 systems have timestamps with 1-nanosecond resolution.
1 Unfortunately, some older `cp -p' implementations truncate
1 timestamps when copying files, which can cause the destination
1 file to appear to be older than the source. The exact amount of
1 truncation depends on the resolution of the system calls that `cp'
1 uses. Traditionally this was `utime', which has 1-second
1 resolution. Less-ancient `cp' implementations such as GNU Core
1 Utilities 5.0.91 (2003) use `utimes', which has 1-microsecond
1 resolution. Modern implementations such as GNU Core Utilities
1 6.12 (2008) can set timestamps to the full nanosecond resolution,
1 using the modern system calls `futimens' and `utimensat' when they
1 are available. As of 2011, though, many platforms do not yet
1 fully support these new system calls.
1
1 Bob Proulx notes that `cp -p' always _tries_ to copy ownerships.
1 But whether it actually does copy ownerships or not is a system
1 dependent policy decision implemented by the kernel. If the
1 kernel allows it then it happens. If the kernel does not allow it
1 then it does not happen. It is not something `cp' itself has
1 control over.
1
1 In Unix System V any user can chown files to any other user, and
1 System V also has a non-sticky `/tmp'. That probably derives from
1 the heritage of System V in a business environment without hostile
1 users. BSD changed this to be a more secure model where only root
1 can `chown' files and a sticky `/tmp' is used. That undoubtedly
1 derives from the heritage of BSD in a campus environment.
1
1 GNU/Linux and Solaris by default follow BSD, but can be configured
1 to allow a System V style `chown'. On the other hand, HP-UX
1 follows System V, but can be configured to use the modern security
1 model and disallow `chown'. Since it is an
1 administrator-configurable parameter you can't use the name of the
1 kernel as an indicator of the behavior.
1
1 `date'
1 Some versions of `date' do not recognize special `%' directives,
1 and unfortunately, instead of complaining, they just pass them
1 through, and exit with success:
1
1 $ uname -a
1 OSF1 medusa.sis.pasteur.fr V5.1 732 alpha
1 $ date "+%s"
1 %s
1
1 `diff'
1 Option `-u' is nonportable.
1
1 Some implementations, such as Tru64's, fail when comparing to
1 `/dev/null'. Use an empty file instead.
1
1 `dirname'
1 Not all hosts have a working `dirname', and you should instead use
1 `AS_DIRNAME' (⇒Programming in M4sh). For example:
1
1 dir=`dirname "$file"` # This is not portable.
1 dir=`AS_DIRNAME(["$file"])` # This is more portable.
1
1 `egrep'
1 Posix 1003.1-2001 no longer requires `egrep', but many hosts do
1 not yet support the Posix replacement `grep -E'. Also, some
1 traditional implementations do not work on long input lines. To
1 work around these problems, invoke `AC_PROG_EGREP' and then use
1 `$EGREP'.
1
1 Portable extended regular expressions should use `\' only to escape
1 characters in the string `$()*+.?[\^{|'. For example, `\}' is not
1 portable, even though it typically matches `}'.
1
1 The empty alternative is not portable. Use `?' instead. For
1 instance with Digital Unix v5.0:
1
1 > printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$'
1 |foo
1 > printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$'
1 bar|
1 > printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$'
1 foo
1 |bar
1
11 `$EGREP' also suffers the limitations of `grep' (⇒Limitations
of Usual Tools grep.).
1
1 `expr'
1 Not all implementations obey the Posix rule that `--' separates
1 options from arguments; likewise, not all implementations provide
1 the extension to Posix that the first argument can be treated as
1 part of a valid expression rather than an invalid option if it
1 begins with `-'. When performing arithmetic, use `expr 0 + $var'
1 if `$var' might be a negative number, to keep `expr' from
1 interpreting it as an option.
1
1 No `expr' keyword starts with `X', so use `expr X"WORD" :
1 'XREGEX'' to keep `expr' from misinterpreting WORD.
1
1 Don't use `length', `substr', `match' and `index'.
1
1 `expr' (`|')
1 You can use `|'. Although Posix does require that `expr '''
1 return the empty string, it does not specify the result when you
1 `|' together the empty string (or zero) with the empty string. For
1 example:
1
1 expr '' \| ''
1
1 Posix 1003.2-1992 returns the empty string for this case, but
1 traditional Unix returns `0' (Solaris is one such example). In
1 Posix 1003.1-2001, the specification was changed to match
1 traditional Unix's behavior (which is bizarre, but it's too late
1 to fix this). Please note that the same problem does arise when
1 the empty string results from a computation, as in:
1
1 expr bar : foo \| foo : bar
1
1 Avoid this portability problem by avoiding the empty string.
1
1 `expr' (`:')
1 Portable `expr' regular expressions should use `\' to escape only
1 characters in the string `$()*.0123456789[\^n{}'. For example,
1 alternation, `\|', is common but Posix does not require its
1 support, so it should be avoided in portable scripts. Similarly,
1 `\+' and `\?' should be avoided.
1
1 Portable `expr' regular expressions should not begin with `^'.
1 Patterns are automatically anchored so leading `^' is not needed
1 anyway.
1
1 On the other hand, the behavior of the `$' anchor is not portable
1 on multi-line strings. Posix is ambiguous whether the anchor
1 applies to each line, as was done in older versions of the GNU
1 Core Utilities, or whether it applies only to the end of the
1 overall string, as in Coreutils 6.0 and most other implementations.
1
1 $ baz='foo
1 > bar'
1 $ expr "X$baz" : 'X\(foo\)$'
1
1 $ expr-5.97 "X$baz" : 'X\(foo\)$'
1 foo
1
1 The Posix standard is ambiguous as to whether `expr 'a' : '\(b\)''
1 outputs `0' or the empty string. In practice, it outputs the
1 empty string on most platforms, but portable scripts should not
1 assume this. For instance, the QNX 4.25 native `expr' returns `0'.
1
1 One might think that a way to get a uniform behavior would be to
1 use the empty string as a default value:
1
1 expr a : '\(b\)' \| ''
1
1 Unfortunately this behaves exactly as the original expression; see
1 the `expr' (`|') entry for more information.
1
1 Some ancient `expr' implementations (e.g., SunOS 4 `expr' and
1 Solaris 8 `/usr/ucb/expr') have a silly length limit that causes
1 `expr' to fail if the matched substring is longer than 120 bytes.
1 In this case, you might want to fall back on `echo|sed' if `expr'
1 fails. Nowadays this is of practical importance only for the rare
1 installer who mistakenly puts `/usr/ucb' before `/usr/bin' in
1 `PATH'.
1
1 On Mac OS X 10.4, `expr' mishandles the pattern `[^-]' in some
1 cases. For example, the command
1 expr Xpowerpc-apple-darwin8.1.0 : 'X[^-]*-[^-]*-\(.*\)'
1
1 outputs `apple-darwin8.1.0' rather than the correct `darwin8.1.0'.
1 This particular case can be worked around by substituting `[^--]'
1 for `[^-]'.
1
1 Don't leave, there is some more!
1
1 The QNX 4.25 `expr', in addition of preferring `0' to the empty
1 string, has a funny behavior in its exit status: it's always 1
1 when parentheses are used!
1
1 $ val=`expr 'a' : 'a'`; echo "$?: $val"
1 0: 1
1 $ val=`expr 'a' : 'b'`; echo "$?: $val"
1 1: 0
1
1 $ val=`expr 'a' : '\(a\)'`; echo "?: $val"
1 1: a
1 $ val=`expr 'a' : '\(b\)'`; echo "?: $val"
1 1: 0
1
1 In practice this can be a big problem if you are ready to catch
1 failures of `expr' programs with some other method (such as using
1 `sed'), since you may get twice the result. For instance
1
1 $ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'
1
1 outputs `a' on most hosts, but `aa' on QNX 4.25. A simple
1 workaround consists of testing `expr' and using a variable set to
1 `expr' or to `false' according to the result.
1
1 Tru64 `expr' incorrectly treats the result as a number, if it can
1 be interpreted that way:
1
1 $ expr 00001 : '.*\(...\)'
1 1
1
1 On HP-UX 11, `expr' only supports a single sub-expression.
1
1 $ expr 'Xfoo' : 'X\(f\(oo\)*\)$'
1 expr: More than one '\(' was used.
1
1 `fgrep'
1 Posix 1003.1-2001 no longer requires `fgrep', but many hosts do
1 not yet support the Posix replacement `grep -F'. Also, some
1 traditional implementations do not work on long input lines. To
1 work around these problems, invoke `AC_PROG_FGREP' and then use
1 `$FGREP'.
1
1 Tru64/OSF 5.1 `fgrep' does not match an empty pattern.
1
1 `find'
1 The option `-maxdepth' seems to be GNU specific. Tru64 v5.1,
1 NetBSD 1.5 and Solaris `find' commands do not understand it.
1
1 The replacement of `{}' is guaranteed only if the argument is
1 exactly _{}_, not if it's only a part of an argument. For
1 instance on DU, and HP-UX 10.20 and HP-UX 11:
1
1 $ touch foo
1 $ find . -name foo -exec echo "{}-{}" \;
1 {}-{}
1
1 while GNU `find' reports `./foo-./foo'.
1
1 `grep'
1 Portable scripts can rely on the `grep' options `-c', `-l', `-n',
1 and `-v', but should avoid other options. For example, don't use
1 `-w', as Posix does not require it and Irix 6.5.16m's `grep' does
1 not support it. Also, portable scripts should not combine `-c'
1 with `-l', as Posix does not allow this.
1
1 Some of the options required by Posix are not portable in practice.
1 Don't use `grep -q' to suppress output, because many `grep'
1 implementations (e.g., Solaris) do not support `-q'. Don't use
1 `grep -s' to suppress output either, because Posix says `-s' does
1 not suppress output, only some error messages; also, the `-s'
1 option of traditional `grep' behaved like `-q' does in most modern
1 implementations. Instead, redirect the standard output and
1 standard error (in case the file doesn't exist) of `grep' to
1 `/dev/null'. Check the exit status of `grep' to determine whether
1 it found a match.
1
1 The QNX4 implementation fails to count lines with `grep -c '$'',
1 but works with `grep -c '^''. Other alternatives for counting
1 lines are to use `sed -n '$='' or `wc -l'.
1
1 Some traditional `grep' implementations do not work on long input
1 lines. On AIX the default `grep' silently truncates long lines on
1 the input before matching.
1
1 Also, many implementations do not support multiple regexps with
1 `-e': they either reject `-e' entirely (e.g., Solaris) or honor
1 only the last pattern (e.g., IRIX 6.5 and NeXT). To work around
1 these problems, invoke `AC_PROG_GREP' and then use `$GREP'.
1
1 Another possible workaround for the multiple `-e' problem is to
1 separate the patterns by newlines, for example:
1
1 grep 'foo
1 bar' in.txt
1
1 except that this fails with traditional `grep' implementations and
1 with OpenBSD 3.8 `grep'.
1
1 Traditional `grep' implementations (e.g., Solaris) do not support
1 the `-E' or `-F' options. To work around these problems, invoke
1 `AC_PROG_EGREP' and then use `$EGREP', and similarly for
1 `AC_PROG_FGREP' and `$FGREP'. Even if you are willing to require
1 support for Posix `grep', your script should not use both `-E' and
1 `-F', since Posix does not allow this combination.
1
1 Portable `grep' regular expressions should use `\' only to escape
1 characters in the string `$()*.0123456789[\^{}'. For example,
1 alternation, `\|', is common but Posix does not require its
1 support in basic regular expressions, so it should be avoided in
1 portable scripts. Solaris and HP-UX `grep' do not support it.
1 Similarly, the following escape sequences should also be avoided:
1 `\<', `\>', `\+', `\?', `\`', `\'', `\B', `\b', `\S', `\s', `\W',
1 and `\w'.
1
1 Posix does not specify the behavior of `grep' on binary files. An
1 example where this matters is using BSD `grep' to search text that
1 includes embedded ANSI escape sequences for colored output to
1 terminals (`\033[m' is the sequence to restore normal output); the
1 behavior depends on whether input is seekable:
1
1 $ printf 'esc\033[mape\n' > sample
1 $ grep . sample
1 Binary file sample matches
1 $ cat sample | grep .
1 escape
1
1 `join'
1 Solaris 8 `join' has bugs when the second operand is standard
1 input, and when standard input is a pipe. For example, the
1 following shell script causes Solaris 8 `join' to loop forever:
1
1 cat >file <<'EOF'
1 1 x
1 2 y
1 EOF
1 cat file | join file -
1
1 Use `join - file' instead.
1
1 On NetBSD, `join -a 1 file1 file2' mistakenly behaves like `join
1 -a 1 -a 2 1 file1 file2', resulting in a usage warning; the
1 workaround is to use `join -a1 file1 file2' instead.
1
1 `ln'
1 Don't rely on `ln' having a `-f' option. Symbolic links are not
1 available on old systems; use `$(LN_S)' as a portable substitute.
1
1 For versions of the DJGPP before 2.04, `ln' emulates symbolic links
1 to executables by generating a stub that in turn calls the real
1 program. This feature also works with nonexistent files like in
1 the Posix spec. So `ln -s file link' generates `link.exe', which
1 attempts to call `file.exe' if run. But this feature only works
1 for executables, so `cp -p' is used instead for these systems.
1 DJGPP versions 2.04 and later have full support for symbolic links.
1
1 `ls'
1 The portable options are `-acdilrtu'. Current practice is for
1 `-l' to output both owner and group, even though ancient versions
1 of `ls' omitted the group.
1
1 On ancient hosts, `ls foo' sent the diagnostic `foo not found' to
1 standard output if `foo' did not exist. Hence a shell command
1 like `sources=`ls *.c 2>/dev/null`' did not always work, since it
1 was equivalent to `sources='*.c not found'' in the absence of `.c'
1 files. This is no longer a practical problem, since current `ls'
1 implementations send diagnostics to standard error.
1
1 The behavior of `ls' on a directory that is being concurrently
1 modified is not always predictable, because of a data race where
1 cached information returned by `readdir' does not match the current
1 directory state. In fact, MacOS 10.5 has an intermittent bug where
1 `readdir', and thus `ls', sometimes lists a file more than once if
1 other files were added or removed from the directory immediately
1 prior to the `ls' call. Since `ls' already sorts its output, the
1 duplicate entries can be avoided by piping the results through
1 `uniq'.
1
1 `mkdir'
1 No `mkdir' option is portable to older systems. Instead of `mkdir
11 Programming in M4sh::) or `AC_PROG_MKDIR_P' (⇒Particular
Programs).
1
1 Combining the `-m' and `-p' options, as in `mkdir -m go-w -p DIR',
1 often leads to trouble. FreeBSD `mkdir' incorrectly attempts to
1 change the permissions of DIR even if it already exists. HP-UX
1 11.23 and IRIX 6.5 `mkdir' often assign the wrong permissions to
1 any newly-created parents of DIR.
1
1 Posix does not clearly specify whether `mkdir -p foo' should
1 succeed when `foo' is a symbolic link to an already-existing
1 directory. The GNU Core Utilities 5.1.0 `mkdir' succeeds, but
1 Solaris `mkdir' fails.
1
1 Traditional `mkdir -p' implementations suffer from race conditions.
1 For example, if you invoke `mkdir -p a/b' and `mkdir -p a/c' at
1 the same time, both processes might detect that `a' is missing,
1 one might create `a', then the other might try to create `a' and
1 fail with a `File exists' diagnostic. The GNU Core Utilities
1 (`fileutils' version 4.1), FreeBSD 5.0, NetBSD 2.0.2, and OpenBSD
1 2.4 are known to be race-free when two processes invoke `mkdir -p'
1 simultaneously, but earlier versions are vulnerable. Solaris
1 `mkdir' is still vulnerable as of Solaris 10, and other
1 traditional Unix systems are probably vulnerable too. This
1 possible race is harmful in parallel builds when several Make
1 rules call `mkdir -p' to construct directories. You may use
1 `install-sh -d' as a safe replacement, provided this script is
1 recent enough; the copy shipped with Autoconf 2.60 and Automake
1 1.10 is OK, but copies from older versions are vulnerable.
1
1 `mkfifo'
1 `mknod'
1 The GNU Coding Standards state that `mknod' is safe to use on
1 platforms where it has been tested to exist; but it is generally
1 portable only for creating named FIFOs, since device numbers are
1 platform-specific. Autotest uses `mkfifo' to implement parallel
1 testsuites. Posix states that behavior is unspecified when
1 opening a named FIFO for both reading and writing; on at least
1 Cygwin, this results in failure on any attempt to read or write to
1 that file descriptor.
1
1 `mktemp'
1 Shell scripts can use temporary files safely with `mktemp', but it
1 does not exist on all systems. A portable way to create a safe
1 temporary file name is to create a temporary directory with mode
1 700 and use a file inside this directory. Both methods prevent
1 attackers from gaining control, though `mktemp' is far less likely
1 to fail gratuitously under attack.
1
1 Here is sample code to create a new temporary directory `$dir'
1 safely:
1
1 # Create a temporary directory $dir in $TMPDIR (default /tmp).
1 # Use mktemp if possible; otherwise fall back on mkdir,
1 # with $RANDOM to make collisions less likely.
1 : "${TMPDIR:=/tmp}"
1 {
1 dir=`
1 (umask 077 && mktemp -d "$TMPDIR/fooXXXXXX") 2>/dev/null
1 ` &&
1 test -d "$dir"
1 } || {
1 dir=$TMPDIR/foo$$-$RANDOM
1 (umask 077 && mkdir "$dir")
1 } || exit $?
1
1 `mv'
1 The only portable options are `-f' and `-i'.
1
1 Moving individual files between file systems is portable (it was
1 in Unix version 6), but it is not always atomic: when doing `mv
1 new existing', there's a critical section where neither the old
1 nor the new version of `existing' actually exists.
1
1 On some systems moving files from `/tmp' can sometimes cause
1 undesirable (but perfectly valid) warnings, even if you created
1 these files. This is because `/tmp' belongs to a group that
1 ordinary users are not members of, and files created in `/tmp'
1 inherit the group of `/tmp'. When the file is copied, `mv' issues
1 a diagnostic without failing:
1
1 $ touch /tmp/foo
1 $ mv /tmp/foo .
1 error-->mv: ./foo: set owner/group (was: 100/0): Operation not permitted
1 $ echo $?
1 0
1 $ ls foo
1 foo
1
1 This annoying behavior conforms to Posix, unfortunately.
1
1 Moving directories across mount points is not portable, use `cp'
1 and `rm'.
1
1 DOS variants cannot rename or remove open files, and do not
1 support commands like `mv foo bar >foo', even though this is
1 perfectly portable among Posix hosts.
1
1 `od'
1 In Mac OS X 10.3, `od' does not support the standard Posix options
1 `-A', `-j', `-N', or `-t', or the XSI option `-s'. The only
1 supported Posix option is `-v', and the only supported XSI options
1 are those in `-bcdox'. The BSD `hexdump' program can be used
1 instead.
1
1 This problem no longer exists in Mac OS X 10.4.3.
1
1 `rm'
1 The `-f' and `-r' options are portable.
1
1 It is not portable to invoke `rm' without options or operands. On
1 the other hand, Posix now requires `rm -f' to silently succeed
1 when there are no operands (useful for constructs like `rm -rf
1 $filelist' without first checking if `$filelist' was empty). But
1 this was not always portable; at least NetBSD `rm' built before
1 2008 would fail with a diagnostic.
1
1 A file might not be removed even if its parent directory is
1 writable and searchable. Many Posix hosts cannot remove a mount
1 point, a named stream, a working directory, or a last link to a
1 file that is being executed.
1
1 DOS variants cannot rename or remove open files, and do not
1 support commands like `rm foo >foo', even though this is perfectly
1 portable among Posix hosts.
1
1 `rmdir'
1 Just as with `rm', some platforms refuse to remove a working
1 directory.
1
1 `sed'
1 Patterns should not include the separator (unless escaped), even
1 as part of a character class. In conformance with Posix, the Cray
1 `sed' rejects `s/[^/]*$//': use `s%[^/]*$%%'. Even when escaped,
1 patterns should not include separators that are also used as `sed'
1 metacharacters. For example, GNU sed 4.0.9 rejects
1 `s,x\{1\,\},,', while sed 4.1 strips the backslash before the comma
1 before evaluating the basic regular expression.
1
1 Avoid empty patterns within parentheses (i.e., `\(\)'). Posix does
1 not require support for empty patterns, and Unicos 9 `sed' rejects
1 them.
1
1 Unicos 9 `sed' loops endlessly on patterns like `.*\n.*'.
1
1 Sed scripts should not use branch labels longer than 7 characters
1 and should not contain comments; AIX 5.3 `sed' rejects indented
1 comments. HP-UX sed has a limit of 99 commands (not counting `:'
1 commands) and 48 labels, which cannot be circumvented by using
1 more than one script file. It can execute up to 19 reads with the
1 `r' command per cycle. Solaris `/usr/ucb/sed' rejects usages that
1 exceed a limit of about 6000 bytes for the internal representation
1 of commands.
1
1 Avoid redundant `;', as some `sed' implementations, such as NetBSD
1 1.4.2's, incorrectly try to interpret the second `;' as a command:
1
1 $ echo a | sed 's/x/x/;;s/x/x/'
1 sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
1
1 Some `sed' implementations have a buffer limited to 4000 bytes,
1 and this limits the size of input lines, output lines, and internal
1 buffers that can be processed portably. Likewise, not all `sed'
1 implementations can handle embedded `NUL' or a missing trailing
1 newline.
1
1 Remember that ranges within a bracket expression of a regular
1 expression are only well-defined in the `C' (or `POSIX') locale.
1 Meanwhile, support for character classes like `[[:upper:]]' is not
1 yet universal, so if you cannot guarantee the setting of `LC_ALL',
1 it is better to spell out a range `[ABCDEFGHIJKLMNOPQRSTUVWXYZ]'
1 than to rely on `[A-Z]'.
1
1 Additionally, Posix states that regular expressions are only
1 well-defined on characters. Unfortunately, there exist platforms
1 such as MacOS X 10.5 where not all 8-bit byte values are valid
1 characters, even though that platform has a single-byte `C'
1 locale. And Posix allows the existence of a multi-byte `C'
1 locale, although that does not yet appear to be a common
1 implementation. At any rate, it means that not all bytes will be
1 matched by the regular expression `.':
1
1 $ printf '\200\n' | LC_ALL=C sed -n /./p | wc -l
1 0
1 $ printf '\200\n' | LC_ALL=en_US.ISO8859-1 sed -n /./p | wc -l
1 1
1
1 Portable `sed' regular expressions should use `\' only to escape
1 characters in the string `$()*.0123456789[\^n{}'. For example,
1 alternation, `\|', is common but Posix does not require its
1 support, so it should be avoided in portable scripts. Solaris
1 `sed' does not support alternation; e.g., `sed '/a\|b/d'' deletes
1 only lines that contain the literal string `a|b'. Similarly, `\+'
1 and `\?' should be avoided.
1
1 Anchors (`^' and `$') inside groups are not portable.
1
1 Nested parentheses in patterns (e.g., `\(\(a*\)b*)\)') are quite
1 portable to current hosts, but was not supported by some ancient
1 `sed' implementations like SVR3.
1
1 Some `sed' implementations, e.g., Solaris, restrict the special
1 role of the asterisk `*' to one-character regular expressions and
1 back-references, and the special role of interval expressions
1 `\{M\}', `\{M,\}', or `\{M,N\}' to one-character regular
1 expressions. This may lead to unexpected behavior:
1
1 $ echo '1*23*4' | /usr/bin/sed 's/\(.\)*/x/g'
1 x2x4
1 $ echo '1*23*4' | /usr/xpg4/bin/sed 's/\(.\)*/x/g'
1 x
1
1 The `-e' option is mostly portable. However, its argument cannot
1 start with `a', `c', or `i', as this runs afoul of a Tru64 5.1 bug.
1 Also, its argument cannot be empty, as this fails on AIX 5.3.
1 Some people prefer to use `-e':
1
1 sed -e 'COMMAND-1' \
1 -e 'COMMAND-2'
1
1 as opposed to the equivalent:
1
1 sed '
1 COMMAND-1
1 COMMAND-2
1 '
1
1 The following usage is sometimes equivalent:
1
1 sed 'COMMAND-1;COMMAND-2'
1
1 but Posix says that this use of a semicolon has undefined effect if
1 COMMAND-1's verb is `{', `a', `b', `c', `i', `r', `t', `w', `:',
1 or `#', so you should use semicolon only with simple scripts that
1 do not use these verbs.
1
1 Posix up to the 2008 revision requires the argument of the `-e'
1 option to be a syntactically complete script. GNU `sed' allows to
1 pass multiple script fragments, each as argument of a separate
1 `-e' option, that are then combined, with newlines between the
1 fragments, and a future Posix revision may allow this as well.
1 This approach is not portable with script fragments ending in
1 backslash; for example, the `sed' programs on Solaris 10, HP-UX
1 11, and AIX don't allow splitting in this case:
1
1 $ echo a | sed -n -e 'i\
1 0'
1 0
1 $ echo a | sed -n -e 'i\' -e 0
1 Unrecognized command: 0
1
1 In practice, however, this technique of joining fragments through
1 `-e' works for multiple `sed' functions within `{' and `}', even
1 if that is not specified by Posix:
1
1 $ echo a | sed -n -e '/a/{' -e s/a/b/ -e p -e '}'
1 b
1
1 Commands inside { } brackets are further restricted. Posix 2008
1 says that they cannot be preceded by addresses, `!', or `;', and
1 that each command must be followed immediately by a newline,
1 without any intervening blanks or semicolons. The closing bracket
1 must be alone on a line, other than white space preceding or
1 following it. However, a future version of Posix may standardize
1 the use of addresses within brackets.
1
1 Contrary to yet another urban legend, you may portably use `&' in
1 the replacement part of the `s' command to mean "what was
1 matched". All descendants of Unix version 7 `sed' (at least; we
1 don't have first hand experience with older `sed' implementations)
1 have supported it.
1
1 Posix requires that you must not have any white space between `!'
1 and the following command. It is OK to have blanks between the
1 address and the `!'. For instance, on Solaris:
1
1 $ echo "foo" | sed -n '/bar/ ! p'
1 error-->Unrecognized command: /bar/ ! p
1 $ echo "foo" | sed -n '/bar/! p'
1 error-->Unrecognized command: /bar/! p
1 $ echo "foo" | sed -n '/bar/ !p'
1 foo
1
1 Posix also says that you should not combine `!' and `;'. If you
1 use `!', it is best to put it on a command that is delimited by
1 newlines rather than `;'.
1
1 Also note that Posix requires that the `b', `t', `r', and `w'
1 commands be followed by exactly one space before their argument.
1 On the other hand, no white space is allowed between `:' and the
1 subsequent label name.
1
1 If a sed script is specified on the command line and ends in an
1 `a', `c', or `i' command, the last line of inserted text should be
1 followed by a newline. Otherwise some `sed' implementations
1 (e.g., OpenBSD 3.9) do not append a newline to the inserted text.
1
1 Many `sed' implementations (e.g., MacOS X 10.4, OpenBSD 3.9,
1 Solaris 10 `/usr/ucb/sed') strip leading white space from the text
1 of `a', `c', and `i' commands. Prepend a backslash to work around
1 this incompatibility with Posix:
1
1 $ echo flushleft | sed 'a\
1 > indented
1 > '
1 flushleft
1 indented
1 $ echo foo | sed 'a\
1 > \ indented
1 > '
1 flushleft
1 indented
1
1 Posix requires that with an empty regular expression, the last
1 non-empty regular expression from either an address specification
1 or substitution command is applied. However, busybox 1.6.1
1 complains when using a substitution command with a replacement
1 containing a back-reference to an empty regular expression; the
1 workaround is repeating the regular expression.
1
1 $ echo abc | busybox sed '/a\(b\)c/ s//\1/'
1 sed: No previous regexp.
1 $ echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/'
1 b
1
1 `sed' (`t')
1 Some old systems have `sed' that "forget" to reset their `t' flag
1 when starting a new cycle. For instance on MIPS RISC/OS, and on
1 IRIX 5.3, if you run the following `sed' script (the line numbers
1 are not actual part of the texts):
1
1 s/keep me/kept/g # a
1 t end # b
1 s/.*/deleted/g # c
1 :end # d
1
1 on
1
1 delete me # 1
1 delete me # 2
1 keep me # 3
1 delete me # 4
1
1 you get
1
1 deleted
1 delete me
1 kept
1 deleted
1
1 instead of
1
1 deleted
1 deleted
1 kept
1 deleted
1
1 Why? When processing line 1, (c) matches, therefore sets the `t'
1 flag, and the output is produced. When processing line 2, the `t'
1 flag is still set (this is the bug). Command (a) fails to match,
1 but `sed' is not supposed to clear the `t' flag when a
1 substitution fails. Command (b) sees that the flag is set,
1 therefore it clears it, and jumps to (d), hence you get `delete me'
1 instead of `deleted'. When processing line (3), `t' is clear, (a)
1 matches, so the flag is set, hence (b) clears the flags and jumps.
1 Finally, since the flag is clear, line 4 is processed properly.
1
1 There are two things one should remember about `t' in `sed'.
1 Firstly, always remember that `t' jumps if _some_ substitution
1 succeeded, not only the immediately preceding substitution.
1 Therefore, always use a fake `t clear' followed by a `:clear' on
1 the next line, to reset the `t' flag where needed.
1
1 Secondly, you cannot rely on `sed' to clear the flag at each new
1 cycle.
1
1 One portable implementation of the script above is:
1
1 t clear
1 :clear
1 s/keep me/kept/g
1 t end
1 s/.*/deleted/g
1 :end
1
1 `sleep'
1 Using `sleep' is generally portable. However, remember that
1 adding a `sleep' to work around timestamp issues, with a minimum
1 granularity of one second, doesn't scale well for parallel builds
1 on modern machines with sub-second process completion.
1
1 `sort'
1 Remember that sort order is influenced by the current locale.
1 Inside `configure', the C locale is in effect, but in Makefile
1 snippets, you may need to specify `LC_ALL=C sort'.
1
1 `tar'
1 There are multiple file formats for `tar'; if you use Automake,
1 the macro `AM_INIT_AUTOMAKE' has some options controlling which
1 level of portability to use.
1
1 `touch'
1 If you specify the desired timestamp (e.g., with the `-r' option),
1 older `touch' implementations use the `utime' or `utimes' system
1 call, which can result in the same kind of timestamp truncation
1 problems that `cp -p' has.
1
1 On ancient BSD systems, `touch' or any command that results in an
1 empty file does not update the timestamps, so use a command like
1 `echo' as a workaround. Also, GNU `touch' 3.16r (and presumably
1 all before that) fails to work on SunOS 4.1.3 when the empty file
1 is on an NFS-mounted 4.2 volume. However, these problems are no
1 longer of practical concern.
1
1 `tr'
1 Not all versions of `tr' handle all backslash character escapes.
1 For example, Solaris 10 `/usr/ucb/tr' falls over, even though
1 Solaris contains more modern `tr' in other locations. Using octal
1 escapes is more portable for carriage returns, since `\015' is the
1 same for both ASCII and EBCDIC, and since use of literal carriage
1 returns in scripts causes a number of other problems. But for
1 other characters, like newline, using octal escapes ties the
1 operation to ASCII, so it is better to use literal characters.
1
1 $ { echo moon; echo light; } | /usr/ucb/tr -d '\n' ; echo
1 moo
1 light
1 $ { echo moon; echo light; } | /usr/bin/tr -d '\n' ; echo
1 moonlight
1 $ { echo moon; echo light; } | /usr/ucb/tr -d '\012' ; echo
1 moonlight
1 $ nl='
1 '; { echo moon; echo light; } | /usr/ucb/tr -d "$nl" ; echo
1 moonlight
1
1 Not all versions of `tr' recognize direct ranges of characters: at
1 least Solaris `/usr/bin/tr' still fails to do so. But you can use
1 `/usr/xpg4/bin/tr' instead, or add brackets (which in Posix
1 transliterate to themselves).
1
1 $ echo "Hazy Fantazy" | LC_ALL=C /usr/bin/tr a-z A-Z
1 HAZy FAntAZy
1 $ echo "Hazy Fantazy" | LC_ALL=C /usr/bin/tr '[a-z]' '[A-Z]'
1 HAZY FANTAZY
1 $ echo "Hazy Fantazy" | LC_ALL=C /usr/xpg4/bin/tr a-z A-Z
1 HAZY FANTAZY
1
1 When providing two arguments, be sure the second string is at
1 least as long as the first.
1
1 $ echo abc | /usr/xpg4/bin/tr bc d
1 adc
1 $ echo abc | coreutils/tr bc d
1 add
1
1 Posix requires `tr' to operate on binary files. But at least
1 Solaris `/usr/ucb/tr' and `/usr/bin/tr' silently discard `NUL' in
1 the input prior to doing any translation. When using `tr' to
1 process a binary file that may contain `NUL' bytes, it is
1 necessary to use `/usr/xpg4/bin/tr' instead, or `/usr/xpg6/bin/tr'
1 if that is available.
1
1 $ printf 'a\0b' | /usr/ucb/tr x x | od -An -tx1
1 61 62
1 $ printf 'a\0b' | /usr/bin/tr x x | od -An -tx1
1 61 62
1 $ printf 'a\0b' | /usr/xpg4/bin/tr x x | od -An -tx1
1 61 00 62
1
1 Solaris `/usr/ucb/tr' additionally fails to handle `\0' as the
1 octal escape for `NUL'.
1
1 $ printf 'abc' | /usr/ucb/tr 'bc' '\0d' | od -An -tx1
1 61 62 63
1 $ printf 'abc' | /usr/bin/tr 'bc' '\0d' | od -An -tx1
1 61 00 64
1 $ printf 'abc' | /usr/xpg4/bin/tr 'bc' '\0d' | od -An -tx1
1 61 00 64
1
1