sed: Reporting Bugs
1
1 10 Reporting Bugs
1 *****************
1
1 Email bug reports to <bug-sed@gnu.org>. Also, please include the output
1 of 'sed --version' in the body of your report if at all possible.
1
1 Please do not send a bug report like this:
1
1 while building frobme-1.3.4
1 $ configure
1 error-> sed: file sedscr line 1: Unknown option to 's'
1
1 If GNU 'sed' doesn't configure your favorite package, take a few
1 extra minutes to identify the specific problem and make a stand-alone
1 test case. Unlike other programs such as C compilers, making such test
1 cases for 'sed' is quite simple.
1
1 A stand-alone test case includes all the data necessary to perform
1 the test, and the specific invocation of 'sed' that causes the problem.
1 The smaller a stand-alone test case is, the better. A test case should
1 not involve something as far removed from 'sed' as "try to configure
1 frobme-1.3.4". Yes, that is in principle enough information to look for
1 the bug, but that is not a very practical prospect.
1
1 Here are a few commonly reported bugs that are not bugs.
1
1 'N' command on the last line
1
1 Most versions of 'sed' exit without printing anything when the 'N'
1 command is issued on the last line of a file. GNU 'sed' prints
1 pattern space before exiting unless of course the '-n' command
1 switch has been specified. This choice is by design.
1
1 Default behavior (gnu extension, non-POSIX conforming):
1 $ seq 3 | sed N
1 1
1 2
1 3
1 To force POSIX-conforming behavior:
1 $ seq 3 | sed --posix N
1 1
1 2
1
1 For example, the behavior of
1 sed N foo bar
1 would depend on whether foo has an even or an odd number of
1 lines(1). Or, when writing a script to read the next few lines
1 following a pattern match, traditional implementations of 'sed'
1 would force you to write something like
1 /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N }
1 instead of just
1 /foo/{ N;N;N;N;N;N;N;N;N; }
1
1 In any case, the simplest workaround is to use '$d;N' in scripts
1 that rely on the traditional behavior, or to set the
1 'POSIXLY_CORRECT' variable to a non-empty value.
1
1 Regex syntax clashes (problems with backslashes)
1 'sed' uses the POSIX basic regular expression syntax. According to
1 the standard, the meaning of some escape sequences is undefined in
1 this syntax; notable in the case of 'sed' are '\|', '\+', '\?',
1 '\`', '\'', '\<', '\>', '\b', '\B', '\w', and '\W'.
1
1 As in all GNU programs that use POSIX basic regular expressions,
1 'sed' interprets these escape sequences as special characters. So,
1 'x\+' matches one or more occurrences of 'x'. 'abc\|def' matches
1 either 'abc' or 'def'.
1
1 This syntax may cause problems when running scripts written for
1 other 'sed's. Some 'sed' programs have been written with the
1 assumption that '\|' and '\+' match the literal characters '|' and
1 '+'. Such scripts must be modified by removing the spurious
1 backslashes if they are to be used with modern implementations of
1 'sed', like GNU 'sed'.
1
1 On the other hand, some scripts use s|abc\|def||g to remove
1 occurrences of _either_ 'abc' or 'def'. While this worked until
1 'sed' 4.0.x, newer versions interpret this as removing the string
1 'abc|def'. This is again undefined behavior according to POSIX,
1 and this interpretation is arguably more robust: older 'sed's, for
1 example, required that the regex matcher parsed '\/' as '/' in the
1 common case of escaping a slash, which is again undefined behavior;
1 the new behavior avoids this, and this is good because the regex
1 matcher is only partially under our control.
1
1 In addition, this version of 'sed' supports several escape
1 characters (some of which are multi-character) to insert
1 non-printable characters in scripts ('\a', '\c', '\d', '\o', '\r',
1 '\t', '\v', '\x'). These can cause similar problems with scripts
1 written for other 'sed's.
1
1 '-i' clobbers read-only files
1
1 In short, 'sed -i' will let you delete the contents of a read-only
11 file, and in general the '-i' option (⇒Invocation Invoking
sed.) lets you clobber protected files. This is not a bug, but
1 rather a consequence of how the Unix file system works.
1
1 The permissions on a file say what can happen to the data in that
1 file, while the permissions on a directory say what can happen to
1 the list of files in that directory. 'sed -i' will not ever open
1 for writing a file that is already on disk. Rather, it will work
1 on a temporary file that is finally renamed to the original name:
1 if you rename or delete files, you're actually modifying the
1 contents of the directory, so the operation depends on the
1 permissions of the directory, not of the file. For this same
1 reason, 'sed' does not let you use '-i' on a writable file in a
1 read-only directory, and will break hard or symbolic links when
1 '-i' is used on such a file.
1
1 '0a' does not work (gives an error)
1
1 There is no line 0. 0 is a special address that is only used to
1 treat addresses like '0,/RE/' as active when the script starts: if
1 you write '1,/abc/d' and the first line includes the word 'abc',
1 then that match would be ignored because address ranges must span
1 at least two lines (barring the end of the file); but what you
1 probably wanted is to delete every line up to the first one
1 including 'abc', and this is obtained with '0,/abc/d'.
1
1 '[a-z]' is case insensitive
1
1 You are encountering problems with locales. POSIX mandates that
1 '[a-z]' uses the current locale's collation order - in C parlance,
1 that means using 'strcoll(3)' instead of 'strcmp(3)'. Some locales
1 have a case-insensitive collation order, others don't.
1
1 Another problem is that '[a-z]' tries to use collation symbols.
1 This only happens if you are on the GNU system, using GNU libc's
1 regular expression matcher instead of compiling the one supplied
1 with GNU sed. In a Danish locale, for example, the regular
1 expression '^[a-z]$' matches the string 'aa', because this is a
1 single collating symbol that comes after 'a' and before 'b'; 'll'
1 behaves similarly in Spanish locales, or 'ij' in Dutch locales.
1
1 To work around these problems, which may cause bugs in shell
1 scripts, set the 'LC_COLLATE' and 'LC_CTYPE' environment variables
1 to 'C'.
1
1 's/.*//' does not clear pattern space
1
1 This happens if your input stream includes invalid multibyte
1 sequences. POSIX mandates that such sequences are _not_ matched by
1 '.', so that 's/.*//' will not clear pattern space as you would
1 expect. In fact, there is no way to clear sed's buffers in the
1 middle of the script in most multibyte locales (including UTF-8
1 locales). For this reason, GNU 'sed' provides a 'z' command (for
1 'zap') as an extension.
1
1 To work around these problems, which may cause bugs in shell
1 scripts, set the 'LC_COLLATE' and 'LC_CTYPE' environment variables
1 to 'C'.
1
1 ---------- Footnotes ----------
1
1 (1) which is the actual "bug" that prompted the change in behavior
1