sed: Reporting Bugs

1 
1 10 Reporting Bugs
1 *****************
1 
1 Email bug reports to <bug-sed@gnu.org>.  Also, please include the output
1 of 'sed --version' in the body of your report if at all possible.
1 
1    Please do not send a bug report like this:
1 
1      while building frobme-1.3.4
1      $ configure
1      error-> sed: file sedscr line 1: Unknown option to 's'
1 
1    If GNU 'sed' doesn't configure your favorite package, take a few
1 extra minutes to identify the specific problem and make a stand-alone
1 test case.  Unlike other programs such as C compilers, making such test
1 cases for 'sed' is quite simple.
1 
1    A stand-alone test case includes all the data necessary to perform
1 the test, and the specific invocation of 'sed' that causes the problem.
1 The smaller a stand-alone test case is, the better.  A test case should
1 not involve something as far removed from 'sed' as "try to configure
1 frobme-1.3.4".  Yes, that is in principle enough information to look for
1 the bug, but that is not a very practical prospect.
1 
1    Here are a few commonly reported bugs that are not bugs.
1 
1 'N' command on the last line
1 
1      Most versions of 'sed' exit without printing anything when the 'N'
1      command is issued on the last line of a file.  GNU 'sed' prints
1      pattern space before exiting unless of course the '-n' command
1      switch has been specified.  This choice is by design.
1 
1      Default behavior (gnu extension, non-POSIX conforming):
1           $ seq 3 | sed N
1           1
1           2
1           3
1      To force POSIX-conforming behavior:
1           $ seq 3 | sed --posix N
1           1
1           2
1 
1      For example, the behavior of
1           sed N foo bar
1      would depend on whether foo has an even or an odd number of
1      lines(1).  Or, when writing a script to read the next few lines
1      following a pattern match, traditional implementations of 'sed'
1      would force you to write something like
1           /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N }
1      instead of just
1           /foo/{ N;N;N;N;N;N;N;N;N; }
1 
1      In any case, the simplest workaround is to use '$d;N' in scripts
1      that rely on the traditional behavior, or to set the
1      'POSIXLY_CORRECT' variable to a non-empty value.
1 
1 Regex syntax clashes (problems with backslashes)
1      'sed' uses the POSIX basic regular expression syntax.  According to
1      the standard, the meaning of some escape sequences is undefined in
1      this syntax; notable in the case of 'sed' are '\|', '\+', '\?',
1      '\`', '\'', '\<', '\>', '\b', '\B', '\w', and '\W'.
1 
1      As in all GNU programs that use POSIX basic regular expressions,
1      'sed' interprets these escape sequences as special characters.  So,
1      'x\+' matches one or more occurrences of 'x'.  'abc\|def' matches
1      either 'abc' or 'def'.
1 
1      This syntax may cause problems when running scripts written for
1      other 'sed's.  Some 'sed' programs have been written with the
1      assumption that '\|' and '\+' match the literal characters '|' and
1      '+'.  Such scripts must be modified by removing the spurious
1      backslashes if they are to be used with modern implementations of
1      'sed', like GNU 'sed'.
1 
1      On the other hand, some scripts use s|abc\|def||g to remove
1      occurrences of _either_ 'abc' or 'def'.  While this worked until
1      'sed' 4.0.x, newer versions interpret this as removing the string
1      'abc|def'.  This is again undefined behavior according to POSIX,
1      and this interpretation is arguably more robust: older 'sed's, for
1      example, required that the regex matcher parsed '\/' as '/' in the
1      common case of escaping a slash, which is again undefined behavior;
1      the new behavior avoids this, and this is good because the regex
1      matcher is only partially under our control.
1 
1      In addition, this version of 'sed' supports several escape
1      characters (some of which are multi-character) to insert
1      non-printable characters in scripts ('\a', '\c', '\d', '\o', '\r',
1      '\t', '\v', '\x').  These can cause similar problems with scripts
1      written for other 'sed's.
1 
1 '-i' clobbers read-only files
1 
1      In short, 'sed -i' will let you delete the contents of a read-only
11      file, and in general the '-i' option (⇒Invocation Invoking
      sed.) lets you clobber protected files.  This is not a bug, but
1      rather a consequence of how the Unix file system works.
1 
1      The permissions on a file say what can happen to the data in that
1      file, while the permissions on a directory say what can happen to
1      the list of files in that directory.  'sed -i' will not ever open
1      for writing a file that is already on disk.  Rather, it will work
1      on a temporary file that is finally renamed to the original name:
1      if you rename or delete files, you're actually modifying the
1      contents of the directory, so the operation depends on the
1      permissions of the directory, not of the file.  For this same
1      reason, 'sed' does not let you use '-i' on a writable file in a
1      read-only directory, and will break hard or symbolic links when
1      '-i' is used on such a file.
1 
1 '0a' does not work (gives an error)
1 
1      There is no line 0.  0 is a special address that is only used to
1      treat addresses like '0,/RE/' as active when the script starts: if
1      you write '1,/abc/d' and the first line includes the word 'abc',
1      then that match would be ignored because address ranges must span
1      at least two lines (barring the end of the file); but what you
1      probably wanted is to delete every line up to the first one
1      including 'abc', and this is obtained with '0,/abc/d'.
1 
1 '[a-z]' is case insensitive
1 
1      You are encountering problems with locales.  POSIX mandates that
1      '[a-z]' uses the current locale's collation order - in C parlance,
1      that means using 'strcoll(3)' instead of 'strcmp(3)'.  Some locales
1      have a case-insensitive collation order, others don't.
1 
1      Another problem is that '[a-z]' tries to use collation symbols.
1      This only happens if you are on the GNU system, using GNU libc's
1      regular expression matcher instead of compiling the one supplied
1      with GNU sed.  In a Danish locale, for example, the regular
1      expression '^[a-z]$' matches the string 'aa', because this is a
1      single collating symbol that comes after 'a' and before 'b'; 'll'
1      behaves similarly in Spanish locales, or 'ij' in Dutch locales.
1 
1      To work around these problems, which may cause bugs in shell
1      scripts, set the 'LC_COLLATE' and 'LC_CTYPE' environment variables
1      to 'C'.
1 
1 's/.*//' does not clear pattern space
1 
1      This happens if your input stream includes invalid multibyte
1      sequences.  POSIX mandates that such sequences are _not_ matched by
1      '.', so that 's/.*//' will not clear pattern space as you would
1      expect.  In fact, there is no way to clear sed's buffers in the
1      middle of the script in most multibyte locales (including UTF-8
1      locales).  For this reason, GNU 'sed' provides a 'z' command (for
1      'zap') as an extension.
1 
1      To work around these problems, which may cause bugs in shell
1      scripts, set the 'LC_COLLATE' and 'LC_CTYPE' environment variables
1      to 'C'.
1 
1    ---------- Footnotes ----------
1 
1    (1) which is the actual "bug" that prompted the change in behavior
1