sed: Branching and flow control

1 
1 6.4 Branching and Flow Control
1 ==============================
1 
1 The branching commands 'b', 't', and 'T' enable changing the flow of
1 'sed' programs.
1 
1    By default, 'sed' reads an input line into the pattern buffer, then
1 continues to processes all commands in order.  Commands without
1 addresses affect all lines.  Commands with addresses affect only
1 matching lines.  ⇒Execution Cycle and ⇒Addresses overview.
1 
1    'sed' does not support a typical 'if/then' construct.  Instead, some
1 commands can be used as conditionals or to change the default flow
1 control:
1 
1 'd'
1      delete (clears) the current pattern space, and restart the program
1      cycle without processing the rest of the commands and without
1      printing the pattern space.
1 
1 'D'
1      delete the contents of the pattern space _up to the first newline_,
1      and restart the program cycle without processing the rest of the
1      commands and without printing the pattern space.
1 
1 '[addr]X'
1 '[addr]{ X ; X ; X }'
1 '/regexp/X'
1 '/regexp/{ X ; X ; X }'
1      Addresses and regular expressions can be used as an 'if/then'
1      conditional: If [ADDR] matches the current pattern space, execute
1      the command(s).  For example: The command '/^#/d' means: _if_ the
1      current pattern matches the regular expression '^#' (a line
1      starting with a hash), _then_ execute the 'd' command: delete the
1      line without printing it, and restart the program cycle
1      immediately.
1 
1 'b'
1      branch unconditionally (that is: always jump to a label, skipping
1      or repeating other commands, without restarting a new cycle).
1      Combined with an address, the branch can be conditionally executed
1      on matched lines.
1 
1 't'
1      branch conditionally (that is: jump to a label) _only if_ a 's///'
1      command has succeeded since the last input line was read or another
1      conditional branch was taken.
1 
1 'T'
1      similar but opposite to the 't' command: branch only if there has
1      been _no_ successful substitutions since the last input line was
1      read.
1 
1    The following two 'sed' programs are equivalent.  The first
1 (contrived) example uses the 'b' command to skip the 's///' command on
1 lines containing '1'.  The second example uses an address with negation
1 ('!') to perform substitution only on desired lines.  The 'y///' command
1 is still executed on all lines:
1 
1      $ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
1      a4
1      z5
1      z6
1 
1      $ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
1      a4
1      z5
1      z6
1 
1 6.4.1 Branching and Cycles
1 --------------------------
1 
1 The 'b','t' and 'T' commands can be followed by a label (typically a
1 single letter).  Labels are defined with a colon followed by one or more
1 letters (e.g.  ':x').  If the label is omitted the branch commands
1 restart the cycle.  Note the difference between branching to a label and
1 restarting the cycle: when a cycle is restarted, 'sed' first prints the
1 current content of the pattern space, then reads the next input line
1 into the pattern space; Jumping to a label (even if it is at the
1 beginning of the program) does not print the pattern space and does not
1 read the next input line.
1 
1    The following program is a no-op.  The 'b' command (the only command
1 in the program) does not have a label, and thus simply restarts the
1 cycle.  On each cycle, the pattern space is printed and the next input
1 line is read:
1 
1      $ seq 3 | sed b
1      1
1      2
1      3
1 
1    The following example is an infinite-loop - it doesn't terminate and
1 doesn't print anything.  The 'b' command jumps to the 'x' label, and a
1 new cycle is never started:
1 
1      $ seq 3 | sed ':x ; bx'
1 
1      # The above command requires gnu sed (which supports additional
1      # commands following a label, without a newline). A portable equivalent:
1      #     sed -e ':x' -e bx
1 
1    Branching is often complemented with the 'n' or 'N' commands: both
1 commands read the next input line into the pattern space without waiting
1 for the cycle to restart.  Before reading the next input line, 'n'
1 prints the current pattern space then empties it, while 'N' appends a
1 newline and the next input line to the pattern space.
1 
1    Consider the following two examples:
1 
1      $ seq 3 | sed ':x ; n ; bx'
1      1
1      2
1      3
1 
1      $ seq 3 | sed ':x ; N ; bx'
1      1
1      2
1      3
1 
1    * Both examples do not inf-loop, despite never starting a new cycle.
1 
1    * In the first example, the 'n' commands first prints the content of
1      the pattern space, empties the pattern space then reads the next
1      input line.
1 
1    * In the second example, the 'N' commands appends the next input line
1      to the pattern space (with a newline).  Lines are accumulated in
1      the pattern space until there are no more input lines to read, then
1      the 'N' command terminates the 'sed' program.  When the program
1      terminates, the end-of-cycle actions are performed, and the entire
1      pattern space is printed.
1 
1    * The second example requires GNU 'sed', because it uses the
1      non-POSIX-standard behavior of 'N'.  See the "'N' command on the
1      last line" paragraph in ⇒Reporting Bugs.
1 
1    * To further examine the difference between the two examples, try the
1      following commands:
1           printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
1           printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
1           printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
1           printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
1 
1 6.4.2 Branching example: joining lines
1 --------------------------------------
1 
1 As a real-world example of using branching, consider the case of
1 quoted-printable (https://en.wikipedia.org/wiki/Quoted-printable) files,
1 typically used to encode email messages.  In these files long lines are
1 split and marked with a "soft line break" consisting of a single '='
1 character at the end of the line:
1 
1      $ cat jaques.txt
1      All the wor=
1      ld's a stag=
1      e,
1      And all the=
1       men and wo=
1      men merely =
1      players:
1      They have t=
1      heir exits =
1      and their e=
1      ntrances;
1      And one man=
1       in his tim=
1      e plays man=
1      y parts.
1 
1    The following program uses an address match '/=$/' as a conditional:
1 If the current pattern space ends with a '=', it reads the next input
1 line using 'N', replaces all '=' characters which are followed by a
1 newline, and unconditionally branches ('b') to the beginning of the
1 program without restarting a new cycle.  If the pattern space does not
1 ends with '=', the default action is performed: the pattern space is
1 printed and a new cycle is started:
1 
1      $ sed ':x ; /=$/ { N ; s/=\n//g ; bx }' jaques.txt
1      All the world's a stage,
1      And all the men and women merely players:
1      They have their exits and their entrances;
1      And one man in his time plays many parts.
1 
1    Here's an alternative program with a slightly different approach: On
1 all lines except the last, 'N' appends the line to the pattern space.  A
1 substitution command then removes soft line breaks ('=' at the end of a
1 line, i.e.  followed by a newline) by replacing them with an empty
1 string.  _if_ the substitution was successful (meaning the pattern space
1 contained a line which should be joined), The conditional branch command
1 't' jumps to the beginning of the program without completing or
1 restarting the cycle.  If the substitution failed (meaning there were no
1 soft line breaks), The 't' command will _not_ branch.  Then, 'P' will
1 print the pattern space content until the first newline, and 'D' will
1 delete the pattern space content until the first new line.  (To learn
1 more about 'N', 'P' and 'D' commands ⇒Multiline techniques).
1 
1      $ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
1      All the world's a stage,
1      And all the men and women merely players:
1      They have their exits and their entrances;
1      And one man in his time plays many parts.
1 
1    For more line-joining examples ⇒Joining lines.
1