coreutils: Squeezing and deleting

1 
1 9.1.3 Squeezing repeats and deleting
1 ------------------------------------
1 
1 When given just the ‘--delete’ (‘-d’) option, ‘tr’ removes any input
1 characters that are in SET1.
1 
1    When given just the ‘--squeeze-repeats’ (‘-s’) option and not
1 translating, ‘tr’ replaces each input sequence of a repeated character
1 that is in SET1 with a single occurrence of that character.
1 
1    When given both ‘--delete’ and ‘--squeeze-repeats’, ‘tr’ first
1 performs any deletions using SET1, then squeezes repeats from any
1 remaining characters using SET2.
1 
1    The ‘--squeeze-repeats’ option may also be used when translating, in
1 which case ‘tr’ first performs translation, then squeezes repeats from
1 any remaining characters using SET2.
1 
1    Here are some examples to illustrate various combinations of options:
1 
1    • Remove all zero bytes:
1 
1           tr -d '\0'
1 
1    • Put all words on lines by themselves.  This converts all
1      non-alphanumeric characters to newlines, then squeezes each string
1      of repeated newlines into a single newline:
1 
1           tr -cs '[:alnum:]' '[\n*]'
1 
1    • Convert each sequence of repeated newlines to a single newline.
1      I.e., delete blank lines:
1 
1           tr -s '\n'
1 
1    • Find doubled occurrences of words in a document.  For example,
1      people often write “the the” with the repeated words separated by a
1      newline.  The Bourne shell script below works first by converting
1      each sequence of punctuation and blank characters to a single
1      newline.  That puts each “word” on a line by itself.  Next it maps
1      all uppercase characters to lower case, and finally it runs ‘uniq’
1      with the ‘-d’ option to print out only the words that were
1      repeated.
1 
1           #!/bin/sh
1           cat -- "$@" \
1             | tr -s '[:punct:][:blank:]' '[\n*]' \
1             | tr '[:upper:]' '[:lower:]' \
1             | uniq -d
1 
1    • Deleting a small set of characters is usually straightforward.  For
1      example, to remove all ‘a’s, ‘x’s, and ‘M’s you would do this:
1 
1           tr -d axM
1 
1      However, when ‘-’ is one of those characters, it can be tricky
1      because ‘-’ has special meanings.  Performing the same task as
1      above but also removing all ‘-’ characters, we might try ‘tr -d
1      -axM’, but that would fail because ‘tr’ would try to interpret ‘-a’
1      as a command-line option.  Alternatively, we could try putting the
1      hyphen inside the string, ‘tr -d a-xM’, but that wouldn’t work
1      either because it would make ‘tr’ interpret ‘a-x’ as the range of
1      characters ‘a’...‘x’ rather than the three.  One way to solve the
1      problem is to put the hyphen at the end of the list of characters:
1 
1           tr -d axM-
1 
1      Or you can use ‘--’ to terminate option processing:
1 
1           tr -d -- -axM
1 
1      More generally, use the character class notation ‘[=c=]’ with ‘-’
1      (or any other character) in place of the ‘c’:
1 
1           tr -d '[=-=]axM'
1 
1      Note how single quotes are used in the above example to protect the
1      square brackets from interpretation by a shell.
1