coreutils: Squeezing and deleting
1
1 9.1.3 Squeezing repeats and deleting
1 ------------------------------------
1
1 When given just the ‘--delete’ (‘-d’) option, ‘tr’ removes any input
1 characters that are in SET1.
1
1 When given just the ‘--squeeze-repeats’ (‘-s’) option and not
1 translating, ‘tr’ replaces each input sequence of a repeated character
1 that is in SET1 with a single occurrence of that character.
1
1 When given both ‘--delete’ and ‘--squeeze-repeats’, ‘tr’ first
1 performs any deletions using SET1, then squeezes repeats from any
1 remaining characters using SET2.
1
1 The ‘--squeeze-repeats’ option may also be used when translating, in
1 which case ‘tr’ first performs translation, then squeezes repeats from
1 any remaining characters using SET2.
1
1 Here are some examples to illustrate various combinations of options:
1
1 • Remove all zero bytes:
1
1 tr -d '\0'
1
1 • Put all words on lines by themselves. This converts all
1 non-alphanumeric characters to newlines, then squeezes each string
1 of repeated newlines into a single newline:
1
1 tr -cs '[:alnum:]' '[\n*]'
1
1 • Convert each sequence of repeated newlines to a single newline.
1 I.e., delete blank lines:
1
1 tr -s '\n'
1
1 • Find doubled occurrences of words in a document. For example,
1 people often write “the the” with the repeated words separated by a
1 newline. The Bourne shell script below works first by converting
1 each sequence of punctuation and blank characters to a single
1 newline. That puts each “word” on a line by itself. Next it maps
1 all uppercase characters to lower case, and finally it runs ‘uniq’
1 with the ‘-d’ option to print out only the words that were
1 repeated.
1
1 #!/bin/sh
1 cat -- "$@" \
1 | tr -s '[:punct:][:blank:]' '[\n*]' \
1 | tr '[:upper:]' '[:lower:]' \
1 | uniq -d
1
1 • Deleting a small set of characters is usually straightforward. For
1 example, to remove all ‘a’s, ‘x’s, and ‘M’s you would do this:
1
1 tr -d axM
1
1 However, when ‘-’ is one of those characters, it can be tricky
1 because ‘-’ has special meanings. Performing the same task as
1 above but also removing all ‘-’ characters, we might try ‘tr -d
1 -axM’, but that would fail because ‘tr’ would try to interpret ‘-a’
1 as a command-line option. Alternatively, we could try putting the
1 hyphen inside the string, ‘tr -d a-xM’, but that wouldn’t work
1 either because it would make ‘tr’ interpret ‘a-x’ as the range of
1 characters ‘a’...‘x’ rather than the three. One way to solve the
1 problem is to put the hyphen at the end of the list of characters:
1
1 tr -d axM-
1
1 Or you can use ‘--’ to terminate option processing:
1
1 tr -d -- -axM
1
1 More generally, use the character class notation ‘[=c=]’ with ‘-’
1 (or any other character) in place of the ‘c’:
1
1 tr -d '[=-=]axM'
1
1 Note how single quotes are used in the above example to protect the
1 square brackets from interpretation by a shell.
1