coreutils: Translating
1
1 9.1.2 Translating
1 -----------------
1
1 ‘tr’ performs translation when SET1 and SET2 are both given and the
1 ‘--delete’ (‘-d’) option is not given. ‘tr’ translates each character
1 of its input that is in SET1 to the corresponding character in SET2.
1 Characters not in SET1 are passed through unchanged. When a character
1 appears more than once in SET1 and the corresponding characters in SET2
1 are not all the same, only the final one is used. For example, these
1 two commands are equivalent:
1
1 tr aaa xyz
1 tr a z
1
1 A common use of ‘tr’ is to convert lowercase characters to uppercase.
1 This can be done in many ways. Here are three of them:
1
1 tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
1 tr a-z A-Z
1 tr '[:lower:]' '[:upper:]'
1
1 But note that using ranges like ‘a-z’ above is not portable.
1
1 When ‘tr’ is performing translation, SET1 and SET2 typically have the
1 same length. If SET1 is shorter than SET2, the extra characters at the
1 end of SET2 are ignored.
1
1 On the other hand, making SET1 longer than SET2 is not portable;
1 POSIX says that the result is undefined. In this situation, BSD ‘tr’
1 pads SET2 to the length of SET1 by repeating the last character of SET2
1 as many times as necessary. System V ‘tr’ truncates SET1 to the length
1 of SET2.
1
1 By default, GNU ‘tr’ handles this case like BSD ‘tr’. When the
1 ‘--truncate-set1’ (‘-t’) option is given, GNU ‘tr’ handles this case
1 like the System V ‘tr’ instead. This option is ignored for operations
1 other than translation.
1
1 Acting like System V ‘tr’ in this case breaks the relatively common
1 BSD idiom:
1
1 tr -cs A-Za-z0-9 '\012'
1
1 because it converts only zero bytes (the first element in the complement
1 of SET1), rather than all non-alphanumerics, to newlines.
1
1 By the way, the above idiom is not portable because it uses ranges, and
1 it assumes that the octal code for newline is 012. Assuming a POSIX
1 compliant ‘tr’, here is a better way to write it:
1
1 tr -cs '[:alnum:]' '[\n*]'
1