coreutils: Sorting files for join

1 
1 8.3.2 Pre-sorting
1 -----------------
1 
1 ‘join’ requires sorted input files.  Each input file should be sorted
1 according to the key (=field/column number) used in ‘join’.  The
1 recommended sorting option is ‘sort -k 1b,1’ (assuming the desired key
1 is in the first column).
1 
1 Typical usage:
1      $ sort -k 1b,1 file1 > file1.sorted
1      $ sort -k 1b,1 file2 > file2.sorted
1      $ join file1.sorted file2.sorted > file3
1 
1    Normally, the sort order is that of the collating sequence specified
1 by the ‘LC_COLLATE’ locale.  Unless the ‘-t’ option is given, the sort
1 comparison ignores blanks at the start of the join field, as in ‘sort
1 -b’.  If the ‘--ignore-case’ option is given, the sort comparison
1 ignores the case of characters in the join field, as in ‘sort -f’:
1 
1      $ sort -k 1bf,1 file1 > file1.sorted
1      $ sort -k 1bf,1 file2 > file2.sorted
1      $ join --ignore-case file1.sorted file2.sorted > file3
1 
1    The ‘sort’ and ‘join’ commands should use consistent locales and
1 options if the output of ‘sort’ is fed to ‘join’.  You can use a command
1 like ‘sort -k 1b,1’ to sort a file on its default join field, but if you
1 select a non-default locale, join field, separator, or comparison
1 options, then you should do so consistently between ‘join’ and ‘sort’.
1 
1 To avoid any locale-related issues, it is recommended to use the ‘C’
1 locale for both commands:
1 
1      $ LC_ALL=C sort -k 1b,1 file1 > file1.sorted
1      $ LC_ALL=C sort -k 1b,1 file2 > file2.sorted
1      $ LC_ALL=C join file1.sorted file2.sorted > file3
1