coreutils: sort invocation
1
1 7.1 ‘sort’: Sort text files
1 ===========================
1
1 ‘sort’ sorts, merges, or compares all the lines from the given files, or
1 standard input if none are given or for a FILE of ‘-’. By default,
1 ‘sort’ writes the results to standard output. Synopsis:
1
1 sort [OPTION]... [FILE]...
1
1 Many options affect how ‘sort’ compares lines; if the results are
1 unexpected, try the ‘--debug’ option to see what happened. A pair of
1 lines is compared as follows: ‘sort’ compares each pair of fields (see
1 ‘--key’), in the order specified on the command line, according to the
1 associated ordering options, until a difference is found or no fields
1 are left. If no key fields are specified, ‘sort’ uses a default key of
1 the entire line. Finally, as a last resort when all keys compare equal,
1 ‘sort’ compares entire lines as if no ordering options other than
1 ‘--reverse’ (‘-r’) were specified. The ‘--stable’ (‘-s’) option
1 disables this “last-resort comparison” so that lines in which all fields
1 compare equal are left in their original relative order. The ‘--unique’
1 (‘-u’) option also disables the last-resort comparison.
1
1 Unless otherwise specified, all comparisons use the character
1 collating sequence specified by the ‘LC_COLLATE’ locale.(1) A line’s
1 trailing newline is not part of the line for comparison purposes. If
1 the final byte of an input file is not a newline, GNU ‘sort’ silently
1 supplies one. GNU ‘sort’ (as specified for all GNU utilities) has no
1 limit on input line length or restrictions on bytes allowed within
1 lines.
1
1 ‘sort’ has three modes of operation: sort (the default), merge, and
1 check for sortedness. The following options change the operation mode:
1
1 ‘-c’
1 ‘--check’
1 ‘--check=diagnose-first’
1 Check whether the given file is already sorted: if it is not all
1 sorted, print a diagnostic containing the first out-of-order line
1 and exit with a status of 1. Otherwise, exit successfully. At
1 most one input file can be given.
1
1 ‘-C’
1 ‘--check=quiet’
1 ‘--check=silent’
1 Exit successfully if the given file is already sorted, and exit
1 with status 1 otherwise. At most one input file can be given.
1 This is like ‘-c’, except it does not print a diagnostic.
1
1 ‘-m’
1 ‘--merge’
1 Merge the given files by sorting them as a group. Each input file
1 must always be individually sorted. It always works to sort
1 instead of merge; merging is provided because it is faster, in the
1 case where it works.
1
1 Exit status:
1
1 0 if no error occurred
1 1 if invoked with ‘-c’ or ‘-C’ and the input is not sorted
1 2 if an error occurred
1
1 If the environment variable ‘TMPDIR’ is set, ‘sort’ uses its value as
1 the directory for temporary files instead of ‘/tmp’. The
1 ‘--temporary-directory’ (‘-T’) option in turn overrides the environment
1 variable.
1
1 The following options affect the ordering of output lines. They may
1 be specified globally or as part of a specific key field. If no key
1 fields are specified, global options apply to comparison of entire
1 lines; otherwise the global options are inherited by key fields that do
1 not specify any special options of their own. In pre-POSIX versions of
1 ‘sort’, global options affect only later key fields, so portable shell
1 scripts should specify global options first.
1
1 ‘-b’
1 ‘--ignore-leading-blanks’
1 Ignore leading blanks when finding sort keys in each line. By
1 default a blank is a space or a tab, but the ‘LC_CTYPE’ locale can
1 change this. Note blanks may be ignored by your locale’s collating
1 rules, but without this option they will be significant for
1 character positions specified in keys with the ‘-k’ option.
1
1 ‘-d’
1 ‘--dictionary-order’
1 Sort in “phone directory” order: ignore all characters except
1 letters, digits and blanks when sorting. By default letters and
1 digits are those of ASCII and a blank is a space or a tab, but the
1 ‘LC_CTYPE’ locale can change this.
1
1 ‘-f’
1 ‘--ignore-case’
1 Fold lowercase characters into the equivalent uppercase characters
1 when comparing so that, for example, ‘b’ and ‘B’ sort as equal.
1 The ‘LC_CTYPE’ locale determines character types. When used with
1 ‘--unique’ those lower case equivalent lines are thrown away.
1 (There is currently no way to throw away the upper case equivalent
1 instead. (Any ‘--reverse’ given would only affect the final
1 result, after the throwing away.))
1
1 ‘-g’
1 ‘--general-numeric-sort’
1 ‘--sort=general-numeric’
1 Sort numerically, converting a prefix of each line to a long
1 double-precision floating point number. ⇒Floating point.
1 Do not report overflow, underflow, or conversion errors. Use the
1 following collating sequence:
1
1 • Lines that do not start with numbers (all considered to be
1 equal).
1 • NaNs (“Not a Number” values, in IEEE floating point
1 arithmetic) in a consistent but machine-dependent order.
1 • Minus infinity.
1 • Finite numbers in ascending numeric order (with -0 and +0
1 equal).
1 • Plus infinity.
1
1 Use this option only if there is no alternative; it is much slower
1 than ‘--numeric-sort’ (‘-n’) and it can lose information when
1 converting to floating point.
1
1 ‘-h’
1 ‘--human-numeric-sort’
1 ‘--sort=human-numeric’
1 Sort numerically, first by numeric sign (negative, zero, or
1 positive); then by SI suffix (either empty, or ‘k’ or ‘K’, or one
1 of ‘MGTPEZY’, in that order; ⇒Block size); and finally by
1 numeric value. For example, ‘1023M’ sorts before ‘1G’ because ‘M’
1 (mega) precedes ‘G’ (giga) as an SI suffix. This option sorts
1 values that are consistently scaled to the nearest suffix,
1 regardless of whether suffixes denote powers of 1000 or 1024, and
1 it therefore sorts the output of any single invocation of the ‘df’,
1 ‘du’, or ‘ls’ commands that are invoked with their
1 ‘--human-readable’ or ‘--si’ options. The syntax for numbers is
1 the same as for the ‘--numeric-sort’ option; the SI suffix must
1 immediately follow the number. Note also the ‘numfmt’ command,
1 which can be used to reformat numbers to human format _after_ the
1 sort, thus often allowing sort to operate on more accurate numbers.
1
1 ‘-i’
1 ‘--ignore-nonprinting’
1 Ignore nonprinting characters. The ‘LC_CTYPE’ locale determines
1 character types. This option has no effect if the stronger
1 ‘--dictionary-order’ (‘-d’) option is also given.
1
1 ‘-M’
1 ‘--month-sort’
1 ‘--sort=month’
1 An initial string, consisting of any amount of blanks, followed by
1 a month name abbreviation, is folded to UPPER case and compared in
1 the order ‘JAN’ < ‘FEB’ < ... < ‘DEC’. Invalid names compare low
1 to valid names. The ‘LC_TIME’ locale category determines the month
1 spellings. By default a blank is a space or a tab, but the
1 ‘LC_CTYPE’ locale can change this.
1
1 ‘-n’
1 ‘--numeric-sort’
1 ‘--sort=numeric’
1 Sort numerically. The number begins each line and consists of
1 optional blanks, an optional ‘-’ sign, and zero or more digits
1 possibly separated by thousands separators, optionally followed by
1 a decimal-point character and zero or more digits. An empty number
1 is treated as ‘0’. The ‘LC_NUMERIC’ locale specifies the
1 decimal-point character and thousands separator. By default a
1 blank is a space or a tab, but the ‘LC_CTYPE’ locale can change
1 this.
1
1 Comparison is exact; there is no rounding error.
1
1 Neither a leading ‘+’ nor exponential notation is recognized. To
1 compare such strings numerically, use the ‘--general-numeric-sort’
1 (‘-g’) option.
1
1 ‘-V’
1 ‘--version-sort’
1 Sort by version name and number. It behaves like a standard sort,
1 except that each sequence of decimal digits is treated numerically
1 as an index/version number. (⇒Details about version sort.)
1
1 ‘-r’
1 ‘--reverse’
1 Reverse the result of comparison, so that lines with greater key
1 values appear earlier in the output instead of later.
1
1 ‘-R’
1 ‘--random-sort’
1 ‘--sort=random’
1 Sort by hashing the input keys and then sorting the hash values.
1 Choose the hash function at random, ensuring that it is free of
1 collisions so that differing keys have differing hash values. This
11 is like a random permutation of the inputs (⇒shuf
invocation), except that keys with the same value sort together.
1
1 If multiple random sort fields are specified, the same random hash
1 function is used for all fields. To use different random hash
1 functions for different fields, you can invoke ‘sort’ more than
1 once.
1
1 The choice of hash function is affected by the ‘--random-source’
1 option.
1
1 Other options are:
1
1 ‘--compress-program=PROG’
1 Compress any temporary files with the program PROG.
1
1 With no arguments, PROG must compress standard input to standard
1 output, and when given the ‘-d’ option it must decompress standard
1 input to standard output.
1
1 Terminate with an error if PROG exits with nonzero status.
1
1 White space and the backslash character should not appear in PROG;
1 they are reserved for future use.
1
1 ‘--files0-from=FILE’
1 Disallow processing files named on the command line, and instead
1 process those named in file FILE; each name being terminated by a
1 zero byte (ASCII NUL). This is useful when the list of file names
1 is so long that it may exceed a command line length limitation. In
1 such cases, running ‘sort’ via ‘xargs’ is undesirable because it
1 splits the list into pieces and makes ‘sort’ print sorted output
1 for each sublist rather than for the entire list. One way to
1 produce a list of ASCII NUL terminated file names is with GNU
1 ‘find’, using its ‘-print0’ predicate. If FILE is ‘-’ then the
1 ASCII NUL terminated file names are read from standard input.
1
1 ‘-k POS1[,POS2]’
1 ‘--key=POS1[,POS2]’
1 Specify a sort field that consists of the part of the line between
1 POS1 and POS2 (or the end of the line, if POS2 is omitted),
1 _inclusive_.
1
1 In its simplest form POS specifies a field number (starting with
1 1), with fields being separated by runs of blank characters, and by
1 default those blanks being included in the comparison at the start
1 of each field. To adjust the handling of blank characters see the
1 ‘-b’ and ‘-t’ options.
1
1 More generally, each POS has the form ‘F[.C][OPTS]’, where F is the
1 number of the field to use, and C is the number of the first
1 character from the beginning of the field. Fields and character
1 positions are numbered starting with 1; a character position of
1 zero in POS2 indicates the field’s last character. If ‘.C’ is
1 omitted from POS1, it defaults to 1 (the beginning of the field);
1 if omitted from POS2, it defaults to 0 (the end of the field).
1 OPTS are ordering options, allowing individual keys to be sorted
1 according to different rules; see below for details. Keys can span
1 multiple fields.
1
1 Example: To sort on the second field, use ‘--key=2,2’ (‘-k 2,2’).
1 See below for more notes on keys and more examples. See also the
1 ‘--debug’ option to help determine the part of the line being used
1 in the sort.
1
1 ‘--debug’
1 Highlight the portion of each line used for sorting. Also issue
1 warnings about questionable usage to stderr.
1
1 ‘--batch-size=NMERGE’
1 Merge at most NMERGE inputs at once.
1
1 When ‘sort’ has to merge more than NMERGE inputs, it merges them in
1 groups of NMERGE, saving the result in a temporary file, which is
1 then used as an input in a subsequent merge.
1
1 A large value of NMERGE may improve merge performance and decrease
1 temporary storage utilization at the expense of increased memory
1 usage and I/O. Conversely a small value of NMERGE may reduce
1 memory requirements and I/O at the expense of temporary storage
1 consumption and merge performance.
1
1 The value of NMERGE must be at least 2. The default value is
1 currently 16, but this is implementation-dependent and may change
1 in the future.
1
1 The value of NMERGE may be bounded by a resource limit for open
1 file descriptors. The commands ‘ulimit -n’ or ‘getconf OPEN_MAX’
1 may display limits for your systems; these limits may be modified
1 further if your program already has some files open, or if the
1 operating system has other limits on the number of open files. If
1 the value of NMERGE exceeds the resource limit, ‘sort’ silently
1 uses a smaller value.
1
1 ‘-o OUTPUT-FILE’
1 ‘--output=OUTPUT-FILE’
1 Write output to OUTPUT-FILE instead of standard output. Normally,
1 ‘sort’ reads all input before opening OUTPUT-FILE, so you can sort
1 a file in place by using commands like ‘sort -o F F’ and ‘cat F |
1 sort -o F’. However, it is often safer to output to an
1 otherwise-unused file, as data may be lost if the system crashes or
1 ‘sort’ encounters an I/O or other serious error while a file is
1 being sorted in place. Also, ‘sort’ with ‘--merge’ (‘-m’) can open
1 the output file before reading all input, so a command like ‘cat F
1 | sort -m -o F - G’ is not safe as ‘sort’ might start writing ‘F’
1 before ‘cat’ is done reading it.
1
1 On newer systems, ‘-o’ cannot appear after an input file if
1 ‘POSIXLY_CORRECT’ is set, e.g., ‘sort F -o F’. Portable scripts
1 should specify ‘-o OUTPUT-FILE’ before any input files.
1
1 ‘--random-source=FILE’
1 Use FILE as a source of random data used to determine which random
1 hash function to use with the ‘-R’ option. ⇒Random sources.
1
1 ‘-s’
1 ‘--stable’
1
1 Make ‘sort’ stable by disabling its last-resort comparison. This
1 option has no effect if no fields or global ordering options other
1 than ‘--reverse’ (‘-r’) are specified.
1
1 ‘-S SIZE’
1 ‘--buffer-size=SIZE’
1 Use a main-memory sort buffer of the given SIZE. By default, SIZE
1 is in units of 1024 bytes. Appending ‘%’ causes SIZE to be
1 interpreted as a percentage of physical memory. Appending ‘K’
1 multiplies SIZE by 1024 (the default), ‘M’ by 1,048,576, ‘G’ by
1 1,073,741,824, and so on for ‘T’, ‘P’, ‘E’, ‘Z’, and ‘Y’.
1 Appending ‘b’ causes SIZE to be interpreted as a byte count, with
1 no multiplication.
1
1 This option can improve the performance of ‘sort’ by causing it to
1 start with a larger or smaller sort buffer than the default.
1 However, this option affects only the initial buffer size. The
1 buffer grows beyond SIZE if ‘sort’ encounters input lines larger
1 than SIZE.
1
1 ‘-t SEPARATOR’
1 ‘--field-separator=SEPARATOR’
1 Use character SEPARATOR as the field separator when finding the
1 sort keys in each line. By default, fields are separated by the
1 empty string between a non-blank character and a blank character.
1 By default a blank is a space or a tab, but the ‘LC_CTYPE’ locale
1 can change this.
1
1 That is, given the input line ‘ foo bar’, ‘sort’ breaks it into
1 fields ‘ foo’ and ‘ bar’. The field separator is not considered to
1 be part of either the field preceding or the field following, so
1 with ‘sort -t " "’ the same input line has three fields: an empty
1 field, ‘foo’, and ‘bar’. However, fields that extend to the end of
1 the line, as ‘-k 2’, or fields consisting of a range, as ‘-k 2,3’,
1 retain the field separators present between the endpoints of the
1 range.
1
1 To specify ASCII NUL as the field separator, use the two-character
1 string ‘\0’, e.g., ‘sort -t '\0'’.
1
1 ‘-T TEMPDIR’
1 ‘--temporary-directory=TEMPDIR’
1 Use directory TEMPDIR to store temporary files, overriding the
1 ‘TMPDIR’ environment variable. If this option is given more than
1 once, temporary files are stored in all the directories given. If
1 you have a large sort or merge that is I/O-bound, you can often
1 improve performance by using this option to specify directories on
1 different disks and controllers.
1
1 ‘--parallel=N’
1 Set the number of sorts run in parallel to N. By default, N is set
1 to the number of available processors, but limited to 8, as there
1 are diminishing performance gains after that. Note also that using
1 N threads increases the memory usage by a factor of log N. Also
1 see ⇒nproc invocation.
1
1 ‘-u’
1 ‘--unique’
1
1 Normally, output only the first of a sequence of lines that compare
1 equal. For the ‘--check’ (‘-c’ or ‘-C’) option, check that no pair
1 of consecutive lines compares equal.
1
1 This option also disables the default last-resort comparison.
1
1 The commands ‘sort -u’ and ‘sort | uniq’ are equivalent, but this
1 equivalence does not extend to arbitrary ‘sort’ options. For
1 example, ‘sort -n -u’ inspects only the value of the initial
1 numeric string when checking for uniqueness, whereas ‘sort -n |
1 uniq’ inspects the entire line. ⇒uniq invocation.
1
1 ‘-z’
1 ‘--zero-terminated’
1 Delimit items with a zero byte rather than a newline (ASCII LF).
1 I.e., treat input as items separated by ASCII NUL and terminate
1 output items with ASCII NUL. This option can be useful in
1 conjunction with ‘perl -0’ or ‘find -print0’ and ‘xargs -0’ which
1 do the same in order to reliably handle arbitrary file names (even
1 those containing blanks or other special characters).
1
1 Historical (BSD and System V) implementations of ‘sort’ have differed
1 in their interpretation of some options, particularly ‘-b’, ‘-f’, and
1 ‘-n’. GNU sort follows the POSIX behavior, which is usually (but not
1 always!) like the System V behavior. According to POSIX, ‘-n’ no
1 longer implies ‘-b’. For consistency, ‘-M’ has been changed in the same
1 way. This may affect the meaning of character positions in field
1 specifications in obscure cases. The only fix is to add an explicit
1 ‘-b’.
1
1 A position in a sort field specified with ‘-k’ may have any of the
1 option letters ‘MbdfghinRrV’ appended to it, in which case no global
1 ordering options are inherited by that particular field. The ‘-b’
1 option may be independently attached to either or both of the start and
1 end positions of a field specification, and if it is inherited from the
1 global options it will be attached to both. If input lines can contain
1 leading or adjacent blanks and ‘-t’ is not used, then ‘-k’ is typically
1 combined with ‘-b’ or an option that implicitly ignores leading blanks
1 (‘Mghn’) as otherwise the varying numbers of leading blanks in fields
1 can cause confusing results.
1
1 If the start position in a sort field specifier falls after the end
1 of the line or after the end field, the field is empty. If the ‘-b’
1 option was specified, the ‘.C’ part of a field specification is counted
1 from the first nonblank character of the field.
1
1 On systems not conforming to POSIX 1003.1-2001, ‘sort’ supports a
1 traditional origin-zero syntax ‘+POS1 [-POS2]’ for specifying sort keys.
1 The traditional command ‘sort +A.X -B.Y’ is equivalent to ‘sort -k
1 A+1.X+1,B’ if Y is ‘0’ or absent, otherwise it is equivalent to ‘sort -k
1 A+1.X+1,B+1.Y’.
1
1 This traditional behavior can be controlled with the
1 ‘_POSIX2_VERSION’ environment variable (⇒Standards conformance);
1 it can also be enabled when ‘POSIXLY_CORRECT’ is not set by using the
1 traditional syntax with ‘-POS2’ present.
1
1 Scripts intended for use on standard hosts should avoid traditional
1 syntax and should use ‘-k’ instead. For example, avoid ‘sort +2’, since
1 it might be interpreted as either ‘sort ./+2’ or ‘sort -k 3’. If your
1 script must also run on hosts that support only the traditional syntax,
1 it can use a test like ‘if sort -k 1 </dev/null >/dev/null 2>&1; then
1 ...’ to decide which syntax to use.
1
1 Here are some examples to illustrate various combinations of options.
1
1 • Sort in descending (reverse) numeric order.
1
1 sort -n -r
1
1 • Run no more than 4 sorts concurrently, using a buffer size of 10M.
1
1 sort --parallel=4 -S 10M
1
1 • Sort alphabetically, omitting the first and second fields and the
1 blanks at the start of the third field. This uses a single key
1 composed of the characters beginning at the start of the first
1 nonblank character in field three and extending to the end of each
1 line.
1
1 sort -k 3b
1
1 • Sort numerically on the second field and resolve ties by sorting
1 alphabetically on the third and fourth characters of field five.
1 Use ‘:’ as the field delimiter.
1
1 sort -t : -k 2,2n -k 5.3,5.4
1
1 Note that if you had written ‘-k 2n’ instead of ‘-k 2,2n’ ‘sort’
1 would have used all characters beginning in the second field and
1 extending to the end of the line as the primary _numeric_ key. For
1 the large majority of applications, treating keys spanning more
1 than one field as numeric will not do what you expect.
1
1 Also note that the ‘n’ modifier was applied to the field-end
1 specifier for the first key. It would have been equivalent to
1 specify ‘-k 2n,2’ or ‘-k 2n,2n’. All modifiers except ‘b’ apply to
1 the associated _field_, regardless of whether the modifier
1 character is attached to the field-start and/or the field-end part
1 of the key specifier.
1
1 • Sort the password file on the fifth field and ignore any leading
1 blanks. Sort lines with equal values in field five on the numeric
1 user ID in field three. Fields are separated by ‘:’.
1
1 sort -t : -k 5b,5 -k 3,3n /etc/passwd
1 sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
1 sort -t : -b -k 5,5 -k 3,3n /etc/passwd
1
1 These three commands have equivalent effect. The first specifies
1 that the first key’s start position ignores leading blanks and the
1 second key is sorted numerically. The other two commands rely on
1 global options being inherited by sort keys that lack modifiers.
1 The inheritance works in this case because ‘-k 5b,5b’ and ‘-k 5b,5’
1 are equivalent, as the location of a field-end lacking a ‘.C’
1 character position is not affected by whether initial blanks are
1 skipped.
1
1 • Sort a set of log files, primarily by IPv4 address and secondarily
1 by timestamp. If two lines’ primary and secondary keys are
1 identical, output the lines in the same order that they were input.
1 The log files contain lines that look like this:
1
1 4.150.156.3 - - [01/Apr/2004:06:31:51 +0000] message 1
1 211.24.3.231 - - [24/Apr/2004:20:17:39 +0000] message 2
1
1 Fields are separated by exactly one space. Sort IPv4 addresses
1 lexicographically, e.g., 212.61.52.2 sorts before 212.129.233.201
1 because 61 is less than 129.
1
1 sort -s -t ' ' -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 file*.log |
1 sort -s -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n
1
1 This example cannot be done with a single ‘sort’ invocation, since
1 IPv4 address components are separated by ‘.’ while dates come just
1 after a space. So it is broken down into two invocations of
1 ‘sort’: the first sorts by timestamp and the second by IPv4
1 address. The timestamp is sorted by year, then month, then day,
1 and finally by hour-minute-second field, using ‘-k’ to isolate each
1 field. Except for hour-minute-second there’s no need to specify
1 the end of each key field, since the ‘n’ and ‘M’ modifiers sort
1 based on leading prefixes that cannot cross field boundaries. The
1 IPv4 addresses are sorted lexicographically. The second sort uses
1 ‘-s’ so that ties in the primary key are broken by the secondary
1 key; the first sort uses ‘-s’ so that the combination of the two
1 sorts is stable.
1
1 • Generate a tags file in case-insensitive sorted order.
1
1 find src -type f -print0 | sort -z -f | xargs -0 etags --append
1
1 The use of ‘-print0’, ‘-z’, and ‘-0’ in this case means that file
1 names that contain blanks or other special characters are not
1 broken up by the sort operation.
1
1 • Use the common DSU, Decorate Sort Undecorate idiom to sort lines
1 according to their length.
1
1 awk '{print length, $0}' /etc/passwd | sort -n | cut -f2- -d' '
1
1 In general this technique can be used to sort data that the ‘sort’
1 command does not support, or is inefficient at, sorting directly.
1
1 • Shuffle a list of directories, but preserve the order of files
1 within each directory. For instance, one could use this to
1 generate a music playlist in which albums are shuffled but the
1 songs of each album are played in order.
1
1 ls */* | sort -t / -k 1,1R -k 2,2
1
1 ---------- Footnotes ----------
1
1 (1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to
1 ‘en_US’), then ‘sort’ may produce output that is sorted differently than
1 you’re accustomed to. In that case, set the ‘LC_ALL’ environment
1 variable to ‘C’. Note that setting only ‘LC_COLLATE’ has two problems.
1 First, it is ineffective if ‘LC_ALL’ is also set. Second, it has
1 undefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is unset) is
1 set to an incompatible value. For example, you get undefined behavior
1 if ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is ‘en_US.UTF-8’.
1