coreutils: split invocation
1
1 5.3 ‘split’: Split a file into pieces.
1 ======================================
1
1 ‘split’ creates output files containing consecutive or interleaved
1 sections of INPUT (standard input if none is given or INPUT is ‘-’).
1 Synopsis:
1
1 split [OPTION] [INPUT [PREFIX]]
1
1 By default, ‘split’ puts 1000 lines of INPUT (or whatever is left
1 over for the last section), into each output file.
1
1 The output files’ names consist of PREFIX (‘x’ by default) followed
1 by a group of characters (‘aa’, ‘ab’, ... by default), such that
1 concatenating the output files in traditional sorted order by file name
1 produces the original input file (except ‘-nr/N’). By default split
1 will initially create files with two generated suffix characters, and
1 will increase this width by two when the next most significant position
1 reaches the last character. (‘yz’, ‘zaaa’, ‘zaab’, ...). In this way
1 an arbitrary number of output files are supported, which sort as
1 described above, even in the presence of an ‘--additional-suffix’
1 option. If the ‘-a’ option is specified and the output file names are
1 exhausted, ‘split’ reports an error without deleting the output files
1 that it did create.
1
11 The program accepts the following options. Also see ⇒Common
options.
1
1 ‘-l LINES’
1 ‘--lines=LINES’
1 Put LINES lines of INPUT into each output file. If ‘--separator’
1 is specified, then LINES determines the number of records.
1
1 For compatibility ‘split’ also supports an obsolete option syntax
1 ‘-LINES’. New scripts should use ‘-l LINES’ instead.
1
1 ‘-b SIZE’
1 ‘--bytes=SIZE’
1 Put SIZE bytes of INPUT into each output file. SIZE may be, or may
1 be an integer optionally followed by, one of the following
1 multiplicative suffixes:
1 ‘b’ => 512 ("blocks")
1 ‘KB’ => 1000 (KiloBytes)
1 ‘K’ => 1024 (KibiBytes)
1 ‘MB’ => 1000*1000 (MegaBytes)
1 ‘M’ => 1024*1024 (MebiBytes)
1 ‘GB’ => 1000*1000*1000 (GigaBytes)
1 ‘G’ => 1024*1024*1024 (GibiBytes)
1 and so on for ‘T’, ‘P’, ‘E’, ‘Z’, and ‘Y’.
1
1 ‘-C SIZE’
1 ‘--line-bytes=SIZE’
1 Put into each output file as many complete lines of INPUT as
1 possible without exceeding SIZE bytes. Individual lines or records
1 longer than SIZE bytes are broken into multiple files. SIZE has
1 the same format as for the ‘--bytes’ option. If ‘--separator’ is
1 specified, then LINES determines the number of records.
1
1 ‘--filter=COMMAND’
1 With this option, rather than simply writing to each output file,
1 write through a pipe to the specified shell COMMAND for each output
1 file. COMMAND should use the $FILE environment variable, which is
1 set to a different output file name for each invocation of the
1 command. For example, imagine that you have a 1TiB compressed file
1 that, if uncompressed, would be too large to reside on disk, yet
1 you must split it into individually-compressed pieces of a more
1 manageable size. To do that, you might run this command:
1
1 xz -dc BIG.xz | split -b200G --filter='xz > $FILE.xz' - big-
1
1 Assuming a 10:1 compression ratio, that would create about fifty
1 20GiB files with names ‘big-aa.xz’, ‘big-ab.xz’, ‘big-ac.xz’, etc.
1
1 ‘-n CHUNKS’
1 ‘--number=CHUNKS’
1
1 Split INPUT to CHUNKS output files where CHUNKS may be:
1
1 N generate N files based on current size of INPUT
1 K/N only output Kth of N to stdout
1 l/N generate N files without splitting lines or records
1 l/K/N likewise but only output Kth of N to stdout
1 r/N like ‘l’ but use round robin distribution
1 r/K/N likewise but only output Kth of N to stdout
1
1 Any excess bytes remaining after dividing the INPUT into N chunks,
1 are assigned to the last chunk. Any excess bytes appearing after
1 the initial calculation are discarded (except when using ‘r’ mode).
1
1 All N files are created even if there are fewer than N lines, or
1 the INPUT is truncated.
1
1 For ‘l’ mode, chunks are approximately INPUT size / N. The INPUT
1 is partitioned into N equal sized portions, with the last assigned
1 any excess. If a line _starts_ within a partition it is written
1 completely to the corresponding file. Since lines or records are
1 not split even if they overlap a partition, the files written can
1 be larger or smaller than the partition size, and even empty if a
1 line/record is so long as to completely overlap the partition.
1
1 For ‘r’ mode, the size of INPUT is irrelevant, and so can be a pipe
1 for example.
1
1 ‘-a LENGTH’
1 ‘--suffix-length=LENGTH’
1 Use suffixes of length LENGTH. If a LENGTH of 0 is specified, this
1 is the same as if (any previous) ‘-a’ was not specified, and thus
1 enables the default behavior, which starts the suffix length at 2,
1 and unless ‘-n’ or ‘--numeric-suffixes=FROM’ is specified, will
1 auto increase the length by 2 as required.
1
1 ‘-d’
1 ‘--numeric-suffixes[=FROM]’
1 Use digits in suffixes rather than lower-case letters. The
1 numerical suffix counts from FROM if specified, 0 otherwise.
1
1 FROM is supported with the long form option, and is used to either
1 set the initial suffix for a single run, or to set the suffix
1 offset for independently split inputs, and consequently the auto
1 suffix length expansion described above is disabled. Therefore you
1 may also want to use option ‘-a’ to allow suffixes beyond ‘99’.
1 Note if option ‘--number’ is specified and the number of files is
1 less than FROM, a single run is assumed and the minimum suffix
1 length required is automatically determined.
1
1 ‘-x’
1 ‘--hex-suffixes[=FROM]’
1 Like ‘--numeric-suffixes’, but use hexadecimal numbers (in lower
1 case).
1
1 ‘--additional-suffix=SUFFIX’
1 Append an additional SUFFIX to output file names. SUFFIX must not
1 contain slash.
1
1 ‘-e’
1 ‘--elide-empty-files’
1 Suppress the generation of zero-length output files. This can
1 happen with the ‘--number’ option if a file is (truncated to be)
1 shorter than the number requested, or if a line is so long as to
1 completely span a chunk. The output file sequence numbers, always
1 run consecutively even when this option is specified.
1
1 ‘-t SEPARATOR’
1 ‘--separator=SEPARATOR’
1 Use character SEPARATOR as the record separator instead of the
1 default newline character (ASCII LF). To specify ASCII NUL as the
1 separator, use the two-character string ‘\0’, e.g., ‘split -t
1 '\0'’.
1
1 ‘-u’
1 ‘--unbuffered’
1 Immediately copy input to output in ‘--number r/...’ mode, which is
1 a much slower mode of operation.
1
1 ‘--verbose’
1 Write a diagnostic just before each output file is opened.
1
1 An exit status of zero indicates success, and a nonzero value
1 indicates failure.
1
1 Here are a few examples to illustrate how the ‘--number’ (‘-n’)
1 option works:
1
1 Notice how, by default, one line may be split onto two or more:
1
1 $ seq -w 6 10 > k; split -n3 k; head xa?
1 ==> xaa <==
1 06
1 07
1 ==> xab <==
1
1 08
1 0
1 ==> xac <==
1 9
1 10
1
1 Use the "l/" modifier to suppress that:
1
1 $ seq -w 6 10 > k; split -nl/3 k; head xa?
1 ==> xaa <==
1 06
1 07
1
1 ==> xab <==
1 08
1 09
1
1 ==> xac <==
1 10
1
1 Use the "r/" modifier to distribute lines in a round-robin fashion:
1
1 $ seq -w 6 10 > k; split -nr/3 k; head xa?
1 ==> xaa <==
1 06
1 09
1
1 ==> xab <==
1 07
1 10
1
1 ==> xac <==
1 08
1
1 You can also extract just the Kth chunk. This extracts and prints
1 just the 7th "chunk" of 33:
1
1 $ seq 100 > k; split -nl/7/33 k
1 20
1 21
1 22
1