Info: (coreutils) split invocation

⇖ Info Catalog ← coreutils: tail invocation ↑ coreutils: Output of parts of files → coreutils: csplit invocation
coreutils: split invocation

1 
1 5.3 ‘split’: Split a file into pieces.
1 ======================================
1 
1 ‘split’ creates output files containing consecutive or interleaved
1 sections of INPUT (standard input if none is given or INPUT is ‘-’).
1 Synopsis:
1 
1      split [OPTION] [INPUT [PREFIX]]
1 
1    By default, ‘split’ puts 1000 lines of INPUT (or whatever is left
1 over for the last section), into each output file.
1 
1    The output files’ names consist of PREFIX (‘x’ by default) followed
1 by a group of characters (‘aa’, ‘ab’, ... by default), such that
1 concatenating the output files in traditional sorted order by file name
1 produces the original input file (except ‘-nr/N’).  By default split
1 will initially create files with two generated suffix characters, and
1 will increase this width by two when the next most significant position
1 reaches the last character.  (‘yz’, ‘zaaa’, ‘zaab’, ...).  In this way
1 an arbitrary number of output files are supported, which sort as
1 described above, even in the presence of an ‘--additional-suffix’
1 option.  If the ‘-a’ option is specified and the output file names are
1 exhausted, ‘split’ reports an error without deleting the output files
1 that it did create.
1 
11    The program accepts the following options.  Also see ⇒Common
 options.
1 
1 ‘-l LINES’
1 ‘--lines=LINES’
1      Put LINES lines of INPUT into each output file.  If ‘--separator’
1      is specified, then LINES determines the number of records.
1 
1      For compatibility ‘split’ also supports an obsolete option syntax
1      ‘-LINES’.  New scripts should use ‘-l LINES’ instead.
1 
1 ‘-b SIZE’
1 ‘--bytes=SIZE’
1      Put SIZE bytes of INPUT into each output file.  SIZE may be, or may
1      be an integer optionally followed by, one of the following
1      multiplicative suffixes:
1           ‘b’  =>            512 ("blocks")
1           ‘KB’ =>           1000 (KiloBytes)
1           ‘K’  =>           1024 (KibiBytes)
1           ‘MB’ =>      1000*1000 (MegaBytes)
1           ‘M’  =>      1024*1024 (MebiBytes)
1           ‘GB’ => 1000*1000*1000 (GigaBytes)
1           ‘G’  => 1024*1024*1024 (GibiBytes)
1      and so on for ‘T’, ‘P’, ‘E’, ‘Z’, and ‘Y’.
1 
1 ‘-C SIZE’
1 ‘--line-bytes=SIZE’
1      Put into each output file as many complete lines of INPUT as
1      possible without exceeding SIZE bytes.  Individual lines or records
1      longer than SIZE bytes are broken into multiple files.  SIZE has
1      the same format as for the ‘--bytes’ option.  If ‘--separator’ is
1      specified, then LINES determines the number of records.
1 
1 ‘--filter=COMMAND’
1      With this option, rather than simply writing to each output file,
1      write through a pipe to the specified shell COMMAND for each output
1      file.  COMMAND should use the $FILE environment variable, which is
1      set to a different output file name for each invocation of the
1      command.  For example, imagine that you have a 1TiB compressed file
1      that, if uncompressed, would be too large to reside on disk, yet
1      you must split it into individually-compressed pieces of a more
1      manageable size.  To do that, you might run this command:
1 
1           xz -dc BIG.xz | split -b200G --filter='xz > $FILE.xz' - big-
1 
1      Assuming a 10:1 compression ratio, that would create about fifty
1      20GiB files with names ‘big-aa.xz’, ‘big-ab.xz’, ‘big-ac.xz’, etc.
1 
1 ‘-n CHUNKS’
1 ‘--number=CHUNKS’
1 
1      Split INPUT to CHUNKS output files where CHUNKS may be:
1 
1           N      generate N files based on current size of INPUT
1           K/N    only output Kth of N to stdout
1           l/N    generate N files without splitting lines or records
1           l/K/N  likewise but only output Kth of N to stdout
1           r/N    like ‘l’ but use round robin distribution
1           r/K/N  likewise but only output Kth of N to stdout
1 
1      Any excess bytes remaining after dividing the INPUT into N chunks,
1      are assigned to the last chunk.  Any excess bytes appearing after
1      the initial calculation are discarded (except when using ‘r’ mode).
1 
1      All N files are created even if there are fewer than N lines, or
1      the INPUT is truncated.
1 
1      For ‘l’ mode, chunks are approximately INPUT size / N.  The INPUT
1      is partitioned into N equal sized portions, with the last assigned
1      any excess.  If a line _starts_ within a partition it is written
1      completely to the corresponding file.  Since lines or records are
1      not split even if they overlap a partition, the files written can
1      be larger or smaller than the partition size, and even empty if a
1      line/record is so long as to completely overlap the partition.
1 
1      For ‘r’ mode, the size of INPUT is irrelevant, and so can be a pipe
1      for example.
1 
1 ‘-a LENGTH’
1 ‘--suffix-length=LENGTH’
1      Use suffixes of length LENGTH.  If a LENGTH of 0 is specified, this
1      is the same as if (any previous) ‘-a’ was not specified, and thus
1      enables the default behavior, which starts the suffix length at 2,
1      and unless ‘-n’ or ‘--numeric-suffixes=FROM’ is specified, will
1      auto increase the length by 2 as required.
1 
1 ‘-d’
1 ‘--numeric-suffixes[=FROM]’
1      Use digits in suffixes rather than lower-case letters.  The
1      numerical suffix counts from FROM if specified, 0 otherwise.
1 
1      FROM is supported with the long form option, and is used to either
1      set the initial suffix for a single run, or to set the suffix
1      offset for independently split inputs, and consequently the auto
1      suffix length expansion described above is disabled.  Therefore you
1      may also want to use option ‘-a’ to allow suffixes beyond ‘99’.
1      Note if option ‘--number’ is specified and the number of files is
1      less than FROM, a single run is assumed and the minimum suffix
1      length required is automatically determined.
1 
1 ‘-x’
1 ‘--hex-suffixes[=FROM]’
1      Like ‘--numeric-suffixes’, but use hexadecimal numbers (in lower
1      case).
1 
1 ‘--additional-suffix=SUFFIX’
1      Append an additional SUFFIX to output file names.  SUFFIX must not
1      contain slash.
1 
1 ‘-e’
1 ‘--elide-empty-files’
1      Suppress the generation of zero-length output files.  This can
1      happen with the ‘--number’ option if a file is (truncated to be)
1      shorter than the number requested, or if a line is so long as to
1      completely span a chunk.  The output file sequence numbers, always
1      run consecutively even when this option is specified.
1 
1 ‘-t SEPARATOR’
1 ‘--separator=SEPARATOR’
1      Use character SEPARATOR as the record separator instead of the
1      default newline character (ASCII LF). To specify ASCII NUL as the
1      separator, use the two-character string ‘\0’, e.g., ‘split -t
1      '\0'’.
1 
1 ‘-u’
1 ‘--unbuffered’
1      Immediately copy input to output in ‘--number r/...’ mode, which is
1      a much slower mode of operation.
1 
1 ‘--verbose’
1      Write a diagnostic just before each output file is opened.
1 
1    An exit status of zero indicates success, and a nonzero value
1 indicates failure.
1 
1    Here are a few examples to illustrate how the ‘--number’ (‘-n’)
1 option works:
1 
1    Notice how, by default, one line may be split onto two or more:
1 
1      $ seq -w 6 10 > k; split -n3 k; head xa?
1      ==> xaa <==
1      06
1      07
1      ==> xab <==
1 
1      08
1      0
1      ==> xac <==
1      9
1      10
1 
1    Use the "l/" modifier to suppress that:
1 
1      $ seq -w 6 10 > k; split -nl/3 k; head xa?
1      ==> xaa <==
1      06
1      07
1 
1      ==> xab <==
1      08
1      09
1 
1      ==> xac <==
1      10
1 
1    Use the "r/" modifier to distribute lines in a round-robin fashion:
1 
1      $ seq -w 6 10 > k; split -nr/3 k; head xa?
1      ==> xaa <==
1      06
1      09
1 
1      ==> xab <==
1      07
1      10
1 
1      ==> xac <==
1      08
1 
1    You can also extract just the Kth chunk.  This extracts and prints
1 just the 7th "chunk" of 33:
1 
1      $ seq 100 > k; split -nl/7/33 k
1      20
1      21
1      22
1
⇖ Info Catalog ← coreutils: tail invocation ↑ coreutils: Output of parts of files → coreutils: csplit invocation