tar: sparse

1 
1 8.1.2 Archiving Sparse Files
1 ----------------------------
1 
1 Files in the file system occasionally have "holes".  A "hole" in a file
1 is a section of the file's contents which was never written.  The
1 contents of a hole reads as all zeros.  On many operating systems,
1 actual disk storage is not allocated for holes, but they are counted in
1 the length of the file.  If you archive such a file, 'tar' could create
1 an archive longer than the original.  To have 'tar' attempt to recognize
1 the holes in a file, use '--sparse' ('-S').  When you use this option,
1 then, for any file using less disk space than would be expected from its
1 length, 'tar' searches the file for holes.  It then records in the
1 archive for the file where the holes (consecutive stretches of zeros)
1 are, and only archives the "real contents" of the file.  On extraction
1 (using '--sparse' is not needed on extraction) any such files have also
1 holes created wherever the holes were found.  Thus, if you use
1 '--sparse', 'tar' archives won't take more space than the original.
1 
1    GNU 'tar' uses two methods for detecting holes in sparse files.
1 These methods are described later in this subsection.
1 
1 '-S'
1 '--sparse'
1      This option instructs 'tar' to test each file for sparseness before
1      attempting to archive it.  If the file is found to be sparse it is
1      treated specially, thus allowing to decrease the amount of space
1      used by its image in the archive.
1 
1      This option is meaningful only when creating or updating archives.
1      It has no effect on extraction.
1 
1    Consider using '--sparse' when performing file system backups, to
1 avoid archiving the expanded forms of files stored sparsely in the
1 system.
1 
1    Even if your system has no sparse files currently, some may be
1 created in the future.  If you use '--sparse' while making file system
1 backups as a matter of course, you can be assured the archive will never
1 take more space on the media than the files take on disk (otherwise,
1 archiving a disk filled with sparse files might take hundreds of tapes).
1 ⇒Incremental Dumps.
1 
1    However, be aware that '--sparse' option may present a serious
1 drawback.  Namely, in order to determine the positions of holes in a
1 file 'tar' may have to read it before trying to archive it, so in total
1 the file may be read *twice*.  This may happen when your OS or your FS
1 does not support "SEEK_HOLE/SEEK_DATA" feature in "lseek" (See
1 '--hole-detection', below).
1 
1    When using 'POSIX' archive format, GNU 'tar' is able to store sparse
1 files using in three distinct ways, called "sparse formats".  A sparse
1 format is identified by its "number", consisting, as usual of two
1 decimal numbers, delimited by a dot.  By default, format '1.0' is used.
1 If, for some reason, you wish to use an earlier format, you can select
1 it using '--sparse-version' option.
1 
1 '--sparse-version=VERSION'
1      Select the format to store sparse files in.  Valid VERSION values
1      are: '0.0', '0.1' and '1.0'.  ⇒Sparse Formats, for a
1      detailed description of each format.
1 
1    Using '--sparse-format' option implies '--sparse'.
1 
1 '--hole-detection=METHOD'
1      Enforce concrete hole detection method.  Before the real contents
1      of sparse file are stored, 'tar' needs to gather knowledge about
1      file sparseness.  This is because it needs to have the file's map
1      of holes stored into tar header before it starts archiving the file
1      contents.  Currently, two methods of hole detection are
1      implemented:
1 
1         * '--hole-detection=seek' Seeking the file for data and holes.
1           It uses enhancement of the 'lseek' system call ('SEEK_HOLE'
1           and 'SEEK_DATA') which is able to reuse file system knowledge
1           about sparse file contents - so the detection is usually very
1           fast.  To use this feature, your file system and operating
1           system must support it.  At the time of this writing (2015)
1           this feature, in spite of not being accepted by POSIX, is
1           fairly widely supported by different operating systems.
1 
1         * '--hole-detection=raw' Reading byte-by-byte the whole sparse
1           file before the archiving.  This method detects holes like
1           consecutive stretches of zeroes.  Comparing to the previous
1           method, it is usually much slower, although more portable.
1 
1    When no '--hole-detection' option is given, 'tar' uses the 'seek', if
1 supported by the operating system.
1 
1    Using '--hole-detection' option implies '--sparse'.
1