tar: sparse
1
1 8.1.2 Archiving Sparse Files
1 ----------------------------
1
1 Files in the file system occasionally have "holes". A "hole" in a file
1 is a section of the file's contents which was never written. The
1 contents of a hole reads as all zeros. On many operating systems,
1 actual disk storage is not allocated for holes, but they are counted in
1 the length of the file. If you archive such a file, 'tar' could create
1 an archive longer than the original. To have 'tar' attempt to recognize
1 the holes in a file, use '--sparse' ('-S'). When you use this option,
1 then, for any file using less disk space than would be expected from its
1 length, 'tar' searches the file for holes. It then records in the
1 archive for the file where the holes (consecutive stretches of zeros)
1 are, and only archives the "real contents" of the file. On extraction
1 (using '--sparse' is not needed on extraction) any such files have also
1 holes created wherever the holes were found. Thus, if you use
1 '--sparse', 'tar' archives won't take more space than the original.
1
1 GNU 'tar' uses two methods for detecting holes in sparse files.
1 These methods are described later in this subsection.
1
1 '-S'
1 '--sparse'
1 This option instructs 'tar' to test each file for sparseness before
1 attempting to archive it. If the file is found to be sparse it is
1 treated specially, thus allowing to decrease the amount of space
1 used by its image in the archive.
1
1 This option is meaningful only when creating or updating archives.
1 It has no effect on extraction.
1
1 Consider using '--sparse' when performing file system backups, to
1 avoid archiving the expanded forms of files stored sparsely in the
1 system.
1
1 Even if your system has no sparse files currently, some may be
1 created in the future. If you use '--sparse' while making file system
1 backups as a matter of course, you can be assured the archive will never
1 take more space on the media than the files take on disk (otherwise,
1 archiving a disk filled with sparse files might take hundreds of tapes).
1 ⇒Incremental Dumps.
1
1 However, be aware that '--sparse' option may present a serious
1 drawback. Namely, in order to determine the positions of holes in a
1 file 'tar' may have to read it before trying to archive it, so in total
1 the file may be read *twice*. This may happen when your OS or your FS
1 does not support "SEEK_HOLE/SEEK_DATA" feature in "lseek" (See
1 '--hole-detection', below).
1
1 When using 'POSIX' archive format, GNU 'tar' is able to store sparse
1 files using in three distinct ways, called "sparse formats". A sparse
1 format is identified by its "number", consisting, as usual of two
1 decimal numbers, delimited by a dot. By default, format '1.0' is used.
1 If, for some reason, you wish to use an earlier format, you can select
1 it using '--sparse-version' option.
1
1 '--sparse-version=VERSION'
1 Select the format to store sparse files in. Valid VERSION values
1 are: '0.0', '0.1' and '1.0'. ⇒Sparse Formats, for a
1 detailed description of each format.
1
1 Using '--sparse-format' option implies '--sparse'.
1
1 '--hole-detection=METHOD'
1 Enforce concrete hole detection method. Before the real contents
1 of sparse file are stored, 'tar' needs to gather knowledge about
1 file sparseness. This is because it needs to have the file's map
1 of holes stored into tar header before it starts archiving the file
1 contents. Currently, two methods of hole detection are
1 implemented:
1
1 * '--hole-detection=seek' Seeking the file for data and holes.
1 It uses enhancement of the 'lseek' system call ('SEEK_HOLE'
1 and 'SEEK_DATA') which is able to reuse file system knowledge
1 about sparse file contents - so the detection is usually very
1 fast. To use this feature, your file system and operating
1 system must support it. At the time of this writing (2015)
1 this feature, in spite of not being accepted by POSIX, is
1 fairly widely supported by different operating systems.
1
1 * '--hole-detection=raw' Reading byte-by-byte the whole sparse
1 file before the archiving. This method detects holes like
1 consecutive stretches of zeroes. Comparing to the previous
1 method, it is usually much slower, although more portable.
1
1 When no '--hole-detection' option is given, 'tar' uses the 'seek', if
1 supported by the operating system.
1
1 Using '--hole-detection' option implies '--sparse'.
1