tar: Sparse Recovery
1
1 8.3.10.2 Extracting Sparse Members
1 ..................................
1
1 Any 'tar' implementation will be able to extract sparse members from a
1 PAX archive. However, the extracted files will be "condensed", i.e.,
1 any zero blocks will be removed from them. When we restore such a
1 condensed file to its original form, by adding zero blocks (or "holes")
1 back to their original locations, we call this process "expanding" a
1 compressed sparse file.
1
1 To expand a file, you will need a simple auxiliary program called
1 'xsparse'. It is available in source form from GNU 'tar' home page
1 (http://www.gnu.org/software/tar/utils/xsparse.html).
1
1 Let's begin with archive members in "sparse format version 1.0"(1),
1 which are the easiest to expand. The condensed file will contain both
1 file map and file data, so no additional data will be needed to restore
1 it. If the original file name was 'DIR/NAME', then the condensed file
1 will be named 'DIR/GNUSparseFile.N/NAME', where N is a decimal
1 number(2).
1
1 To expand a version 1.0 file, run 'xsparse' as follows:
1
1 $ xsparse cond-file
1
1 where 'cond-file' is the name of the condensed file. The utility will
1 deduce the name for the resulting expanded file using the following
1 algorithm:
1
1 1. If 'cond-file' does not contain any directories, '../cond-file'
1 will be used;
1
1 2. If 'cond-file' has the form 'DIR/T/NAME', where both T and NAME are
1 simple names, with no '/' characters in them, the output file name
1 will be 'DIR/NAME'.
1
1 3. Otherwise, if 'cond-file' has the form 'DIR/NAME', the output file
1 name will be 'NAME'.
1
1 In the unlikely case when this algorithm does not suit your needs,
1 you can explicitly specify output file name as a second argument to the
1 command:
1
1 $ xsparse cond-file out-file
1
1 It is often a good idea to run 'xsparse' in "dry run" mode first. In
1 this mode, the command does not actually expand the file, but verbosely
1 lists all actions it would be taking to do so. The dry run mode is
1 enabled by '-n' command line argument:
1
1 $ xsparse -n /home/gray/GNUSparseFile.6058/sparsefile
1 Reading v.1.0 sparse map
1 Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to
1 '/home/gray/sparsefile'
1 Finished dry run
1
1 To actually expand the file, you would run:
1
1 $ xsparse /home/gray/GNUSparseFile.6058/sparsefile
1
1 The program behaves the same way all UNIX utilities do: it will keep
1 quiet unless it has something important to tell you (e.g. an error
1 condition or something). If you wish it to produce verbose output,
1 similar to that from the dry run mode, use '-v' option:
1
1 $ xsparse -v /home/gray/GNUSparseFile.6058/sparsefile
1 Reading v.1.0 sparse map
1 Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to
1 '/home/gray/sparsefile'
1 Done
1
1 Additionally, if your 'tar' implementation has extracted the
1 "extended headers" for this file, you can instruct 'xstar' to use them
1 in order to verify the integrity of the expanded file. The option '-x'
1 sets the name of the extended header file to use. Continuing our
1 example:
1
1 $ xsparse -v -x /home/gray/PaxHeaders.6058/sparsefile \
1 /home/gray/GNUSparseFile.6058/sparsefile
1 Reading extended header file
1 Found variable GNU.sparse.major = 1
1 Found variable GNU.sparse.minor = 0
1 Found variable GNU.sparse.name = sparsefile
1 Found variable GNU.sparse.realsize = 217481216
1 Reading v.1.0 sparse map
1 Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to
1 '/home/gray/sparsefile'
1 Done
1
1 An "extended header" is a special 'tar' archive header that precedes
1 an archive member and contains a set of "variables", describing the
1 member properties that cannot be stored in the standard 'ustar' header.
1 While optional for expanding sparse version 1.0 members, the use of
1 extended headers is mandatory when expanding sparse members in older
1 sparse formats: v.0.0 and v.0.1 (The sparse formats are described in
1 detail in ⇒Sparse Formats.) So, for these formats, the question
1 is: how to obtain extended headers from the archive?
1
1 If you use a 'tar' implementation that does not support PAX format,
1 extended headers for each member will be extracted as a separate file.
1 If we represent the member name as 'DIR/NAME', then the extended header
1 file will be named 'DIR/PaxHeaders.N/NAME', where N is an integer
1 number.
1
1 Things become more difficult if your 'tar' implementation does
1 support PAX headers, because in this case you will have to manually
1 extract the headers. We recommend the following algorithm:
1
1 1. Consult the documentation of your 'tar' implementation for an
1 option that prints "block numbers" along with the archive listing
1 (analogous to GNU 'tar''s '-R' option). For example, 'star' has
1 '-block-number'.
1
1 2. Obtain verbose listing using the 'block number' option, and find
1 block numbers of the sparse member in question and the member
1 immediately following it. For example, running 'star' on our
1 archive we obtain:
1
1 $ star -t -v -block-number -f arc.tar
1 ...
1 star: Unknown extended header keyword 'GNU.sparse.size' ignored.
1 star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored.
1 star: Unknown extended header keyword 'GNU.sparse.name' ignored.
1 star: Unknown extended header keyword 'GNU.sparse.map' ignored.
1 block 56: 425984 -rw-r--r-- gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile
1 block 897: 65391 -rw-r--r-- gray/users Jun 24 20:06 2006 README
1 ...
1
1 (as usual, ignore the warnings about unknown keywords.)
1
1 3. Let SIZE be the size of the sparse member, BS be its block number
1 and BN be the block number of the next member. Compute:
1
1 N = BS - BN - SIZE/512 - 2
1
1 This number gives the size of the extended header part in tar
1 "blocks". In our example, this formula gives: '897 - 56 - 425984 /
1 512 - 2 = 7'.
1
1 4. Use 'dd' to extract the headers:
1
1 dd if=ARCHIVE of=HNAME bs=512 skip=BS count=N
1
1 where ARCHIVE is the archive name, HNAME is a name of the file to
1 store the extended header in, BS and N are computed in previous
1 steps.
1
1 In our example, this command will be
1
1 $ dd if=arc.tar of=xhdr bs=512 skip=56 count=7
1
1 Finally, you can expand the condensed file, using the obtained
1 header:
1
1 $ xsparse -v -x xhdr GNUSparseFile.6058/sparsefile
1 Reading extended header file
1 Found variable GNU.sparse.size = 217481216
1 Found variable GNU.sparse.numblocks = 208
1 Found variable GNU.sparse.name = sparsefile
1 Found variable GNU.sparse.map = 0,2048,1050624,2048,...
1 Expanding file 'GNUSparseFile.28124/sparsefile' to 'sparsefile'
1 Done
1
1 ---------- Footnotes ----------
1
1 (1) ⇒PAX 1.
1
1 (2) Technically speaking, N is a "process ID" of the 'tar' process
1 which created the archive (⇒PAX keywords).
1