tar: Sparse Recovery

1 
1 8.3.10.2 Extracting Sparse Members
1 ..................................
1 
1 Any 'tar' implementation will be able to extract sparse members from a
1 PAX archive.  However, the extracted files will be "condensed", i.e.,
1 any zero blocks will be removed from them.  When we restore such a
1 condensed file to its original form, by adding zero blocks (or "holes")
1 back to their original locations, we call this process "expanding" a
1 compressed sparse file.
1 
1    To expand a file, you will need a simple auxiliary program called
1 'xsparse'.  It is available in source form from GNU 'tar' home page
1 (http://www.gnu.org/software/tar/utils/xsparse.html).
1 
1    Let's begin with archive members in "sparse format version 1.0"(1),
1 which are the easiest to expand.  The condensed file will contain both
1 file map and file data, so no additional data will be needed to restore
1 it.  If the original file name was 'DIR/NAME', then the condensed file
1 will be named 'DIR/GNUSparseFile.N/NAME', where N is a decimal
1 number(2).
1 
1    To expand a version 1.0 file, run 'xsparse' as follows:
1 
1      $ xsparse cond-file
1 
1 where 'cond-file' is the name of the condensed file.  The utility will
1 deduce the name for the resulting expanded file using the following
1 algorithm:
1 
1   1. If 'cond-file' does not contain any directories, '../cond-file'
1      will be used;
1 
1   2. If 'cond-file' has the form 'DIR/T/NAME', where both T and NAME are
1      simple names, with no '/' characters in them, the output file name
1      will be 'DIR/NAME'.
1 
1   3. Otherwise, if 'cond-file' has the form 'DIR/NAME', the output file
1      name will be 'NAME'.
1 
1    In the unlikely case when this algorithm does not suit your needs,
1 you can explicitly specify output file name as a second argument to the
1 command:
1 
1      $ xsparse cond-file out-file
1 
1    It is often a good idea to run 'xsparse' in "dry run" mode first.  In
1 this mode, the command does not actually expand the file, but verbosely
1 lists all actions it would be taking to do so.  The dry run mode is
1 enabled by '-n' command line argument:
1 
1      $ xsparse -n /home/gray/GNUSparseFile.6058/sparsefile
1      Reading v.1.0 sparse map
1      Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to
1      '/home/gray/sparsefile'
1      Finished dry run
1 
1    To actually expand the file, you would run:
1 
1      $ xsparse /home/gray/GNUSparseFile.6058/sparsefile
1 
1 The program behaves the same way all UNIX utilities do: it will keep
1 quiet unless it has something important to tell you (e.g.  an error
1 condition or something).  If you wish it to produce verbose output,
1 similar to that from the dry run mode, use '-v' option:
1 
1      $ xsparse -v /home/gray/GNUSparseFile.6058/sparsefile
1      Reading v.1.0 sparse map
1      Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to
1      '/home/gray/sparsefile'
1      Done
1 
1    Additionally, if your 'tar' implementation has extracted the
1 "extended headers" for this file, you can instruct 'xstar' to use them
1 in order to verify the integrity of the expanded file.  The option '-x'
1 sets the name of the extended header file to use.  Continuing our
1 example:
1 
1      $ xsparse -v -x /home/gray/PaxHeaders.6058/sparsefile \
1        /home/gray/GNUSparseFile.6058/sparsefile
1      Reading extended header file
1      Found variable GNU.sparse.major = 1
1      Found variable GNU.sparse.minor = 0
1      Found variable GNU.sparse.name = sparsefile
1      Found variable GNU.sparse.realsize = 217481216
1      Reading v.1.0 sparse map
1      Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to
1      '/home/gray/sparsefile'
1      Done
1 
1    An "extended header" is a special 'tar' archive header that precedes
1 an archive member and contains a set of "variables", describing the
1 member properties that cannot be stored in the standard 'ustar' header.
1 While optional for expanding sparse version 1.0 members, the use of
1 extended headers is mandatory when expanding sparse members in older
1 sparse formats: v.0.0 and v.0.1 (The sparse formats are described in
1 detail in ⇒Sparse Formats.)  So, for these formats, the question
1 is: how to obtain extended headers from the archive?
1 
1    If you use a 'tar' implementation that does not support PAX format,
1 extended headers for each member will be extracted as a separate file.
1 If we represent the member name as 'DIR/NAME', then the extended header
1 file will be named 'DIR/PaxHeaders.N/NAME', where N is an integer
1 number.
1 
1    Things become more difficult if your 'tar' implementation does
1 support PAX headers, because in this case you will have to manually
1 extract the headers.  We recommend the following algorithm:
1 
1   1. Consult the documentation of your 'tar' implementation for an
1      option that prints "block numbers" along with the archive listing
1      (analogous to GNU 'tar''s '-R' option).  For example, 'star' has
1      '-block-number'.
1 
1   2. Obtain verbose listing using the 'block number' option, and find
1      block numbers of the sparse member in question and the member
1      immediately following it.  For example, running 'star' on our
1      archive we obtain:
1 
1           $ star -t -v -block-number -f arc.tar
1           ...
1           star: Unknown extended header keyword 'GNU.sparse.size' ignored.
1           star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored.
1           star: Unknown extended header keyword 'GNU.sparse.name' ignored.
1           star: Unknown extended header keyword 'GNU.sparse.map' ignored.
1           block        56:  425984 -rw-r--r--  gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile
1           block       897:   65391 -rw-r--r--  gray/users Jun 24 20:06 2006 README
1           ...
1 
1      (as usual, ignore the warnings about unknown keywords.)
1 
1   3. Let SIZE be the size of the sparse member, BS be its block number
1      and BN be the block number of the next member.  Compute:
1 
1           N = BS - BN - SIZE/512 - 2
1 
1      This number gives the size of the extended header part in tar
1      "blocks".  In our example, this formula gives: '897 - 56 - 425984 /
1      512 - 2 = 7'.
1 
1   4. Use 'dd' to extract the headers:
1 
1           dd if=ARCHIVE of=HNAME bs=512 skip=BS count=N
1 
1      where ARCHIVE is the archive name, HNAME is a name of the file to
1      store the extended header in, BS and N are computed in previous
1      steps.
1 
1      In our example, this command will be
1 
1           $ dd if=arc.tar of=xhdr bs=512 skip=56 count=7
1 
1    Finally, you can expand the condensed file, using the obtained
1 header:
1 
1      $ xsparse -v -x xhdr GNUSparseFile.6058/sparsefile
1      Reading extended header file
1      Found variable GNU.sparse.size = 217481216
1      Found variable GNU.sparse.numblocks = 208
1      Found variable GNU.sparse.name = sparsefile
1      Found variable GNU.sparse.map = 0,2048,1050624,2048,...
1      Expanding file 'GNUSparseFile.28124/sparsefile' to 'sparsefile'
1      Done
1 
1    ---------- Footnotes ----------
1 
1    (1) ⇒PAX 1.
1 
1    (2) Technically speaking, N is a "process ID" of the 'tar' process
1 which created the archive (⇒PAX keywords).
1