tar: Blocking Factor

1 
1 9.4.2 The Blocking Factor of an Archive
1 ---------------------------------------
1 
1      _(This message will disappear, once this node revised.)_
1 
1    The data in an archive is grouped into blocks, which are 512 bytes.
1 Blocks are read and written in whole number multiples called "records".
1 The number of blocks in a record (i.e., the size of a record in units of
1 512 bytes) is called the "blocking factor".  The
1 '--blocking-factor=512-SIZE' ('-b 512-SIZE') option specifies the
1 blocking factor of an archive.  The default blocking factor is typically
1 20 (i.e., 10240 bytes), but can be specified at installation.  To find
1 out the blocking factor of an existing archive, use 'tar --list
1 --file=ARCHIVE-NAME'.  This may not work on some devices.
1 
1    Records are separated by gaps, which waste space on the archive
1 media.  If you are archiving on magnetic tape, using a larger blocking
1 factor (and therefore larger records) provides faster throughput and
1 allows you to fit more data on a tape (because there are fewer gaps).
1 If you are archiving on cartridge, a very large blocking factor (say 126
1 or more) greatly increases performance.  A smaller blocking factor, on
1 the other hand, may be useful when archiving small files, to avoid
1 archiving lots of nulls as 'tar' fills out the archive to the end of the
1 record.  In general, the ideal record size depends on the size of the
1 inter-record gaps on the tape you are using, and the average size of the
1 files you are archiving.  ⇒create, for information on writing
1 archives.
1 
1    Archives with blocking factors larger than 20 cannot be read by very
1 old versions of 'tar', or by some newer versions of 'tar' running on old
1 machines with small address spaces.  With GNU 'tar', the blocking factor
1 of an archive is limited only by the maximum record size of the device
1 containing the archive, or by the amount of available virtual memory.
1 
1    Also, on some systems, not using adequate blocking factors, as
1 sometimes imposed by the device drivers, may yield unexpected
1 diagnostics.  For example, this has been reported:
1 
1      Cannot write to /dev/dlt: Invalid argument
1 
1 In such cases, it sometimes happen that the 'tar' bundled by the system
1 is aware of block size idiosyncrasies, while GNU 'tar' requires an
1 explicit specification for the block size, which it cannot guess.  This
1 yields some people to consider GNU 'tar' is misbehaving, because by
1 comparison, 'the bundle 'tar' works OK'. Adding '-b 256', for example,
1 might resolve the problem.
1 
1    If you use a non-default blocking factor when you create an archive,
1 you must specify the same blocking factor when you modify that archive.
1 Some archive devices will also require you to specify the blocking
1 factor when reading that archive, however this is not typically the
1 case.  Usually, you can use '--list' ('-t') without specifying a
1 blocking factor--'tar' reports a non-default record size and then lists
1 the archive members as it would normally.  To extract files from an
1 archive with a non-standard blocking factor (particularly if you're not
1 sure what the blocking factor is), you can usually use the
1 '--read-full-records' ('-B') option while specifying a blocking factor
1 larger then the blocking factor of the archive (i.e., 'tar --extract
1 --read-full-records --blocking-factor=300').  ⇒list, for more
1 information on the '--list' ('-t') operation.  ⇒Reading, for a
1 more detailed explanation of that option.
1 
1 '--blocking-factor=NUMBER'
1 '-b NUMBER'
1      Specifies the blocking factor of an archive.  Can be used with any
1      operation, but is usually not necessary with '--list' ('-t').
1 
1    Device blocking
1 
1 '-b BLOCKS'
1 '--blocking-factor=BLOCKS'
1      Set record size to BLOCKS*512 bytes.
1 
1      This option is used to specify a "blocking factor" for the archive.
1      When reading or writing the archive, 'tar', will do reads and
1      writes of the archive in records of BLOCK*512 bytes.  This is true
1      even when the archive is compressed.  Some devices requires that
1      all write operations be a multiple of a certain size, and so, 'tar'
1      pads the archive out to the next record boundary.
1 
1      The default blocking factor is set when 'tar' is compiled, and is
1      typically 20.  Blocking factors larger than 20 cannot be read by
1      very old versions of 'tar', or by some newer versions of 'tar'
1      running on old machines with small address spaces.
1 
1      With a magnetic tape, larger records give faster throughput and fit
1      more data on a tape (because there are fewer inter-record gaps).
1      If the archive is in a disk file or a pipe, you may want to specify
1      a smaller blocking factor, since a large one will result in a large
1      number of null bytes at the end of the archive.
1 
1      When writing cartridge or other streaming tapes, a much larger
1      blocking factor (say 126 or more) will greatly increase
1      performance.  However, you must specify the same blocking factor
1      when reading or updating the archive.
1 
1      Apparently, Exabyte drives have a physical block size of 8K bytes.
1      If we choose our blocksize as a multiple of 8k bytes, then the
1      problem seems to disappear.  Id est, we are using block size of 112
1      right now, and we haven't had the problem since we switched...
1 
1      With GNU 'tar' the blocking factor is limited only by the maximum
1      record size of the device containing the archive, or by the amount
1      of available virtual memory.
1 
1      However, deblocking or reblocking is virtually avoided in a special
1      case which often occurs in practice, but which requires all the
1      following conditions to be simultaneously true:
1         * the archive is subject to a compression option,
1         * the archive is not handled through standard input or output,
1           nor redirected nor piped,
1         * the archive is directly handled to a local disk, instead of
1           any special device,
1         * '--blocking-factor' is not explicitly specified on the 'tar'
1           invocation.
1 
1      If the output goes directly to a local disk, and not through
1      stdout, then the last write is not extended to a full record size.
1      Otherwise, reblocking occurs.  Here are a few other remarks on this
1      topic:
1 
1         * 'gzip' will complain about trailing garbage if asked to
1           uncompress a compressed archive on tape, there is an option to
1           turn the message off, but it breaks the regularity of simply
1           having to use 'PROG -d' for decompression.  It would be nice
1           if gzip was silently ignoring any number of trailing zeros.
1           I'll ask Jean-loup Gailly, by sending a copy of this message
1           to him.
1 
1         * 'compress' does not show this problem, but as Jean-loup
1           pointed out to Michael, 'compress -d' silently adds garbage
1           after the result of decompression, which tar ignores because
1           it already recognized its end-of-file indicator.  So this bug
1           may be safely ignored.
1 
1         * 'gzip -d -q' will be silent about the trailing zeros indeed,
1           but will still return an exit status of 2 which tar reports in
1           turn.  'tar' might ignore the exit status returned, but I hate
1           doing that, as it weakens the protection 'tar' offers users
1           against other possible problems at decompression time.  If
1           'gzip' was silently skipping trailing zeros _and_ also
1           avoiding setting the exit status in this innocuous case, that
1           would solve this situation.
1 
1         * 'tar' should become more solid at not stopping to read a pipe
1           at the first null block encountered.  This inelegantly breaks
1           the pipe.  'tar' should rather drain the pipe out before
1           exiting itself.
1 
1 '-i'
1 '--ignore-zeros'
1      Ignore blocks of zeros in archive (means EOF).
1 
1      The '--ignore-zeros' ('-i') option causes 'tar' to ignore blocks of
1      zeros in the archive.  Normally a block of zeros indicates the end
1      of the archive, but when reading a damaged archive, or one which
1      was created by concatenating several archives together, this option
1      allows 'tar' to read the entire archive.  This option is not on by
1      default because many versions of 'tar' write garbage after the
1      zeroed blocks.
1 
1      Note that this option causes 'tar' to read to the end of the
1      archive file, which may sometimes avoid problems when multiple
1      files are stored on a single physical tape.
1 
1 '-B'
1 '--read-full-records'
1      Reblock as we read (for reading 4.2BSD pipes).
1 
1      If '--read-full-records' is used, 'tar' will not panic if an
1      attempt to read a record from the archive does not return a full
1      record.  Instead, 'tar' will keep reading until it has obtained a
1      full record.
1 
1      This option is turned on by default when 'tar' is reading an
1      archive from standard input, or from a remote machine.  This is
1      because on BSD Unix systems, a read of a pipe will return however
1      much happens to be in the pipe, even if it is less than 'tar'
1      requested.  If this option was not used, 'tar' would fail as soon
1      as it read an incomplete record from the pipe.
1 
1      This option is also useful with the commands for updating an
1      archive.
1 
1    Tape blocking
1 
1    When handling various tapes or cartridges, you have to take care of
1 selecting a proper blocking, that is, the number of disk blocks you put
1 together as a single tape block on the tape, without intervening tape
1 gaps.  A "tape gap" is a small landing area on the tape with no
1 information on it, used for decelerating the tape to a full stop, and
1 for later regaining the reading or writing speed.  When the tape driver
1 starts reading a record, the record has to be read whole without
1 stopping, as a tape gap is needed to stop the tape motion without losing
1 information.
1 
1    Using higher blocking (putting more disk blocks per tape block) will
1 use the tape more efficiently as there will be less tape gaps.  But
1 reading such tapes may be more difficult for the system, as more memory
1 will be required to receive at once the whole record.  Further, if there
1 is a reading error on a huge record, this is less likely that the system
1 will succeed in recovering the information.  So, blocking should not be
1 too low, nor it should be too high.  'tar' uses by default a blocking of
1 20 for historical reasons, and it does not really matter when reading or
1 writing to disk.  Current tape technology would easily accommodate
1 higher blockings.  Sun recommends a blocking of 126 for Exabytes and 96
1 for DATs.  We were told that for some DLT drives, the blocking should be
1 a multiple of 4Kb, preferably 64Kb ('-b 128') or 256 for decent
1 performance.  Other manufacturers may use different recommendations for
1 the same tapes.  This might also depends of the buffering techniques
1 used inside modern tape controllers.  Some imposes a minimum blocking,
1 or a maximum blocking.  Others request blocking to be some exponent of
1 two.
1 
1    So, there is no fixed rule for blocking.  But blocking at read time
1 should ideally be the same as blocking used at write time.  At one place
1 I know, with a wide variety of equipment, they found it best to use a
1 blocking of 32 to guarantee that their tapes are fully interchangeable.
1 
1    I was also told that, for recycled tapes, prior erasure (by the same
1 drive unit that will be used to create the archives) sometimes lowers
1 the error rates observed at rewriting time.
1 
1    I might also use '--number-blocks' instead of '--block-number', so
1 '--block' will then expand to '--blocking-factor' unambiguously.
1