tar: Blocking Factor
1
1 9.4.2 The Blocking Factor of an Archive
1 ---------------------------------------
1
1 _(This message will disappear, once this node revised.)_
1
1 The data in an archive is grouped into blocks, which are 512 bytes.
1 Blocks are read and written in whole number multiples called "records".
1 The number of blocks in a record (i.e., the size of a record in units of
1 512 bytes) is called the "blocking factor". The
1 '--blocking-factor=512-SIZE' ('-b 512-SIZE') option specifies the
1 blocking factor of an archive. The default blocking factor is typically
1 20 (i.e., 10240 bytes), but can be specified at installation. To find
1 out the blocking factor of an existing archive, use 'tar --list
1 --file=ARCHIVE-NAME'. This may not work on some devices.
1
1 Records are separated by gaps, which waste space on the archive
1 media. If you are archiving on magnetic tape, using a larger blocking
1 factor (and therefore larger records) provides faster throughput and
1 allows you to fit more data on a tape (because there are fewer gaps).
1 If you are archiving on cartridge, a very large blocking factor (say 126
1 or more) greatly increases performance. A smaller blocking factor, on
1 the other hand, may be useful when archiving small files, to avoid
1 archiving lots of nulls as 'tar' fills out the archive to the end of the
1 record. In general, the ideal record size depends on the size of the
1 inter-record gaps on the tape you are using, and the average size of the
1 files you are archiving. ⇒create, for information on writing
1 archives.
1
1 Archives with blocking factors larger than 20 cannot be read by very
1 old versions of 'tar', or by some newer versions of 'tar' running on old
1 machines with small address spaces. With GNU 'tar', the blocking factor
1 of an archive is limited only by the maximum record size of the device
1 containing the archive, or by the amount of available virtual memory.
1
1 Also, on some systems, not using adequate blocking factors, as
1 sometimes imposed by the device drivers, may yield unexpected
1 diagnostics. For example, this has been reported:
1
1 Cannot write to /dev/dlt: Invalid argument
1
1 In such cases, it sometimes happen that the 'tar' bundled by the system
1 is aware of block size idiosyncrasies, while GNU 'tar' requires an
1 explicit specification for the block size, which it cannot guess. This
1 yields some people to consider GNU 'tar' is misbehaving, because by
1 comparison, 'the bundle 'tar' works OK'. Adding '-b 256', for example,
1 might resolve the problem.
1
1 If you use a non-default blocking factor when you create an archive,
1 you must specify the same blocking factor when you modify that archive.
1 Some archive devices will also require you to specify the blocking
1 factor when reading that archive, however this is not typically the
1 case. Usually, you can use '--list' ('-t') without specifying a
1 blocking factor--'tar' reports a non-default record size and then lists
1 the archive members as it would normally. To extract files from an
1 archive with a non-standard blocking factor (particularly if you're not
1 sure what the blocking factor is), you can usually use the
1 '--read-full-records' ('-B') option while specifying a blocking factor
1 larger then the blocking factor of the archive (i.e., 'tar --extract
1 --read-full-records --blocking-factor=300'). ⇒list, for more
1 information on the '--list' ('-t') operation. ⇒Reading, for a
1 more detailed explanation of that option.
1
1 '--blocking-factor=NUMBER'
1 '-b NUMBER'
1 Specifies the blocking factor of an archive. Can be used with any
1 operation, but is usually not necessary with '--list' ('-t').
1
1 Device blocking
1
1 '-b BLOCKS'
1 '--blocking-factor=BLOCKS'
1 Set record size to BLOCKS*512 bytes.
1
1 This option is used to specify a "blocking factor" for the archive.
1 When reading or writing the archive, 'tar', will do reads and
1 writes of the archive in records of BLOCK*512 bytes. This is true
1 even when the archive is compressed. Some devices requires that
1 all write operations be a multiple of a certain size, and so, 'tar'
1 pads the archive out to the next record boundary.
1
1 The default blocking factor is set when 'tar' is compiled, and is
1 typically 20. Blocking factors larger than 20 cannot be read by
1 very old versions of 'tar', or by some newer versions of 'tar'
1 running on old machines with small address spaces.
1
1 With a magnetic tape, larger records give faster throughput and fit
1 more data on a tape (because there are fewer inter-record gaps).
1 If the archive is in a disk file or a pipe, you may want to specify
1 a smaller blocking factor, since a large one will result in a large
1 number of null bytes at the end of the archive.
1
1 When writing cartridge or other streaming tapes, a much larger
1 blocking factor (say 126 or more) will greatly increase
1 performance. However, you must specify the same blocking factor
1 when reading or updating the archive.
1
1 Apparently, Exabyte drives have a physical block size of 8K bytes.
1 If we choose our blocksize as a multiple of 8k bytes, then the
1 problem seems to disappear. Id est, we are using block size of 112
1 right now, and we haven't had the problem since we switched...
1
1 With GNU 'tar' the blocking factor is limited only by the maximum
1 record size of the device containing the archive, or by the amount
1 of available virtual memory.
1
1 However, deblocking or reblocking is virtually avoided in a special
1 case which often occurs in practice, but which requires all the
1 following conditions to be simultaneously true:
1 * the archive is subject to a compression option,
1 * the archive is not handled through standard input or output,
1 nor redirected nor piped,
1 * the archive is directly handled to a local disk, instead of
1 any special device,
1 * '--blocking-factor' is not explicitly specified on the 'tar'
1 invocation.
1
1 If the output goes directly to a local disk, and not through
1 stdout, then the last write is not extended to a full record size.
1 Otherwise, reblocking occurs. Here are a few other remarks on this
1 topic:
1
1 * 'gzip' will complain about trailing garbage if asked to
1 uncompress a compressed archive on tape, there is an option to
1 turn the message off, but it breaks the regularity of simply
1 having to use 'PROG -d' for decompression. It would be nice
1 if gzip was silently ignoring any number of trailing zeros.
1 I'll ask Jean-loup Gailly, by sending a copy of this message
1 to him.
1
1 * 'compress' does not show this problem, but as Jean-loup
1 pointed out to Michael, 'compress -d' silently adds garbage
1 after the result of decompression, which tar ignores because
1 it already recognized its end-of-file indicator. So this bug
1 may be safely ignored.
1
1 * 'gzip -d -q' will be silent about the trailing zeros indeed,
1 but will still return an exit status of 2 which tar reports in
1 turn. 'tar' might ignore the exit status returned, but I hate
1 doing that, as it weakens the protection 'tar' offers users
1 against other possible problems at decompression time. If
1 'gzip' was silently skipping trailing zeros _and_ also
1 avoiding setting the exit status in this innocuous case, that
1 would solve this situation.
1
1 * 'tar' should become more solid at not stopping to read a pipe
1 at the first null block encountered. This inelegantly breaks
1 the pipe. 'tar' should rather drain the pipe out before
1 exiting itself.
1
1 '-i'
1 '--ignore-zeros'
1 Ignore blocks of zeros in archive (means EOF).
1
1 The '--ignore-zeros' ('-i') option causes 'tar' to ignore blocks of
1 zeros in the archive. Normally a block of zeros indicates the end
1 of the archive, but when reading a damaged archive, or one which
1 was created by concatenating several archives together, this option
1 allows 'tar' to read the entire archive. This option is not on by
1 default because many versions of 'tar' write garbage after the
1 zeroed blocks.
1
1 Note that this option causes 'tar' to read to the end of the
1 archive file, which may sometimes avoid problems when multiple
1 files are stored on a single physical tape.
1
1 '-B'
1 '--read-full-records'
1 Reblock as we read (for reading 4.2BSD pipes).
1
1 If '--read-full-records' is used, 'tar' will not panic if an
1 attempt to read a record from the archive does not return a full
1 record. Instead, 'tar' will keep reading until it has obtained a
1 full record.
1
1 This option is turned on by default when 'tar' is reading an
1 archive from standard input, or from a remote machine. This is
1 because on BSD Unix systems, a read of a pipe will return however
1 much happens to be in the pipe, even if it is less than 'tar'
1 requested. If this option was not used, 'tar' would fail as soon
1 as it read an incomplete record from the pipe.
1
1 This option is also useful with the commands for updating an
1 archive.
1
1 Tape blocking
1
1 When handling various tapes or cartridges, you have to take care of
1 selecting a proper blocking, that is, the number of disk blocks you put
1 together as a single tape block on the tape, without intervening tape
1 gaps. A "tape gap" is a small landing area on the tape with no
1 information on it, used for decelerating the tape to a full stop, and
1 for later regaining the reading or writing speed. When the tape driver
1 starts reading a record, the record has to be read whole without
1 stopping, as a tape gap is needed to stop the tape motion without losing
1 information.
1
1 Using higher blocking (putting more disk blocks per tape block) will
1 use the tape more efficiently as there will be less tape gaps. But
1 reading such tapes may be more difficult for the system, as more memory
1 will be required to receive at once the whole record. Further, if there
1 is a reading error on a huge record, this is less likely that the system
1 will succeed in recovering the information. So, blocking should not be
1 too low, nor it should be too high. 'tar' uses by default a blocking of
1 20 for historical reasons, and it does not really matter when reading or
1 writing to disk. Current tape technology would easily accommodate
1 higher blockings. Sun recommends a blocking of 126 for Exabytes and 96
1 for DATs. We were told that for some DLT drives, the blocking should be
1 a multiple of 4Kb, preferably 64Kb ('-b 128') or 256 for decent
1 performance. Other manufacturers may use different recommendations for
1 the same tapes. This might also depends of the buffering techniques
1 used inside modern tape controllers. Some imposes a minimum blocking,
1 or a maximum blocking. Others request blocking to be some exponent of
1 two.
1
1 So, there is no fixed rule for blocking. But blocking at read time
1 should ideally be the same as blocking used at write time. At one place
1 I know, with a wide variety of equipment, they found it best to use a
1 blocking of 32 to guarantee that their tapes are fully interchangeable.
1
1 I was also told that, for recycled tapes, prior erasure (by the same
1 drive unit that will be used to create the archives) sometimes lowers
1 the error rates observed at rewriting time.
1
1 I might also use '--number-blocks' instead of '--block-number', so
1 '--block' will then expand to '--blocking-factor' unambiguously.
1