tar: Blocking

1 
1 9.4 Blocking
1 ============
1 
1 "Block" and "record" terminology is rather confused, and it is also
1 confusing to the expert reader.  On the other hand, readers who are new
1 to the field have a fresh mind, and they may safely skip the next two
1 paragraphs, as the remainder of this manual uses those two terms in a
1 quite consistent way.
1 
1    John Gilmore, the writer of the public domain 'tar' from which GNU
1 'tar' was originally derived, wrote (June 1995):
1 
1      The nomenclature of tape drives comes from IBM, where I believe
1      they were invented for the IBM 650 or so.  On IBM mainframes, what
1      is recorded on tape are tape blocks.  The logical organization of
1      data is into records.  There are various ways of putting records
1      into blocks, including 'F' (fixed sized records), 'V' (variable
1      sized records), 'FB' (fixed blocked: fixed size records, N to a
1      block), 'VB' (variable size records, N to a block), 'VSB' (variable
1      spanned blocked: variable sized records that can occupy more than
1      one block), etc.  The 'JCL' 'DD RECFORM=' parameter specified this
1      to the operating system.
1 
1      The Unix man page on 'tar' was totally confused about this.  When I
1      wrote 'PD TAR', I used the historically correct terminology ('tar'
1      writes data records, which are grouped into blocks).  It appears
1      that the bogus terminology made it into POSIX (no surprise here),
1      and now Franc,ois has migrated that terminology back into the
1      source code too.
1 
1    The term "physical block" means the basic transfer chunk from or to a
1 device, after which reading or writing may stop without anything being
1 lost.  In this manual, the term "block" usually refers to a disk
1 physical block, _assuming_ that each disk block is 512 bytes in length.
1 It is true that some disk devices have different physical blocks, but
1 'tar' ignore these differences in its own format, which is meant to be
1 portable, so a 'tar' block is always 512 bytes in length, and "block"
1 always mean a 'tar' block.  The term "logical block" often represents
1 the basic chunk of allocation of many disk blocks as a single entity,
1 which the operating system treats somewhat atomically; this concept is
1 only barely used in GNU 'tar'.
1 
1    The term "physical record" is another way to speak of a physical
1 block, those two terms are somewhat interchangeable.  In this manual,
1 the term "record" usually refers to a tape physical block, _assuming_
1 that the 'tar' archive is kept on magnetic tape.  It is true that
1 archives may be put on disk or used with pipes, but nevertheless, 'tar'
1 tries to read and write the archive one "record" at a time, whatever the
1 medium in use.  One record is made up of an integral number of blocks,
1 and this operation of putting many disk blocks into a single tape block
1 is called "reblocking", or more simply, "blocking".  The term "logical
1 record" refers to the logical organization of many characters into
1 something meaningful to the application.  The term "unit record"
1 describes a small set of characters which are transmitted whole to or by
1 the application, and often refers to a line of text.  Those two last
1 terms are unrelated to what we call a "record" in GNU 'tar'.
1 
1    When writing to tapes, 'tar' writes the contents of the archive in
1 chunks known as "records".  To change the default blocking factor, use
1 the '--blocking-factor=512-SIZE' ('-b 512-SIZE') option.  Each record
1 will then be composed of 512-SIZE blocks.  (Each 'tar' block is 512
1 bytes.  ⇒Standard.)  Each file written to the archive uses at
1 least one full record.  As a result, using a larger record size can
1 result in more wasted space for small files.  On the other hand, a
1 larger record size can often be read and written much more efficiently.
1 
1    Further complicating the problem is that some tape drives ignore the
1 blocking entirely.  For these, a larger record size can still improve
1 performance (because the software layers above the tape drive still
1 honor the blocking), but not as dramatically as on tape drives that
1 honor blocking.
1 
1    When reading an archive, 'tar' can usually figure out the record size
1 on itself.  When this is the case, and a non-standard record size was
1 used when the archive was created, 'tar' will print a message about a
1 non-standard blocking factor, and then operate normally(1).  On some
1 tape devices, however, 'tar' cannot figure out the record size itself.
1 On most of those, you can specify a blocking factor (with
1 '--blocking-factor') larger than the actual blocking factor, and then
1 use the '--read-full-records' ('-B') option.  (If you specify a blocking
1 factor with '--blocking-factor' and don't use the '--read-full-records'
1 option, then 'tar' will not attempt to figure out the recording size
1 itself.)  On some devices, you must always specify the record size
1 exactly with '--blocking-factor' when reading, because 'tar' cannot
1 figure it out.  In any case, use '--list' ('-t') before doing any
1 extractions to see whether 'tar' is reading the archive correctly.
1 
1    'tar' blocks are all fixed size (512 bytes), and its scheme for
1 putting them into records is to put a whole number of them (one or more)
1 into each record.  'tar' records are all the same size; at the end of
1 the file there's a block containing all zeros, which is how you tell
1 that the remainder of the last record(s) are garbage.
1 
1    In a standard 'tar' file (no options), the block size is 512 and the
1 record size is 10240, for a blocking factor of 20.  What the
1 '--blocking-factor' option does is sets the blocking factor, changing
1 the record size while leaving the block size at 512 bytes.  20 was fine
1 for ancient 800 or 1600 bpi reel-to-reel tape drives; most tape drives
1 these days prefer much bigger records in order to stream and not waste
1 tape.  When writing tapes for myself, some tend to use a factor of the
1 order of 2048, say, giving a record size of around one megabyte.
1 
1    If you use a blocking factor larger than 20, older 'tar' programs
1 might not be able to read the archive, so we recommend this as a limit
1 to use in practice.  GNU 'tar', however, will support arbitrarily large
1 record sizes, limited only by the amount of virtual memory or the
1 physical characteristics of the tape device.
1 

Menu