tar: Blocking
1
1 9.4 Blocking
1 ============
1
1 "Block" and "record" terminology is rather confused, and it is also
1 confusing to the expert reader. On the other hand, readers who are new
1 to the field have a fresh mind, and they may safely skip the next two
1 paragraphs, as the remainder of this manual uses those two terms in a
1 quite consistent way.
1
1 John Gilmore, the writer of the public domain 'tar' from which GNU
1 'tar' was originally derived, wrote (June 1995):
1
1 The nomenclature of tape drives comes from IBM, where I believe
1 they were invented for the IBM 650 or so. On IBM mainframes, what
1 is recorded on tape are tape blocks. The logical organization of
1 data is into records. There are various ways of putting records
1 into blocks, including 'F' (fixed sized records), 'V' (variable
1 sized records), 'FB' (fixed blocked: fixed size records, N to a
1 block), 'VB' (variable size records, N to a block), 'VSB' (variable
1 spanned blocked: variable sized records that can occupy more than
1 one block), etc. The 'JCL' 'DD RECFORM=' parameter specified this
1 to the operating system.
1
1 The Unix man page on 'tar' was totally confused about this. When I
1 wrote 'PD TAR', I used the historically correct terminology ('tar'
1 writes data records, which are grouped into blocks). It appears
1 that the bogus terminology made it into POSIX (no surprise here),
1 and now Franc,ois has migrated that terminology back into the
1 source code too.
1
1 The term "physical block" means the basic transfer chunk from or to a
1 device, after which reading or writing may stop without anything being
1 lost. In this manual, the term "block" usually refers to a disk
1 physical block, _assuming_ that each disk block is 512 bytes in length.
1 It is true that some disk devices have different physical blocks, but
1 'tar' ignore these differences in its own format, which is meant to be
1 portable, so a 'tar' block is always 512 bytes in length, and "block"
1 always mean a 'tar' block. The term "logical block" often represents
1 the basic chunk of allocation of many disk blocks as a single entity,
1 which the operating system treats somewhat atomically; this concept is
1 only barely used in GNU 'tar'.
1
1 The term "physical record" is another way to speak of a physical
1 block, those two terms are somewhat interchangeable. In this manual,
1 the term "record" usually refers to a tape physical block, _assuming_
1 that the 'tar' archive is kept on magnetic tape. It is true that
1 archives may be put on disk or used with pipes, but nevertheless, 'tar'
1 tries to read and write the archive one "record" at a time, whatever the
1 medium in use. One record is made up of an integral number of blocks,
1 and this operation of putting many disk blocks into a single tape block
1 is called "reblocking", or more simply, "blocking". The term "logical
1 record" refers to the logical organization of many characters into
1 something meaningful to the application. The term "unit record"
1 describes a small set of characters which are transmitted whole to or by
1 the application, and often refers to a line of text. Those two last
1 terms are unrelated to what we call a "record" in GNU 'tar'.
1
1 When writing to tapes, 'tar' writes the contents of the archive in
1 chunks known as "records". To change the default blocking factor, use
1 the '--blocking-factor=512-SIZE' ('-b 512-SIZE') option. Each record
1 will then be composed of 512-SIZE blocks. (Each 'tar' block is 512
1 bytes. ⇒Standard.) Each file written to the archive uses at
1 least one full record. As a result, using a larger record size can
1 result in more wasted space for small files. On the other hand, a
1 larger record size can often be read and written much more efficiently.
1
1 Further complicating the problem is that some tape drives ignore the
1 blocking entirely. For these, a larger record size can still improve
1 performance (because the software layers above the tape drive still
1 honor the blocking), but not as dramatically as on tape drives that
1 honor blocking.
1
1 When reading an archive, 'tar' can usually figure out the record size
1 on itself. When this is the case, and a non-standard record size was
1 used when the archive was created, 'tar' will print a message about a
1 non-standard blocking factor, and then operate normally(1). On some
1 tape devices, however, 'tar' cannot figure out the record size itself.
1 On most of those, you can specify a blocking factor (with
1 '--blocking-factor') larger than the actual blocking factor, and then
1 use the '--read-full-records' ('-B') option. (If you specify a blocking
1 factor with '--blocking-factor' and don't use the '--read-full-records'
1 option, then 'tar' will not attempt to figure out the recording size
1 itself.) On some devices, you must always specify the record size
1 exactly with '--blocking-factor' when reading, because 'tar' cannot
1 figure it out. In any case, use '--list' ('-t') before doing any
1 extractions to see whether 'tar' is reading the archive correctly.
1
1 'tar' blocks are all fixed size (512 bytes), and its scheme for
1 putting them into records is to put a whole number of them (one or more)
1 into each record. 'tar' records are all the same size; at the end of
1 the file there's a block containing all zeros, which is how you tell
1 that the remainder of the last record(s) are garbage.
1
1 In a standard 'tar' file (no options), the block size is 512 and the
1 record size is 10240, for a blocking factor of 20. What the
1 '--blocking-factor' option does is sets the blocking factor, changing
1 the record size while leaving the block size at 512 bytes. 20 was fine
1 for ancient 800 or 1600 bpi reel-to-reel tape drives; most tape drives
1 these days prefer much bigger records in order to stream and not waste
1 tape. When writing tapes for myself, some tend to use a factor of the
1 order of 2048, say, giving a record size of around one megabyte.
1
1 If you use a blocking factor larger than 20, older 'tar' programs
1 might not be able to read the archive, so we recommend this as a limit
1 to use in practice. GNU 'tar', however, will support arbitrarily large
1 record sizes, limited only by the amount of virtual memory or the
1 physical characteristics of the tape device.
1
Menu