tar: Standard
1
1 Basic Tar Format
1 ================
1
1 _(This message will disappear, once this node revised.)_
1
1 While an archive may contain many files, the archive itself is a
1 single ordinary file. Like any other file, an archive file can be
1 written to a storage device such as a tape or disk, sent through a pipe
1 or over a network, saved on the active file system, or even stored in
1 another archive. An archive file is not easy to read or manipulate
1 without using the 'tar' utility or Tar mode in GNU Emacs.
1
1 Physically, an archive consists of a series of file entries
1 terminated by an end-of-archive entry, which consists of two 512 blocks
1 of zero bytes. A file entry usually describes one of the files in the
1 archive (an "archive member"), and consists of a file header and the
1 contents of the file. File headers contain file names and statistics,
1 checksum information which 'tar' uses to detect file corruption, and
1 information about file types.
1
1 Archives are permitted to have more than one member with the same
1 member name. One way this situation can occur is if more than one
1 version of a file has been stored in the archive. For information about
1 adding new versions of a file to an archive, see ⇒update.
1
1 In addition to entries describing archive members, an archive may
11 contain entries which 'tar' itself uses to store information. ⇒
label, for an example of such an archive entry.
1
1 A 'tar' archive file contains a series of blocks. Each block
1 contains 'BLOCKSIZE' bytes. Although this format may be thought of as
1 being on magnetic tape, other media are often used.
1
1 Each file archived is represented by a header block which describes
1 the file, followed by zero or more blocks which give the contents of the
1 file. At the end of the archive file there are two 512-byte blocks
1 filled with binary zeros as an end-of-file marker. A reasonable system
1 should write such end-of-file marker at the end of an archive, but must
1 not assume that such a block exists when reading an archive. In
1 particular GNU 'tar' always issues a warning if it does not encounter
1 it.
1
1 The blocks may be "blocked" for physical I/O operations. Each record
1 of N blocks (where N is set by the '--blocking-factor=512-SIZE' ('-b
1 512-SIZE') option to 'tar') is written with a single 'write ()'
1 operation. On magnetic tapes, the result of such a write is a single
1 record. When writing an archive, the last record of blocks should be
1 written at the full size, with blocks after the zero block containing
1 all zeros. When reading an archive, a reasonable system should properly
1 handle an archive whose last record is shorter than the rest, or which
1 contains garbage records after a zero block.
1
1 The header block is defined in C as follows. In the GNU 'tar'
1 distribution, this is part of file 'src/tar.h':
1
1
1 /* tar Header Block, from POSIX 1003.1-1990. */
1
1 /* POSIX header. */
1
1 struct posix_header
1 { /* byte offset */
1 char name[100]; /* 0 */
1 char mode[8]; /* 100 */
1 char uid[8]; /* 108 */
1 char gid[8]; /* 116 */
1 char size[12]; /* 124 */
1 char mtime[12]; /* 136 */
1 char chksum[8]; /* 148 */
1 char typeflag; /* 156 */
1 char linkname[100]; /* 157 */
1 char magic[6]; /* 257 */
1 char version[2]; /* 263 */
1 char uname[32]; /* 265 */
1 char gname[32]; /* 297 */
1 char devmajor[8]; /* 329 */
1 char devminor[8]; /* 337 */
1 char prefix[155]; /* 345 */
1 /* 500 */
1 };
1
1 #define TMAGIC "ustar" /* ustar and a null */
1 #define TMAGLEN 6
1 #define TVERSION "00" /* 00 and no null */
1 #define TVERSLEN 2
1
1 /* Values used in typeflag field. */
1 #define REGTYPE '0' /* regular file */
1 #define AREGTYPE '\0' /* regular file */
1 #define LNKTYPE '1' /* link */
1 #define SYMTYPE '2' /* reserved */
1 #define CHRTYPE '3' /* character special */
1 #define BLKTYPE '4' /* block special */
1 #define DIRTYPE '5' /* directory */
1 #define FIFOTYPE '6' /* FIFO special */
1 #define CONTTYPE '7' /* reserved */
1
1 #define XHDTYPE 'x' /* Extended header referring to the
1 next file in the archive */
1 #define XGLTYPE 'g' /* Global extended header */
1
1 /* Bits used in the mode field, values in octal. */
1 #define TSUID 04000 /* set UID on execution */
1 #define TSGID 02000 /* set GID on execution */
1 #define TSVTX 01000 /* reserved */
1 /* file permissions */
1 #define TUREAD 00400 /* read by owner */
1 #define TUWRITE 00200 /* write by owner */
1 #define TUEXEC 00100 /* execute/search by owner */
1 #define TGREAD 00040 /* read by group */
1 #define TGWRITE 00020 /* write by group */
1 #define TGEXEC 00010 /* execute/search by group */
1 #define TOREAD 00004 /* read by other */
1 #define TOWRITE 00002 /* write by other */
1 #define TOEXEC 00001 /* execute/search by other */
1
1 /* tar Header Block, GNU extensions. */
1
1 /* In GNU tar, SYMTYPE is for to symbolic links, and CONTTYPE is for
1 contiguous files, so maybe disobeying the "reserved" comment in POSIX
1 header description. I suspect these were meant to be used this way, and
1 should not have really been "reserved" in the published standards. */
1
1 /* *BEWARE* *BEWARE* *BEWARE* that the following information is still
1 boiling, and may change. Even if the OLDGNU format description should be
1 accurate, the so-called GNU format is not yet fully decided. It is
1 surely meant to use only extensions allowed by POSIX, but the sketch
1 below repeats some ugliness from the OLDGNU format, which should rather
1 go away. Sparse files should be saved in such a way that they do *not*
1 require two passes at archive creation time. Huge files get some POSIX
1 fields to overflow, alternate solutions have to be sought for this. */
1
1 /* Descriptor for a single file hole. */
1
1 struct sparse
1 { /* byte offset */
1 char offset[12]; /* 0 */
1 char numbytes[12]; /* 12 */
1 /* 24 */
1 };
1
1 /* Sparse files are not supported in POSIX ustar format. For sparse files
1 with a POSIX header, a GNU extra header is provided which holds overall
1 sparse information and a few sparse descriptors. When an old GNU header
1 replaces both the POSIX header and the GNU extra header, it holds some
1 sparse descriptors too. Whether POSIX or not, if more sparse descriptors
1 are still needed, they are put into as many successive sparse headers as
1 necessary. The following constants tell how many sparse descriptors fit
1 in each kind of header able to hold them. */
1
1 #define SPARSES_IN_EXTRA_HEADER 16
1 #define SPARSES_IN_OLDGNU_HEADER 4
1 #define SPARSES_IN_SPARSE_HEADER 21
1
1 /* Extension header for sparse files, used immediately after the GNU extra
1 header, and used only if all sparse information cannot fit into that
1 extra header. There might even be many such extension headers, one after
1 the other, until all sparse information has been recorded. */
1
1 struct sparse_header
1 { /* byte offset */
1 struct sparse sp[SPARSES_IN_SPARSE_HEADER];
1 /* 0 */
1 char isextended; /* 504 */
1 /* 505 */
1 };
1
1 /* The old GNU format header conflicts with POSIX format in such a way that
1 POSIX archives may fool old GNU tar's, and POSIX tar's might well be
1 fooled by old GNU tar archives. An old GNU format header uses the space
1 used by the prefix field in a POSIX header, and cumulates information
1 normally found in a GNU extra header. With an old GNU tar header, we
1 never see any POSIX header nor GNU extra header. Supplementary sparse
1 headers are allowed, however. */
1
1 struct oldgnu_header
1 { /* byte offset */
1 char unused_pad1[345]; /* 0 */
1 char atime[12]; /* 345 Incr. archive: atime of the file */
1 char ctime[12]; /* 357 Incr. archive: ctime of the file */
1 char offset[12]; /* 369 Multivolume archive: the offset of
1 the start of this volume */
1 char longnames[4]; /* 381 Not used */
1 char unused_pad2; /* 385 */
1 struct sparse sp[SPARSES_IN_OLDGNU_HEADER];
1 /* 386 */
1 char isextended; /* 482 Sparse file: Extension sparse header
1 follows */
1 char realsize[12]; /* 483 Sparse file: Real size*/
1 /* 495 */
1 };
1
1 /* OLDGNU_MAGIC uses both magic and version fields, which are contiguous.
1 Found in an archive, it indicates an old GNU header format, which will be
1 hopefully become obsolescent. With OLDGNU_MAGIC, uname and gname are
1 valid, though the header is not truly POSIX conforming. */
1 #define OLDGNU_MAGIC "ustar " /* 7 chars and a null */
1
1 /* The standards committee allows only capital A through capital Z for
1 user-defined expansion. Other letters in use include:
1
1 'A' Solaris Access Control List
1 'E' Solaris Extended Attribute File
1 'I' Inode only, as in 'star'
1 'N' Obsolete GNU tar, for file names that do not fit into the main header.
1 'X' POSIX 1003.1-2001 eXtended (VU version) */
1
1 /* This is a dir entry that contains the names of files that were in the
1 dir at the time the dump was made. */
1 #define GNUTYPE_DUMPDIR 'D'
1
1 /* Identifies the *next* file on the tape as having a long linkname. */
1 #define GNUTYPE_LONGLINK 'K'
1
1 /* Identifies the *next* file on the tape as having a long name. */
1 #define GNUTYPE_LONGNAME 'L'
1
1 /* This is the continuation of a file that began on another volume. */
1 #define GNUTYPE_MULTIVOL 'M'
1
1 /* This is for sparse files. */
1 #define GNUTYPE_SPARSE 'S'
1
1 /* This file is a tape/volume header. Ignore it on extraction. */
1 #define GNUTYPE_VOLHDR 'V'
1
1 /* Solaris extended header */
1 #define SOLARIS_XHDTYPE 'X'
1
1 /* Jo"rg Schilling star header */
1
1 struct star_header
1 { /* byte offset */
1 char name[100]; /* 0 */
1 char mode[8]; /* 100 */
1 char uid[8]; /* 108 */
1 char gid[8]; /* 116 */
1 char size[12]; /* 124 */
1 char mtime[12]; /* 136 */
1 char chksum[8]; /* 148 */
1 char typeflag; /* 156 */
1 char linkname[100]; /* 157 */
1 char magic[6]; /* 257 */
1 char version[2]; /* 263 */
1 char uname[32]; /* 265 */
1 char gname[32]; /* 297 */
1 char devmajor[8]; /* 329 */
1 char devminor[8]; /* 337 */
1 char prefix[131]; /* 345 */
1 char atime[12]; /* 476 */
1 char ctime[12]; /* 488 */
1 /* 500 */
1 };
1
1 #define SPARSES_IN_STAR_HEADER 4
1 #define SPARSES_IN_STAR_EXT_HEADER 21
1
1 struct star_in_header
1 {
1 char fill[345]; /* 0 Everything that is before t_prefix */
1 char prefix[1]; /* 345 t_name prefix */
1 char fill2; /* 346 */
1 char fill3[8]; /* 347 */
1 char isextended; /* 355 */
1 struct sparse sp[SPARSES_IN_STAR_HEADER]; /* 356 */
1 char realsize[12]; /* 452 Actual size of the file */
1 char offset[12]; /* 464 Offset of multivolume contents */
1 char atime[12]; /* 476 */
1 char ctime[12]; /* 488 */
1 char mfill[8]; /* 500 */
1 char xmagic[4]; /* 508 "tar" */
1 };
1
1 struct star_ext_header
1 {
1 struct sparse sp[SPARSES_IN_STAR_EXT_HEADER];
1 char isextended;
1 };
1
1
1 All characters in header blocks are represented by using 8-bit
1 characters in the local variant of ASCII. Each field within the
1 structure is contiguous; that is, there is no padding used within the
1 structure. Each character on the archive medium is stored contiguously.
1
1 Bytes representing the contents of files (after the header block of
1 each file) are not translated in any way and are not constrained to
1 represent characters in any character set. The 'tar' format does not
1 distinguish text files from binary files, and no translation of file
1 contents is performed.
1
1 The 'name', 'linkname', 'magic', 'uname', and 'gname' are
1 null-terminated character strings. All other fields are zero-filled
1 octal numbers in ASCII. Each numeric field of width W contains W minus 1
1 digits, and a null. (In the extended GNU format, the numeric fields can
1 take other forms.)
1
1 The 'name' field is the file name of the file, with directory names
1 (if any) preceding the file name, separated by slashes.
1
1 The 'mode' field provides nine bits specifying file permissions and
1 three bits to specify the Set UID, Set GID, and Save Text ("sticky")
1 modes. Values for these bits are defined above. When special
1 permissions are required to create a file with a given mode, and the
1 user restoring files from the archive does not hold such permissions,
1 the mode bit(s) specifying those special permissions are ignored. Modes
1 which are not supported by the operating system restoring files from the
1 archive will be ignored. Unsupported modes should be faked up when
1 creating or updating an archive; e.g., the group permission could be
1 copied from the _other_ permission.
1
1 The 'uid' and 'gid' fields are the numeric user and group ID of the
1 file owners, respectively. If the operating system does not support
1 numeric user or group IDs, these fields should be ignored.
1
1 The 'size' field is the size of the file in bytes; linked files are
1 archived with this field specified as zero.
1
1 The 'mtime' field represents the data modification time of the file
1 at the time it was archived. It represents the integer number of
1 seconds since January 1, 1970, 00:00 Coordinated Universal Time.
1
1 The 'chksum' field represents the simple sum of all bytes in the
1 header block. Each 8-bit byte in the header is added to an unsigned
1 integer, initialized to zero, the precision of which shall be no less
1 than seventeen bits. When calculating the checksum, the 'chksum' field
1 is treated as if it were all blanks.
1
1 The 'typeflag' field specifies the type of file archived. If a
1 particular implementation does not recognize or permit the specified
1 type, the file will be extracted as if it were a regular file. As this
1 action occurs, 'tar' issues a warning to the standard error.
1
1 The 'atime' and 'ctime' fields are used in making incremental
1 backups; they store, respectively, the particular file's access and
1 status change times.
1
1 The 'offset' is used by the '--multi-volume' ('-M') option, when
1 making a multi-volume archive. The offset is number of bytes into the
1 file that we need to restart at to continue the file on the next tape,
1 i.e., where we store the location that a continued file is continued at.
1
1 The following fields were added to deal with sparse files. A file is
1 "sparse" if it takes in unallocated blocks which end up being
1 represented as zeros, i.e., no useful data. A test to see if a file is
1 sparse is to look at the number blocks allocated for it versus the
1 number of characters in the file; if there are fewer blocks allocated
1 for the file than would normally be allocated for a file of that size,
1 then the file is sparse. This is the method 'tar' uses to detect a
1 sparse file, and once such a file is detected, it is treated differently
1 from non-sparse files.
1
1 Sparse files are often 'dbm' files, or other database-type files
1 which have data at some points and emptiness in the greater part of the
1 file. Such files can appear to be very large when an 'ls -l' is done on
1 them, when in truth, there may be a very small amount of important data
1 contained in the file. It is thus undesirable to have 'tar' think that
1 it must back up this entire file, as great quantities of room are wasted
1 on empty blocks, which can lead to running out of room on a tape far
1 earlier than is necessary. Thus, sparse files are dealt with so that
1 these empty blocks are not written to the tape. Instead, what is
1 written to the tape is a description, of sorts, of the sparse file:
1 where the holes are, how big the holes are, and how much data is found
1 at the end of the hole. This way, the file takes up potentially far
1 less room on the tape, and when the file is extracted later on, it will
1 look exactly the way it looked beforehand. The following is a
1 description of the fields used to handle a sparse file:
1
1 The 'sp' is an array of 'struct sparse'. Each 'struct sparse'
1 contains two 12-character strings which represent an offset into the
1 file and a number of bytes to be written at that offset. The offset is
1 absolute, and not relative to the offset in preceding array element.
1
1 The header can hold four of these 'struct sparse' at the moment; if
1 more are needed, they are not stored in the header.
1
1 The 'isextended' flag is set when an 'extended_header' is needed to
1 deal with a file. Note that this means that this flag can only be set
1 when dealing with a sparse file, and it is only set in the event that
1 the description of the file will not fit in the allotted room for sparse
1 structures in the header. In other words, an extended_header is needed.
1
1 The 'extended_header' structure is used for sparse files which need
1 more sparse structures than can fit in the header. The header can fit 4
1 such structures; if more are needed, the flag 'isextended' gets set and
1 the next block is an 'extended_header'.
1
1 Each 'extended_header' structure contains an array of 21 sparse
1 structures, along with a similar 'isextended' flag that the header had.
1 There can be an indeterminate number of such 'extended_header's to
1 describe a sparse file.
1
1 'REGTYPE'
1 'AREGTYPE'
1 These flags represent a regular file. In order to be compatible
1 with older versions of 'tar', a 'typeflag' value of 'AREGTYPE'
1 should be silently recognized as a regular file. New archives
1 should be created using 'REGTYPE'. Also, for backward
1 compatibility, 'tar' treats a regular file whose name ends with a
1 slash as a directory.
1
1 'LNKTYPE'
1 This flag represents a file linked to another file, of any type,
1 previously archived. Such files are identified in Unix by each
1 file having the same device and inode number. The linked-to name
1 is specified in the 'linkname' field with a trailing null.
1
1 'SYMTYPE'
1 This represents a symbolic link to another file. The linked-to
1 name is specified in the 'linkname' field with a trailing null.
1
1 'CHRTYPE'
1 'BLKTYPE'
1 These represent character special files and block special files
1 respectively. In this case the 'devmajor' and 'devminor' fields
1 will contain the major and minor device numbers respectively.
1 Operating systems may map the device specifications to their own
1 local specification, or may ignore the entry.
1
1 'DIRTYPE'
1 This flag specifies a directory or sub-directory. The directory
1 name in the 'name' field should end with a slash. On systems where
1 disk allocation is performed on a directory basis, the 'size' field
1 will contain the maximum number of bytes (which may be rounded to
1 the nearest disk block allocation unit) which the directory may
1 hold. A 'size' field of zero indicates no such limiting. Systems
1 which do not support limiting in this manner should ignore the
1 'size' field.
1
1 'FIFOTYPE'
1 This specifies a FIFO special file. Note that the archiving of a
1 FIFO file archives the existence of this file and not its contents.
1
1 'CONTTYPE'
1 This specifies a contiguous file, which is the same as a normal
1 file except that, in operating systems which support it, all its
1 space is allocated contiguously on the disk. Operating systems
1 which do not allow contiguous allocation should silently treat this
1 type as a normal file.
1
1 'A' ... 'Z'
1 These are reserved for custom implementations. Some of these are
1 used in the GNU modified format, as described below.
1
1 Other values are reserved for specification in future revisions of
1 the P1003 standard, and should not be used by any 'tar' program.
1
1 The 'magic' field indicates that this archive was output in the P1003
1 archive format. If this field contains 'TMAGIC', the 'uname' and
1 'gname' fields will contain the ASCII representation of the owner and
1 group of the file respectively. If found, the user and group IDs are
1 used rather than the values in the 'uid' and 'gid' fields.
1
1 For references, see ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990,
1 pages 169-173 (section 10.1) for 'Archive/Interchange File Format'; and
1 IEEE Std 1003.2-1992, pages 380-388 (section 4.48) and pages 936-940
1 (section E.4.48) for 'pax - Portable archive interchange'.
1