tar: Standard

1 
1 Basic Tar Format
1 ================
1 
1      _(This message will disappear, once this node revised.)_
1 
1    While an archive may contain many files, the archive itself is a
1 single ordinary file.  Like any other file, an archive file can be
1 written to a storage device such as a tape or disk, sent through a pipe
1 or over a network, saved on the active file system, or even stored in
1 another archive.  An archive file is not easy to read or manipulate
1 without using the 'tar' utility or Tar mode in GNU Emacs.
1 
1    Physically, an archive consists of a series of file entries
1 terminated by an end-of-archive entry, which consists of two 512 blocks
1 of zero bytes.  A file entry usually describes one of the files in the
1 archive (an "archive member"), and consists of a file header and the
1 contents of the file.  File headers contain file names and statistics,
1 checksum information which 'tar' uses to detect file corruption, and
1 information about file types.
1 
1    Archives are permitted to have more than one member with the same
1 member name.  One way this situation can occur is if more than one
1 version of a file has been stored in the archive.  For information about
1 adding new versions of a file to an archive, see ⇒update.
1 
1    In addition to entries describing archive members, an archive may
11 contain entries which 'tar' itself uses to store information.  ⇒
 label, for an example of such an archive entry.
1 
1    A 'tar' archive file contains a series of blocks.  Each block
1 contains 'BLOCKSIZE' bytes.  Although this format may be thought of as
1 being on magnetic tape, other media are often used.
1 
1    Each file archived is represented by a header block which describes
1 the file, followed by zero or more blocks which give the contents of the
1 file.  At the end of the archive file there are two 512-byte blocks
1 filled with binary zeros as an end-of-file marker.  A reasonable system
1 should write such end-of-file marker at the end of an archive, but must
1 not assume that such a block exists when reading an archive.  In
1 particular GNU 'tar' always issues a warning if it does not encounter
1 it.
1 
1    The blocks may be "blocked" for physical I/O operations.  Each record
1 of N blocks (where N is set by the '--blocking-factor=512-SIZE' ('-b
1 512-SIZE') option to 'tar') is written with a single 'write ()'
1 operation.  On magnetic tapes, the result of such a write is a single
1 record.  When writing an archive, the last record of blocks should be
1 written at the full size, with blocks after the zero block containing
1 all zeros.  When reading an archive, a reasonable system should properly
1 handle an archive whose last record is shorter than the rest, or which
1 contains garbage records after a zero block.
1 
1    The header block is defined in C as follows.  In the GNU 'tar'
1 distribution, this is part of file 'src/tar.h':
1 
1 
1      /* tar Header Block, from POSIX 1003.1-1990.  */
1 
1      /* POSIX header.  */
1 
1      struct posix_header
1      {                              /* byte offset */
1        char name[100];               /*   0 */
1        char mode[8];                 /* 100 */
1        char uid[8];                  /* 108 */
1        char gid[8];                  /* 116 */
1        char size[12];                /* 124 */
1        char mtime[12];               /* 136 */
1        char chksum[8];               /* 148 */
1        char typeflag;                /* 156 */
1        char linkname[100];           /* 157 */
1        char magic[6];                /* 257 */
1        char version[2];              /* 263 */
1        char uname[32];               /* 265 */
1        char gname[32];               /* 297 */
1        char devmajor[8];             /* 329 */
1        char devminor[8];             /* 337 */
1        char prefix[155];             /* 345 */
1                                      /* 500 */
1      };
1 
1      #define TMAGIC   "ustar"        /* ustar and a null */
1      #define TMAGLEN  6
1      #define TVERSION "00"           /* 00 and no null */
1      #define TVERSLEN 2
1 
1      /* Values used in typeflag field.  */
1      #define REGTYPE  '0'            /* regular file */
1      #define AREGTYPE '\0'           /* regular file */
1      #define LNKTYPE  '1'            /* link */
1      #define SYMTYPE  '2'            /* reserved */
1      #define CHRTYPE  '3'            /* character special */
1      #define BLKTYPE  '4'            /* block special */
1      #define DIRTYPE  '5'            /* directory */
1      #define FIFOTYPE '6'            /* FIFO special */
1      #define CONTTYPE '7'            /* reserved */
1 
1      #define XHDTYPE  'x'            /* Extended header referring to the
1                                         next file in the archive */
1      #define XGLTYPE  'g'            /* Global extended header */
1 
1      /* Bits used in the mode field, values in octal.  */
1      #define TSUID    04000          /* set UID on execution */
1      #define TSGID    02000          /* set GID on execution */
1      #define TSVTX    01000          /* reserved */
1                                      /* file permissions */
1      #define TUREAD   00400          /* read by owner */
1      #define TUWRITE  00200          /* write by owner */
1      #define TUEXEC   00100          /* execute/search by owner */
1      #define TGREAD   00040          /* read by group */
1      #define TGWRITE  00020          /* write by group */
1      #define TGEXEC   00010          /* execute/search by group */
1      #define TOREAD   00004          /* read by other */
1      #define TOWRITE  00002          /* write by other */
1      #define TOEXEC   00001          /* execute/search by other */
1 
1      /* tar Header Block, GNU extensions.  */
1 
1      /* In GNU tar, SYMTYPE is for to symbolic links, and CONTTYPE is for
1         contiguous files, so maybe disobeying the "reserved" comment in POSIX
1         header description.  I suspect these were meant to be used this way, and
1         should not have really been "reserved" in the published standards.  */
1 
1      /* *BEWARE* *BEWARE* *BEWARE* that the following information is still
1         boiling, and may change.  Even if the OLDGNU format description should be
1         accurate, the so-called GNU format is not yet fully decided.  It is
1         surely meant to use only extensions allowed by POSIX, but the sketch
1         below repeats some ugliness from the OLDGNU format, which should rather
1         go away.  Sparse files should be saved in such a way that they do *not*
1         require two passes at archive creation time.  Huge files get some POSIX
1         fields to overflow, alternate solutions have to be sought for this.  */
1 
1      /* Descriptor for a single file hole.  */
1 
1      struct sparse
1      {                              /* byte offset */
1        char offset[12];              /*   0 */
1        char numbytes[12];            /*  12 */
1                                      /*  24 */
1      };
1 
1      /* Sparse files are not supported in POSIX ustar format.  For sparse files
1         with a POSIX header, a GNU extra header is provided which holds overall
1         sparse information and a few sparse descriptors.  When an old GNU header
1         replaces both the POSIX header and the GNU extra header, it holds some
1         sparse descriptors too.  Whether POSIX or not, if more sparse descriptors
1         are still needed, they are put into as many successive sparse headers as
1         necessary.  The following constants tell how many sparse descriptors fit
1         in each kind of header able to hold them.  */
1 
1      #define SPARSES_IN_EXTRA_HEADER  16
1      #define SPARSES_IN_OLDGNU_HEADER 4
1      #define SPARSES_IN_SPARSE_HEADER 21
1 
1      /* Extension header for sparse files, used immediately after the GNU extra
1         header, and used only if all sparse information cannot fit into that
1         extra header.  There might even be many such extension headers, one after
1         the other, until all sparse information has been recorded.  */
1 
1      struct sparse_header
1      {                              /* byte offset */
1        struct sparse sp[SPARSES_IN_SPARSE_HEADER];
1                                      /*   0 */
1        char isextended;              /* 504 */
1                                      /* 505 */
1      };
1 
1      /* The old GNU format header conflicts with POSIX format in such a way that
1         POSIX archives may fool old GNU tar's, and POSIX tar's might well be
1         fooled by old GNU tar archives.  An old GNU format header uses the space
1         used by the prefix field in a POSIX header, and cumulates information
1         normally found in a GNU extra header.  With an old GNU tar header, we
1         never see any POSIX header nor GNU extra header.  Supplementary sparse
1         headers are allowed, however.  */
1 
1      struct oldgnu_header
1      {                              /* byte offset */
1        char unused_pad1[345];        /*   0 */
1        char atime[12];               /* 345 Incr. archive: atime of the file */
1        char ctime[12];               /* 357 Incr. archive: ctime of the file */
1        char offset[12];              /* 369 Multivolume archive: the offset of
1                                         the start of this volume */
1        char longnames[4];            /* 381 Not used */
1        char unused_pad2;             /* 385 */
1        struct sparse sp[SPARSES_IN_OLDGNU_HEADER];
1                                      /* 386 */
1        char isextended;              /* 482 Sparse file: Extension sparse header
1                                         follows */
1        char realsize[12];            /* 483 Sparse file: Real size*/
1                                      /* 495 */
1      };
1 
1      /* OLDGNU_MAGIC uses both magic and version fields, which are contiguous.
1         Found in an archive, it indicates an old GNU header format, which will be
1         hopefully become obsolescent.  With OLDGNU_MAGIC, uname and gname are
1         valid, though the header is not truly POSIX conforming.  */
1      #define OLDGNU_MAGIC "ustar  "  /* 7 chars and a null */
1 
1      /* The standards committee allows only capital A through capital Z for
1         user-defined expansion.  Other letters in use include:
1 
1         'A' Solaris Access Control List
1         'E' Solaris Extended Attribute File
1         'I' Inode only, as in 'star'
1         'N' Obsolete GNU tar, for file names that do not fit into the main header.
1         'X' POSIX 1003.1-2001 eXtended (VU version)  */
1 
1      /* This is a dir entry that contains the names of files that were in the
1         dir at the time the dump was made.  */
1      #define GNUTYPE_DUMPDIR 'D'
1 
1      /* Identifies the *next* file on the tape as having a long linkname.  */
1      #define GNUTYPE_LONGLINK 'K'
1 
1      /* Identifies the *next* file on the tape as having a long name.  */
1      #define GNUTYPE_LONGNAME 'L'
1 
1      /* This is the continuation of a file that began on another volume.  */
1      #define GNUTYPE_MULTIVOL 'M'
1 
1      /* This is for sparse files.  */
1      #define GNUTYPE_SPARSE 'S'
1 
1      /* This file is a tape/volume header.  Ignore it on extraction.  */
1      #define GNUTYPE_VOLHDR 'V'
1 
1      /* Solaris extended header */
1      #define SOLARIS_XHDTYPE 'X'
1 
1      /* Jo"rg Schilling star header */
1 
1      struct star_header
1      {                              /* byte offset */
1        char name[100];               /*   0 */
1        char mode[8];                 /* 100 */
1        char uid[8];                  /* 108 */
1        char gid[8];                  /* 116 */
1        char size[12];                /* 124 */
1        char mtime[12];               /* 136 */
1        char chksum[8];               /* 148 */
1        char typeflag;                /* 156 */
1        char linkname[100];           /* 157 */
1        char magic[6];                /* 257 */
1        char version[2];              /* 263 */
1        char uname[32];               /* 265 */
1        char gname[32];               /* 297 */
1        char devmajor[8];             /* 329 */
1        char devminor[8];             /* 337 */
1        char prefix[131];             /* 345 */
1        char atime[12];               /* 476 */
1        char ctime[12];               /* 488 */
1                                      /* 500 */
1      };
1 
1      #define SPARSES_IN_STAR_HEADER      4
1      #define SPARSES_IN_STAR_EXT_HEADER  21
1 
1      struct star_in_header
1      {
1        char fill[345];       /*   0  Everything that is before t_prefix */
1        char prefix[1];       /* 345  t_name prefix */
1        char fill2;           /* 346  */
1        char fill3[8];        /* 347  */
1        char isextended;      /* 355  */
1        struct sparse sp[SPARSES_IN_STAR_HEADER]; /* 356  */
1        char realsize[12];    /* 452  Actual size of the file */
1        char offset[12];      /* 464  Offset of multivolume contents */
1        char atime[12];       /* 476  */
1        char ctime[12];       /* 488  */
1        char mfill[8];        /* 500  */
1        char xmagic[4];       /* 508  "tar" */
1      };
1 
1      struct star_ext_header
1      {
1        struct sparse sp[SPARSES_IN_STAR_EXT_HEADER];
1        char isextended;
1      };
1 
1 
1    All characters in header blocks are represented by using 8-bit
1 characters in the local variant of ASCII. Each field within the
1 structure is contiguous; that is, there is no padding used within the
1 structure.  Each character on the archive medium is stored contiguously.
1 
1    Bytes representing the contents of files (after the header block of
1 each file) are not translated in any way and are not constrained to
1 represent characters in any character set.  The 'tar' format does not
1 distinguish text files from binary files, and no translation of file
1 contents is performed.
1 
1    The 'name', 'linkname', 'magic', 'uname', and 'gname' are
1 null-terminated character strings.  All other fields are zero-filled
1 octal numbers in ASCII. Each numeric field of width W contains W minus 1
1 digits, and a null.  (In the extended GNU format, the numeric fields can
1 take other forms.)
1 
1    The 'name' field is the file name of the file, with directory names
1 (if any) preceding the file name, separated by slashes.
1 
1    The 'mode' field provides nine bits specifying file permissions and
1 three bits to specify the Set UID, Set GID, and Save Text ("sticky")
1 modes.  Values for these bits are defined above.  When special
1 permissions are required to create a file with a given mode, and the
1 user restoring files from the archive does not hold such permissions,
1 the mode bit(s) specifying those special permissions are ignored.  Modes
1 which are not supported by the operating system restoring files from the
1 archive will be ignored.  Unsupported modes should be faked up when
1 creating or updating an archive; e.g., the group permission could be
1 copied from the _other_ permission.
1 
1    The 'uid' and 'gid' fields are the numeric user and group ID of the
1 file owners, respectively.  If the operating system does not support
1 numeric user or group IDs, these fields should be ignored.
1 
1    The 'size' field is the size of the file in bytes; linked files are
1 archived with this field specified as zero.
1 
1    The 'mtime' field represents the data modification time of the file
1 at the time it was archived.  It represents the integer number of
1 seconds since January 1, 1970, 00:00 Coordinated Universal Time.
1 
1    The 'chksum' field represents the simple sum of all bytes in the
1 header block.  Each 8-bit byte in the header is added to an unsigned
1 integer, initialized to zero, the precision of which shall be no less
1 than seventeen bits.  When calculating the checksum, the 'chksum' field
1 is treated as if it were all blanks.
1 
1    The 'typeflag' field specifies the type of file archived.  If a
1 particular implementation does not recognize or permit the specified
1 type, the file will be extracted as if it were a regular file.  As this
1 action occurs, 'tar' issues a warning to the standard error.
1 
1    The 'atime' and 'ctime' fields are used in making incremental
1 backups; they store, respectively, the particular file's access and
1 status change times.
1 
1    The 'offset' is used by the '--multi-volume' ('-M') option, when
1 making a multi-volume archive.  The offset is number of bytes into the
1 file that we need to restart at to continue the file on the next tape,
1 i.e., where we store the location that a continued file is continued at.
1 
1    The following fields were added to deal with sparse files.  A file is
1 "sparse" if it takes in unallocated blocks which end up being
1 represented as zeros, i.e., no useful data.  A test to see if a file is
1 sparse is to look at the number blocks allocated for it versus the
1 number of characters in the file; if there are fewer blocks allocated
1 for the file than would normally be allocated for a file of that size,
1 then the file is sparse.  This is the method 'tar' uses to detect a
1 sparse file, and once such a file is detected, it is treated differently
1 from non-sparse files.
1 
1    Sparse files are often 'dbm' files, or other database-type files
1 which have data at some points and emptiness in the greater part of the
1 file.  Such files can appear to be very large when an 'ls -l' is done on
1 them, when in truth, there may be a very small amount of important data
1 contained in the file.  It is thus undesirable to have 'tar' think that
1 it must back up this entire file, as great quantities of room are wasted
1 on empty blocks, which can lead to running out of room on a tape far
1 earlier than is necessary.  Thus, sparse files are dealt with so that
1 these empty blocks are not written to the tape.  Instead, what is
1 written to the tape is a description, of sorts, of the sparse file:
1 where the holes are, how big the holes are, and how much data is found
1 at the end of the hole.  This way, the file takes up potentially far
1 less room on the tape, and when the file is extracted later on, it will
1 look exactly the way it looked beforehand.  The following is a
1 description of the fields used to handle a sparse file:
1 
1    The 'sp' is an array of 'struct sparse'.  Each 'struct sparse'
1 contains two 12-character strings which represent an offset into the
1 file and a number of bytes to be written at that offset.  The offset is
1 absolute, and not relative to the offset in preceding array element.
1 
1    The header can hold four of these 'struct sparse' at the moment; if
1 more are needed, they are not stored in the header.
1 
1    The 'isextended' flag is set when an 'extended_header' is needed to
1 deal with a file.  Note that this means that this flag can only be set
1 when dealing with a sparse file, and it is only set in the event that
1 the description of the file will not fit in the allotted room for sparse
1 structures in the header.  In other words, an extended_header is needed.
1 
1    The 'extended_header' structure is used for sparse files which need
1 more sparse structures than can fit in the header.  The header can fit 4
1 such structures; if more are needed, the flag 'isextended' gets set and
1 the next block is an 'extended_header'.
1 
1    Each 'extended_header' structure contains an array of 21 sparse
1 structures, along with a similar 'isextended' flag that the header had.
1 There can be an indeterminate number of such 'extended_header's to
1 describe a sparse file.
1 
1 'REGTYPE'
1 'AREGTYPE'
1      These flags represent a regular file.  In order to be compatible
1      with older versions of 'tar', a 'typeflag' value of 'AREGTYPE'
1      should be silently recognized as a regular file.  New archives
1      should be created using 'REGTYPE'.  Also, for backward
1      compatibility, 'tar' treats a regular file whose name ends with a
1      slash as a directory.
1 
1 'LNKTYPE'
1      This flag represents a file linked to another file, of any type,
1      previously archived.  Such files are identified in Unix by each
1      file having the same device and inode number.  The linked-to name
1      is specified in the 'linkname' field with a trailing null.
1 
1 'SYMTYPE'
1      This represents a symbolic link to another file.  The linked-to
1      name is specified in the 'linkname' field with a trailing null.
1 
1 'CHRTYPE'
1 'BLKTYPE'
1      These represent character special files and block special files
1      respectively.  In this case the 'devmajor' and 'devminor' fields
1      will contain the major and minor device numbers respectively.
1      Operating systems may map the device specifications to their own
1      local specification, or may ignore the entry.
1 
1 'DIRTYPE'
1      This flag specifies a directory or sub-directory.  The directory
1      name in the 'name' field should end with a slash.  On systems where
1      disk allocation is performed on a directory basis, the 'size' field
1      will contain the maximum number of bytes (which may be rounded to
1      the nearest disk block allocation unit) which the directory may
1      hold.  A 'size' field of zero indicates no such limiting.  Systems
1      which do not support limiting in this manner should ignore the
1      'size' field.
1 
1 'FIFOTYPE'
1      This specifies a FIFO special file.  Note that the archiving of a
1      FIFO file archives the existence of this file and not its contents.
1 
1 'CONTTYPE'
1      This specifies a contiguous file, which is the same as a normal
1      file except that, in operating systems which support it, all its
1      space is allocated contiguously on the disk.  Operating systems
1      which do not allow contiguous allocation should silently treat this
1      type as a normal file.
1 
1 'A' ... 'Z'
1      These are reserved for custom implementations.  Some of these are
1      used in the GNU modified format, as described below.
1 
1    Other values are reserved for specification in future revisions of
1 the P1003 standard, and should not be used by any 'tar' program.
1 
1    The 'magic' field indicates that this archive was output in the P1003
1 archive format.  If this field contains 'TMAGIC', the 'uname' and
1 'gname' fields will contain the ASCII representation of the owner and
1 group of the file respectively.  If found, the user and group IDs are
1 used rather than the values in the 'uid' and 'gid' fields.
1 
1    For references, see ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990,
1 pages 169-173 (section 10.1) for 'Archive/Interchange File Format'; and
1 IEEE Std 1003.2-1992, pages 380-388 (section 4.48) and pages 936-940
1 (section E.4.48) for 'pax - Portable archive interchange'.
1