tar: cpio

1 
1 8.4 Comparison of 'tar' and 'cpio'
1 ==================================
1 
1      _(This message will disappear, once this node revised.)_
1 
1    The 'cpio' archive formats, like 'tar', do have maximum file name
1 lengths.  The binary and old ASCII formats have a maximum file length of
1 256, and the new ASCII and CRC ASCII formats have a max file length of
1 1024.  GNU 'cpio' can read and write archives with arbitrary file name
1 lengths, but other 'cpio' implementations may crash unexplainedly trying
1 to read them.
1 
1    'tar' handles symbolic links in the form in which it comes in BSD;
1 'cpio' doesn't handle symbolic links in the form in which it comes in
1 System V prior to SVR4, and some vendors may have added symlinks to
1 their system without enhancing 'cpio' to know about them.  Others may
1 have enhanced it in a way other than the way I did it at Sun, and which
1 was adopted by AT&T (and which is, I think, also present in the 'cpio'
1 that Berkeley picked up from AT&T and put into a later BSD release--I
1 think I gave them my changes).
1 
1    (SVR4 does some funny stuff with 'tar'; basically, its 'cpio' can
1 handle 'tar' format input, and write it on output, and it probably
1 handles symbolic links.  They may not have bothered doing anything to
1 enhance 'tar' as a result.)
1 
1    'cpio' handles special files; traditional 'tar' doesn't.
1 
1    'tar' comes with V7, System III, System V, and BSD source; 'cpio'
1 comes only with System III, System V, and later BSD (4.3-tahoe and
1 later).
1 
1    'tar''s way of handling multiple hard links to a file can handle file
1 systems that support 32-bit i-numbers (e.g., the BSD file system);
1 'cpio's way requires you to play some games (in its "binary" format,
1 i-numbers are only 16 bits, and in its "portable ASCII" format, they're
1 18 bits--it would have to play games with the "file system ID" field of
1 the header to make sure that the file system ID/i-number pairs of
1 different files were always different), and I don't know which 'cpio's,
1 if any, play those games.  Those that don't might get confused and think
1 two files are the same file when they're not, and make hard links
1 between them.
1 
1    'tar's way of handling multiple hard links to a file places only one
1 copy of the link on the tape, but the name attached to that copy is the
1 _only_ one you can use to retrieve the file; 'cpio's way puts one copy
1 for every link, but you can retrieve it using any of the names.
1 
1      What type of check sum (if any) is used, and how is this
1      calculated.
1 
1    See the attached manual pages for 'tar' and 'cpio' format.  'tar'
1 uses a checksum which is the sum of all the bytes in the 'tar' header
1 for a file; 'cpio' uses no checksum.
1 
1      If anyone knows why 'cpio' was made when 'tar' was present at the
1      unix scene,
1 
1    It wasn't.  'cpio' first showed up in PWB/UNIX 1.0; no
1 generally-available version of UNIX had 'tar' at the time.  I don't know
1 whether any version that was generally available _within AT&T_ had
1 'tar', or, if so, whether the people within AT&T who did 'cpio' knew
1 about it.
1 
1    On restore, if there is a corruption on a tape 'tar' will stop at
1 that point, while 'cpio' will skip over it and try to restore the rest
1 of the files.
1 
1    The main difference is just in the command syntax and header format.
1 
1    'tar' is a little more tape-oriented in that everything is blocked to
1 start on a record boundary.
1 
1      Is there any differences between the ability to recover crashed
1      archives between the two of them.  (Is there any chance of
1      recovering crashed archives at all.)
1 
1    Theoretically it should be easier under 'tar' since the blocking lets
1 you find a header with some variation of 'dd skip=NN'.  However, modern
1 'cpio''s and variations have an option to just search for the next file
1 header after an error with a reasonable chance of resyncing.  However,
1 lots of tape driver software won't allow you to continue past a media
1 error which should be the only reason for getting out of sync unless a
1 file changed sizes while you were writing the archive.
1 
1      If anyone knows why 'cpio' was made when 'tar' was present at the
1      unix scene, please tell me about this too.
1 
1    Probably because it is more media efficient (by not blocking
1 everything and using only the space needed for the headers where 'tar'
1 always uses 512 bytes per file header) and it knows how to archive
1 special files.
1 
1    You might want to look at the freely available alternatives.  The
1 major ones are 'afio', GNU 'tar', and 'pax', each of which have their
1 own extensions with some backwards compatibility.
1 
1    Sparse files were 'tar'red as sparse files (which you can easily
1 test, because the resulting archive gets smaller, and GNU 'cpio' can no
1 longer read it).
1