gzip: Overview

1 
1 1 Overview
1 **********
1 
1 ‘gzip’ reduces the size of the named files using Lempel–Ziv coding
1 (LZ77).  Whenever possible, each file is replaced by one with the
1 extension ‘.gz’, while keeping the same ownership modes, access and
1 modification times.  (The default extension is ‘z’ for MSDOS, OS/2 FAT
1 and Atari.)  If no files are specified or if a file name is ‘-’, the
1 standard input is compressed to the standard output.  ‘gzip’ will only
1 attempt to compress regular files.  In particular, it will ignore
1 symbolic links.
1 
1    If the new file name is too long for its file system, ‘gzip’
1 truncates it.  ‘gzip’ attempts to truncate only the parts of the file
1 name longer than 3 characters.  (A part is delimited by dots.)  If the
1 name consists of small parts only, the longest parts are truncated.  For
1 example, if file names are limited to 14 characters, gzip.msdos.exe is
1 compressed to gzi.msd.exe.gz.  Names are not truncated on systems which
1 do not have a limit on file name length.
1 
1    By default, ‘gzip’ keeps the original file name and timestamp in the
1 compressed file.  These are used when decompressing the file with the
1 ‘-N’ option.  This is useful when the compressed file name was truncated
1 or when the timestamp was not preserved after a file transfer.  However,
1 due to limitations in the current ‘gzip’ file format, fractional seconds
1 are discarded.  Also, timestamps must fall within the range 1970-01-01
1 00:00:01 through 2106-02-07 06:28:15 UTC, and hosts whose operating
1 systems use 32-bit timestamps are further restricted to timestamps no
1 later than 2038-01-19 03:14:07 UTC.  The upper bounds assume the typical
1 case where leap seconds are ignored.
1 
1    Compressed files can be restored to their original form using ‘gzip
1 -d’ or ‘gunzip’ or ‘zcat’.  If the original name saved in the compressed
1 file is not suitable for its file system, a new name is constructed from
1 the original one to make it legal.
1 
1    ‘gunzip’ takes a list of files on its command line and replaces each
1 file whose name ends with ‘.gz’, ‘.z’ ‘-gz’, ‘-z’, or ‘_z’ (ignoring
1 case) and which begins with the correct magic number with an
1 uncompressed file without the original extension.  ‘gunzip’ also
1 recognizes the special extensions ‘.tgz’ and ‘.taz’ as shorthands for
1 ‘.tar.gz’ and ‘.tar.Z’ respectively.  When compressing, ‘gzip’ uses the
1 ‘.tgz’ extension if necessary instead of truncating a file with a ‘.tar’
1 extension.
1 
1    ‘gunzip’ can currently decompress files created by ‘gzip’, ‘zip’,
1 ‘compress’ or ‘pack’.  The detection of the input format is automatic.
1 When using the first two formats, ‘gunzip’ checks a 32 bit CRC (cyclic
1 redundancy check).  For ‘pack’, ‘gunzip’ checks the uncompressed length.
1 The ‘compress’ format was not designed to allow consistency checks.
1 However ‘gunzip’ is sometimes able to detect a bad ‘.Z’ file.  If you
1 get an error when uncompressing a ‘.Z’ file, do not assume that the ‘.Z’
1 file is correct simply because the standard ‘uncompress’ does not
1 complain.  This generally means that the standard ‘uncompress’ does not
1 check its input, and happily generates garbage output.  The SCO
1 ‘compress -H’ format (LZH compression method) does not include a CRC but
1 also allows some consistency checks.
1 
1    Files created by ‘zip’ can be uncompressed by ‘gzip’ only if they
1 have a single member compressed with the “deflation” method.  This
1 feature is only intended to help conversion of ‘tar.zip’ files to the
1 ‘tar.gz’ format.  To extract a ‘zip’ file with a single member, use a
1 command like ‘gunzip <foo.zip’ or ‘gunzip -S .zip foo.zip’.  To extract
1 ‘zip’ files with several members, use ‘unzip’ instead of ‘gunzip’.
1 
1    ‘zcat’ is identical to ‘gunzip -c’.  ‘zcat’ uncompresses either a
1 list of files on the command line or its standard input and writes the
1 uncompressed data on standard output.  ‘zcat’ will uncompress files that
1 have the correct magic number whether they have a ‘.gz’ suffix or not.
1 
1    ‘gzip’ uses the Lempel–Ziv algorithm used in ‘zip’ and PKZIP.  The
1 amount of compression obtained depends on the size of the input and the
1 distribution of common substrings.  Typically, text such as source code
1 or English is reduced by 60–70%.  Compression is generally much better
1 than that achieved by LZW (as used in ‘compress’), Huffman coding (as
1 used in ‘pack’), or adaptive Huffman coding (‘compact’).
1 
1    Compression is always performed, even if the compressed file is
1 slightly larger than the original.  The worst case expansion is a few
1 bytes for the ‘gzip’ file header, plus 5 bytes every 32K block, or an
1 expansion ratio of 0.015% for large files.  Note that the actual number
1 of used disk blocks almost never increases.  ‘gzip’ normally preserves
1 the mode, ownership and timestamps of files when compressing or
1 decompressing.
1 
1    The ‘gzip’ file format is specified in P. Deutsch, GZIP file format
1 specification version 4.3, Internet RFC 1952
1 (https://www.ietf.org/rfc/rfc1952.txt) (May 1996).  The ‘zip’ deflation
1 format is specified in P. Deutsch, DEFLATE Compressed Data Format
1 Specification version 1.3, Internet RFC 1951
1 (https://www.ietf.org/rfc/rfc1951.txt) (May 1996).
1