gzip: Overview
1
1 1 Overview
1 **********
1
1 ‘gzip’ reduces the size of the named files using Lempel–Ziv coding
1 (LZ77). Whenever possible, each file is replaced by one with the
1 extension ‘.gz’, while keeping the same ownership modes, access and
1 modification times. (The default extension is ‘z’ for MSDOS, OS/2 FAT
1 and Atari.) If no files are specified or if a file name is ‘-’, the
1 standard input is compressed to the standard output. ‘gzip’ will only
1 attempt to compress regular files. In particular, it will ignore
1 symbolic links.
1
1 If the new file name is too long for its file system, ‘gzip’
1 truncates it. ‘gzip’ attempts to truncate only the parts of the file
1 name longer than 3 characters. (A part is delimited by dots.) If the
1 name consists of small parts only, the longest parts are truncated. For
1 example, if file names are limited to 14 characters, gzip.msdos.exe is
1 compressed to gzi.msd.exe.gz. Names are not truncated on systems which
1 do not have a limit on file name length.
1
1 By default, ‘gzip’ keeps the original file name and timestamp in the
1 compressed file. These are used when decompressing the file with the
1 ‘-N’ option. This is useful when the compressed file name was truncated
1 or when the timestamp was not preserved after a file transfer. However,
1 due to limitations in the current ‘gzip’ file format, fractional seconds
1 are discarded. Also, timestamps must fall within the range 1970-01-01
1 00:00:01 through 2106-02-07 06:28:15 UTC, and hosts whose operating
1 systems use 32-bit timestamps are further restricted to timestamps no
1 later than 2038-01-19 03:14:07 UTC. The upper bounds assume the typical
1 case where leap seconds are ignored.
1
1 Compressed files can be restored to their original form using ‘gzip
1 -d’ or ‘gunzip’ or ‘zcat’. If the original name saved in the compressed
1 file is not suitable for its file system, a new name is constructed from
1 the original one to make it legal.
1
1 ‘gunzip’ takes a list of files on its command line and replaces each
1 file whose name ends with ‘.gz’, ‘.z’ ‘-gz’, ‘-z’, or ‘_z’ (ignoring
1 case) and which begins with the correct magic number with an
1 uncompressed file without the original extension. ‘gunzip’ also
1 recognizes the special extensions ‘.tgz’ and ‘.taz’ as shorthands for
1 ‘.tar.gz’ and ‘.tar.Z’ respectively. When compressing, ‘gzip’ uses the
1 ‘.tgz’ extension if necessary instead of truncating a file with a ‘.tar’
1 extension.
1
1 ‘gunzip’ can currently decompress files created by ‘gzip’, ‘zip’,
1 ‘compress’ or ‘pack’. The detection of the input format is automatic.
1 When using the first two formats, ‘gunzip’ checks a 32 bit CRC (cyclic
1 redundancy check). For ‘pack’, ‘gunzip’ checks the uncompressed length.
1 The ‘compress’ format was not designed to allow consistency checks.
1 However ‘gunzip’ is sometimes able to detect a bad ‘.Z’ file. If you
1 get an error when uncompressing a ‘.Z’ file, do not assume that the ‘.Z’
1 file is correct simply because the standard ‘uncompress’ does not
1 complain. This generally means that the standard ‘uncompress’ does not
1 check its input, and happily generates garbage output. The SCO
1 ‘compress -H’ format (LZH compression method) does not include a CRC but
1 also allows some consistency checks.
1
1 Files created by ‘zip’ can be uncompressed by ‘gzip’ only if they
1 have a single member compressed with the “deflation” method. This
1 feature is only intended to help conversion of ‘tar.zip’ files to the
1 ‘tar.gz’ format. To extract a ‘zip’ file with a single member, use a
1 command like ‘gunzip <foo.zip’ or ‘gunzip -S .zip foo.zip’. To extract
1 ‘zip’ files with several members, use ‘unzip’ instead of ‘gunzip’.
1
1 ‘zcat’ is identical to ‘gunzip -c’. ‘zcat’ uncompresses either a
1 list of files on the command line or its standard input and writes the
1 uncompressed data on standard output. ‘zcat’ will uncompress files that
1 have the correct magic number whether they have a ‘.gz’ suffix or not.
1
1 ‘gzip’ uses the Lempel–Ziv algorithm used in ‘zip’ and PKZIP. The
1 amount of compression obtained depends on the size of the input and the
1 distribution of common substrings. Typically, text such as source code
1 or English is reduced by 60–70%. Compression is generally much better
1 than that achieved by LZW (as used in ‘compress’), Huffman coding (as
1 used in ‘pack’), or adaptive Huffman coding (‘compact’).
1
1 Compression is always performed, even if the compressed file is
1 slightly larger than the original. The worst case expansion is a few
1 bytes for the ‘gzip’ file header, plus 5 bytes every 32K block, or an
1 expansion ratio of 0.015% for large files. Note that the actual number
1 of used disk blocks almost never increases. ‘gzip’ normally preserves
1 the mode, ownership and timestamps of files when compressing or
1 decompressing.
1
1 The ‘gzip’ file format is specified in P. Deutsch, GZIP file format
1 specification version 4.3, Internet RFC 1952
1 (https://www.ietf.org/rfc/rfc1952.txt) (May 1996). The ‘zip’ deflation
1 format is specified in P. Deutsch, DEFLATE Compressed Data Format
1 Specification version 1.3, Internet RFC 1951
1 (https://www.ietf.org/rfc/rfc1951.txt) (May 1996).
1