tar: controlling pattern-matching

1 
1 Controlling Pattern-Matching
1 ----------------------------
1 
1 For the purposes of this section, we call "exclusion members" all member
1 names obtained while processing '--exclude' and '--exclude-from'
1 options, and "inclusion members" those member names that were given in
1 the command line or read from the file specified with '--files-from'
1 option.
1 
1    These two pairs of member lists are used in the following operations:
1 '--diff', '--extract', '--list', '--update'.
1 
1    There are no inclusion members in create mode ('--create' and
1 '--append'), since in this mode the names obtained from the command line
1 refer to _files_, not archive members.
1 
1    By default, inclusion members are compared with archive members
1 literally (1) and exclusion members are treated as globbing patterns.
1 For example:
1 
1      $ tar tf foo.tar
1      a.c
1      b.c
1      a.txt
1      [remarks]
1      # Member names are used verbatim:
1      $ tar -xf foo.tar -v '[remarks]'
1      [remarks]
1      # Exclude member names are globbed:
1      $ tar -xf foo.tar -v --exclude '*.c'
1      a.txt
1      [remarks]
1 
1    This behavior can be altered by using the following options:
1 
1 '--wildcards'
1      Treat all member names as wildcards.
1 
1 '--no-wildcards'
1      Treat all member names as literal strings.
1 
1    Thus, to extract files whose names end in '.c', you can use:
1 
1      $ tar -xf foo.tar -v --wildcards '*.c'
1      a.c
1      b.c
1 
1 Notice quoting of the pattern to prevent the shell from interpreting it.
1 
1    The effect of '--wildcards' option is canceled by '--no-wildcards'.
1 This can be used to pass part of the command line arguments verbatim and
1 other part as globbing patterns.  For example, the following invocation:
1 
1      $ tar -xf foo.tar --wildcards '*.txt' --no-wildcards '[remarks]'
1 
1 instructs 'tar' to extract from 'foo.tar' all files whose names end in
1 '.txt' and the file named '[remarks]'.
1 
1    Normally, a pattern matches a name if an initial subsequence of the
1 name's components matches the pattern, where '*', '?', and '[...]' are
1 the usual shell wildcards, '\' escapes wildcards, and wildcards can
1 match '/'.
1 
11    Other than optionally stripping leading '/' from names (⇒
 absolute), patterns and names are used as-is.  For example, trailing
1 '/' is not trimmed from a user-specified name before deciding whether to
1 exclude it.
1 
1    However, this matching procedure can be altered by the options listed
1 below.  These options accumulate.  For example:
1 
1      --ignore-case --exclude='makefile' --no-ignore-case ---exclude='readme'
1 
1 ignores case when excluding 'makefile', but not when excluding 'readme'.
1 
1 '--anchored'
1 '--no-anchored'
1      If anchored, a pattern must match an initial subsequence of the
1      name's components.  Otherwise, the pattern can match any
1      subsequence.  Default is '--no-anchored' for exclusion members and
1      '--anchored' inclusion members.
1 
1 '--ignore-case'
1 '--no-ignore-case'
1      When ignoring case, upper-case patterns match lower-case names and
1      vice versa.  When not ignoring case (the default), matching is
1      case-sensitive.
1 
1 '--wildcards-match-slash'
1 '--no-wildcards-match-slash'
1      When wildcards match slash (the default for exclusion members), a
1      wildcard like '*' in the pattern can match a '/' in the name.
1      Otherwise, '/' is matched only by '/'.
1 
1    The '--recursion' and '--no-recursion' options (⇒recurse) also
1 affect how member patterns are interpreted.  If recursion is in effect,
1 a pattern matches a name if it matches any of the name's parent
1 directories.
1 
1    The following table summarizes pattern-matching default values:
1 
1 Members                Default settings
1 --------------------------------------------------------------------------
1 Inclusion              '--wildcards --anchored --wildcards-match-slash'
1 Exclusion              '--wildcards --no-anchored
1                        --wildcards-match-slash'
1 
1 Wildcard matching confusion
1 ...........................
1 
1 Using of '--[no-]anchored' and '--[no-]wildcards-match-slash' was proven
1 to make confusion.  The reasons for this are probably different default
1 setting for inclusion and exclusion patterns (in general: you shouldn't
1 rely on defaults if possible) and maybe also because when using any of
1 these two options, the position on command line matters (these options
1 should be placed prior to the member name on command line).
1 
1 Consider following directory structure:
1 
1      $ find path/ | sort
1      path/
1      path/file1
1      path/file2
1      path/subpath
1      path/subpath/file1
1      path/subpath/file2
1      path/subpath2
1      path/subpath2/file1
1      path/subpath2/file2
1 
1 To archive full directory 'path' except all files named 'file1' may be
1 reached by any of the two following commands:
1 
1      $ tar -cf a.tar --no-wildcards-match-slash --no-anchored path \
1            --exclude='*/file1'
1      $ tar -cf a.tar --wildcards-match-slash path --exclude='*/file1'
1 
1 Note that the '--wildcards-match-slash' and '--no-anchored' may be
1 omitted as it is default for '--exclude'.  Anyway, we usually want just
1 concrete file (or rather subset of files with the same name).  Assume we
1 want exclude only files named 'file1' from the first subdirectory level.
1 Following command obviously does not work (it still excludes all files
1 having 'file1' name):
1 
1      $ tar -cf a.tar --no-wildcards-match-slash path \
1          --exclude='*/file1' | sort
1 
1 This is because the '--no-anchored' is set by default for exclusion.
1 What you need to fix is to put '--anchored' before pathname:
1 
1      $ tar -cvf a.tar --no-wildcards-match-slash --anchored path \
1          --exclude='*/file1' | sort
1      path/
1      path/file2
1      path/subpath1/
1      path/subpath1/file1
1      path/subpath1/file2
1      path/subpath2/
1      path/subpath2/file1
1      path/subpath2/file2
1 
1 Similarly you can exclude second level by specifying '*/*/file1'.
1 
1    ---------- Footnotes ----------
1 
1    (1) Notice that earlier GNU 'tar' versions used globbing for
1 inclusion members, which contradicted to UNIX98 specification and was
1 not documented.  ⇒Changes, for more information on this and other
1 changes.
1