wget: Recursive Accept/Reject Options

1 
1 2.12 Recursive Accept/Reject Options
1 ====================================
1 
1 ‘-A ACCLIST --accept ACCLIST’
1 ‘-R REJLIST --reject REJLIST’
1      Specify comma-separated lists of file name suffixes or patterns to
1      accept or reject (⇒Types of Files).  Note that if any of the
1      wildcard characters, ‘*’, ‘?’, ‘[’ or ‘]’, appear in an element of
1      ACCLIST or REJLIST, it will be treated as a pattern, rather than a
1      suffix.  In this case, you have to enclose the pattern into quotes
1      to prevent your shell from expanding it, like in ‘-A "*.mp3"’ or
1      ‘-A '*.mp3'’.
1 
1 ‘--accept-regex URLREGEX’
1 ‘--reject-regex URLREGEX’
1      Specify a regular expression to accept or reject the complete URL.
1 
1 ‘--regex-type REGEXTYPE’
1      Specify the regular expression type.  Possible types are ‘posix’ or
1      ‘pcre’.  Note that to be able to use ‘pcre’ type, wget has to be
1      compiled with libpcre support.
1 
1 ‘-D DOMAIN-LIST’
1 ‘--domains=DOMAIN-LIST’
1      Set domains to be followed.  DOMAIN-LIST is a comma-separated list
1      of domains.  Note that it does _not_ turn on ‘-H’.
1 
1 ‘--exclude-domains DOMAIN-LIST’
11      Specify the domains that are _not_ to be followed (⇒Spanning
      Hosts).
1 
1 ‘--follow-ftp’
1      Follow FTP links from HTML documents.  Without this option, Wget
1      will ignore all the FTP links.
1 
1 ‘--follow-tags=LIST’
1      Wget has an internal table of HTML tag / attribute pairs that it
1      considers when looking for linked documents during a recursive
1      retrieval.  If a user wants only a subset of those tags to be
1      considered, however, he or she should be specify such tags in a
1      comma-separated LIST with this option.
1 
1 ‘--ignore-tags=LIST’
1      This is the opposite of the ‘--follow-tags’ option.  To skip
1      certain HTML tags when recursively looking for documents to
1      download, specify them in a comma-separated LIST.
1 
1      In the past, this option was the best bet for downloading a single
1      page and its requisites, using a command-line like:
1 
1           wget --ignore-tags=a,area -H -k -K -r http://SITE/DOCUMENT
1 
1      However, the author of this option came across a page with tags
1      like ‘<LINK REL="home" HREF="/">’ and came to the realization that
1      specifying tags to ignore was not enough.  One can’t just tell Wget
1      to ignore ‘<LINK>’, because then stylesheets will not be
1      downloaded.  Now the best bet for downloading a single page and its
1      requisites is the dedicated ‘--page-requisites’ option.
1 
1 ‘--ignore-case’
1      Ignore case when matching files and directories.  This influences
1      the behavior of -R, -A, -I, and -X options, as well as globbing
1      implemented when downloading from FTP sites.  For example, with
1      this option, ‘-A "*.txt"’ will match ‘file1.txt’, but also
1      ‘file2.TXT’, ‘file3.TxT’, and so on.  The quotes in the example are
1      to prevent the shell from expanding the pattern.
1 
1 ‘-H’
1 ‘--span-hosts’
11      Enable spanning across hosts when doing recursive retrieving (⇒
      Spanning Hosts).
1 
1 ‘-L’
1 ‘--relative’
1      Follow relative links only.  Useful for retrieving a specific home
1      page without any distractions, not even those from the same hosts
1      (⇒Relative Links).
1 
1 ‘-I LIST’
1 ‘--include-directories=LIST’
1      Specify a comma-separated list of directories you wish to follow
1      when downloading (⇒Directory-Based Limits).  Elements of
1      LIST may contain wildcards.
1 
1 ‘-X LIST’
1 ‘--exclude-directories=LIST’
1      Specify a comma-separated list of directories you wish to exclude
1      from download (⇒Directory-Based Limits).  Elements of LIST
1      may contain wildcards.
1 
1 ‘-np’
1 ‘--no-parent’
1      Do not ever ascend to the parent directory when retrieving
1      recursively.  This is a useful option, since it guarantees that
1      only the files _below_ a certain hierarchy will be downloaded.
1      ⇒Directory-Based Limits, for more details.
1