wget: Spanning Hosts
1
1 4.1 Spanning Hosts
1 ==================
1
1 Wget’s recursive retrieval normally refuses to visit hosts different
1 than the one you specified on the command line. This is a reasonable
1 default; without it, every retrieval would have the potential to turn
1 your Wget into a small version of google.
1
1 However, visiting different hosts, or “host spanning,” is sometimes a
1 useful option. Maybe the images are served from a different server.
1 Maybe you’re mirroring a site that consists of pages interlinked between
1 three servers. Maybe the server has two equivalent names, and the HTML
1 pages refer to both interchangeably.
1
1 Span to any host—‘-H’
1
1 The ‘-H’ option turns on host spanning, thus allowing Wget’s
1 recursive run to visit any host referenced by a link. Unless
1 sufficient recursion-limiting criteria are applied depth, these
1 foreign hosts will typically link to yet more hosts, and so on
1 until Wget ends up sucking up much more data than you have
1 intended.
1
1 Limit spanning to certain domains—‘-D’
1
1 The ‘-D’ option allows you to specify the domains that will be
1 followed, thus limiting the recursion only to the hosts that belong
1 to these domains. Obviously, this makes sense only in conjunction
1 with ‘-H’. A typical example would be downloading the contents of
1 ‘www.example.com’, but allowing downloads from
1 ‘images.example.com’, etc.:
1
1 wget -rH -Dexample.com http://www.example.com/
1
1 You can specify more than one address by separating them with a
1 comma, e.g. ‘-Ddomain1.com,domain2.com’.
1
1 Keep download off certain domains—‘--exclude-domains’
1
1 If there are domains you want to exclude specifically, you can do
1 it with ‘--exclude-domains’, which accepts the same type of
1 arguments of ‘-D’, but will _exclude_ all the listed domains. For
1 example, if you want to download all the hosts from ‘foo.edu’
1 domain, with the exception of ‘sunsite.foo.edu’, you can do it like
1 this:
1
1 wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \
1 http://www.foo.edu/
1