wget: Advanced Usage
1
1 7.2 Advanced Usage
1 ==================
1
1 • You have a file that contains the URLs you want to download? Use
1 the ‘-i’ switch:
1
1 wget -i FILE
1
1 If you specify ‘-’ as file name, the URLs will be read from
1 standard input.
1
1 • Create a five levels deep mirror image of the GNU web site, with
1 the same directory structure the original has, with only one try
1 per document, saving the log of the activities to ‘gnulog’:
1
1 wget -r https://www.gnu.org/ -o gnulog
1
1 • The same as the above, but convert the links in the downloaded
1 files to point to local files, so you can view the documents
1 off-line:
1
1 wget --convert-links -r https://www.gnu.org/ -o gnulog
1
1 • Retrieve only one HTML page, but make sure that all the elements
1 needed for the page to be displayed, such as inline images and
1 external style sheets, are also downloaded. Also make sure the
1 downloaded page references the downloaded links.
1
1 wget -p --convert-links http://www.example.com/dir/page.html
1
1 The HTML page will be saved to ‘www.example.com/dir/page.html’, and
1 the images, stylesheets, etc., somewhere under ‘www.example.com/’,
1 depending on where they were on the remote server.
1
1 • The same as the above, but without the ‘www.example.com/’
1 directory. In fact, I don’t want to have all those random server
1 directories anyway—just save _all_ those files under a ‘download/’
1 subdirectory of the current directory.
1
1 wget -p --convert-links -nH -nd -Pdownload \
1 http://www.example.com/dir/page.html
1
1 • Retrieve the index.html of ‘www.lycos.com’, showing the original
1 server headers:
1
1 wget -S http://www.lycos.com/
1
1 • Save the server headers with the file, perhaps for post-processing.
1
1 wget --save-headers http://www.lycos.com/
1 more index.html
1
1 • Retrieve the first two levels of ‘wuarchive.wustl.edu’, saving them
1 to ‘/tmp’.
1
1 wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
1
1 • You want to download all the GIFs from a directory on an HTTP
1 server. You tried ‘wget http://www.example.com/dir/*.gif’, but
1 that didn’t work because HTTP retrieval does not support globbing.
1 In that case, use:
1
1 wget -r -l1 --no-parent -A.gif http://www.example.com/dir/
1
1 More verbose, but the effect is the same. ‘-r -l1’ means to
1 retrieve recursively (⇒Recursive Download), with maximum
1 depth of 1. ‘--no-parent’ means that references to the parent
1 directory are ignored (⇒Directory-Based Limits), and
1 ‘-A.gif’ means to download only the GIF files. ‘-A "*.gif"’ would
1 have worked too.
1
1 • Suppose you were in the middle of downloading, when Wget was
1 interrupted. Now you do not want to clobber the files already
1 present. It would be:
1
1 wget -nc -r https://www.gnu.org/
1
1 • If you want to encode your own username and password to HTTP or
1 FTP, use the appropriate URL syntax (⇒URL Format).
1
1 wget ftp://hniksic:mypassword@unix.example.com/.emacs
1
1 Note, however, that this usage is not advisable on multi-user
1 systems because it reveals your password to anyone who looks at the
1 output of ‘ps’.
1
1 • You would like the output documents to go to standard output
1 instead of to files?
1
1 wget -O - http://jagor.srce.hr/ http://www.srce.hr/
1
1 You can also combine the two options and make pipelines to retrieve
1 the documents from remote hotlists:
1
1 wget -O - http://cool.list.com/ | wget --force-html -i -
1