wget: Overview
1
1 1 Overview
1 **********
1
1 GNU Wget is a free utility for non-interactive download of files from
1 the Web. It supports HTTP, HTTPS, and FTP protocols, as well as
1 retrieval through HTTP proxies.
1
1 This chapter is a partial overview of Wget’s features.
1
1 • Wget is non-interactive, meaning that it can work in the
1 background, while the user is not logged on. This allows you to
1 start a retrieval and disconnect from the system, letting Wget
1 finish the work. By contrast, most of the Web browsers require
1 constant user’s presence, which can be a great hindrance when
1 transferring a lot of data.
1
1 • Wget can follow links in HTML, XHTML, and CSS pages, to create
1 local versions of remote web sites, fully recreating the directory
1 structure of the original site. This is sometimes referred to as
1 “recursive downloading.” While doing that, Wget respects the Robot
1 Exclusion Standard (‘/robots.txt’). Wget can be instructed to
1 convert the links in downloaded files to point at the local files,
1 for offline viewing.
1
1 • File name wildcard matching and recursive mirroring of directories
1 are available when retrieving via FTP. Wget can read the
1 time-stamp information given by both HTTP and FTP servers, and
1 store it locally. Thus Wget can see if the remote file has changed
1 since last retrieval, and automatically retrieve the new version if
1 it has. This makes Wget suitable for mirroring of FTP sites, as
1 well as home pages.
1
1 • Wget has been designed for robustness over slow or unstable network
1 connections; if a download fails due to a network problem, it will
1 keep retrying until the whole file has been retrieved. If the
1 server supports regetting, it will instruct the server to continue
1 the download from where it left off.
1
1 • Wget supports proxy servers, which can lighten the network load,
1 speed up retrieval and provide access behind firewalls. Wget uses
1 the passive FTP downloading by default, active FTP being an option.
1
1 • Wget supports IP version 6, the next generation of IP. IPv6 is
1 autodetected at compile-time, and can be disabled at either build
1 or run time. Binaries built with IPv6 support work well in both
1 IPv4-only and dual family environments.
1
1 • Built-in features offer mechanisms to tune which links you wish to
1 follow (⇒Following Links).
1
1 • The progress of individual downloads is traced using a progress
1 gauge. Interactive downloads are tracked using a
1 “thermometer”-style gauge, whereas non-interactive ones are traced
1 with dots, each dot representing a fixed amount of data received
1 (1KB by default). Either gauge can be customized to your
1 preferences.
1
1 • Most of the features are fully configurable, either through command
11 line options, or via the initialization file ‘.wgetrc’ (⇒
Startup File). Wget allows you to define “global” startup files
1 (‘/etc/wgetrc’ by default) for site settings. You can also specify
1 the location of a startup file with the –config option. To disable
1 the reading of config files, use –no-config. If both –config and
1 –no-config are given, –no-config is ignored.
1
1 • Finally, GNU Wget is free software. This means that everyone may
1 use it, redistribute it and/or modify it under the terms of the GNU
1 General Public License, as published by the Free Software
1 Foundation (see the file ‘COPYING’ that came with GNU Wget, for
1 details).
1