wget: HTTP Options
1
1 2.7 HTTP Options
1 ================
1
1 ‘--default-page=NAME’
1 Use NAME as the default file name when it isn’t known (i.e., for
1 URLs that end in a slash), instead of ‘index.html’.
1
1 ‘-E’
1 ‘--adjust-extension’
1 If a file of type ‘application/xhtml+xml’ or ‘text/html’ is
1 downloaded and the URL does not end with the regexp
1 ‘\.[Hh][Tt][Mm][Ll]?’, this option will cause the suffix ‘.html’ to
1 be appended to the local filename. This is useful, for instance,
1 when you’re mirroring a remote site that uses ‘.asp’ pages, but you
1 want the mirrored pages to be viewable on your stock Apache server.
1 Another good use for this is when you’re downloading CGI-generated
1 materials. A URL like ‘http://site.com/article.cgi?25’ will be
1 saved as ‘article.cgi?25.html’.
1
1 Note that filenames changed in this way will be re-downloaded every
1 time you re-mirror a site, because Wget can’t tell that the local
1 ‘X.html’ file corresponds to remote URL ‘X’ (since it doesn’t yet
1 know that the URL produces output of type ‘text/html’ or
1 ‘application/xhtml+xml’.
1
1 As of version 1.12, Wget will also ensure that any downloaded files
1 of type ‘text/css’ end in the suffix ‘.css’, and the option was
1 renamed from ‘--html-extension’, to better reflect its new
1 behavior. The old option name is still acceptable, but should now
1 be considered deprecated.
1
1 As of version 1.19.2, Wget will also ensure that any downloaded
1 files with a ‘Content-Encoding’ of ‘br’, ‘compress’, ‘deflate’ or
1 ‘gzip’ end in the suffix ‘.br’, ‘.Z’, ‘.zlib’ and ‘.gz’
1 respectively.
1
1 At some point in the future, this option may well be expanded to
1 include suffixes for other types of content, including content
1 types that are not parsed by Wget.
1
1 ‘--http-user=USER’
1 ‘--http-password=PASSWORD’
1 Specify the username USER and password PASSWORD on an HTTP server.
1 According to the type of the challenge, Wget will encode them using
1 either the ‘basic’ (insecure), the ‘digest’, or the Windows ‘NTLM’
1 authentication scheme.
1
1 Another way to specify username and password is in the URL itself
1 (⇒URL Format). Either method reveals your password to
1 anyone who bothers to run ‘ps’. To prevent the passwords from
1 being seen, use the ‘--use-askpass’ or store them in ‘.wgetrc’ or
1 ‘.netrc’, and make sure to protect those files from other users
1 with ‘chmod’. If the passwords are really important, do not leave
1 them lying in those files either—edit the files and delete them
1 after Wget has started the download.
1
1 ‘--no-http-keep-alive’
1 Turn off the “keep-alive” feature for HTTP downloads. Normally,
1 Wget asks the server to keep the connection open so that, when you
1 download more than one document from the same server, they get
1 transferred over the same TCP connection. This saves time and at
1 the same time reduces the load on the server.
1
1 This option is useful when, for some reason, persistent
1 (keep-alive) connections don’t work for you, for example due to a
1 server bug or due to the inability of server-side scripts to cope
1 with the connections.
1
1 ‘--no-cache’
1 Disable server-side cache. In this case, Wget will send the remote
1 server an appropriate directive (‘Pragma: no-cache’) to get the
1 file from the remote service, rather than returning the cached
1 version. This is especially useful for retrieving and flushing
1 out-of-date documents on proxy servers.
1
1 Caching is allowed by default.
1
1 ‘--no-cookies’
1 Disable the use of cookies. Cookies are a mechanism for
1 maintaining server-side state. The server sends the client a
1 cookie using the ‘Set-Cookie’ header, and the client responds with
1 the same cookie upon further requests. Since cookies allow the
1 server owners to keep track of visitors and for sites to exchange
1 this information, some consider them a breach of privacy. The
1 default is to use cookies; however, _storing_ cookies is not on by
1 default.
1
1 ‘--load-cookies FILE’
1 Load cookies from FILE before the first HTTP retrieval. FILE is a
1 textual file in the format originally used by Netscape’s
1 ‘cookies.txt’ file.
1
1 You will typically use this option when mirroring sites that
1 require that you be logged in to access some or all of their
1 content. The login process typically works by the web server
1 issuing an HTTP cookie upon receiving and verifying your
1 credentials. The cookie is then resent by the browser when
1 accessing that part of the site, and so proves your identity.
1
1 Mirroring such a site requires Wget to send the same cookies your
1 browser sends when communicating with the site. This is achieved
1 by ‘--load-cookies’—simply point Wget to the location of the
1 ‘cookies.txt’ file, and it will send the same cookies your browser
1 would send in the same situation. Different browsers keep textual
1 cookie files in different locations:
1
1 Netscape 4.x.
1 The cookies are in ‘~/.netscape/cookies.txt’.
1
1 Mozilla and Netscape 6.x.
1 Mozilla’s cookie file is also named ‘cookies.txt’, located
1 somewhere under ‘~/.mozilla’, in the directory of your
1 profile. The full path usually ends up looking somewhat like
1 ‘~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt’.
1
1 Internet Explorer.
1 You can produce a cookie file Wget can use by using the File
1 menu, Import and Export, Export Cookies. This has been tested
1 with Internet Explorer 5; it is not guaranteed to work with
1 earlier versions.
1
1 Other browsers.
1 If you are using a different browser to create your cookies,
1 ‘--load-cookies’ will only work if you can locate or produce a
1 cookie file in the Netscape format that Wget expects.
1
1 If you cannot use ‘--load-cookies’, there might still be an
1 alternative. If your browser supports a “cookie manager”, you can
1 use it to view the cookies used when accessing the site you’re
1 mirroring. Write down the name and value of the cookie, and
1 manually instruct Wget to send those cookies, bypassing the
1 “official” cookie support:
1
1 wget --no-cookies --header "Cookie: NAME=VALUE"
1
1 ‘--save-cookies FILE’
1 Save cookies to FILE before exiting. This will not save cookies
1 that have expired or that have no expiry time (so-called “session
1 cookies”), but also see ‘--keep-session-cookies’.
1
1 ‘--keep-session-cookies’
1 When specified, causes ‘--save-cookies’ to also save session
1 cookies. Session cookies are normally not saved because they are
1 meant to be kept in memory and forgotten when you exit the browser.
1 Saving them is useful on sites that require you to log in or to
1 visit the home page before you can access some pages. With this
1 option, multiple Wget runs are considered a single browser session
1 as far as the site is concerned.
1
1 Since the cookie file format does not normally carry session
1 cookies, Wget marks them with an expiry timestamp of 0. Wget’s
1 ‘--load-cookies’ recognizes those as session cookies, but it might
1 confuse other browsers. Also note that cookies so loaded will be
1 treated as other session cookies, which means that if you want
1 ‘--save-cookies’ to preserve them again, you must use
1 ‘--keep-session-cookies’ again.
1
1 ‘--ignore-length’
1 Unfortunately, some HTTP servers (CGI programs, to be more precise)
1 send out bogus ‘Content-Length’ headers, which makes Wget go wild,
1 as it thinks not all the document was retrieved. You can spot this
1 syndrome if Wget retries getting the same document again and again,
1 each time claiming that the (otherwise normal) connection has
1 closed on the very same byte.
1
1 With this option, Wget will ignore the ‘Content-Length’ header—as
1 if it never existed.
1
1 ‘--header=HEADER-LINE’
1 Send HEADER-LINE along with the rest of the headers in each HTTP
1 request. The supplied header is sent as-is, which means it must
1 contain name and value separated by colon, and must not contain
1 newlines.
1
1 You may define more than one additional header by specifying
1 ‘--header’ more than once.
1
1 wget --header='Accept-Charset: iso-8859-2' \
1 --header='Accept-Language: hr' \
1 http://fly.srk.fer.hr/
1
1 Specification of an empty string as the header value will clear all
1 previous user-defined headers.
1
1 As of Wget 1.10, this option can be used to override headers
1 otherwise generated automatically. This example instructs Wget to
1 connect to localhost, but to specify ‘foo.bar’ in the ‘Host’
1 header:
1
1 wget --header="Host: foo.bar" http://localhost/
1
1 In versions of Wget prior to 1.10 such use of ‘--header’ caused
1 sending of duplicate headers.
1
1 ‘--compression=TYPE’
1 Choose the type of compression to be used. Legal values are
1 ‘auto’, ‘gzip’ and ‘none’.
1
1 If ‘auto’ or ‘gzip’ are specified, Wget asks the server to compress
1 the file using the gzip compression format. If the server
1 compresses the file and responds with the ‘Content-Encoding’ header
1 field set appropriately, the file will be decompressed
1 automatically.
1
1 If ‘none’ is specified, wget will not ask the server to compress
1 the file and will not decompress any server responses. This is the
1 default.
1
1 Compression support is currently experimental. In case it is
1 turned on, please report any bugs to ‘bug-wget@gnu.org’.
1
1 ‘--max-redirect=NUMBER’
1 Specifies the maximum number of redirections to follow for a
1 resource. The default is 20, which is usually far more than
1 necessary. However, on those occasions where you want to allow
1 more (or fewer), this is the option to use.
1
1 ‘--proxy-user=USER’
1 ‘--proxy-password=PASSWORD’
1 Specify the username USER and password PASSWORD for authentication
1 on a proxy server. Wget will encode them using the ‘basic’
1 authentication scheme.
1
1 Security considerations similar to those with ‘--http-password’
1 pertain here as well.
1
1 ‘--referer=URL’
1 Include ‘Referer: URL’ header in HTTP request. Useful for
1 retrieving documents with server-side processing that assume they
1 are always being retrieved by interactive web browsers and only
1 come out properly when Referer is set to one of the pages that
1 point to them.
1
1 ‘--save-headers’
1 Save the headers sent by the HTTP server to the file, preceding the
1 actual contents, with an empty line as the separator.
1
1 ‘-U AGENT-STRING’
1 ‘--user-agent=AGENT-STRING’
1 Identify as AGENT-STRING to the HTTP server.
1
1 The HTTP protocol allows the clients to identify themselves using a
1 ‘User-Agent’ header field. This enables distinguishing the WWW
1 software, usually for statistical purposes or for tracing of
1 protocol violations. Wget normally identifies as ‘Wget/VERSION’,
1 VERSION being the current version number of Wget.
1
1 However, some sites have been known to impose the policy of
1 tailoring the output according to the ‘User-Agent’-supplied
1 information. While this is not such a bad idea in theory, it has
1 been abused by servers denying information to clients other than
1 (historically) Netscape or, more frequently, Microsoft Internet
1 Explorer. This option allows you to change the ‘User-Agent’ line
1 issued by Wget. Use of this option is discouraged, unless you
1 really know what you are doing.
1
1 Specifying empty user agent with ‘--user-agent=""’ instructs Wget
1 not to send the ‘User-Agent’ header in HTTP requests.
1
1 ‘--post-data=STRING’
1 ‘--post-file=FILE’
1 Use POST as the method for all HTTP requests and send the specified
1 data in the request body. ‘--post-data’ sends STRING as data,
1 whereas ‘--post-file’ sends the contents of FILE. Other than that,
1 they work in exactly the same way. In particular, they _both_
1 expect content of the form ‘key1=value1&key2=value2’, with
1 percent-encoding for special characters; the only difference is
1 that one expects its content as a command-line parameter and the
1 other accepts its content from a file. In particular,
1 ‘--post-file’ is _not_ for transmitting files as form attachments:
1 those must appear as ‘key=value’ data (with appropriate
1 percent-coding) just like everything else. Wget does not currently
1 support ‘multipart/form-data’ for transmitting POST data; only
1 ‘application/x-www-form-urlencoded’. Only one of ‘--post-data’ and
1 ‘--post-file’ should be specified.
1
1 Please note that wget does not require the content to be of the
1 form ‘key1=value1&key2=value2’, and neither does it test for it.
1 Wget will simply transmit whatever data is provided to it. Most
1 servers however expect the POST data to be in the above format when
1 processing HTML Forms.
1
1 When sending a POST request using the ‘--post-file’ option, Wget
1 treats the file as a binary file and will send every character in
1 the POST request without stripping trailing newline or formfeed
1 characters. Any other control characters in the text will also be
1 sent as-is in the POST request.
1
1 Please be aware that Wget needs to know the size of the POST data
1 in advance. Therefore the argument to ‘--post-file’ must be a
1 regular file; specifying a FIFO or something like ‘/dev/stdin’
1 won’t work. It’s not quite clear how to work around this
1 limitation inherent in HTTP/1.0. Although HTTP/1.1 introduces
1 “chunked” transfer that doesn’t require knowing the request length
1 in advance, a client can’t use chunked unless it knows it’s talking
1 to an HTTP/1.1 server. And it can’t know that until it receives a
1 response, which in turn requires the request to have been completed
1 – a chicken-and-egg problem.
1
1 Note: As of version 1.15 if Wget is redirected after the POST
1 request is completed, its behaviour will depend on the response
1 code returned by the server. In case of a 301 Moved Permanently,
1 302 Moved Temporarily or 307 Temporary Redirect, Wget will, in
1 accordance with RFC2616, continue to send a POST request. In case
1 a server wants the client to change the Request method upon
1 redirection, it should send a 303 See Other response code.
1
1 This example shows how to log in to a server using POST and then
1 proceed to download the desired pages, presumably only accessible
1 to authorized users:
1
1 # Log in to the server. This can be done only once.
1 wget --save-cookies cookies.txt \
1 --post-data 'user=foo&password=bar' \
1 http://example.com/auth.php
1
1 # Now grab the page or pages we care about.
1 wget --load-cookies cookies.txt \
1 -p http://example.com/interesting/article.php
1
1 If the server is using session cookies to track user
1 authentication, the above will not work because ‘--save-cookies’
1 will not save them (and neither will browsers) and the
1 ‘cookies.txt’ file will be empty. In that case use
1 ‘--keep-session-cookies’ along with ‘--save-cookies’ to force
1 saving of session cookies.
1
1 ‘--method=HTTP-METHOD’
1 For the purpose of RESTful scripting, Wget allows sending of other
1 HTTP Methods without the need to explicitly set them using
1 ‘--header=Header-Line’. Wget will use whatever string is passed to
1 it after ‘--method’ as the HTTP Method to the server.
1
1 ‘--body-data=DATA-STRING’
1 ‘--body-file=DATA-FILE’
1 Must be set when additional data needs to be sent to the server
1 along with the Method specified using ‘--method’. ‘--body-data’
1 sends STRING as data, whereas ‘--body-file’ sends the contents of
1 FILE. Other than that, they work in exactly the same way.
1
1 Currently, ‘--body-file’ is _not_ for transmitting files as a
1 whole. Wget does not currently support ‘multipart/form-data’ for
1 transmitting data; only ‘application/x-www-form-urlencoded’. In
1 the future, this may be changed so that wget sends the
1 ‘--body-file’ as a complete file instead of sending its contents to
1 the server. Please be aware that Wget needs to know the contents
1 of BODY Data in advance, and hence the argument to ‘--body-file’
1 should be a regular file. See ‘--post-file’ for a more detailed
1 explanation. Only one of ‘--body-data’ and ‘--body-file’ should be
1 specified.
1
1 If Wget is redirected after the request is completed, Wget will
1 suspend the current method and send a GET request till the
1 redirection is completed. This is true for all redirection
1 response codes except 307 Temporary Redirect which is used to
1 explicitly specify that the request method should _not_ change.
1 Another exception is when the method is set to ‘POST’, in which
1 case the redirection rules specified under ‘--post-data’ are
1 followed.
1
1 ‘--content-disposition’
1
1 If this is set to on, experimental (not fully-functional) support
1 for ‘Content-Disposition’ headers is enabled. This can currently
1 result in extra round-trips to the server for a ‘HEAD’ request, and
1 is known to suffer from a few bugs, which is why it is not
1 currently enabled by default.
1
1 This option is useful for some file-downloading CGI programs that
1 use ‘Content-Disposition’ headers to describe what the name of a
1 downloaded file should be.
1
1 When combined with ‘--metalink-over-http’ and
1 ‘--trust-server-names’, a ‘Content-Type: application/metalink4+xml’
1 file is named using the ‘Content-Disposition’ filename field, if
1 available.
1
1 ‘--content-on-error’
1
1 If this is set to on, wget will not skip the content when the
1 server responds with a http status code that indicates error.
1
1 ‘--trust-server-names’
1
1 If this is set, on a redirect, the local file name will be based on
1 the redirection URL. By default the local file name is based on the
1 original URL. When doing recursive retrieving this can be helpful
1 because in many web sites redirected URLs correspond to an
1 underlying file structure, while link URLs do not.
1
1 ‘--auth-no-challenge’
1
1 If this option is given, Wget will send Basic HTTP authentication
1 information (plaintext username and password) for all requests,
1 just like Wget 1.10.2 and prior did by default.
1
1 Use of this option is not recommended, and is intended only to
1 support some few obscure servers, which never send HTTP
1 authentication challenges, but accept unsolicited auth info, say,
1 in addition to form-based authentication.
1
1 ‘--retry-on-http-error=CODE[,CODE,...]’
1 Consider given HTTP response codes as non-fatal, transient errors.
1 Supply a comma-separated list of 3-digit HTTP response codes as
1 argument. Useful to work around special circumstances where
1 retries are required, but the server responds with an error code
1 normally not retried by Wget. Such errors might be 503 (Service
1 Unavailable) and 429 (Too Many Requests). Retries enabled by this
1 option are performed subject to the normal retry timing and retry
1 count limitations of Wget.
1
1 Using this option is intended to support special use cases only and
1 is generally not recommended, as it can force retries even in cases
1 where the server is actually trying to decrease its load. Please
1 use wisely and only if you know what you are doing.
1