wget: HTTP Time-Stamping Internals
1
1 5.2 HTTP Time-Stamping Internals
1 ================================
1
1 Time-stamping in HTTP is implemented by checking of the ‘Last-Modified’
1 header. If you wish to retrieve the file ‘foo.html’ through HTTP, Wget
1 will check whether ‘foo.html’ exists locally. If it doesn’t, ‘foo.html’
1 will be retrieved unconditionally.
1
1 If the file does exist locally, Wget will first check its local
1 time-stamp (similar to the way ‘ls -l’ checks it), and then send a
1 ‘HEAD’ request to the remote server, demanding the information on the
1 remote file.
1
1 The ‘Last-Modified’ header is examined to find which file was
1 modified more recently (which makes it “newer”). If the remote file is
1 newer, it will be downloaded; if it is older, Wget will give up.(1)
1
1 When ‘--backup-converted’ (‘-K’) is specified in conjunction with
1 ‘-N’, server file ‘X’ is compared to local file ‘X.orig’, if extant,
1 rather than being compared to local file ‘X’, which will always differ
1 if it’s been converted by ‘--convert-links’ (‘-k’).
1
1 Arguably, HTTP time-stamping should be implemented using the
1 ‘If-Modified-Since’ request.
1
1 ---------- Footnotes ----------
1
1 (1) As an additional check, Wget will look at the ‘Content-Length’
1 header, and compare the sizes; if they are not the same, the remote file
1 will be downloaded no matter what the time-stamp says.
1