gawkinet: Web page
1
1 2.7 Reading a Web Page
1 ======================
1
1 Retrieving a web page from a web server is as simple as retrieving email
1 from an email server. We only have to use a similar, but not identical,
1 protocol and a different port. The name of the protocol is HyperText
1 Transfer Protocol (HTTP) and the port number is usually 80. As in the
1 preceding node, ask your administrator about the name of your local web
1 server or proxy web server and its port number for HTTP requests.
1
1 The following program employs a rather crude approach toward
1 retrieving a web page. It uses the prehistoric syntax of HTTP 0.9,
1 which almost all web servers still support. The most noticeable thing
1 about it is that the program directs the request to the local proxy
1 server whose name you insert in the special file name (which in turn
1 calls 'www.yahoo.com'):
1
1 BEGIN {
1 RS = ORS = "\r\n"
1 HttpService = "/inet/tcp/0/PROXY/80"
1 print "GET http://www.yahoo.com" |& HttpService
1 while ((HttpService |& getline) > 0)
1 print $0
1 close(HttpService)
1 }
1
1 Again, lines are separated by a redefined 'RS' and 'ORS'. The 'GET'
1 request that we send to the server is the only kind of HTTP request that
1 existed when the web was created in the early 1990s. HTTP calls this
1 'GET' request a "method," which tells the service to transmit a web page
1 (here the home page of the Yahoo! search engine). Version 1.0 added
1 the request methods 'HEAD' and 'POST'. The current version of HTTP is
1 1.1,(1) and knows the additional request methods 'OPTIONS', 'PUT',
1 'DELETE', and 'TRACE'. You can fill in any valid web address, and the
1 program prints the HTML code of that page to your screen.
1
1 Notice the similarity between the responses of the POP and HTTP
1 services. First, you get a header that is terminated by an empty line,
1 and then you get the body of the page in HTML. The lines of the headers
1 also have the same form as in POP. There is the name of a parameter,
1 then a colon, and finally the value of that parameter.
1
1 Images ('.png' or '.gif' files) can also be retrieved this way, but
1 then you get binary data that should be redirected into a file. Another
1 application is calling a CGI (Common Gateway Interface) script on some
1 server. CGI scripts are used when the contents of a web page are not
1 constant, but generated instantly at the moment you send a request for
1 the page. For example, to get a detailed report about the current
1 quotes of Motorola stock shares, call a CGI script at Yahoo! with the
1 following:
1
1 get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
1 print get |& HttpService
1
1 You can also request weather reports this way.
1
1 ---------- Footnotes ----------
1
1 (1) Version 1.0 of HTTP was defined in RFC 1945. HTTP 1.1 was
1 initially specified in RFC 2068. In June 1999, RFC 2068 was made
1 obsolete by RFC 2616, an update without any substantial changes.
1