gawkinet: Web page

1 
1 2.7 Reading a Web Page
1 ======================
1 
1 Retrieving a web page from a web server is as simple as retrieving email
1 from an email server.  We only have to use a similar, but not identical,
1 protocol and a different port.  The name of the protocol is HyperText
1 Transfer Protocol (HTTP) and the port number is usually 80.  As in the
1 preceding node, ask your administrator about the name of your local web
1 server or proxy web server and its port number for HTTP requests.
1 
1    The following program employs a rather crude approach toward
1 retrieving a web page.  It uses the prehistoric syntax of HTTP 0.9,
1 which almost all web servers still support.  The most noticeable thing
1 about it is that the program directs the request to the local proxy
1 server whose name you insert in the special file name (which in turn
1 calls 'www.yahoo.com'):
1 
1      BEGIN {
1        RS = ORS = "\r\n"
1        HttpService = "/inet/tcp/0/PROXY/80"
1        print "GET http://www.yahoo.com"     |& HttpService
1        while ((HttpService |& getline) > 0)
1           print $0
1        close(HttpService)
1      }
1 
1    Again, lines are separated by a redefined 'RS' and 'ORS'.  The 'GET'
1 request that we send to the server is the only kind of HTTP request that
1 existed when the web was created in the early 1990s.  HTTP calls this
1 'GET' request a "method," which tells the service to transmit a web page
1 (here the home page of the Yahoo!  search engine).  Version 1.0 added
1 the request methods 'HEAD' and 'POST'.  The current version of HTTP is
1 1.1,(1) and knows the additional request methods 'OPTIONS', 'PUT',
1 'DELETE', and 'TRACE'.  You can fill in any valid web address, and the
1 program prints the HTML code of that page to your screen.
1 
1    Notice the similarity between the responses of the POP and HTTP
1 services.  First, you get a header that is terminated by an empty line,
1 and then you get the body of the page in HTML. The lines of the headers
1 also have the same form as in POP. There is the name of a parameter,
1 then a colon, and finally the value of that parameter.
1 
1    Images ('.png' or '.gif' files) can also be retrieved this way, but
1 then you get binary data that should be redirected into a file.  Another
1 application is calling a CGI (Common Gateway Interface) script on some
1 server.  CGI scripts are used when the contents of a web page are not
1 constant, but generated instantly at the moment you send a request for
1 the page.  For example, to get a detailed report about the current
1 quotes of Motorola stock shares, call a CGI script at Yahoo!  with the
1 following:
1 
1      get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
1      print get |& HttpService
1 
1    You can also request weather reports this way.
1 
1    ---------- Footnotes ----------
1 
1    (1) Version 1.0 of HTTP was defined in RFC 1945.  HTTP 1.1 was
1 initially specified in RFC 2068.  In June 1999, RFC 2068 was made
1 obsolete by RFC 2616, an update without any substantial changes.
1