gawkinet: CGI Lib

1 
1 2.9.1 A Simple CGI Library
1 --------------------------
1 
1      HTTP is like being married: you have to be able to handle whatever
1      you're given, while being very careful what you send back.
1      Phil Smith III,
1      <http://www.netfunny.com/rhf/jokes/99/Mar/http.html>
1 
1    In ⇒A Web Service with Interaction Interacting Service, we saw
1 the function 'CGI_setup()' as part of the web server "core logic"
1 framework.  The code presented there handles almost everything necessary
1 for CGI requests.  One thing it doesn't do is handle encoded characters
1 in the requests.  For example, an '&' is encoded as a percent sign
1 followed by the hexadecimal value: '%26'.  These encoded values should
1 be decoded.  Following is a simple library to perform these tasks.  This
1 code is used for all web server examples used throughout the rest of
1 this Info file.  If you want to use it for your own web server, store
1 the source code into a file named 'inetlib.awk'.  Then you can include
1 these functions into your code by placing the following statement into
1 your program (on the first line of your script):
1 
1      @include inetlib.awk
1 
1 But beware, this mechanism is only possible if you invoke your web
1 server script with 'igawk' instead of the usual 'awk' or 'gawk'.  Here
1 is the code:
1 
1      # CGI Library and core of a web server
1      # Global arrays
1      #   GETARG --- arguments to CGI GET command
1      #   MENU   --- menu items (path names)
1      #   PARAM  --- parameters of form x=y
1 
1      # Optional variable MyHost contains host address
1      # Optional variable MyPort contains port number
1      # Needs TopHeader, TopDoc, TopFooter
1      # Sets MyPrefix, HttpService, Status, Reason
1 
1      BEGIN {
1        if (MyHost == "") {
1           "uname -n" | getline MyHost
1           close("uname -n")
1        }
1        if (MyPort ==  0) MyPort = 8080
1        HttpService = "/inet/tcp/" MyPort "/0/0"
1        MyPrefix    = "http://" MyHost ":" MyPort
1        SetUpServer()
1        while ("awk" != "complex") {
1          # header lines are terminated this way
1          RS = ORS    = "\r\n"
1          Status      = 200             # this means OK
1          Reason      = "OK"
1          Header      = TopHeader
1          Document    = TopDoc
1          Footer      = TopFooter
1          if        (GETARG["Method"] == "GET") {
1              HandleGET()
1          } else if (GETARG["Method"] == "HEAD") {
1              # not yet implemented
1          } else if (GETARG["Method"] != "") {
1              print "bad method", GETARG["Method"]
1          }
1          Prompt = Header Document Footer
1          print "HTTP/1.0", Status, Reason     |& HttpService
1          print "Connection: Close"            |& HttpService
1          print "Pragma: no-cache"             |& HttpService
1          len = length(Prompt) + length(ORS)
1          print "Content-length:", len         |& HttpService
1          print ORS Prompt                     |& HttpService
1          # ignore all the header lines
1          while ((HttpService |& getline) > 0)
1              continue
1          # stop talking to this client
1          close(HttpService)
1          # wait for new client request
1          HttpService |& getline
1          # do some logging
1          print systime(), strftime(), $0
1          CGI_setup($1, $2, $3)
1        }
1      }
1 
1      function CGI_setup(   method, uri, version, i)
1      {
1          delete GETARG
1          delete MENU
1          delete PARAM
1          GETARG["Method"] = method
1          GETARG["URI"] = uri
1          GETARG["Version"] = version
1 
1          i = index(uri, "?")
1          if (i > 0) {  # is there a "?" indicating a CGI request?
1              split(substr(uri, 1, i-1), MENU, "[/:]")
1              split(substr(uri, i+1), PARAM, "&")
1              for (i in PARAM) {
1                  PARAM[i] = _CGI_decode(PARAM[i])
1                  j = index(PARAM[i], "=")
1                  GETARG[substr(PARAM[i], 1, j-1)] = \
1                                               substr(PARAM[i], j+1)
1              }
1          } else { # there is no "?", no need for splitting PARAMs
1              split(uri, MENU, "[/:]")
1          }
1          for (i in MENU)     # decode characters in path
1              if (i > 4)      # but not those in host name
1                  MENU[i] = _CGI_decode(MENU[i])
1      }
1 
1    This isolates details in a single function, 'CGI_setup()'.  Decoding
1 of encoded characters is pushed off to a helper function,
1 '_CGI_decode()'.  The use of the leading underscore ('_') in the
1 function name is intended to indicate that it is an "internal" function,
1 although there is nothing to enforce this:
1 
1      function _CGI_decode(str,   hexdigs, i, pre, code1, code2,
1                                  val, result)
1      {
1         hexdigs = "123456789abcdef"
1 
1         i = index(str, "%")
1         if (i == 0) # no work to do
1            return str
1 
1         do {
1            pre = substr(str, 1, i-1)   # part before %xx
1            code1 = substr(str, i+1, 1) # first hex digit
1            code2 = substr(str, i+2, 1) # second hex digit
1            str = substr(str, i+3)      # rest of string
1 
1            code1 = tolower(code1)
1            code2 = tolower(code2)
1            val = index(hexdigs, code1) * 16 \
1                  + index(hexdigs, code2)
1 
1            result = result pre sprintf("%c", val)
1            i = index(str, "%")
1         } while (i != 0)
1         if (length(str) > 0)
1            result = result str
1         return result
1      }
1 
1    This works by splitting the string apart around an encoded character.
1 The two digits are converted to lowercase characters and looked up in a
1 string of hex digits.  Note that '0' is not in the string on purpose;
1 'index()' returns zero when it's not found, automatically giving the
1 correct value!  Once the hexadecimal value is converted from characters
1 in a string into a numerical value, 'sprintf()' converts the value back
1 into a real character.  The following is a simple test harness for the
1 above functions:
1 
1      BEGIN {
1        CGI_setup("GET",
1        "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
1             "&percent=a %25 sign",
1        "1.0")
1        for (i in MENU)
1            printf "MENU[\"%s\"] = %s\n", i, MENU[i]
1        for (i in PARAM)
1            printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
1        for (i in GETARG)
1            printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
1      }
1 
1    And this is the result when we run it:
1 
1      $ gawk -f testserv.awk
1      -| MENU["4"] = www.gnu.org
1      -| MENU["5"] = cgi-bin
1      -| MENU["6"] = foo
1      -| MENU["1"] = http
1      -| MENU["2"] =
1      -| MENU["3"] =
1      -| PARAM["1"] = p1=stuff
1      -| PARAM["2"] = p2=stuff&junk
1      -| PARAM["3"] = percent=a % sign
1      -| GETARG["p1"] = stuff
1      -| GETARG["percent"] = a % sign
1      -| GETARG["p2"] = stuff&junk
1      -| GETARG["Method"] = GET
1      -| GETARG["Version"] = 1.0
1      -| GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
1      p2=stuff%26junk&percent=a %25 sign
1