gawkinet: CGI Lib
1
1 2.9.1 A Simple CGI Library
1 --------------------------
1
1 HTTP is like being married: you have to be able to handle whatever
1 you're given, while being very careful what you send back.
1 Phil Smith III,
1 <http://www.netfunny.com/rhf/jokes/99/Mar/http.html>
1
1 In ⇒A Web Service with Interaction Interacting Service, we saw
1 the function 'CGI_setup()' as part of the web server "core logic"
1 framework. The code presented there handles almost everything necessary
1 for CGI requests. One thing it doesn't do is handle encoded characters
1 in the requests. For example, an '&' is encoded as a percent sign
1 followed by the hexadecimal value: '%26'. These encoded values should
1 be decoded. Following is a simple library to perform these tasks. This
1 code is used for all web server examples used throughout the rest of
1 this Info file. If you want to use it for your own web server, store
1 the source code into a file named 'inetlib.awk'. Then you can include
1 these functions into your code by placing the following statement into
1 your program (on the first line of your script):
1
1 @include inetlib.awk
1
1 But beware, this mechanism is only possible if you invoke your web
1 server script with 'igawk' instead of the usual 'awk' or 'gawk'. Here
1 is the code:
1
1 # CGI Library and core of a web server
1 # Global arrays
1 # GETARG --- arguments to CGI GET command
1 # MENU --- menu items (path names)
1 # PARAM --- parameters of form x=y
1
1 # Optional variable MyHost contains host address
1 # Optional variable MyPort contains port number
1 # Needs TopHeader, TopDoc, TopFooter
1 # Sets MyPrefix, HttpService, Status, Reason
1
1 BEGIN {
1 if (MyHost == "") {
1 "uname -n" | getline MyHost
1 close("uname -n")
1 }
1 if (MyPort == 0) MyPort = 8080
1 HttpService = "/inet/tcp/" MyPort "/0/0"
1 MyPrefix = "http://" MyHost ":" MyPort
1 SetUpServer()
1 while ("awk" != "complex") {
1 # header lines are terminated this way
1 RS = ORS = "\r\n"
1 Status = 200 # this means OK
1 Reason = "OK"
1 Header = TopHeader
1 Document = TopDoc
1 Footer = TopFooter
1 if (GETARG["Method"] == "GET") {
1 HandleGET()
1 } else if (GETARG["Method"] == "HEAD") {
1 # not yet implemented
1 } else if (GETARG["Method"] != "") {
1 print "bad method", GETARG["Method"]
1 }
1 Prompt = Header Document Footer
1 print "HTTP/1.0", Status, Reason |& HttpService
1 print "Connection: Close" |& HttpService
1 print "Pragma: no-cache" |& HttpService
1 len = length(Prompt) + length(ORS)
1 print "Content-length:", len |& HttpService
1 print ORS Prompt |& HttpService
1 # ignore all the header lines
1 while ((HttpService |& getline) > 0)
1 continue
1 # stop talking to this client
1 close(HttpService)
1 # wait for new client request
1 HttpService |& getline
1 # do some logging
1 print systime(), strftime(), $0
1 CGI_setup($1, $2, $3)
1 }
1 }
1
1 function CGI_setup( method, uri, version, i)
1 {
1 delete GETARG
1 delete MENU
1 delete PARAM
1 GETARG["Method"] = method
1 GETARG["URI"] = uri
1 GETARG["Version"] = version
1
1 i = index(uri, "?")
1 if (i > 0) { # is there a "?" indicating a CGI request?
1 split(substr(uri, 1, i-1), MENU, "[/:]")
1 split(substr(uri, i+1), PARAM, "&")
1 for (i in PARAM) {
1 PARAM[i] = _CGI_decode(PARAM[i])
1 j = index(PARAM[i], "=")
1 GETARG[substr(PARAM[i], 1, j-1)] = \
1 substr(PARAM[i], j+1)
1 }
1 } else { # there is no "?", no need for splitting PARAMs
1 split(uri, MENU, "[/:]")
1 }
1 for (i in MENU) # decode characters in path
1 if (i > 4) # but not those in host name
1 MENU[i] = _CGI_decode(MENU[i])
1 }
1
1 This isolates details in a single function, 'CGI_setup()'. Decoding
1 of encoded characters is pushed off to a helper function,
1 '_CGI_decode()'. The use of the leading underscore ('_') in the
1 function name is intended to indicate that it is an "internal" function,
1 although there is nothing to enforce this:
1
1 function _CGI_decode(str, hexdigs, i, pre, code1, code2,
1 val, result)
1 {
1 hexdigs = "123456789abcdef"
1
1 i = index(str, "%")
1 if (i == 0) # no work to do
1 return str
1
1 do {
1 pre = substr(str, 1, i-1) # part before %xx
1 code1 = substr(str, i+1, 1) # first hex digit
1 code2 = substr(str, i+2, 1) # second hex digit
1 str = substr(str, i+3) # rest of string
1
1 code1 = tolower(code1)
1 code2 = tolower(code2)
1 val = index(hexdigs, code1) * 16 \
1 + index(hexdigs, code2)
1
1 result = result pre sprintf("%c", val)
1 i = index(str, "%")
1 } while (i != 0)
1 if (length(str) > 0)
1 result = result str
1 return result
1 }
1
1 This works by splitting the string apart around an encoded character.
1 The two digits are converted to lowercase characters and looked up in a
1 string of hex digits. Note that '0' is not in the string on purpose;
1 'index()' returns zero when it's not found, automatically giving the
1 correct value! Once the hexadecimal value is converted from characters
1 in a string into a numerical value, 'sprintf()' converts the value back
1 into a real character. The following is a simple test harness for the
1 above functions:
1
1 BEGIN {
1 CGI_setup("GET",
1 "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
1 "&percent=a %25 sign",
1 "1.0")
1 for (i in MENU)
1 printf "MENU[\"%s\"] = %s\n", i, MENU[i]
1 for (i in PARAM)
1 printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
1 for (i in GETARG)
1 printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
1 }
1
1 And this is the result when we run it:
1
1 $ gawk -f testserv.awk
1 -| MENU["4"] = www.gnu.org
1 -| MENU["5"] = cgi-bin
1 -| MENU["6"] = foo
1 -| MENU["1"] = http
1 -| MENU["2"] =
1 -| MENU["3"] =
1 -| PARAM["1"] = p1=stuff
1 -| PARAM["2"] = p2=stuff&junk
1 -| PARAM["3"] = percent=a % sign
1 -| GETARG["p1"] = stuff
1 -| GETARG["percent"] = a % sign
1 -| GETARG["p2"] = stuff&junk
1 -| GETARG["Method"] = GET
1 -| GETARG["Version"] = 1.0
1 -| GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
1 p2=stuff%26junk&percent=a %25 sign
1