gawkinet: STOXPRED

1 
1 3.9 STOXPRED: Stock Market Prediction As A Service
1 ==================================================
1 
1      Far out in the uncharted backwaters of the unfashionable end of the
1      Western Spiral arm of the Galaxy lies a small unregarded yellow
1      sun.
1 
1      Orbiting this at a distance of roughly ninety-two million miles is
1      an utterly insignificant little blue-green planet whose
1      ape-descendent life forms are so amazingly primitive that they
1      still think digital watches are a pretty neat idea.
1 
1      This planet has -- or rather had -- a problem, which was this: most
1      of the people living on it were unhappy for pretty much of the
1      time.  Many solutions were suggested for this problem, but most of
1      these were largely concerned with the movements of small green
1      pieces of paper, which is odd because it wasn't the small green
1      pieces of paper that were unhappy.
1      Douglas Adams, 'The Hitch Hiker's Guide to the Galaxy'
1 
1    Valuable services on the Internet are usually _not_ implemented as
1 mobile agents.  There are much simpler ways of implementing services.
1 All Unix systems provide, for example, the 'cron' service.  Unix system
1 users can write a list of tasks to be done each day, each week, twice a
1 day, or just once.  The list is entered into a file named 'crontab'.
1 For example, to distribute a newsletter on a daily basis this way, use
1 'cron' for calling a script each day early in the morning.
1 
1      # run at 8 am on weekdays, distribute the newsletter
1      0 8 * * 1-5   $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1
1 
1    The script first looks for interesting information on the Internet,
1 assembles it in a nice form and sends the results via email to the
1 customers.
1 
1    The following is an example of a primitive newsletter on stock market
1 prediction.  It is a report which first tries to predict the change of
1 each share in the Dow Jones Industrial Index for the particular day.
1 Then it mentions some especially promising shares as well as some shares
1 which look remarkably bad on that day.  The report ends with the usual
1 disclaimer which tells every child _not_ to try this at home and hurt
1 anybody.
1 
1      Good morning Uncle Scrooge,
1 
1      This is your daily stock market report for Monday, October 16, 2000.
1      Here are the predictions for today:
1 
1              AA      neutral
1              GE      up
1              JNJ     down
1              MSFT    neutral
1              ...
1              UTX     up
1              DD      down
1              IBM     up
1              MO      down
1              WMT     up
1              DIS     up
1              INTC    up
1              MRK     down
1              XOM     down
1              EK      down
1              IP      down
1 
1      The most promising shares for today are these:
1 
1              INTC            http://biz.yahoo.com/n/i/intc.html
1 
1      The stock shares to avoid today are these:
1 
1              EK              http://biz.yahoo.com/n/e/ek.html
1              IP              http://biz.yahoo.com/n/i/ip.html
1              DD              http://biz.yahoo.com/n/d/dd.html
1              ...
1 
1    The script as a whole is rather long.  In order to ease the pain of
1 studying other people's source code, we have broken the script up into
1 meaningful parts which are invoked one after the other.  The basic
1 structure of the script is as follows:
1 
1      BEGIN {
1        Init()
1        ReadQuotes()
1        CleanUp()
1        Prediction()
1        Report()
1        SendMail()
1      }
1 
1    The earlier parts store data into variables and arrays which are
1 subsequently used by later parts of the script.  The 'Init()' function
1 first checks if the script is invoked correctly (without any
1 parameters).  If not, it informs the user of the correct usage.  What
1 follows are preparations for the retrieval of the historical quote data.
1 The names of the 30 stock shares are stored in an array 'name' along
1 with the current date in 'day', 'month', and 'year'.
1 
1    All users who are separated from the Internet by a firewall and have
1 to direct their Internet accesses to a proxy must supply the name of the
1 proxy to this script with the '-v Proxy=NAME' option.  For most users,
1 the default proxy and port number should suffice.
1 
1      function Init() {
1        if (ARGC != 1) {
1          print "STOXPRED - daily stock share prediction"
1          print "IN:\n    no parameters, nothing on stdin"
1          print "PARAM:\n    -v Proxy=MyProxy -v ProxyPort=80"
1          print "OUT:\n    commented predictions as email"
1          print "JK 09.10.2000"
1          exit
1        }
1        # Remember ticker symbols from Dow Jones Industrial Index
1        StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
1          SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
1          MRK XOM EK IP", name);
1        # Remember the current date as the end of the time series
1        day   = strftime("%d")
1        month = strftime("%m")
1        year  = strftime("%Y")
1        if (Proxy     == "")  Proxy     = "chart.yahoo.com"
1        if (ProxyPort ==  0)  ProxyPort = 80
1        YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
1      }
1 
1    There are two really interesting parts in the script.  One is the
1 function which reads the historical stock quotes from an Internet
1 server.  The other is the one that does the actual prediction.  In the
1 following function we see how the quotes are read from the Yahoo server.
1 The data which comes from the server is in CSV format (comma-separated
1 values):
1 
1      Date,Open,High,Low,Close,Volume
1      9-Oct-00,22.75,22.75,21.375,22.375,7888500
1      6-Oct-00,23.8125,24.9375,21.5625,22,10701100
1      5-Oct-00,24.4375,24.625,23.125,23.50,5810300
1 
1    Lines contain values of the same time instant, whereas columns are
1 separated by commas and contain the kind of data that is described in
1 the header (first) line.  At first, 'gawk' is instructed to separate
1 columns by commas ('FS = ","').  In the loop that follows, a connection
1 to the Yahoo server is first opened, then a download takes place, and
1 finally the connection is closed.  All this happens once for each ticker
1 symbol.  In the body of this loop, an Internet address is built up as a
1 string according to the rules of the Yahoo server.  The starting and
1 ending date are chosen to be exactly the same, but one year apart in the
1 past.  All the action is initiated within the 'printf' command which
1 transmits the request for data to the Yahoo server.
1 
1    In the inner loop, the server's data is first read and then scanned
1 line by line.  Only lines which have six columns and the name of a month
1 in the first column contain relevant data.  This data is stored in the
1 two-dimensional array 'quote'; one dimension being time, the other being
1 the ticker symbol.  During retrieval of the first stock's data, the
1 calendar names of the time instances are stored in the array 'day'
1 because we need them later.
1 
1      function ReadQuotes() {
1        # Retrieve historical data for each ticker symbol
1        FS = ","
1        for (stock = 1; stock <= StockCount; stock++) {
1          URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
1                "&a=" month "&b=" day   "&c=" year-1 \
1                "&d=" month "&e=" day   "&f=" year \
1                "g=d&q=q&y=0&z=" name[stock] "&x=.csv"
1          printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
1          while ((YahooData |& getline) > 0) {
1            if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) {
1              if (stock == 1)
1                days[++daycount] = $1;
1              quote[$1, stock] = $5
1            }
1          }
1          close(YahooData)
1        }
1        FS = " "
1      }
1 
1    Now that we _have_ the data, it can be checked once again to make
1 sure that no individual stock is missing or invalid, and that all the
1 stock quotes are aligned correctly.  Furthermore, we renumber the time
1 instances.  The most recent day gets day number 1 and all other days get
1 consecutive numbers.  All quotes are rounded toward the nearest whole
1 number in US Dollars.
1 
1      function CleanUp() {
1        # clean up time series; eliminate incomplete data sets
1        for (d = 1; d <= daycount; d++) {
1          for (stock = 1; stock <= StockCount; stock++)
1            if (! ((days[d], stock) in quote))
1                stock = StockCount + 10
1          if (stock > StockCount + 1)
1              continue
1          datacount++
1          for (stock = 1; stock <= StockCount; stock++)
1            data[datacount, stock] = int(0.5 + quote[days[d], stock])
1        }
1        delete quote
1        delete days
1      }
1 
1    Now we have arrived at the second really interesting part of the
1 whole affair.  What we present here is a very primitive prediction
1 algorithm: _If a stock fell yesterday, assume it will also fall today;
1 if it rose yesterday, assume it will rise today_.  (Feel free to replace
1 this algorithm with a smarter one.)  If a stock changed in the same
1 direction on two consecutive days, this is an indication which should be
1 highlighted.  Two-day advances are stored in 'hot' and two-day declines
1 in 'avoid'.
1 
1    The rest of the function is a sanity check.  It counts the number of
1 correct predictions in relation to the total number of predictions one
1 could have made in the year before.
1 
1      function Prediction() {
1        # Predict each ticker symbol by prolonging yesterday's trend
1        for (stock = 1; stock <= StockCount; stock++) {
1          if         (data[1, stock] > data[2, stock]) {
1            predict[stock] = "up"
1          } else if  (data[1, stock] < data[2, stock]) {
1            predict[stock] = "down"
1          } else {
1            predict[stock] = "neutral"
1          }
1          if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
1            hot[stock] = 1
1          if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
1            avoid[stock] = 1
1        }
1        # Do a plausibility check: how many predictions proved correct?
1        for (s = 1; s <= StockCount; s++) {
1          for (d = 1; d <= datacount-2; d++) {
1            if         (data[d+1, s] > data[d+2, s]) {
1              UpCount++
1            } else if  (data[d+1, s] < data[d+2, s]) {
1              DownCount++
1            } else {
1              NeutralCount++
1            }
1            if (((data[d, s]  > data[d+1, s]) && (data[d+1, s]  > data[d+2, s])) ||
1                ((data[d, s]  < data[d+1, s]) && (data[d+1, s]  < data[d+2, s])) ||
1                ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
1              CorrectCount++
1          }
1        }
1      }
1 
1    At this point the hard work has been done: the array 'predict'
1 contains the predictions for all the ticker symbols.  It is up to the
1 function 'Report()' to find some nice words to introduce the desired
1 information.
1 
1      function Report() {
1        # Generate report
1        report =        "\nThis is your daily "
1        report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
1        report = report "Here are the predictions for today:\n\n"
1        for (stock = 1; stock <= StockCount; stock++)
1          report = report "\t" name[stock] "\t" predict[stock] "\n"
1        for (stock in hot) {
1          if (HotCount++ == 0)
1            report = report "\nThe most promising shares for today are these:\n\n"
1          report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
1            tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
1        }
1        for (stock in avoid) {
1          if (AvoidCount++ == 0)
1            report = report "\nThe stock shares to avoid today are these:\n\n"
1          report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
1            tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
1        }
1        report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
1        report = report " losers. When using this kind\nof prediction scheme for"
1        report = report " the 12 months which lie behind us,\nwe get " UpCount
1        report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
1        report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
1        report = report " predictions " CorrectCount " proved correct next day.\n"
1        report = report "A success rate of "\
1                   int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
1        report = report "Random choice would have produced a 33% success rate.\n"
1        report = report "Disclaimer: Like every other prediction of the stock\n"
1        report = report "market, this report is, of course, complete nonsense.\n"
1        report = report "If you are stupid enough to believe these predictions\n"
1        report = report "you should visit a doctor who can treat your ailment."
1      }
1 
1    The function 'SendMail()' goes through the list of customers and
1 opens a pipe to the 'mail' command for each of them.  Each one receives
1 an email message with a proper subject heading and is addressed with his
1 full name.
1 
1      function SendMail() {
1        # send report to customers
1        customer["uncle.scrooge@ducktown.gov"] = "Uncle Scrooge"
1        customer["more@utopia.org"           ] = "Sir Thomas More"
1        customer["spinoza@denhaag.nl"        ] = "Baruch de Spinoza"
1        customer["marx@highgate.uk"          ] = "Karl Marx"
1        customer["keynes@the.long.run"       ] = "John Maynard Keynes"
1        customer["bierce@devil.hell.org"     ] = "Ambrose Bierce"
1        customer["laplace@paris.fr"          ] = "Pierre Simon de Laplace"
1        for (c in customer) {
1          MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
1          print "Good morning " customer[c] "," | MailPipe
1          print report "\n.\n" | MailPipe
1          close(MailPipe)
1        }
1      }
1 
1    Be patient when running the script by hand.  Retrieving the data for
1 all the ticker symbols and sending the emails may take several minutes
1 to complete, depending upon network traffic and the speed of the
1 available Internet link.  The quality of the prediction algorithm is
1 likely to be disappointing.  Try to find a better one.  Should you find
1 one with a success rate of more than 50%, please tell us about it!  It
1 is only for the sake of curiosity, of course.  ':-)'
1