gawkinet: STOXPRED
1
1 3.9 STOXPRED: Stock Market Prediction As A Service
1 ==================================================
1
1 Far out in the uncharted backwaters of the unfashionable end of the
1 Western Spiral arm of the Galaxy lies a small unregarded yellow
1 sun.
1
1 Orbiting this at a distance of roughly ninety-two million miles is
1 an utterly insignificant little blue-green planet whose
1 ape-descendent life forms are so amazingly primitive that they
1 still think digital watches are a pretty neat idea.
1
1 This planet has -- or rather had -- a problem, which was this: most
1 of the people living on it were unhappy for pretty much of the
1 time. Many solutions were suggested for this problem, but most of
1 these were largely concerned with the movements of small green
1 pieces of paper, which is odd because it wasn't the small green
1 pieces of paper that were unhappy.
1 Douglas Adams, 'The Hitch Hiker's Guide to the Galaxy'
1
1 Valuable services on the Internet are usually _not_ implemented as
1 mobile agents. There are much simpler ways of implementing services.
1 All Unix systems provide, for example, the 'cron' service. Unix system
1 users can write a list of tasks to be done each day, each week, twice a
1 day, or just once. The list is entered into a file named 'crontab'.
1 For example, to distribute a newsletter on a daily basis this way, use
1 'cron' for calling a script each day early in the morning.
1
1 # run at 8 am on weekdays, distribute the newsletter
1 0 8 * * 1-5 $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1
1
1 The script first looks for interesting information on the Internet,
1 assembles it in a nice form and sends the results via email to the
1 customers.
1
1 The following is an example of a primitive newsletter on stock market
1 prediction. It is a report which first tries to predict the change of
1 each share in the Dow Jones Industrial Index for the particular day.
1 Then it mentions some especially promising shares as well as some shares
1 which look remarkably bad on that day. The report ends with the usual
1 disclaimer which tells every child _not_ to try this at home and hurt
1 anybody.
1
1 Good morning Uncle Scrooge,
1
1 This is your daily stock market report for Monday, October 16, 2000.
1 Here are the predictions for today:
1
1 AA neutral
1 GE up
1 JNJ down
1 MSFT neutral
1 ...
1 UTX up
1 DD down
1 IBM up
1 MO down
1 WMT up
1 DIS up
1 INTC up
1 MRK down
1 XOM down
1 EK down
1 IP down
1
1 The most promising shares for today are these:
1
1 INTC http://biz.yahoo.com/n/i/intc.html
1
1 The stock shares to avoid today are these:
1
1 EK http://biz.yahoo.com/n/e/ek.html
1 IP http://biz.yahoo.com/n/i/ip.html
1 DD http://biz.yahoo.com/n/d/dd.html
1 ...
1
1 The script as a whole is rather long. In order to ease the pain of
1 studying other people's source code, we have broken the script up into
1 meaningful parts which are invoked one after the other. The basic
1 structure of the script is as follows:
1
1 BEGIN {
1 Init()
1 ReadQuotes()
1 CleanUp()
1 Prediction()
1 Report()
1 SendMail()
1 }
1
1 The earlier parts store data into variables and arrays which are
1 subsequently used by later parts of the script. The 'Init()' function
1 first checks if the script is invoked correctly (without any
1 parameters). If not, it informs the user of the correct usage. What
1 follows are preparations for the retrieval of the historical quote data.
1 The names of the 30 stock shares are stored in an array 'name' along
1 with the current date in 'day', 'month', and 'year'.
1
1 All users who are separated from the Internet by a firewall and have
1 to direct their Internet accesses to a proxy must supply the name of the
1 proxy to this script with the '-v Proxy=NAME' option. For most users,
1 the default proxy and port number should suffice.
1
1 function Init() {
1 if (ARGC != 1) {
1 print "STOXPRED - daily stock share prediction"
1 print "IN:\n no parameters, nothing on stdin"
1 print "PARAM:\n -v Proxy=MyProxy -v ProxyPort=80"
1 print "OUT:\n commented predictions as email"
1 print "JK 09.10.2000"
1 exit
1 }
1 # Remember ticker symbols from Dow Jones Industrial Index
1 StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
1 SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
1 MRK XOM EK IP", name);
1 # Remember the current date as the end of the time series
1 day = strftime("%d")
1 month = strftime("%m")
1 year = strftime("%Y")
1 if (Proxy == "") Proxy = "chart.yahoo.com"
1 if (ProxyPort == 0) ProxyPort = 80
1 YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
1 }
1
1 There are two really interesting parts in the script. One is the
1 function which reads the historical stock quotes from an Internet
1 server. The other is the one that does the actual prediction. In the
1 following function we see how the quotes are read from the Yahoo server.
1 The data which comes from the server is in CSV format (comma-separated
1 values):
1
1 Date,Open,High,Low,Close,Volume
1 9-Oct-00,22.75,22.75,21.375,22.375,7888500
1 6-Oct-00,23.8125,24.9375,21.5625,22,10701100
1 5-Oct-00,24.4375,24.625,23.125,23.50,5810300
1
1 Lines contain values of the same time instant, whereas columns are
1 separated by commas and contain the kind of data that is described in
1 the header (first) line. At first, 'gawk' is instructed to separate
1 columns by commas ('FS = ","'). In the loop that follows, a connection
1 to the Yahoo server is first opened, then a download takes place, and
1 finally the connection is closed. All this happens once for each ticker
1 symbol. In the body of this loop, an Internet address is built up as a
1 string according to the rules of the Yahoo server. The starting and
1 ending date are chosen to be exactly the same, but one year apart in the
1 past. All the action is initiated within the 'printf' command which
1 transmits the request for data to the Yahoo server.
1
1 In the inner loop, the server's data is first read and then scanned
1 line by line. Only lines which have six columns and the name of a month
1 in the first column contain relevant data. This data is stored in the
1 two-dimensional array 'quote'; one dimension being time, the other being
1 the ticker symbol. During retrieval of the first stock's data, the
1 calendar names of the time instances are stored in the array 'day'
1 because we need them later.
1
1 function ReadQuotes() {
1 # Retrieve historical data for each ticker symbol
1 FS = ","
1 for (stock = 1; stock <= StockCount; stock++) {
1 URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
1 "&a=" month "&b=" day "&c=" year-1 \
1 "&d=" month "&e=" day "&f=" year \
1 "g=d&q=q&y=0&z=" name[stock] "&x=.csv"
1 printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
1 while ((YahooData |& getline) > 0) {
1 if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) {
1 if (stock == 1)
1 days[++daycount] = $1;
1 quote[$1, stock] = $5
1 }
1 }
1 close(YahooData)
1 }
1 FS = " "
1 }
1
1 Now that we _have_ the data, it can be checked once again to make
1 sure that no individual stock is missing or invalid, and that all the
1 stock quotes are aligned correctly. Furthermore, we renumber the time
1 instances. The most recent day gets day number 1 and all other days get
1 consecutive numbers. All quotes are rounded toward the nearest whole
1 number in US Dollars.
1
1 function CleanUp() {
1 # clean up time series; eliminate incomplete data sets
1 for (d = 1; d <= daycount; d++) {
1 for (stock = 1; stock <= StockCount; stock++)
1 if (! ((days[d], stock) in quote))
1 stock = StockCount + 10
1 if (stock > StockCount + 1)
1 continue
1 datacount++
1 for (stock = 1; stock <= StockCount; stock++)
1 data[datacount, stock] = int(0.5 + quote[days[d], stock])
1 }
1 delete quote
1 delete days
1 }
1
1 Now we have arrived at the second really interesting part of the
1 whole affair. What we present here is a very primitive prediction
1 algorithm: _If a stock fell yesterday, assume it will also fall today;
1 if it rose yesterday, assume it will rise today_. (Feel free to replace
1 this algorithm with a smarter one.) If a stock changed in the same
1 direction on two consecutive days, this is an indication which should be
1 highlighted. Two-day advances are stored in 'hot' and two-day declines
1 in 'avoid'.
1
1 The rest of the function is a sanity check. It counts the number of
1 correct predictions in relation to the total number of predictions one
1 could have made in the year before.
1
1 function Prediction() {
1 # Predict each ticker symbol by prolonging yesterday's trend
1 for (stock = 1; stock <= StockCount; stock++) {
1 if (data[1, stock] > data[2, stock]) {
1 predict[stock] = "up"
1 } else if (data[1, stock] < data[2, stock]) {
1 predict[stock] = "down"
1 } else {
1 predict[stock] = "neutral"
1 }
1 if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
1 hot[stock] = 1
1 if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
1 avoid[stock] = 1
1 }
1 # Do a plausibility check: how many predictions proved correct?
1 for (s = 1; s <= StockCount; s++) {
1 for (d = 1; d <= datacount-2; d++) {
1 if (data[d+1, s] > data[d+2, s]) {
1 UpCount++
1 } else if (data[d+1, s] < data[d+2, s]) {
1 DownCount++
1 } else {
1 NeutralCount++
1 }
1 if (((data[d, s] > data[d+1, s]) && (data[d+1, s] > data[d+2, s])) ||
1 ((data[d, s] < data[d+1, s]) && (data[d+1, s] < data[d+2, s])) ||
1 ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
1 CorrectCount++
1 }
1 }
1 }
1
1 At this point the hard work has been done: the array 'predict'
1 contains the predictions for all the ticker symbols. It is up to the
1 function 'Report()' to find some nice words to introduce the desired
1 information.
1
1 function Report() {
1 # Generate report
1 report = "\nThis is your daily "
1 report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
1 report = report "Here are the predictions for today:\n\n"
1 for (stock = 1; stock <= StockCount; stock++)
1 report = report "\t" name[stock] "\t" predict[stock] "\n"
1 for (stock in hot) {
1 if (HotCount++ == 0)
1 report = report "\nThe most promising shares for today are these:\n\n"
1 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
1 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
1 }
1 for (stock in avoid) {
1 if (AvoidCount++ == 0)
1 report = report "\nThe stock shares to avoid today are these:\n\n"
1 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
1 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
1 }
1 report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
1 report = report " losers. When using this kind\nof prediction scheme for"
1 report = report " the 12 months which lie behind us,\nwe get " UpCount
1 report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
1 report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
1 report = report " predictions " CorrectCount " proved correct next day.\n"
1 report = report "A success rate of "\
1 int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
1 report = report "Random choice would have produced a 33% success rate.\n"
1 report = report "Disclaimer: Like every other prediction of the stock\n"
1 report = report "market, this report is, of course, complete nonsense.\n"
1 report = report "If you are stupid enough to believe these predictions\n"
1 report = report "you should visit a doctor who can treat your ailment."
1 }
1
1 The function 'SendMail()' goes through the list of customers and
1 opens a pipe to the 'mail' command for each of them. Each one receives
1 an email message with a proper subject heading and is addressed with his
1 full name.
1
1 function SendMail() {
1 # send report to customers
1 customer["uncle.scrooge@ducktown.gov"] = "Uncle Scrooge"
1 customer["more@utopia.org" ] = "Sir Thomas More"
1 customer["spinoza@denhaag.nl" ] = "Baruch de Spinoza"
1 customer["marx@highgate.uk" ] = "Karl Marx"
1 customer["keynes@the.long.run" ] = "John Maynard Keynes"
1 customer["bierce@devil.hell.org" ] = "Ambrose Bierce"
1 customer["laplace@paris.fr" ] = "Pierre Simon de Laplace"
1 for (c in customer) {
1 MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
1 print "Good morning " customer[c] "," | MailPipe
1 print report "\n.\n" | MailPipe
1 close(MailPipe)
1 }
1 }
1
1 Be patient when running the script by hand. Retrieving the data for
1 all the ticker symbols and sending the emails may take several minutes
1 to complete, depending upon network traffic and the speed of the
1 available Internet link. The quality of the prediction algorithm is
1 likely to be disappointing. Try to find a better one. Should you find
1 one with a success rate of more than 50%, please tell us about it! It
1 is only for the sake of curiosity, of course. ':-)'
1