Saturday, February 05, 2011

NedNews: HTTP

At this point, I would like to set up my database schemas. I'm very familiar with Usenet, so I think I can lay out everything there. However, I am less familiar with the XML that comes in RSS feeds, so I want to sift through some of that before committing to a database layout.

An RSS feed is indicated by a URL, and is XML content delivered via HTTP. We will need to load packages to handle this for us:
package require http
package require tdom


tdom is the package I use for parsing the NewStars XML turn files. It has an excellent XPath system.

We'll need a URL to fetch the data from (this should come from a subscription list, but we will hack it in for now):
set url "http://semipublic.comp-arch.net/wiki/index.php?title=Special:RecentChan
ges&feed=atom"


This is the list of recent edits from Andy Glew's informative Comp.Arch wiki.

Now, let's fetch some data. Things are somewhat complicated by the asynchronous nature of HTTP (we fire off a request, then some period of time later, we get data). So, we will need a callback to handle the eventual arrival of data. Ideally, we would handle multiple outstanding requests, but let's deal with one right now...
set httpData ""


This is a global variable which will store the result of our http request (this is why we can only handle one request at a time).

And the callback:
proc httpDone {token} {

set fail 0
switch [::http::status $token] {
ok {
if {[::http::ncode $token] != 200} {
puts "HTTP server returned non-OK code [::http::ncode $token]"
set fail 1
}
}

eof {
puts "HTTP server returned nothing"
set fail 1
}

error {
puts "HTTP error [::http::error $token]"
set fail 1
}
}

if {!$fail} {
set ::httpData [::http::data $token]
}
::http::cleanup $token
puts "HTTP Done"
}

The handler is complicated by how the HTTP package decided to handle errors and odd conditions. Ideally, it would roll bad "ok" states into "error" (and probably the "eof" condition as well), but there it is. Again, ideally, we would handle what errors we could (possibly with a retry), and notify the user in a better way (but I am the main user!)

Here we can see Tcl's switch statement. Pretty straightforward. The first argument provides the string that will be matched. The body consists of tuples. The first part of the tuple being the string to match, the second being code to execute on a match. There is no provision for fall-through, although you can handle multiple patterns with a single body:
switch $status {
eof -
error { puts "Not good" }
}

Now we are set to fire off a request:
set httpToken [::http::geturl $url -timeout 5000 -command httpDone]

I punch all this right into my wish session. Then I wait for some output... My first attempt used the "-channel" option, but it appears that requires an already open channel - so the command failed. I can rewrite my handler to not use the channel, paste it into my session, and reissue the command, without restarting (if you create a proc that matches an existing proc, the old one is overwritten).

Which reminds me. proc is just another command in Tcl. It takes three arguments: the name of the procedure, the arguments, and the body. Any Tcl command can be overriden this way (although it is best to rename the original to something else so you can get to it). I use this to trap the exit functionality:
rename exit origExit
proc exit {} {
# closeout some resources and cleanup some stuff
origExit
}


The second time I get:
HTTP Done

No comments: