Home Page   the Perl Page   Programming and Computer Reviews   and Thailand Comments

Snatch - Overview

dtd    GUI    IE5 & OLE    subs    XML examples   

These docs are not quite finished yet, and the GUI code in particular is messy and filled with experiments. The XML examples will arrive before the docs.

Snatch now has gotten scheduling primitives, a GUI debugger, drivers for using OLE on IE5 and several other important bits and pieces. The timing stuff means you can theoretically run Snatch as a background program that does cool things. When I find a more elegant way to check if I'm online I'll demo this.
Package: snatch13.zip (30KB) the whole banana

Grabbing Data off the Web with Perl and XML
(and then assembling the bits)

Synopsis:
Using an XML style config file with Perl code imbedded in it build an HTML page.

<Fudge alert!>
The Perl code is outputting HTML. To avoid driving the programmer crazy the Snatch XML does not demand escaping little things like "<". Thus it isn't using Real and True (tm) XML. Inserting &amp;s all over the place while you're coding would be horrible messy. So sue me.
</Fudge alert!>

Just as a place to begin lets say a typical Snatch file will look something like this:
Page Layout Data
item
item
item

And an example item looks like this:
<item>
<name>population</name> <method>LWP chunk</method> <url>http://www.census.gov/cgi-bin/ipc/popclockw/</url> <sub> m!<h1>(.+?)</h1>!si; $_="<h3>$1 Estimated Population</h3>"; </sub> <item>
The <sub> elements contain perl code to extract and format snatched information. The item is an XML wrapper around this.

Page layout looks like this:
<page>
<outfile>/perl/gui/reporter.html</outfile>
<head>
    my $time=time;
    my $title="The News for ". localtime($time);
    $_= qq|<html\><head\>
        <title\>$title</title\></head\><body>
        |;
</head>
    <footer>$_= "time-$time. seconds\n"</footer>
<page>
Snatch will go get the data using LWP. (or sockets or whatevery you want to code).

Anything not conveniently handled with regexes can call HTMP::Parser; or XML::Parser; for assistance.
Output usually goes to an HTML page. STDOUT is redirected so that print statements go to the output file.

Main Idea here is to keep the useful bits in one convenient location.

You can grab the whole item and plug it into another page with no problems.
<sub> can contain any Perl that will run in an eval, which is apparently anything, and its only a little harder to debug buried inside there. You can also define a lurl pair which can reference a page on your hard disk till you get it right. This doesn't mean the code will work over a live HTTP connection, necessarily, but it helps.
Note the HTML markup in the Perl code. This is actually a very good place to do that sort of thing and especially useful for tables.

   BTW, my definition of well formed XML is that IE5.0 will display it properly, which fits this rather loose approach. More on that at the DTD page.