Snatch - Working Code examples

snatch      dtd     subs     

Snatch XML Samples

These work for me. They should work for you also, assuming a dozen things don't go wrong.


LWP has chunk, all and file variations. LWP chunk and plain LWP are equivalent because chunk is the default.
The data turns up in $_ ready for the <sub> to massage it.
Note: Make sure that <sub> returns false when chunking till you have what you want or it won't work right!

This example gets weather from Yahoo. I grab 4 of these for various locations and have a picture of the whole country. It's in a <TR> tag and ready to insert.

Weather from Yahoo

<name>weatherBKK</name> <method>LWP chunk</method> <url></url> <sub> my $rebegin='<!--Begin Extended Forecast-->'; #Yahoo uses these my $reend='<!--End forecast table-->'; # making extraction easy m/^.*?$rebegin\s*(.*?)\s*$reend/is; my $s=$1; if ($s) { $s=~ s!/?graphics/new_icons/!!sg; $s= "<TR><TD><b>Bangkok</b></TD><TD>" . $s . "</TD></TR>"; }
</sub> <item>
<method>LWP all</method> is available for shorter pages. It just grabs everything in one shot and returns it in a scalar. And <method>LWP file=/perl/somefile.htm</method> is also available. Note this last returns the filename only (or an error message if something went wrong). The Perl sub must open it to do something useful, unless you only want to store the file somewhere, like this example that returns a GIF.

Bank exchange Rates

<name>BangkokBank</name> <method>LWP file=/localweb/images/bankrates.gif</method> <url></url> <sub>$_
</sub> <item>

RSS News Feeds via LWP

RSS is very simple. Looks like this:
     <title>O'Reilly Labs Review: Object Design's eXcelon 1.1</title>
     <description>Jon Udell takes a look at eXcelon...(SNIP{</description>
You can use Eisenzopf's XML::RSS to pull the data out. But our Small Parser can handle this easily just by adding title, link, and description elements to a DTD to the top of the feed. The rest of the code uses a small HereDoc to format each item as a table row.
RSS has more information available but the essentials are simple and portable.
<name>MotherOfPerl</name> <method>LWP all</method> <url></url> <sub><!-- my $dtd=<< "DTD"; <?xml version="1.0"?> <!DOCTYPE rss [ <!ELEMENT item (title, link, description)> ]> DTD $s= $dtd . $_; my @ll=parseXML($s,"item"); die "Parse Failed" if not @ll; my ($temp, $text, $bg, $bgcolor); print "<BR><A HREF=\"\">MotherOfPerl RSS feed</A>\n"; print "<table> \n"; foreach $temp (@ll) { if($bg) {$bgcolor='#CCFF99'; $bg=0;} else {$bgcolor='#CCFFCC';$bg=1;} $text= << "ARTICLE"; <tr bgcolor=\"$bgcolor\"> <td><font size='-1' >$temp->{'title'}</font></td> <td><font size='-1'><A HREF=\"$temp->{'link'}\">$temp->{'title'}</a></font></td> <td><font size='-1' color='teal'>$temp->{'description'}</font></td></tr> ARTICLE print $text; } print "</table>\n"; undef; -->
</sub> <item>

Bank Rates with Sockets

LWP doesn't handle everything. Here's some sample socket code.
<name>BankRates</name> <sub><!-- use IO::Socket; my ($request_string,$rate,$reply,$conn,$len); my $base="THB"; my @quotes=("USD","MYR","GBP","AUD"); $s =''; foreach (@quotes) { $s= $s . sprintf "%s %s ",$_, fxp($base,$_); } #$s=~ s/\n\r//; $_="<b>Thai Baht to $s</B>"; sub fxp { my ($base,$quotecurrency) = @_; $conn=IO::Socket::INET->new( PeerAddr => "", PeerPort => 5011, Proto => 'tcp'); die "Couldn't connect to host\n" unless $conn; $request_string="fxp/1.1\nbasecurrency: $base\nquotecurrency: $quotecurrency\n\n"; $len = length($request_string); unless (syswrite($conn,$request_string,$len) == $len) { print " closed connection\n"; $conn->close(); die "No connection to\n"; } while ($reply=<$conn>) { if ($reply=~/^\d+\.+\d*/) { $rate=$reply; last; } } $rate; } -->
</sub> <item>

Set the PC Clock

I like using my ISP for this one. Its usually faster and close enough unless you need to synch to a particular host.
<name>SetTime</name> <sub>use Net::Time qw(inet_time); #set the pc clock my $host=""; my $t=inet_time($host,'udp'); my ($sec, $min, $hour) = (localtime($t))[0,1,2]; my($old_sec, $old_min, $old_hour) = (localtime(time))[0,1,2]; my $min_diff = $min - $old_min; my $hour = $old_hour; if (abs($min_diff) < 30 ) { #if more than 30 minutes out set by hand! $sec_diff = 60*(60*$hour + $min) + $sec - (60*(60*$old_hour + $old_min) + $old_sec); if ($sec_diff > 3) { # set if more than 3 seconds off my $time_new = "$hour:$min:$sec"; my $rc=system("time $time_new"); } } $timestr=sprintf("%s %02d:%02d:%02d %+d",substr($now,0,16),$hour,$min,$sec,$sec_diff);
</sub> <item>



Normally the footer code will run and the page will be closed when Snatch runs out of items to run.
Sometimes you want to close the page manually. flush runs any <sub> code, finishes with anything in footer, closes the output file and puts STDOUT back. Snatch needs another page definition at this point or you can do something else before linking off to the unknown, see the IE examples.
<name>flush</name> <method>flush</method> <sub><!-- print "<BR>", checkTimer('report')/1000," seconds for the report<BR>\n"; undef; -->
</sub> <item>


Specifying multiple XML files The link method stuffs the XML files into a list and calls them. The %sys hash can be used to pass along information.
<name>do all these</name> <method>link news.xml weather.xml</method> <item>