<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
    <channel>
    <title>The Junkyard</title>
    <link>http://www.rork.nl/</link>
    <description></description>
    <language>en-us</language>           
    <generator>Nucleus CMS v3.64</generator>
    <copyright>&#169;</copyright>             
    <category>Weblog</category>
    <docs>http://backend.userland.com/rss</docs>
    <image>
        <url>http://www.rork.nl//nucleus/nucleus2.gif</url>
        <title>The Junkyard</title>
        <link>http://www.rork.nl/</link>
    </image>
    <item>
    <title>Oil Rush</title>
    <link>xml-rss2.php?itemid=117</link>
    <description><![CDATA[This week <a href = "http://unigine.com/" target = "_blank" title = "Unigine: real-time 3D engine (game, simulation, virtualization and VR)">Unigine</a> released it's so called Naval Strategy Game <a href = "http://www.oilrush-game.com/" target = "_blank" title = "Oil Rush: naval strategy game">Oil Rush</a>.  The game is about capturing and defending different types of platforms which produce oil or units and can be used as hubs to launch new attacks. As you need these platforms to build units and defenses the game focuses on quick action and defense.<br />
<br />
<div style="text-align: center"><img src="http://www.rork.nl/media/1/20120128-oilrush-overview.png" width="452" height="205" alt="Oil Rush - The battlefield" title="Oil Rush - The battlefield" /></div><b>Platforms</b><br />
Platforms are the most important items in Oil Rush, they produce oil or units and serve as navigation points. Unlike the RTS games I used to play you can't build platforms in Oil Rush, rather you have to capture the existing platforms. There are two main types of platforms: Oil Rigs & Production platforms. Oil Rigs pump up oil which you can use for building defenses or strategic advantages like increased production speed or radar. The production platforms make a fixed number of naval or air units which are replenished if you loose them, to build a bigger army you simply need to control more production platforms. <br />
<br />
Defenses can be build around the production platform to protect them from enemy attacks, however Oil Rigs must be defended by units. As Oil Rigs are generally sparse and you need them to improve the protection of the production platforms defending these seems rather important.<br />
<br />
<b>Navigation</b><br />
Your units are associated by platforms, you can select part of the units to move to another platform either grouping there with other units or attacking it. It's possible to select a number of units (25, 50 or 100%) and to select the unit type. There are naval and air units, where air units can move freely over the map naval units are limited by geographical or artificial aspects like mountain ranges or city ruins.<br />
<br />
Enemies will exchange shots in the middle of the sea but large battles offside platforms seem impossible. Depending on the map platforms can be at more or less strategic points making it possible to attack platforms on the other side of the map without being intervened. In the single player campaign platforms seems to be placed more strategic.<br />
<br />
<div style="text-align: center"><img src="http://www.rork.nl/media/1/20120128-oilrush-minimap.png" width="452" height="200" alt="Oil Rush - Navigation" title="Oil Rush - Navigation" /></div><br />
Although the graphical interface can be used to select and direction units it's easiest to select the platforms on the minimap for this. When units are moving or attacking you can follow them in the 3D window. The combination allows you to both follow the battle and move around your units for reinforcement or defense. A hotkey is available to focus on the action.<br />
<br />
<b>First impression</b><br />
I only got to play little last days, so far the game seems pretty good. The whole game seems to be about control and balance. The option to attack remote platforms makes it important to keep an eye on defense. The lack of defensive structures for oil rigs make them vulnerable while you need them to improve your defenses and make more units available to capture new platforms.<br />
<br />
However this balance seems to be easily turned against you, once you are low on production platforms it seems to be hard to regain the lead as you have only few units to defend your own platforms and to conquer new ones.<br />
<br />
The world you play in is really detailed, production rigs have people moving around and battles are very lively. This seems to create a drain on computer resources though, my graphics card (a Radeon HD3200, which is way below recommended) has a hard time rendering the game on lowest graphical settings.<br />
<br />
<div style="text-align: center"><img src="http://www.rork.nl/media/1/20120128-oilrush-battle.png" width="452" height="190" alt="Oil Rush - Battle" title="Oil Rush - Battle" /></div><br />
<b>Availability</b><br />
Oil Rush is available for Linux, MAC, PS3 & Windows and can be bought directly from  <a href = "https://store.unigine.com/products/goods/oilrush/" target = "_blank" title = "Oil Rush | Unigene Online Store">Unigine</a>, Ubuntu Software Center, <a href = "http://www.desura.com/games/oil-rush" target = "_blank" title = "Oil Rush Windows, Linux game | Desura">Desura</a>, <a href = "http://store.steampowered.com/app/200390/" target = "_blank" title = "Oil Rush on Steam">Steam</a> and various other retailers.]]></description>
    <category>Games</category>
    <comments>xml-rss2.php?itemid=117</comments>
    <pubDate>Sat, 28 Jan 2012 12:16:46 +0100</pubDate>
</item><item>
    <title>destination Coop</title>
    <link>xml-rss2.php?itemid=116</link>
    <description><![CDATA[<a href = "http://www.destinationunreal.com/" target = "_blank" title = "destionation Unreal">Destionation Unreal</a> now hosts a new <a href = "http://www.destinationunreal.com/modules.php?name=Gameserver&type=detail&serverid=30" target = "_blank" title = "destination Unreal - Gamesever">Coop server</a> besides it's array of MH servers. The server is set up by iDeFiX and currently hosts the <a href = "http://onp.ut-files.com/" target = "_blank" title = "Operation NaPali Help">Operation NaPali Campaign</a>.<br />
<br />
Coop is a gametype derived from the original unreal series. You (and some friends) explore a strange and often hostile world on a story based quest. Compared to other gametypes the speed of these games is much lower, the maps look much better and fights are sparce but well thought of.<br />
<br />
If you like the old unreal, coop or just want to check out some new environments you can check out the server on <a href = "unreal://	85.14.203.180:5555" target = "_parent" title = "#### destination Coop #### [www.destinationunreal.com]">85.14.203.180:5555</a><br />
<br />
<img src="http://www.rork.nl/media/1/20111227-np14mclanedrpest.jpg" width="452" height="254" alt="np14mclanedrpest" title="np14mclanedrpest" />]]></description>
    <category>Unreal Tournament</category>
    <comments>xml-rss2.php?itemid=116</comments>
    <pubDate>Tue, 27 Dec 2011 18:37:45 +0100</pubDate>
</item><item>
    <title>Useful perl modules for writing a basic webspider</title>
    <link>xml-rss2.php?itemid=113</link>
    <description><![CDATA[Currently I'm developing my own webspider, the primary use is to index sites I want to follow but that don't offer RSS feeds. It's currently in a simple state where it downloads a page and extract the links. Yet for retrieving the page in a nice way and parsing the contexts I use a few modules I'd like to share.<br />
<br />
To retrieve the pages I use <a href = "https://metacpan.org/module/LWP::UserAgent" target = "_blank" title = "LWP::UserAgent - Web user agent class - metacpan.org">LWP::UserAgent</a> and <a href = "https://metacpan.org/module/HTTP::Request" target = "_blank" title = "HTTP::Request - HTTP style request message - metacpan.org">HTTP::Request</a>. I aim the webspider to be nice so I check the <a href = "http://www.rork.nl/index.php?itemid=5" title = "The Junkyard &raquo; Site indexing part 1: Robots.txt" target = "_blank">robots.txt</a> with <a href = "https://metacpan.org/module/WWW::RobotRules" target  = "_blank" title = "WWW::RobotRules - database of robots.txt-derived permissions - metacpan.org">WWW::RobotRules</a> and the <a href = "http://www.rork.nl/index.php?itemid=9" target = "_blank" title = "The Junkyard &raquo; Site indexing part 2: meta tags">robot meta tags</a> with <a href = "https://metacpan.org/module/HTML::TokeParser" target = "_blank" title = "HTML::TokeParser - Alternative HTML::Parser interface - metacpan.org">HTML::TokeParser</a>.<b>Retrieving the page</b><br />
Of course it's easy enough to use <a href = "https://metacpan.org/module/LWP::Simple" target = "_blank" title = "LWP::Simple - Simple procedural LWP interface">LWP::Simple</a> to retrieve a page but with <a href = "https://metacpan.org/module/LWP::UserAgent" target = "_blank" title = "LWP::UserAgent - Web user agent class - metacpan.org">LWP::UserAgent</a> I can retrieve better error messages, use cookies and add identification information. For the request is constructed with <a href = "https://metacpan.org/module/HTTP::Request" target = "_blank" title = "HTTP::Request - HTTP style request message - metacpan.org">HTTP::Request</a> it also allows to send POST requests.<br />
<br />
Initialize the user agent and add information:<br />
<div class = "code"><br />
use LWP::UserAgent;<br />
<br />
my $ua = LWP::UserAgent->new();<br />
      $ua->agent('ExBot/alpha/' . $ua->_agent);<br />
      $ua->cookie_jar({ file => "$ENV{HOME}/.tmp/cookies.txt" });<br />
</div><br />
<br />
A standard GET request<br />
<div class = "code"><br />
use HTTP::Request;<br />
<br />
my $url = "http://www.example.com/?fu=bar";<br />
my $request = HTTP::Request->new("GET", $url);<br />
</div><br />
<br />
A HEAD request<br />
<div class = "code"><br />
use HTTP::Request;<br />
<br />
my $url = "http://www.example.com/";<br />
my $param = "fu=bar";<br />
my $request = HTTP::Request->new("HEAD", $url);<br />
      $request->content_type('application/x-www-form-urlencoded');<br />
      $request->content($param);<br />
</div><br />
<br />
Retrieving the page and check for errors<br />
<div class = "code"><br />
my $response = $ua->request($request);<br />
<br />
if ($response->is_success()) {<br />
  print $response->content;<br />
}<br />
else {<br />
  print "An error occured: " . $response->status_line() . "\n";<br />
}<br />
</div><br />
<br />
<b>Checking the robots.txt</b><br />
To check the robots.txt I have to download it and check the url I want to visit against it. To download I use the code above but I have to find it first. The location is easy to guess: http://&lt;domain&gt;/robots.txt, I use <a href = "https://metacpan.org/module/URI" target ="_blank" URI - Uniform Resource Identifiers (relative and absolute) - metacpan.org">URI</a> to construct the url. Once I have the robots.txt I use <a href = "https://metacpan.org/module/WWW::RobotRules" target  = "_blank" title = "WWW::RobotRules - database of robots.txt-derived permissions - metacpan.org">WWW::RobotRules</a> to check the url I want to download against it.<br />
<br />
Constructing the url<br />
<div class = "code"><br />
use URI;<br />
<br />
my $url = "http://www.example.com/?fu=bar";<br />
# url of the page I want to index<br />
my $p_uri = URI->new($url);<br />
# url of the robots.txt<br />
my $r_uri = URI->new();<br />
<br />
$r_uri->scheme("http");<br />
$r_uri->host($p_uri->host());<br />
$r_uri->path("robots.txt");<br />
</div><br />
<br />
Checking the url against the rules<br />
<div class = "code"><br />
use WWW::RobotRules;<br />
<br />
my $url = "http://www.example.com/?fu=bar";<br />
my $robots_txt = "user-agent: *\ndisallow: /tmp\n";<br />
<br />
if (defined($robots_txt)) {<br />
  my $rules = WWW::RobotRules->new("ExBot");<br />
        $rules->parse($r_uri->as_string, $robots_txt);<br />
 <br />
   print $rules->allowed($url);<br />
}<br />
else {<br />
  print "Error retrieving robots.txt\n";<br />
}<br />
</div><br />
A disadvantage of using <a href = "https://metacpan.org/module/WWW::RobotRules" target  = "_blank" title = "WWW::RobotRules - database of robots.txt-derived permissions - metacpan.org">WWW::RobotRules</a> is that it prints an error if a line is not a valid robots.txt line, this might result in a lot of errors.<br />
<br />
<b>Checking the robot metatags</b><br />
<a href = "https://metacpan.org/module/HTML::TokeParser" target = "_blank" title = "HTML::TokeParser - Alternative HTML::Parser interface - metacpan.org">HTML::TokeParser</a> jumps through a HTML page from tag to tag, this might be inefficient if you want to retrieve specific information from a specific page but if you want to analyze a specific tag from any page it's a really easy solution.<br />
<br />
Retrieve the robot meta tags<br />
<div class = "code"><br />
use HTML::TokeParser;<br />
<br />
my $stream = HTML::TokeParser->new(\$content);<br />
<br />
while (my $tag = $stream->get_tag("meta")) {<br />
  if (exists($tag->[1]{name}) and lc($tag->[1]{name}) eq "robots") {<br />
    if (exists($tag->[1]{content})) {<br />
      my $content = $tag->[1]{content};<br />
      if ($content =~ m/noindex/i) {<br />
        print "Not allowed to index this page\n";<br />
      }<br />
      if ($content =~ m/nofollow/i) {<br />
        print "Not allowed to follow links on this page\n";<br />
      }<br />
      last;<br />
    }<br />
  }<br />
 }<br />
</div><br />
<br />
<b>Other interesting modules</b><br />
These are the modules I currently use but there is some other interesting stuff out there. For example <a href = "https://metacpan.org/module/LWP::RobotUA" target = "_blank" title = "LWP::RobotUA - A class for well-behaved Web robots - metacpan.org">LWP::RobotUA</a> combines <a href = "https://metacpan.org/module/LWP::UserAgent" target = "_blank" title = "LWP::UserAgent - Web user agent class - metacpan.org">LWP::UserAgent</a> with automatically checking the robots.txt and adds timeouts between visiting pages. Another advanced module to retrieve pages is <a href = "https://metacpan.org/module/WWW::Mechanize" target = "_blank" title = "WWW::Mechanize - Handy web browsing in a Perl object - metacpan.org">WWW::Mechanize</a> and there are a number of modules for HTML parsing that I want to look into later.]]></description>
    <category>Perl</category>
    <comments>xml-rss2.php?itemid=113</comments>
    <pubDate>Sun, 6 Nov 2011 11:35:25 +0100</pubDate>
</item><item>
    <title>Targeted spam</title>
    <link>xml-rss2.php?itemid=111</link>
    <description><![CDATA[I think almost everybody is aware these days that major companies or websites like Google, Facebook, and Amazon are creating profiles of their costumers in order to show targeted adds. I don't know if that's a bad thing, for example if someone sends you a wish list you can just pick up the item from the advertisements next to the list.<br />
<br />
Another way of advertisement is sending/posting plain spam, and since I removed the recaptcha and changed to commitcontrol on my weblog I get to see all spam messages that are posted, besides that I also check my filtered emails so I know what I'm spammed about. The email, and now the spam comments made me wonder, is there something like targeted spam?Back in the days I got bombarded with email about a certain kind of pills, the occasional hot chick who wanted to date, software offers, nigerian spam and stuff like that. Nowadays these messages are, I think, only a minor part of the daily spam and I receive a lot of job offers instead. I think this started when I was looking for a job and hence visited some vacancy sites. Maybe it's related or maybe it's the general economy gone bad and many people receive these messages nowadays.<br />
<br />
However, for my holiday I had to buy some stuff so I visited the website of a local store and some brands quite frequently. Now I get offers for these type of goods into the comments on this website. And this stuff is totally unrelated to the topics I post about here (as is most spam).<br />
<br />
Coincidence or not, it does make you wonder: can spammers know what you like and which sites you visited?. I think there are a couple of methods to find out this information: 1) just buy profiles with email addresses, 2) build a malicious site that checks for cookies of other websites (this shouldn't be possible normally), 3) build a malicious site that reads the browser history (this was possible through a bug, it might be fixed now) 4) use google to find whatever information is known on blogs, facebook, forums etc.<br />
<br />
Regarding the spam I receive I think methods 1-3 are most likely, although 4 is quite well possible. There is only one way to test this though and that is to feed (false) information into the system. So starting with the fourth method: I've already got a fine job, so please stop spamming offers.]]></description>
    <category>Spam</category>
    <comments>xml-rss2.php?itemid=111</comments>
    <pubDate>Tue, 25 Oct 2011 21:48:09 +0200</pubDate>
</item><item>
    <title>Nestor10 webspider FAQ</title>
    <link>xml-rss2.php?itemid=109</link>
    <description><![CDATA[As development on my own webspider continues and now reached a stage where the basics seem to be working well and it's ready for a larger implementation it is time to put up some basic information about it. I will write about the technology later, this article is intended for administrators who find the user-agent in their logs and are wondering about it.<b>What's Nestor10's full name?</b><br />
The naming scheme is: Nestor10/&lt;bot version&gt;/libwww-perl/&lt;backend version&gt;<br />
<br />
The current user agent is: Nestor10/alpha/libwww-perl/6.02<br />
<br />
<b>What is Nestor10's purpose?</b><br />
I currently use the webspider to collect news and information I want for both professional and personal use. This means that it will visited targeted sites for veterinarians and poultry health. My first goal for the bot was to provide personal rss feeds for sites that don't supply these themselves. I'm currently working on a front-end to share this information with a small group of colleagues.<br />
<br />
<b>Is Nestor10 malicious? e.g. posting spam or collecting emails?</b><br />
No, Nestor10 only collects links to news items.<br />
<br />
<b>How often will Nestor10 visit my site?</b><br />
That depends on how often I execute the script, currently this is about daily. For most pages there's a minimum of 6 hours between visits, non-news sites will be visited about weekly. If there's more then one page on a domain that is indexed the time between visits is at least 5 seconds.<br />
<br />
<b>Does Nestor10 support robot exclusion protocols?</b><br />
No, in it's alpha stage it doesn't, this is because the use of the information it generates is currently for private use only. It will however scan for the robots.txt and robot meta tag. When the collected data will be made available to a wider public the robot exclusion protocols will be supported.<br />
<br />
<b>What can I do to block Nestor10?</b><br />
Please send me an email specifying the url or website you want to have removed, my email address is sent in the webspiders headers. You can also add the <a href = "http://www.rork.nl/index.php?itemid=9" target = "_blank" title = "The Junkyard &raquo; Site indexing part 2: Meta tags">robot meta tags</a> or <a href = "http://www.rork.nl/index.php?itemid=5" target = "_blank" title = "The Junkyard &raquo; Site indexing part 1: Robots.txt">robots.txt</a>. Nestor10 will look for the following strings in the user agent: nestor10 and *.<br />
<br />
<b>What are your plans for Nestor10?</b><br />
In time my plan is primarily to expand the number of sites Nestor10 indexes and to provide this information to others as a rss like service: only news headlines will be shown and linked to the original articles.]]></description>
    <category>Project 515</category>
    <comments>xml-rss2.php?itemid=109</comments>
    <pubDate>Sat, 22 Oct 2011 11:12:26 +0200</pubDate>
</item><item>
    <title>NP_LatestComments v1.85</title>
    <link>xml-rss2.php?itemid=107</link>
    <description><![CDATA[I just released <a href = "http://wakka.xiffy.nl/latestcomments" title = "Nucleus wiki - LatestComments">NP_LatestComments v1.85</a> which is a modification of the mod previously build or modified by anand, moraes, admun, e-Musty, PiyoPiyoNaku. This plugin for <a href = "http://www.nucleuscms.org/" title = "Nucleus CMS">Nucleus CMS</a> shows the latest comments for all or just one blog. <br />
<br />
I added support for the <a href = "http://wakka.xiffy.nl/bbcode" target = "_blank" title = "bbcode [Wiki:NucleusCMS]">NP_BBCode plugin</a>, different odd/even comments and fixed a bug in breaking comments by word.<br />
<br />
For more information about using the plugin and downloads see the <a href = "http://wakka.xiffy.nl/latestcomments" title = "Nucleus wiki - LatestComments">wiki page</a>.<br />
<br />
I hope people will find these changes useful, comments can be left below (if I fixed that) or in the <a href="http://forum.nucleuscms.org/viewtopic.php?p=95986" title = "Nucleus Support Forum - NP_LatestComments">forum thread</a>.]]></description>
    <category>Nucleus CMS</category>
    <comments>xml-rss2.php?itemid=107</comments>
    <pubDate>Sat, 8 Oct 2011 13:36:29 +0200</pubDate>
</item><item>
    <title>DuckDuckGo</title>
    <link>xml-rss2.php?itemid=105</link>
    <description><![CDATA[In <a href = "http://www.rork.nl/index.php?itemid=102" target = "_parent" title = "The Junkyard &raquo; Best waves: februari 2011">februari's "Best Waves"</a> I shortly wrote about the search engine <a href = "http://www.duckduckgo.com" target = "_blank" title = "DuckDuckGo">DuckDuckGo</a>. I've been using it on and off since and the search engine has been developed and improved ever since. Some improvements are really welcome for my normal (programming) searches.<br />
<br />
The way it handles searches and prints information is quite different from the way Google (tries) to do it. Not only does it claim not to track your searches, it also tries to answer your question itself rather then linking to a page. There are a couple of really interesting additions that I'd like to write about.<b>Search results</b><br />
Every search engine is judged based on the search results and DuckDuckGo's are pretty good, usually it shows me the sites I'm looking for: sites that actually show the info and not sites that link to other sites. Moreover if I look for a certain brand DuckDuckGo usually shows the brands official website, sometimes even with an "official website" icon. If no or few results are shown links to google and bing are provided.<br />
<br />
<b>Zero click info</b><br />
 This is probably the most important reason that I like DuckDuckGo and, I think, one of the parts where most developing is going. With zero click info DuckDuckGo tries to give you the answer to your question rather then a link to the answer to your question. E.g. when I search for <a href = "https://duckduckgo.com/?q=php+mysql_connect" target = "_blank" title = "php mysql_connect (PHP) at DuckDuckGo">"php mysql_connect"</a> the syntax of this function is showed followed by the normal search results, google however just shows the search results. Zero click info has many sources like <a href = "https://duckduckgo.com/?q=albert+einstein" target = "_blank" title = "albert einstein at DuckDuckGo">wikipedia</a>, <a href = "https://duckduckgo.com/?q=temperature+in+deventer" target = "_blank" title = "Temperature in deventer at DuckDuckGo">weather sites</a>, <a href = "https://duckduckgo.com/?q=define+manifesting" target = "_blank" title = "define manifesting at DuckDuckGo">dictionaries</a>, <a href = "https://duckduckgo.com/?q=php+mysql_connect" target = "_blank" title = "php mysql_connect (PHP) at DuckDuckGo">tech info sites</a> etc. and are triggered by certain search terms.<br />
<br />
<b>Bang tags</b><br />
<a href = "https://duckduckgo.com/bang.html" target = "_blank" title = "Duck Duck !Bang">Bang tags</a> are short words starting with a bang (!) that let you use other search engines or websites directly trough the DuckDuckGo searchform. For example "!google HTML::TokeParser" will look up the perl module HTML::TokeParser directly on google, while "!cpan HTML::TokeParser" will lead directly to the CPAN website. This is really useful in the search form in the navigation toolbar of my browser, I no longer have to change the search engine for specific information.<br />
<br />
Besides these useful services there are lots of <a href = "https://duckduckgo.com/goodies.html" target = "_blank" title = "DuckDuckGoodies">goodies</a> that are useful. I hope development of this search engine will continue and will show relevant zero click information about even more topics.]]></description>
    <category>General</category>
    <comments>xml-rss2.php?itemid=105</comments>
    <pubDate>Sat, 24 Sep 2011 11:28:25 +0200</pubDate>
</item><item>
    <title>DHL Delivery notifications</title>
    <link>xml-rss2.php?itemid=103</link>
    <description><![CDATA[Lately I've been working with PHP primarily, I'd almost forgotten how much fun it actually is to do a small perl script for practical extraction and reporting that is not entirely useless. The other day I was expecting a package from Germany which had a <a href = "http://www.dhl.com/en/mail/mail_essentials/track_trace.html" title = "DHL | Track & Trace">Track & Trace</a> code. I regularly checked the page to see when the package could be expected but then I thought of another approach. Instead of checking the page I wrote a small script that checks the pages and uses the KDE notification system to keep me up to date.<br />
<br />
<div style="text-align: center"><img src="http://www.rork.nl/media/1/20110824-dhlnotify.png" width="460" height="142" alt="Track &amp; trace website" title="Track &amp; trace website" /></div>The script uses three modules: <a href = "http://search.cpan.org/~gaas/libwww-perl-6.02/lib/LWP/Simple.pm" title = "LWP::Simple - search.cpan.org" target = "_blank">LWP::Simple</a> and <a href = "http://search.cpan.org/~gaas/HTML-Parser-3.68/lib/HTML/TokeParser.pm" title = "HTML::TokeParser - search.cpan.org" target = "_blank">HTML::TokeParser</a> for downloading and analyzing the page and <a href = "http://search.cpan.org/~sacavilia/Desktop-Notify-0.03/lib/Desktop/Notify.pm" title = "Desktop::Notify = search.cpan.org" target = "_blank">Desktop::Notify</a> to pop up the notification. A log file is used to check whether the message had been delivered before. I use a cron job to execute the script every hour.<br />
<br />
<b>Grabbing and analyzing the page</b><br />
With LWP::Simple it's easy to get the page, instead of hard coding the track & trace identification it's parsed to the script as the first commandline option. This makes it easier to use the script again in the future. I also read the logfile for past messages.<br />
<div class = "code"><br />
#!/usr/bin/perl<br />
<br />
use strict;<br />
use warnings;<br />
use Desktop::Notify;<br />
use LWP::Simple;<br />
use HTML::TokeParser;<br />
<br />
my $sn = shift;<br />
<br />
my $logfile = "/home/rork/Scripts/dev/logs/dhl.log";<br />
<br />
my $url = "http://nolp.dhl.de/nextt-online-public/set_identcodes.do?lang=en&zip=00823&idc=" . $sn;<br />
<br />
my $page = get($url) or die $!;<br />
<br />
my %log;<br />
<br />
open(LOG, "<", $logfile) or die "Can't open $logfile for reading: $!";<br />
while(<LOG>) {<br />
  chomp($_);<br />
  my ($serial, $date, $action) = split(/\//, $_, 3);<br />
  $log{$serial}->{$date} = $action;<br />
}<br />
close(LOG);<br />
</div><br />
The HTML of the target page is stored in $page, now check the source (either through your browser or just print $page while developing the script) and see where the relevant information is stored. In this case in a table of the class 'full eventList'. The table consist of a header and a series of rows with date, city and status.<br />
<br />
Using HTML::TokeParser it's easy to iterate over the tables until the table with the correct class. Then skip the first row and get the next rows. The get_trimmed_text() method trims all the white space at the beginning and the end of the string.<br />
<div class = "code"><br />
my $stream = HTML::TokeParser->new(\$page);<br />
<br />
while (my $tag = $stream->get_tag("table")) {<br />
  next unless ($tag->[1]{class} and $tag->[1]{class} eq 'full eventList');<br />
  # skip the header<br />
  $tag = $stream->get_tag("tr");<br />
  while ($tag = $stream->get_tag("tr")) {<br />
    my ($date, $action) = ("", "");<br />
    $tag = $stream->get_tag("td");<br />
    if ($tag->[1]{class} and $tag->[1]{class} eq "event_date") {<br />
	$date = $stream->get_trimmed_text();<br />
    }<br />
    $tag = $stream->get_tag("td");<br />
    $tag = $stream->get_tag("td");<br />
    if ($tag->[1]{class} and $tag->[1]{class} eq "status lasttd") {<br />
	$tag = $stream->get_tag("div");<br />
	$action = $stream->get_trimmed_text();<br />
    }<br />
</div><br />
<B>Sending the notification</B><br />
Now I should have the date and action. If so and the action has not been sent to the notifier before (I use the date as an identification here) I should sent the notification and update the log.<br />
<div class = "code"><br />
    if ($date ne "" and $action ne "") {<br />
      # print "$sn/$date/$action\n";<br />
      if (!exists($log{$sn}->{$date})) {<br />
        my $notify = Desktop::Notify->new();<br />
        my $notification = $notify->create(body => $action);<br />
           $notification->show();<br />
<br />
<br />
	$log{$sn}->{$date} = $action;<br />
      }<br />
    }<br />
  }<br />
}<br />
</div><br />
Now save the log again, the DHL messages don't contain newlines. If they did I had to escape them.<br />
<div class = "code"><br />
open(LOG, ">", $logfile) or die "Can't open $logfile for writing: $!";<br />
foreach my $serial(keys %log) {<br />
  foreach my $date(keys %{$log{$serial}}) {<br />
    print LOG join("/", $serial, $date, $log{$serial}->{$date}) . "\n";<br />
  }<br />
}<br />
close(LOG);<br />
</div><br />
<div style="text-align: center"><img src="http://www.rork.nl/media/1/20110824-dhlnotify2.png" width="460" height="176" alt="Update notification" title="Update notification" /></div><br />
<br />
<b>Setting up a cron job</b><br />
Desktop::Notify requires an X environment to run, cron however doesn't use X so when I first added the script to cron I got an error. This can be solved by specifying the Display to use before the script: start with DISPLAY=:0. Note the number after the script name, this is the id of the package that has been sent.<br />
<div class = "code"><br />
0 * * * * DISPLAY=:0 perl -w /home/rork/Scripts/dev/dhl-notify.pl 467509123888 >/dev/null 2>>/dev/null<br />
</div>]]></description>
    <category>Perl</category>
    <comments>xml-rss2.php?itemid=103</comments>
    <pubDate>Wed, 24 Aug 2011 13:02:20 +0200</pubDate>
</item><item>
    <title>Best waves: februari 2011</title>
    <link>xml-rss2.php?itemid=102</link>
    <description><![CDATA[It's time to pick up the boards again and explore some new and unknown beaches. This months waves contain a couple of beaches I think are worth exploring but also a couple of new boards to explore them. Search engines, linux and web development are all featured this month but let's start with something different.<b>Writing</b><br />
I'm fairly satisfied with the articles I wrote this year except for the best waves of last month which was a bit too much a list of links and not enough content. So let's start of with <a href = "http://blog.tabini.ca/" title = "The Accidental Businessman" target = "_blank">Marco Tissini's</a> writings on <a href = "http://blog.tabini.ca/2011/02/writing-101/?utm_source=rss&utm_medium=rss&utm_campaign=writing-101" title = "On writing (better) | The Accidental Businessman" target = "_blank">writing</a>. In his post he puts the emphasis on the process of preparation before starting to write and puts a side note on punctuation. I think that is a pretty general rule on writing but something easily forgotten, for example when you write a monthly post like this.<br />
<br />
Some totally different writing is building a content management system. <a href = "http://bergie.iki.fi/" title = "Henri Bergius: Bergie's Home Page and Weblog" target = "_blank">Henrie Bergius</a> wrote about <a href = "http://bergie.iki.fi/blog/decoupling_content_management/" title = "Henri Bergius: Weblog: Decoupling Content Management">decoupling content management</a> describing a system and some tools to have separate and interlocking data storage, admininstratien and user front end. I think this is a very interesting approach of building content management systems and other tools. Most current tools have everything in one package and little choice in the tools and database you have to use. Being able to choose your own templating system, a database you have available and a nice and complete admin tool might come in very handy.<br />
<br />
<b>Search engines</b><br />
But let's go to exploring. <a href = "http://freakaboutlinux.wordpress.com/" title = "Ryan Macnish" target = "_blank">Ryan Macnish</a> was looking for an <a href = "http://freakaboutlinux.wordpress.com/2011/02/24/alternative-search-engines/" title = "Alternative Search Engines &raquo; Ryan Macnish" target = "_blank">alternative search engine</a> and found two search engines to his likings: <a href = "http://duckduckgo.com/" title = "DuckDuckGo" target = "_blank">DuckDuckGo</a> and <a href = "http://blekko.com/" title = "Blekko | slashtag search">Blekko</a>. I'm currently testing DuckDuckGo as default search engine and it seems to show relevant results like official websites first rather then comparison sites or paid adds. Besides that it shows additional information like a summery from <a href = "http://www.wikipedia.org/" title = "Wikipedia" target = "_blank">Wikipedia</a>, a link to a companies main website and offers lists about the subject. I haven't started exploring features like <a href = "https://duckduckgo.com/bang.html" title = "Duck Duck !bang" target = "_blank">!bang tags</a> yet but I'm currently pretty satisfied with it. I didn't try Blekko but it aims at removing <a href = "https://secure.wikimedia.org/wikipedia/en/wiki/Content_farms" title = "Content Farm - Wikipedia, the free encyclopedia" target = "_blank">content farms</a> from their search results.<br />
<br />
Coincidentally <a href = "http://googleblog.blogspot.com/" title = "Official Google Blog" target = "_blank">Google</a> announced a <a href = "http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html" title = "Official Google Blog: Finding more high-quality sites in search" target = "_blank">change</a> in their algorithm to show the best results. This change also focuses on giving content farms a lower ranking and showing more original content. The changes are only available in the US now but will be rolled out over the rest of the world. Their own research shows that a lot of sites that rank lower were actually unwanted results so I'm pretty anxious to see this come to the Netherlands.<br />
<br />
<b>Linux</b><br />
There are three interesting articles on exploring linux. The first is an explainer by <a href = "http://lifehacker.com/" title = "Lifehacker, tips and downloads for getting things done" target = "_blank">lifehacker</a> about the most popular <a href = "http://lifehacker.com/#!explainer/5762081" title = "explainer - lifehacker" target = "_blank">desktop environments</a> for linux: <a href = "http://www.kde.org/" title = "KDE - Experience Freedom" target = "_blank">KDE</a>, <a href = "http://www.gnome.org/" title = "GNOME: The free software desktop project" target = "_blank">GNOME</a> and <a href = "http://www.xfce.org/" title = "Xfce desktop environment" target = "_blank">Xfce</a>. The article covers testing, installation and the key features, design goals and targeted audience of these desktop environments. Personally I'd like to see they also featured <a href = "http://lxde.org/" title = "LXDE.org | Lightweight X11 Desktop Environment" target = "_blank">LXDE</a> which I use as a lightweight desktop if I need one but I have to admit it's less developed then the other three.<br />
<br />
If you want to go beyond desktop environments and test a complete distribution you'd might be interested in a howto from <a href = "http://www.webupd8.org/" title = "Web Upd8: Ubuntu / Linux blog" target = "_blank">Web Upd8</a>. It explains how to setup grub to <a href = "http://www.webupd8.org/2011/02/how-to-boot-iso-with-grub2-easy-way.html" title = "How To Boot An ISO With GRUB2 (The Easy Way!) ~ Web Upd8: Ubuntu / Linux blog" target = "_blank">boot an ISO</a>. <a href = "http://unetbootin.sourceforge.net/" title = "Unetbootin - Homepage and Downloads" target = "_blank">Unetbootin</a> is used to extract and start the ISO straight from GRUB2. I haven't tried this yet for I have plenty of partitions for testing distro's but it looks pretty interesting.<br />
<br />
Going beyond software towards testing new hardware <a href = "http://www.omgubuntu.co.uk/ " title = "OMG! Ubuntu! | Everything Ubuntu. Daily. " target = "_blank">OMG! Ubuntu </a> wrote about a <a href = "http://www.omgubuntu.co.uk/2011/02/canonical-launch-list-of-ubuntu-compatible-pc-components/" title = "Cannonical announce list of Ubuntu-Compatible PC Components" target = "_blank">list of Ubuntu compatible PC Components</a> that has been released. The <a href = "http://www.ubuntu.com/certification/catalog" title = "Component catalog | Ubuntu" target = "_blank">catalog</a> can be searched or browsed by brand or category. If you're looking for a new piece of hardware and want to be sure it's supported by Ubuntu it might be safe to pick something out of this list.<br />
<br />
<b>Tools</b><br />
<a href = "http://thinkmoult.com/" title = "thinkMoult - Seriously whoever reads this description" target = "_blank">Dion Moult</a> wrote about <a href = "http://thinkmoult.com/2011/02/21/syncing-kontact-with-android/" title = "thinkMoult - Syncing Kontact with Android" target = "_blank">syncing Kontact with android</a>. There are as of now no tools to directly sync android with <a href = "http://userbase.kde.org/Kontact" title = "Kontact - KDE UserBase" target = "_blank">Kontact</a> however it's possible to sync Android with a Google account and the Google account with Kontact. It's a bit of a detour but it works. The article features syncing of contacts, agenda and todo list. I already set up agenda syncing month ago and I'm quite happy with that.<br />
<br />
<a href = "http://linuxgrandma.blogspot.com/" title = "Linux Grandma" target = "_blank">Valorie Zimmerman</a> wrote a short blogpost about <a href = "http://linuxgrandma.blogspot.com/2011/02/pastebin-and-pastebinit.html" title = "Linux Grandma: Pastebin, and pastebinit" target = "_blank">pastebinit</a> and how to use it to paste terminal output directly to a pastebin service. In the comments a number of similar tools.]]></description>
    <category>Best waves</category>
    <comments>xml-rss2.php?itemid=102</comments>
    <pubDate>Mon, 28 Feb 2011 13:37:18 +0100</pubDate>
</item><item>
    <title>Simple implementation of the robot meta tag in Nucleus CMS</title>
    <link>xml-rss2.php?itemid=99</link>
    <description><![CDATA[The first series I wrote on this weblog was about <a href = "http://www.rork.nl/index.php?catid=2" title = "The Junkyard &raquo Category: Web spiders" target = "_blank">Web spiders</a> covering their control and my views on how a site should be indexed. After installation I set up the robots.txt, however due to the way urls are made this has some limitations. <a href = "http://www.rork.nl/index.php?itemid=9&catid=2" title = "The Junkyard &raquo Site indexing part 2: Meta tags" target = "_blank">Meta tags</a> would be an excellent addition to this but I never got to implementing them into the CMS until now.<br />
<br />
I expected that adding the metatags would mean changing the Nucleus CMS core or writing a plugin but it was much easier. There are conditional statements which can be used in the skins which allows to add different code for different skins among other things.Because the articles are my main content and the other pages are important for navigation only the setup for the metatags is really easy. Articles will have <i>follow,index</i> and all other pages <i>follow,noindex</i>. Because these pages use different skins I use a conditional statement on skintype to set the meta tag conten.<br />
<br />
<div class = "code"><br />
&lt;%if(skintype,item)%&gt;<br />
&lt;META NAME = "robots" CONTENT = "follow,index"&gt;<br />
&lt;%else%&gt;<br />
&lt;META NAME = "robots" CONTENT = "follow,noindex"&gt;<br />
&lt;%endif%&gt;<br />
</div><br />
<br />
In the default skin you can add this code to head.inc so it will be used on every html page.]]></description>
    <category>Nucleus CMS</category>
    <comments>xml-rss2.php?itemid=99</comments>
    <pubDate>Fri, 25 Feb 2011 12:23:09 +0100</pubDate>
</item>
  </channel>
</rss>
