Pages: [1]
Author Topic: DHL Delivery notifications  (Read 2296 times)
« on: August 24, 2011, 12:02:20 pm »

'Lately I've been working with PHP primarily, I'd almost forgotten how much fun it actually is to do a small perl script for practical extraction and reporting that is not entirely useless. The other day I was expecting a package from Germany which had a Track & Trace code. I regularly checked the page to see when the package could be expected but then I thought of another approach. Instead of checking the page I wrote a small script that checks the pages and uses the KDE notification system to keep me up to date.

The script uses three modules: LWP::Simple and HTML::TokeParser for downloading and analyzing the page and Desktop::Notify to pop up the notification. A log file is used to check whether the message had been delivered before. I use a cron job to execute the script every hour.

Grabbing and analyzing the page
With LWP::Simple it's easy to get the page, instead of hard coding the track & trace identification it's parsed to the script as the first commandline option. This makes it easier to use the script again in the future. I also read the logfile for past messages.


use strict;
use warnings;
use Desktop::Notify;
use LWP::Simple;
use HTML::TokeParser;

my $sn = shift;

my $logfile = "/home/rork/Scripts/dev/logs/dhl.log";

my $url = "" . $sn;

my $page = get($url) or die $!;

my %log;

open(LOG, "<", $logfile) or die "Can't open $logfile for reading: $!";
while() {
  my ($serial, $date, $action) = split(/\//, $_, 3);
  $log{$serial}->{$date} = $action;

The HTML of the target page is stored in $page, now check the source (either through your browser or just print $page while developing the script) and see where the relevant information is stored. In this case in a table of the class 'full eventList'. The table consist of a header and a series of rows with date, city and status.

Using HTML::TokeParser it's easy to iterate over the tables until the table with the correct class. Then skip the first row and get the next rows. The get_trimmed_text() method trims all the white space at the beginning and the end of the string.

my $stream = HTML::TokeParser->new(\$page);

while (my $tag = $stream->get_tag("table")) {
  next unless ($tag->[1]{class} and $tag->[1]{class} eq 'full eventList');
  # skip the header
  $tag = $stream->get_tag("tr");
  while ($tag = $stream->get_tag("tr")) {
    my ($date, $action) = ("", "");
    $tag = $stream->get_tag("td");
    if ($tag->[1]{class} and $tag->[1]{class} eq "event_date") {
   $date = $stream->get_trimmed_text();
    $tag = $stream->get_tag("td");
    $tag = $stream->get_tag("td");
    if ($tag->[1]{class} and $tag->[1]{class} eq "status lasttd") {
   $tag = $stream->get_tag("div");
   $action = $stream->get_trimmed_text();

Sending the notification
Now I should have the date and action. If so and the action has not been sent to the notifier before (I use the date as an identification here) I should sent the notification and update the log.

    if ($date ne "" and $action ne "") {
      # print "$sn/$date/$action\n";
      if (!exists($log{$sn}->{$date})) {
        my $notify = Desktop::Notify->new();
        my $notification = $notify->create(body => $action);

   $log{$sn}->{$date} = $action;

Now save the log again, the DHL messages don't contain newlines. If they did I had to escape them.

open(LOG, ">", $logfile) or die "Can't open $logfile for writing: $!";
foreach my $serial(keys %log) {
  foreach my $date(keys %{$log{$serial}}) {
    print LOG join("/", $serial, $date, $log{$serial}->{$date}) . "\n";

Setting up a cron job
Desktop::Notify requires an X environment to run, cron however doesn't use X so when I first added the script to cron I got an error. This can be solved by specifying the Display to use before the script: start with DISPLAY=:0. Note the number after the script name, this is the id of the package that has been sent.

0 * * * * DISPLAY=:0 perl -w /home/rork/Scripts/dev/ 467509123888 >/dev/null 2>>/dev/null
Pages: [1]
Jump to: