When I saw Adam Barr’s “Monad and RSS” posts (Part I, II, III), I thought of something I’ve wanted to have recently: a utility to parse through my OPML file and tell me what feeds were dead or not updated for months.
Monad is the up and coming command line shell in Windows. Adam’s samples made XML in Monad look easy, so I thought I could bang out a quick script…
… then I discovered the pain of umpteen different syndication formats, each with their own quirky implementations. I am now amazed there are any working feed readers in existence. Why does simple stuff always get so difficult?
Still, I have something that works “okish”. The script tells me about dead and deserted feeds. There are some quirks. For instance, Ted Neward’s <pubDate> element refuses to parse into a DateTime in Monad. I blame it on the JSP extension in Ted’s URL.
The exception handling took some getting used to – but is great to have in script. On the other hand, I kept turning things into strings when I didn’t want strings. I stumbled into the right format to use for namespace-qualified elements, but I do like the way it works. I didn’t want to throw XmlNamespaceManager mumbo jumbo into script.
I’d like to see someone who actually knows Monad take the script and turn it into one of those 5 line masterpieces full of pipe symbols and regular expressions.
$opmldoc = [xml]$(get-content $args[0])
$webclient = new-object System.Net.WebClient
$cutoff = [DateTime]::get_Now().AddDays(-30)
foreach($feed in $opmldoc.opml.body.outline)
{
$date = $null
$doc = $null
trap [System.Net.WebException]
{
write-host "Web error fetching feed for " $feed.title
write-host " Error: " $_.Exception.Status
continue
}
trap [System.Exception]
{
write-host "Choked on: " $feed.title
continue
}
#because of goofy leading chars in msdn feed
$raw = $webclient.DownloadString($feed.xmlUrl)
if($raw -ne $null)
{
$doc = [xml]$raw.SubString($raw.IndexOf("<"))
}
#see if this is rss
if($doc.rss -ne $null)
{
if($doc.rss.channel.item[0].pubDate -ne $null)
{
# uses <pubDate>
# sort items by date and pick the most recent
$date = [DateTime](
$doc.rss.channel.item |
sort-object @{ e = { [DateTime]$_.pubDate }; asc=$false}
)[0].pubDate
}
if($doc.rss.channel.item[0].{dc:date} -ne $null)
{
# uses <dc:date>
$date = [DateTime](
$doc.rss.channel.item |
sort-object @{ e = { [DateTime]$_.{dc:date} }; asc=$false}
)[0].{dc:date}
}
# if we still don't have a date, try <lastBuildDate>
if($date -eq $null)
{
$date = [DateTime]$doc.rss.channel.lastBuildDate
}
}
# check for RDF
elseif($doc.{rdf:RDF} -ne $null)
{
$date = [DateTime](
$doc.{rdf:RDF}.item |
sort-object @{ e = { [DateTime]$_.{dc:date} }; asc=$false}
)[0].{dc:date}
}
# check for ATOM
elseif($doc.feed -ne $null)
{
$date = [DateTime](
$doc.feed.entry |
sort-object @{ e = { [DateTime]$_.issued }; asc=$false}
)[0].issued
}
if($date -eq $null)
{
write-host "Did not parse date from " $feed.title
}
elseif($date -lt $cutoff)
{
write-host "Stale feed alert!! : " $feed.title
}
}