Of Orcs and Software Craftsmanship

Monday, January 27, 2014

Sometimes people will approach me and ask “what is it like to be a software craftsman?”. I’ll usually answer with: “Pffft, don’t ask me, I just cranked out 500 lines of script that are harder to read than Finnegans Wake”. At times though, I like to pretend what it might be like to be a software craftsperson.

For example, a few Saturdays ago I was excited about a new product and found myself chipping away at some features that would require parsing XML files. XML like the following.

<?xml version="1.0" encoding="utf-8" ?>
<SomeDocument
  xmlns="urn:my.org-v159"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
  <Container>
    <Section id="10399">
      <!-- other stuff that might have a Section -->
    </Section>
  </Container>
</SomeDocument>

Except, the real XML isn’t like the XML listed here.

XML OrcsThe XML listed here is like an orc in an animated Disney movie. It has a funny pink nose and buck teeth making it look more vulnerable than dangerous.

The real XML is like an orc in a movie derived from a J.R.R Tolkien novel. It is ugly, angry, and nearly incomprehensible when communicating. Its creator is a Standards Committee committed to enslaving humans, elves, software developers, and dwarves.  It is also just one orc in an army of repugnant orcs that cover a small continent.

Using this simple example though, let’s pretend the first goal is to retrieve the value of the id attribute in the Section element.

public class TheDocument
{
    public TheDocument(XDocument document)
    {
        Id = int.Parse(document
            .Element(_myOrg + "SomeDocument")
            .Element(_myOrg + "Container")
            .Element(_myOrg + "Section")
            .Attribute("id").Value);
    }

    public int Id { get; protected set; }      
    readonly XNamespace _myOrg = "urn:my.org-v159";
}

At this point a real craftsperson might realize that the id value is one piece of data in 376 total pieces of data that must be retrieved from the orc army. Since this one data point required 5 lines of code, the total processing would require 1,880 lines of code, which sounds excessive. Moreover, 1880 was a leap year, and leap years always create bugs in software, so the number itself is a bad omen and a sign that more work needs to be done. At least this is how I imagine a craftsperson to think.

I also imagine a real craftsman knows syntax and APIs pretty well, and in areas they don’t know well they will dig into documentation and figure out better ways of doing things. In this case using a different query method and the implicit conversion operator of an XAttribute cuts the LoC per data point to 4.

Id = (int)document
           .Descendants(_myOrg + "Section")
           .First()
           .Attribute("id");

However, 4 * 376 is 1,504, and 1504 was also a leap year. Coincidence? I don’t believe a craftsperson believes in coincidence, but they might believe in numerology, and regardless of their superstitions they certainly believe in error conditions. A craftsperson will understand the probability of receiving a proper XML document is low, because the Standards Committee, in an attempt to provide the ability to describe every possible variation of orc and sub-orc, built an XML schema so complex and full of malice that even the most sophisticated code generation tools will puke electronic bits on the floor when fed the first 5,000 lines of the associated schema file.

A craftsperson will know then, that it is a good idea to look at the types of error messages the software might produce if, for example, the Section element doesn’t exist.

Unhandled Exception: System.InvalidOperationException: Sequence contains no elements

XPath Orc Slayer These are the types of error messages that make debugging a software like debugging a 2 month old baby. You know the baby is unhappy because no living being makes these types of noises when content , but the baby can’t tell you exactly what it is wrong, it can only communicate with primitive shrieks that keep you awake at night.

A real craftsperson, I imagine, knows the business as well as the domain and the technology. And a real craftsperson will realize these types of blaring baby error messages will occur commonly and will never be solved without the assistance of a developer armed with stack trace and source code. It is then in the best interest of the business to spend additional time to craft a better solution.

But how to solve the problem?

A real craftsperson, I imagine, might take a step back and start to think about alternative approaches. A real craftsperson has more than a few years of experience under their belt, and will remember stories from a past age when orcs first became a tangible nuisance. It was then when the elders traveled to the Standard Mountain and forged a mighty blade. Behold! Its name is XPath the Orc Slayer.

document.XPathEvaluate("number(mo:SomeDocument/mo:Container/mo:Section/@id)")

XPathEvaluate is a strange beast. While the documentation for other XML related APIs drones on for paragraphs about the  minutiae of an XML Infoset, the documentation for XPathEvaluate consists of a single terse sentence.

Evaluates an XPath expression.

Regular developers like me look at documentation like this and scoff. What sort of laziness is this? I already know this method evaluates an XPath expression because that’s the name of the method for Lórien’s sake! But, I’ve always had a suspicion that software craftspeople maintain a cabal and communicate through a series of secret signs. Documentation like this must be a secret sign leading to a powerful tool. Because powerful tools can’t just fall from the sky like rain, they have to be hidden in plain sight so that only powerful people can find them and only in times of darkness and dire needs. Thus, a real software craftsperson will instantly recognize XPathEvaluate as a useful but too-generic tool that needs a little bit of gift wrapping to provide real value in an application.

First, a namespace resolver for the XPath expressions. A namespace resolver is basically a lookup table for the XPath engine to discover real namespace values.

class MyOrgXmlNamespaceResolver : IXmlNamespaceResolver
{
    public MyOrgXmlNamespaceResolver()
    {
        _namespaceMap = new Dictionary<string, string>()
        {
            { "mo", "urn:my.org-v159"},
            { "xsi", "http://www.w3.org/2001/XMLSchema-instance"}
        };
    }

    public IDictionary<string, string> GetNamespacesInScope(XmlNamespaceScope scope)
    {
        return _namespaceMap;
    }

    public string LookupNamespace(string prefix)
    {
        return _namespaceMap[prefix];
    }

    public string LookupPrefix(string namespaceName)
    {
        return _namespaceMap.First(kvp => kvp.Value == namespaceName).Value;
    }

    private IDictionary<string, string> _namespaceMap;
}

Also, a specialized exception class.

public class XmlParsingException : Exception
{
    public XmlParsingException(string xpath)
        :base(String.Format(NoLocate, xpath))
    {
    }

    public XmlParsingException(string xpath, Exception innerException)
        :base(String.Format(Problem, xpath), innerException)
    {            
    }

    const string NoLocate = "Could not locate {0}";
    const string Problem = "Problem locating {0}, see inner exception for details";
}

And finally, some syntactic sugar to dress XPathEvaluate like a tailor.

public static class XmlHelpers
 {
public static T XPathToValue<T>(this XNode node, string xpath)
{
    try
    {
        var queryResult = node.XPathEvaluate(xpath, _namespaceResolver)
            as IEnumerable;

        if (queryResult != null)
        {
            var firstResult = queryResult.OfType<XObject>().FirstOrDefault();                                                
            if (firstResult != null)
            {
                string value = "";
                if (firstResult is XAttribute)
                {
                    value = ((XAttribute)firstResult).Value;
                }
                else if (firstResult is XElement)
                {
                    value = ((XElement) firstResult).Value;
                }
                var converter = TypeDescriptor.GetConverter(typeof(T));
                return (T) converter.ConvertFrom(value);
            }
        }
    }
    catch (Exception ex)
    {
        throw new XmlParsingException(xpath, ex);
    }

    throw new XmlParsingException(xpath);
}

A real software craftsperson, I think, would start to worry because the amount of code here (as well as the cyclomatic complexity) is a bit much. After all, it took only 4 lines of brute force code to retrieve a single integer value from the orc army. But a software craftsperson, I imagine, is always looking for the tangible value in a piece of code, and there are two ways to see if this code provides any value. The first test of value is to consume the code.

public class TheDocument
{
    public TheDocument(XDocument document)
    {
        Id = document.XPathToValue<int>("mo:SomeDocument/mo:Container/mo:Section/@id");
    }

    public int Id { get; protected set; }
}

The consumption test passes. A developer can now focus on slaying orcs instead of XML APIs. The next test is to run the code, particularly with bad data, like a missing Section element.

Could not locate mo:SomeDocument/mo:Container/mo:Section/@id

Unlike the previous blaring baby error message (sequence contains no elements), this error is grown up. Also, it will allow a developer to switch from a mindset of “is this a bug in my code?” to “this is probably a bug in your XML!”, and upon further inspection 99% of the time there will be a bug in the XML, and developer can point out the error with an email like the following.

Version 3 revision 2402 of the specification (the only one we officially support) clearly states that the id attribute of the Section element must exist inside a Container of SomeDocument, except for the circumstances documented on pages 507-512, 693, 701, and the entirety of Appendix E. This is obvious to most people, but since this is the third time you’ve send an XML file with this same error YOU ARE CLEARLY A MORON.

I’m told that some developers write emails like these because they are filled with hubris, but I don’t believe everything I hear.

However, I do believe that a software craftsperson will think the above approach has some merit, because the code creates an easier API than XPathEvaluate and reduces the number of blaring baby error messages. The orcs must be trembling, but I think a software craftsperson will continue to look at the big picture and realize the  code still has two problems.

  1. It’s ugly
  2. It doesn’t support all the features needed to parse all the required information.

With regards to #2, the code will sometimes need to parse integers and strings from both attributes and elements. The code also needs to parse individual elements, and sometimes a collection of elements. To make things even trickier, some information is required, and some information is optional. All of this might point a real software craftsperson to using an extension method as a simple gateway to an object that can encapsulate and manage more complexity.

public static class XmlHelpers
{
    public static XPathEvaluator XPath(this XNode node, string xpath)
    {
        return new XPathEvaluator(node, xpath);   
    }        
}

The XPathEvaluator is responsible for parsing different types of values, and throwing exceptions when required information isn’t present.

public class XPathEvaluator
{
    public XPathEvaluator(XNode node, string xpath)
    {
        _node = node;
        _xpath = xpath;
        _required = true;
        _rawQueryResult = ExecuteExpression();
    }

    public XPathEvaluator Optional()
    {
        _required = false;
        return this;
    }

    public XElement Element()
    {
        var result = _rawQueryResult.OfType<XElement>().FirstOrDefault();
        Validate(result);
        return result;
    }

    public IList<XElement> Elements()
    {
        var result = _rawQueryResult.OfType<XElement>().ToList();
        Validate(result);
        return result;
    }

    public T Value<T>()
    {
        T result = default(T);
        try
        {
            var firstEntry = _rawQueryResult.FirstOrDefault();                
            if (firstEntry != null)
            {
                var rawResult = GetRawResult(firstEntry);
                Validate(rawResult);
                return ConvertResult<T>(rawResult);
            }
        }
        catch (Exception ex)
        {
            throw new XmlParsingException(_xpath, ex);
        }
        if (_required)
        {
            throw new XmlParsingException(_xpath);
        }
        return result;
    }

    private static T ConvertResult<T>(string rawResult)
    {
        var converter = TypeDescriptor.GetConverter(typeof (T));
        return (T) converter.ConvertFrom(rawResult);
    }

    private string GetRawResult(XObject firstEntry)
    {
        string rawResult = null;
        if (firstEntry != null)
        {
            if (firstEntry is XAttribute)
            {
                rawResult = ((XAttribute) firstEntry).Value;
            }
            else if (firstEntry is XElement)
            {
                rawResult = ((XElement) firstEntry).Value;
            }
        }
        return rawResult;
    }       

    void Validate(XElement element)
    {
        if (_required)
        {
            if (element == null)
            {
                throw new XmlParsingException(_xpath);
            }
        }
    }

    void Validate(IList<XElement> elements)
    {
        if (_required)
        {
            if (elements == null || !elements.Any())
            {
                throw new XmlParsingException(_xpath);
            }
        }
    }

    void Validate(string value)
    {
        if (_required)
        {
            if (value == null)
            {
                throw new XmlParsingException(_xpath);
            }
        }
    }

    IList<XObject> ExecuteExpression()
    {
        try
        {
            var result = (IEnumerable) _node.XPathEvaluate(_xpath, _namespaceResolver);
            return result.OfType<XObject>().ToList();
        }
        catch (Exception ex)
        {
            throw new XmlParsingException(_xpath, ex);
        }
    }

    readonly XNode _node;
    readonly string _xpath;
    bool _required;
    IEnumerable<XObject> _rawQueryResult;
    static IXmlNamespaceResolver _namespaceResolver = new MyOrgXmlNamespaceResolver();
}

This is quite a bit of code, but there are many orcs on the field of battle, and now developers can fight them with relatively simple code.

public class TheDocument
{
    public TheDocument(XDocument document)
    {
        Id = document.XPath("mo:SomeDocument/mo:Container/mo:Section/@id").Value<int>();
        Name = document.XPath("mo:SomeDocument/mo:Container/mo:Section/@name").Value<string>();
        Documentation = document.XPath("mo:SomeDocument/mo:Documentation").Optional().Value<string>();
        Container = document.XPath("mo:SomeDocument/mo:Container").Element();
        Extras = document.XPath("mo:SomeDocument/mo:Extra").Elements();
        Comments = document.XPath("mo:SomeDocument/mo:Comment").Optional().Elements();
    }

    public int Id { get; protected set; }
    public string Name { get; protected set; }
    public string Documentation { get; protected set; }
    public XElement Container { get; protected set; }
    public IList<XElement> Extras { get; protected set; }
    public IList<XElement> Comments { get; set; }
}

This is the type of thought process I imagine a software craftsperson might have, but I don’t know for sure.

It could all be rubbish.

And I might be an orc.


Comments
gravatar David Monday, January 27, 2014
"These are the types of error messages that make debugging a software like debugging a 2 month old baby" is the best analogy ever, because all the developers who wear another hat as a parent know what this is like. I haven't read a blog post that was this much fun in a long long time! Thanks very much :-)
gravatar R Monday, January 27, 2014
In my mind, a true craftsman would throw down his tools and storm out the building when asked to use XML.
Jim Monday, January 27, 2014
Enjoyed the Post, I will try out some of your craftsman tips on my next XML project.
gravatar jdogg13 Monday, January 27, 2014
What about using Linq-to-XML?
gravatar Stacy Monday, January 27, 2014
Man, if someone wrote this much code for that task for my company, I would fire them in a New York minute!
gravatar Scott Monday, January 27, 2014
@jdogg13 - Started with LINQ to XML, not easy enough. @Stacy - If you made me stick with the original solution requiring 1,880 lines of code, I'd quit faster than you could fire me.
gravatar Sam Judson Monday, January 27, 2014
A true software craftsman bookmarks this code and then rios it off next time he has to retrieve random crap from XML.
joshka Tuesday, January 28, 2014
TL;DR should have used Java ;)
gravatar Marcel Popescu Tuesday, January 28, 2014
Great article. I wonder if there's anybody at Microsoft who uses XML (from C#) because we've *all* had to write something like this. I think I wrote XML helpers at three different companies so far. Annoying.
gravatar Paul Vassu Thursday, January 30, 2014
Very nice solution and very entertaining read as well. Thanks for sharing it! I was about to plead for a static "helper" class approach, but I rest my case as I find your general solution better in any aspect. Only one remark/question though: _rawQueryResult should be also IList<..> because it does contain a List, right?
gravatar Anish Thursday, January 30, 2014
This is possibly one of the most entertaining articles I've ever read on the internet. Good work, and I imagine such a software craftsman will slay many orcs.
gravatar Leonardo Sunday, February 2, 2014
"The real XML is like an orc in a movie derived from a J.R.R Tolkien novel. It is ugly, angry, and nearly incomprehensible when communicating. Its creator is a Standards Committee committed to enslaving humans, elves, software developers, and dwarves." Were you referring to ARTS XML by any chance?
gravatar Scott Sunday, February 2, 2014
@Leonardo - HL7, actually :)
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!