From LINQ to XPath and Back Again

Friday, June 5, 2009

Let’s say you wanted to select the parts for a Lenovo X60 laptop from the following XML.

<Root>
  <Manufacturer Name="Lenovo=">
    <Model Name="X60=" >
      <Parts>
        <!-- ... -->
      </Parts>
    </Model>
    <Model Name="X200=">
      <!-- ... -->
    </Model>
  </Manufacturer>
  <Manufacturer Name="...=" />
  <Manufacturer Name="...=" />
  <Manufacturer Name="...=" />
  <Manufacturer Name="...=" />
</Root>

If you know LINQ to XML, you might load up an XDocument and start the party with a brute force approach:

var parts = xml.Root
               .Elements("Manufacturer")
                   .Where(e => e.Attribute("Name").Value == "Lenovo")
               .Elements("Model")
                   .Where(e => e.Attribute("Name").Value == "X60")
               .Single()
               .Element("Parts");

But, the code is ugly and makes you long for the days when XPath ruled the planet. Fortunately, you can combine XPath with LINQ to XML. The System.Xml.XPath namespace includes some XPath specific extension methods, like XPathSelectElement:

string xpath = "Manufacturer[@Name='Lenovo']/Model[@Name='X60']/Parts";
var parts = xml.Root.XPathSelectElement(xpath);

Now the query is a bit more readable (at least to some), but let’s see what we can do with extension methods.

static class ComputerManufacturerXmlExtensions
{
    public static XElement Manufacturer(this XElement element, string name)
    {
        return element.Elements("Manufacturer")
                      .Where(e => e.Attribute("Name").Value == name)
                      .Single();
    }

    public static XElement Model(this XElement element, string name)
    {
        return element.Elements("Model")
                      .Where(e => e.Attribute("Name").Value == name)
                      .Single();
    }

    public static XElement Parts(this XElement element)
    {
        return element.Element("Parts");
    }
}

Now, the query is short and succinct:

var parts = xml.Root.Manufacturer("Lenovo").Model("X60").Parts();

Combine an XSD file with T4 code generation and you’ll have all the extension methods you’ll ever need for pretty XML queries...


Comments
Mike Ohlsen Friday, June 5, 2009
What would be the performance implications for running the 3 linq queries vs the one big one originally? I'm sure not much for this example, but if you had some large XML docs, would this be too costly? Or does the one big linq query actually cost the same?
scott Friday, June 5, 2009
@Mike - I haven't taken any measurements. I suspect it will be slower, too. If I get a chance over the weekend, I'll post some numbers.
J Wynia Friday, June 5, 2009
Thanks. This is the first article on extension methods that made me actually want to look at using them in my own projects.

I don't care if it's faster or not. Hardware's quite a bit cheaper than my team's time is and from what I've seen MORE than half of all things brought up as performance problems turn out not to be problems in the final result. That, combined with the fact that the actual performance problems on every project I've been on *weren't* brought up in advance by anyone says that worrying about performance before trying something is nearly always wasted time (and that time's expensive).
Ian Patrick Hughes Friday, June 5, 2009
Hmmmm. I would have to say that I am with J Wynia on this one. As developers we do (and should) obsess about the performance statistics of our applications.

I'll sacrifice the label "fastest approach" for code readability any day of the week and twice on Sundays. The expense of a few microseconds to the user is null, but developer time is expensive and best spent somewhere else besides groking someone's "leanest" code.
Erick Thompson Saturday, June 6, 2009
Due to the way that XLINQ "pipes" opertions, having the extension methods shouldn't be much slower that having a single LINQ statement. The additional Single calls might make it a tiny bit slower - it would probably be faster to use First instead.

Aside from that, the main different is that you're going to get a hit from the extra stack frames, that but is too tiny to worry about.
jmorris Sunday, June 7, 2009
This is very useful, I have been using XDocument quite a bit lately and was missing XPath in a big way. XPath is @ss ugly but it's fairly intuitive and very useful.
Dustin Campbell Wednesday, June 10, 2009
Your extension method solution is excellent, but I thought it useful to point out the usefulness of the Visual Basic 9 XML literal syntax for scenarios like this one

(From m in xml.<Manufactor> _
Where m.@Name = "Lenovo" _
From n in m.<Model> _
Where n.@Name = "X60" _
Select n.<Parts>).Single()
scott Wednesday, June 10, 2009
@Dustin

Always have to show off the "VB is easier" stuff, now don't you?
gravatar anon_anon Wednesday, November 18, 2009
If XPath performance is important to you, look at vtd-xml
http://vtd-xml.sf.net
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!