What’s Wrong With This Code? (#20)

Wednesday, September 3, 2008 by scott

Mike had to model answers. Yes or no answers, date and time answers - all sorts of answers. One catch was that any answer could be “missing” or could be “empty”. Both values had distinct meanings in the domain. An interface definition fell out of the early iterative design work:

public interface IAnswer
{
    bool IsMissing { get; }
    bool IsEmpty { get; }
}

Mike was prepared to implement a DateTimeAnswer class, but first a test:

[TestMethod]
public void Can_Represent_Empty_DateTimeAnswer()
{
    DateTimeAnswer emptyAnswer = new DateTimeAnswer();
    Assert.IsTrue(emptyAnswer.IsEmpty);
}

After a little work, Mike had a class that could pass the test:

public class DateTimeAnswer : IAnswer
{       
    public bool IsEmpty
    {
        get { return Value == _emptyAnswer; }
    }

    public bool IsMissing
    {
        get { return false; } // todo 
    }

    public DateTime Value { get; set; }

    static DateTime _emptyAnswer = DateTime.MinValue;
    static DateTime _missingAnswer = DateTime.MaxValue;
}

After sitting back and looking at the code, Mike realized there were a couple facets of the class he didn’t like:

  • A client of the class needed to know which values of DateTime were used internally to represent empty and missing answers.  
  • The class felt like it should produce immutable objects, and thus the set-able Value property felt wrong.

Mike returned to his test project, and changed his first test to agree with his idea of how the class should work. Mike figured adding a couple well known DateTimeAnswer objects (named Empty and Missing) would get rid of the magic DateTime values in client code.

[TestMethod]
public void Can_Represent_Empty_DateTimeAnswer()
{
    DateTimeAnswer emptyAnswer = DateTimeAnswer.Empty;
    Assert.IsTrue(emptyAnswer.IsEmpty);
}

Feeling pretty confident, Mike returned to his DateTimeAnswer class and added a constructor, changed the Value property to use a protected setter, implemented IsMissing, and published the two well known DateTimeAnswer objects based on his previous code:

public class DateTimeAnswer : IAnswer
{       
    public DateTimeAnswer (DateTime value)
    {
        Value = value;
    }

    public bool IsEmpty
    {
        get { return Value == _emptyAnswer; }
    }

    public bool IsMissing
    {
        get { return Value == _missingAnswer; }
    }

    public DateTime Value { get; protected set; }
    public static DateTimeAnswer Empty = new DateTimeAnswer(_emptyAnswer);
    public static DateTimeAnswer Missing = new DateTimeAnswer(_missingAnswer);
    static DateTime _emptyAnswer = DateTime.MinValue;
    static DateTime _missingAnswer = DateTime.MaxValue;    
}

Mike’s test passed. Mike was so confident about his class he never wrote a test for IsMissing. It was just too easy – what could possible go wrong? Imagine his surprise when someone else wrote the following test, and it failed!

[TestMethod]
public void Can_Represent_Missing_DateTimeAnswer()
{
    DateTimeAnswer missingAnswer = DateTimeAnswer.Missing;
    Assert.IsTrue(missingAnswer.IsMissing);
}

What went wrong?

Stupid LINQ Tricks

Tuesday, September 2, 2008 by scott

Over a month ago I did a presentation on LINQ and promised a few people I’d share the code from the session. Better late than never, eh?

We warmed up by building our own filtering operator to use in a query. The operator takes an Expression<Predicate<T>>, which we need to compile before we invoking the predicate inside.

public static class MyExtensions
{
    public static IEnumerable<T> Where<T>(
                  this IEnumerable<T> sequence,
                  Expression<Predicate<T>> filter)
    {
        foreach (T item in sequence)
        {
            if (filter.Compile()(item))
            {
                yield return item;
            }
        }
    }
}

The following query uses our custom Where operator:

IEnumerable<Employee> employees = new List<Employee>()
{
    new Employee() { ID= 1, Name="Scott" },
    new Employee() { ID =2, Name="Paul" }
};


Employee scott =
    employees.Where(e => e.Name == "Scott").First();

Of course, if we are just going to compile and invoke the expression there is little advantage to using an Expression<T>, but it generally turns into an “a-ha!” moment when you show someone the difference between an Expression<Predicate<T>> and a plain Predicate<T>. Try it yourself in a debugger.

We also wrote a LINQ version of “Hello, World!” that reads text files from a temp directory (a.txt would contain “Hello,”, while b.txt would contain “World!”. A good demonstration of map-filter-reduce with C# 3.0.

var message = Directory.GetFiles(@"c:\temp\")
                       .Where(fname => fname.EndsWith(".txt"))
                       .Select(fname => File.ReadAllText(fname))
                       .Aggregate(
                           new StringBuilder(),
                          (sb, s) => sb.Append(s).Append(" "),
                          sb => sb.ToString()
                       );


Console.WriteLine(message);

Moving into NDepend territory, we also wrote a query to find the namespaces with the most types (for referenced assemblies only):

var groups = Assembly.GetExecutingAssembly()
         .GetReferencedAssemblies()
         .Select(aname => Assembly.Load(aname))
         .SelectMany(asm => asm.GetExportedTypes())
         .GroupBy(t => t.Namespace)
         .OrderByDescending(g => g.Count())
         .Take(10);

foreach (var group in groups)
{
    Console.WriteLine("{0} {1}", group.Key, group.Count());
    foreach (var type in group)
    {
        Console.WriteLine("\t" + type.Name);
    }
}

And finally, some LINQ to XML code that creates an XML document out of all the executing processes on the machine:

XNamespace ns = "http://odetocode.com/schemas/linqdemo";
XNamespace ext = "http://odetocode.com/schemas/extensions";

XDocument doc =
    new XDocument(
        new XElement(ns + "Processes",
            new XAttribute(XNamespace.Xmlns + "ext", ext),
            from p in Process.GetProcesses()
            select new XElement(ns + "Process",
               new XAttribute("Name", p.ProcessName),
               new XAttribute(ext + "PID", p.Id))));

Followed by a query for the processes ID of any mspaint instances:

var query =
   (from e in doc.Descendants(ns + "Process")
    where (string)e.Attribute("Name") == "mspaint"
    select (string)e.Attribute(ext + "PID"));

More on LINQ to come…

Visual Studio SP1 and The Metification of REST

Thursday, August 14, 2008 by scott

Metification – verb

  1. The act of adding metadata to a web service in order to facilitate tooling and discovery.
  2. The act of adding complexity to a web service in order to achieve tight coupling.

Pick one.

Service Pack 1 for Visual Studio 2008 has just arrived with new features, including version 1.0 of ADO.NET Data Services (a.k.a Astoria). From the description (highlighting is mine):

ADO.NET Data Services … consists of a combination of patterns and libraries that enables any data store to be exposed as a flexible data service, naturally integrating with the Web, that can be consumed by Web clients within a corporate network or across the Internet. ADO.NET Data Services uses URIs to point to pieces of data and simple, well-known formats to represent that data, such as JSON and ATOM/APP. This results in data being exposed to Web clients as a REST-style resource collection, addressable with URIs that agents can interact with using standard HTTP verbs such as GET, POST, or DELETE.

Compared to the traditional SOAP approach, the REST-style is a different model for exposing functionality over a web service. Instead of defining messages and exposing operations that act on those messages, you expose resources and act on the resources using common HTTP verbs. I’ve lately been thinking of SOAP based web services as “verb oriented” (exposing GetOrder and UpdateCustomer), while REST style web services are “noun oriented” (exposing Orders and Customers). Both models have advantages and disadvantages, but I’ve felt that REST partners well with rich, Internet applications that need to retrieve a variety of resources  using the same filtering and paging parameters. Creating a heap of GetThisByThat operations is tedious. 

Noun and verbs aren’t the only difference between REST and SOAP. One of the primary strengths of REST is its inherent simplicity. The simplicity not only facilitates broad interoperability, but encourages an acceptance of REST from many who feel overwhelmed by the complexities of WS-*. There are no tools required for REST - all you need is the ability to send an HTTP request and read the response. WS-*, on the other hand, is great when you need a digitally signed message including double-secret user credentials routed through an asynchronous and distributed, two-phase commit transaction with an extended buyer protection. Not everyone needs that flexibility, but you still pay the price for the flexibility when using the tooling and the API, and when configuring the service.

Although we could continue talking about differences in REST and SOAP, I wanted to talk about metadata, and Astoria.

Metafication

REST proponents, as a rule of thumb, shun metadata – but not all forms of metadata. Metadata in prose or written documentation is fine. Metadata in a self-describing response format is fine. However, metadata for tooling is seen by many as pure evil. Part of the complexity in WS-* is in the quirky and convoluted folds of metadata formats like WSDL and XML Schema. REST has seen some attempts at standardized metadata (WADL, WSDL 2.0, XSD), but still resists all attempts for the most part. 

I like metadata. Maybe I’ve been in the .NET ecosystem for so long that I expect tooling, but I still remember the first time I tried to write a program for the Flickr web service (which is technically just POX). I was shocked when I coudn’t find a WSDL file. Then I was surprised at how easy it was to craft the correct URL for an HTTP request, and shred apart the XML response to find photographs. It was so easy that ... well, it was just too easy. It reminded me of writing data access code from scratch. Data access code is so predictable and repetitive that we have tools, frameworks, and code generators to take care of the job. But those tools, frameworks, and code generators rely on metadata defined by a database schema, so their job is relatively straightforward. REST is a bit different, unless you are working with Astoria on the server and a CLR client.

Let’s say you have some DTOs for employees, orders, and other objects you want to send over the wire. You’ll need to decorate them with enough information for the service to understand the primary key.

[DataServiceKey("ID")]
public class Employee
{

public int ID { get; set; }
public string Name { get; set; }
}

[DataServiceKey("ID")]
public class Order
{ // …
}

Next, define a class with public IQueryable<T> properties for each “entity set” (Employees and Orders). IQueryable<T> is easy to conjure up, and the class below represents a read-only data source with some fake in-memory data. If you need create, update, and delete functionality the class will need to implement IUpdateable, too. Sean Wildermuth has a three series blog post about IUpdateable that he wrote when implementing IUpdateable for the NHibernate LINQ project.

public class AcmeData 
{    
public
IQueryable<Employee> Employees
{
get
{ return new List<Employee>
{
new Employee() /* ... */,
new Employee() /* ... */,
new Employee() /* ... */
// ...
}.AsQueryable();
}
}

public
IQueryable<Order> Orders
{
// ...
}
// ...
}

Then you need an .svc file…

<%@ ServiceHost Language="C#" 
Factory="System.Data.Services.DataServiceHostFactory,
System.Data.Services,
Version=3.5.0.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089
"
Service="AcmeDataService" %>

… and you’ll also need a code-behind file for the .svc (which is all setup for you using an ADO.NET data service template, you just add some configuration):

public class AcmeDataService : DataService<AcmeData>
{
public static void InitializeService(IDataServiceConfiguration config)
{
config.SetEntitySetAccessRule("Employees", EntitySetRights.AllRead);
// ... more rules
}
}

At this point you can start testing the service using a web browser and looking at, for example, http://localhost/AcmeDataService.svc/Employees. What is more interesting is looking at http://localhost/AcmeDataService.svc/$metadata, because there you’ll find service metadata, which is where the magic starts.

To consume the service, right-click on a project in Visual Studio and select “Add Service Reference…”. Yes – the same “Add Service Reference” command you might have seen in the hit motion picture “SOAP and WSDL – an XML Love Story”. This feature blurs the lines between REST and WS-*. Enter the root URL to the service and Visual Studio will generate a proxy – but not the type of proxy you receive when using SOAP based web services. This proxy will derive from DataServiceContext class and you can use it like so:

var employees = new AcmeData(serviceRoot)
.Employees
.Where(e => e.Name == "Scott")
.OrderBy(e => e.Name)
.Skip(2)
.Take(2)
.ToList();

DataServiceContext does a little bit of magic to turn the LINQ query into the following HTTP request. It’s LINQ to REST:

GET /AcmeDataService.svc/Employees()
?$filter=Name%20eq%20'Scott'&$orderby=Name&$skip=2&$top=2 HTTP/1.1
User-Agent: Microsoft ADO.NET Data Services
Accept: application/atom+xml,application/xml

The data service will respond with some XML that the data context uses to create objects that look just like the server side DTOs.

I’m sure some are horrified at this metification of REST, but for scenarios when you need to talk between two CLR appdomains (think ASP.NET and Silverlight), this approach gives you the advantages of thinking about nouns in a RESTful model without writing all the glue code to wire up an endpoints and parse XML. Beauty!

Optimizing LINQ Queries

Tuesday, July 15, 2008 by scott

I’ve been asked a few times about how to optimize LINQ code. The first step in optimizing LINQ code is to take some measurements and make sure you really have a problem. 

premature

 

It turns out that optimizing LINQ code isn’t that different from optimizing regular C# code. You need to form a hypothesis, make changes, and measure, measure, measure every step of the way. Measurement is important, because sometimes the changes you need to make are not intuitive.

Here is a specific example using LINQ to Objects.

Let’s say we have 100,000 of these in memory:

public class CensusRecord
{
public string District{ get; set; }
public long Males { get; set; }
public long Females { get; set; }
}

We need a query that will give us back a list of districts ordered by their male / female population ratio, and include the ratio in the query result. A first attempt might look like this:

var query =
from r in _censusRecords
orderby (double)r.Males / (double)r.Females descending
select new
{
District = r.District,
Ratio = (double)r.Males / (double)r.Females
};

query = query.ToList();

It’s tempting to look at the query and think - “If we only calculate the ratio once, we can make the query faster and more readable! A win-win!”. We do this by introducing a new range variable with the let clause:

var query =
from r in _censusRecords
let ratio = (double)r.Males / (double)r.Females orderby ratio descending
select new
{
District = r.District,
Ratio = ratio
};

query = query.ToList();

If you measure the execution time of each query on 100,000 objects, however, you’ll find the second query is about 14% slower than the first query, despite the fact that we are only calculating the ratio once. Surprising! See why we need to take measurements?

Look At Time and Space

The key to this specific issue is understanding how the C# compiler introduces the range variable ratio into the query processing. We know that C# translates declarative queries into a series of method calls. Imagine the method calls forming a pipeline for pumping objects. The first query we wrote would translate into the following:

var query =
_censusRecords.OrderByDescending(r => (double)r.Males /
(double)r.Females)
.Select(r => new { District = r.District,
Ratio = (double)r.Males /
(double)r.Females });

The second query, the one with the let clause, is asking LINQ to pass an additional piece of state through the object pipeline. In other words, we need to pump both a CensusRecord object and a double value (the ratio) into the OrderByDescending and Select methods. There is no magic involved - the only way to get both pieces of data through the pipeline is to instantiate a new object that will carry both pieces of data. When C# is done translating the second query, the result looks like this:

var query =
_censusRecords.Select(r => new { Record = r,
Ratio = (double)r.Males /
(double)r.Females })
.OrderByDescending(r => r.Ratio)
.Select(r => new { District = r.Record.District,
Ratio = r.Ratio });

clr profiler results

The above query requires two projections, which is 200,000 object instantiations.  CLR Profiler says the let version of the query uses 60% more memory.

Now we have a better idea why performance decreased, and we can try a different optimization. We’ll write the query using method calls instead of a declarative syntax, and do a projection into the type we need first, and then order the objects.

var query =
_censusRecords.Select(r => new { District = r.District,
Ratio = (double)r.Males /
(double)r.Females })
.OrderByDescending(r => r.Ratio);

This query will perform about 6% faster than the first query in the post, but consistently (and mysteriously) uses 5% more memory. Ah, tradeoffs.

Moral Of The Story?

The moral of the story is not to rewrite all your LINQ queries to save a 5 milliseconds here and there. The first priority is always to build working, maintainable software. The moral of the story is that LINQ, like any technology, requires analysis and measurements to make optimization gains because the path to better performance isn’t always obvious. Also remember that a query “optimized” for LINQ to Objects might make things worse when the same query uses a different provider, like LINQ to SQL.

Using an ORM? Think Objects!

Monday, July 14, 2008 by scott

I recently had some time on airplanes to read through Bitter EJB, POJOs in Action, and  Better, Faster, Lighter Java. All three books were good, but the last one was my favorite, and was recommended to me by Ian Cooper. No, I’m not planning on trading in assemblies for jar files just yet. I read the books to get some insight and perspectives into specific trends in the Java ecosystem. A Sound Of Thunder

It’s impossible to summarize the books in one paragraph, but I’ll try anyway:

Some Java developers shun the EJB framework so they can focus on objects. Simple objects. Testable objects. Malleable objects. Plain old Java objects that solve business problems without being encumbered by infrastructure and technology concerns.

That’s the gist of the three books in 35 words. The books also talk about patterns, anti-patterns, domain driven design, lightweight frameworks, processes, and generally how to  write software. You’d be surprised how much content is applicable to .NET. In fact, when reading through the books I began to think of .NET and Java as two parallel universes whose deviations could be explained by the accidental killing of one butterfly during a time traveling safari.

The focus of this post is one particular deviation that really stood out.

From Objects To ORMs

The Java developers who focus on objects eventually have to deal with other concerns like persistence. Their  object focus naturally leads some of them to try object-relational mapping frameworks. ORMs like Hibernate not only provide these developers with productivity gains, but do so in a relatively transparent and non-intrusive manner. The two work well together right from the start as the developers understand the ORMs, and the ORMs seem to understand the developers.

From DataSets to ORMs

.NET includes includes DataSets, DataTables, and DataViews. There is an IDE with a Data menu, and a GUI toolbox with Data tab full of Data controls and DataSources. It’s easy to stereotype mainstream .NET development as data-centric. When you introduce an ORM to a .NET developer who has never seen one, the typical questions are along the lines of:

How do I manage my identity values after an INSERT?

... and ...

Does this thing work with stored procedures?

Perfectly reasonable questions given the data-centric atmosphere of .NET, but you can almost feel the tension in these questions. And that is the deviation that stood out to me. On the airplane, I read about Java developers who focused on objects and went in search of ORMs. In .NET land, I’m seeing the ORMs going in search of the developer who is focused on data. The ORMs in particular are LINQ to SQL (currently shipping in Visual Studio) and the Entity Framework (shipping in SP1). Anyone expecting something like “ADO.NET 3.5” is in for a surprise. Persistent entities and DataSets are two different creatures, and require two different mind sets.

Will .NET Developers Focus On Objects Now?

It’s possible, but the tools make it difficult. The Entity Framework, for instance, presents developers with cognitive dissonance at several points. The documentation will tell you the goal of EF is to create a rich, conceptual object model, but the press releases proclaim that the Entity Framework simplifies data-centric development.  There will not be any plain old CLR objects (POCOs) in EF, and the object-focused implicit lazy-loading that comes standard in most ORMs isn’t available (you can read any property on this entity, um, except that one – you’ll have to load it first).

LINQ to SQL is different. LINQ to SQL is objects all the way down. You can use plain old CLR objects with LINQ to SQL if you dig beyond the surface. However, the surface is a shiny designer that looks just like the typed DataSet designer. LINQ to SQL also needs some additional mapping flexibility to truly separate the object  model from the underlying database schema – hopefully we’ll see this in the next version.

What To Do?

If you are a .NET developer who is starting to use an ORM –any ORM, you owe it to yourself and your project to reset your defaults and think differently about the new paradigm. Forget what you know about DataSets and learn about the unit of work pattern. Forget what you know about data readers and learn how an ORM identity map works. Think objects first, data second. If you can’t think of data second, an ORM might not be the technology for you. 

LINQ Deep Dive at D.C. ALT.NET Next Week

Friday, July 11, 2008 by scott

Matt Podwysocki invited me to speak at the D.C. alt.net meeting next Thursday evening (July 24th). The topic is LINQ. Matt specifically requested a code-heavy presentation, so expect two slides followed by plenty of hot lambda and Expression<T> action.

Hopefully, Matt doesn’t blackout the neighborhood like he did at the nearby RockNUG meeting this week. The White House is two blocks away and the people inside get a little jumpy about blackouts.

 

DateTime:
7/24/2008 - 7PM-9PM

Location:
Cynergy Systems Inc.
1600 K St NW
Suite 300
Washington, DC 20006
Show Map

Keeping LINQ Code Healthy

Wednesday, July 9, 2008 by scott

In the BI space I’ve seen a lot of SQL queries succumb to complexity. A data extraction query adds some joins, then some filters, then some nested SELET statements, and it becomes an unhealthy mess in short order. It’s unfortunate, but standard SQL just isn’t a language geared for refactoring towards simplification (although UDFs and CTEs in T-SQL have helped).

I’ve really enjoyed writing LINQ queries this year, and I’ve found them easy to keep pretty.

For example, suppose you need to parse some values out of the following XML:

<ROOT>
<
data>
<
record>
<
field name="Country">Afghanistan</field>
<
field name="Year">1993</field>
<
field name="Value">16870000</field>
<!--
... -->
</
record>
<!--
... -->
</
data>
</
ROOT>

A first crack might look like the following:

var entries =
from r in doc.Descendants("record")
select new
{
Country = r.Elements("field")
.Where(f => f.Attribute("name") .Value == "Country")
.First().Value,
Year = r.Elements("field")
.Where(f => f.Attribute("name").Value == "Year")
.First().Value,
Value = double.Parse
(r.Elements("field")
.Where(f => f.Attribute("name").Value == "Value")
.First().Value)
};

The above is just a mass of method calls and string literals. But, add in a quick helper or extension method…

public static XElement Field(this XElement element, string name)
{
return element.Elements("field")
.Where(f => f.Attribute("name").Value == name)
.First();
}

… and you can quickly turn the query around into something readable.

var entries =
from r in doc.Descendants("record")
select new
{
Country = r.Field("Country").Value,
Year = r.Field("Year").Value,
Value = double.Parse(r.Field("Value").Value)
};

If only SQL code was just as easy to break apart!

My Pluralsight Courses

K.Scott Allen OdeToCode by K. Scott Allen
What JavaScript Developers Should Know About ECMAScript 2015
The Podcast!