OdeToCode IC Logo

And Equality for All ... Anonymous Types

Wednesday, March 26, 2008

Given this simple Employee class:

public class Employee
{
    
public int ID { get; set; }
    
public string Name { get; set; }    
}

How many employees do you expect to see from the following query with a Distinct operator?

var employees = new List<Employee>
{
    
new Employee { ID=1, Name="Barack" },
    
new Employee { ID=2, Name="Hillary" },
    
new Employee { ID=2, Name="Hillary" },
    
new Employee { ID=3, Name="Mac" }
};

var query =
        (
from employee in employees        
        
select employee).Distinct();

foreach (var employee in query)
{
    
Console.WriteLine(employee.Name);
}

The answer is 4 – we'll see both Hillary objects. The docs for Distinct are clear – the method uses the default equality comparer to test for equality, and the default comparer sees 4 distinct object references. One way to get around this would be to use the overloaded version of Distinct that accepts a custom IEqualityComparer.

Let's try the query again and project a new, anonymous type with the same properties as Employee.

var query =
  (
from employee in employees                            
  select new { employee.ID, employee.Name }).Distinct();

That query only yields three objects – Distinct removes the duplicate Hillary! How'd it suddenly get so smart?

Turns out the C# compiler overrides Equals and GetHashCode for anonymous types. The implementation of the two overridden methods uses all the public properties on the type to compute an object's hash code and test for equality. If two objects of the same anonymous type have all the same values for their properties – the objects are equal. This is a safe strategy since anonymously typed objects are essentially immutable (all the properties are read-only). Fiddling with the hash code of a mutable type gets a bit dicey.

Interestingly – I stumbled on the Visual Basic version of anonymous types as I was writing this post and I see that VB allows you to define "Key" properties. In VB, only the values of Key properties are compared during an equality test. Key properties are readonly, while non-key properties on an anonymous type are mutable. That's a very C sharpish thing to do, VB team.