Leroy was shocked when the source code appeared. It was familiar yet strange, like an old lover's kiss. The code was five years old – an artifact of Leroy's first project. Leroy slowly scrolled through the code and pondered his next move. It wasn't a bug that was bothering Leroy – there were no race conditions or tricky numerical conversions. No performance problems or uncaught error conditions. It was all about design …
"Times have changed, and so I have, fortunately", Leroy thought to himself. "And so will this code…"
To be continued…
Aggregate is a standard LINQ operator for in-memory collections that allows us to build a custom aggregation. Although LINQ provides a few standard aggregation operators, like Count, Min, Max, and Average, if you want an inline implementation of, say, a standard deviation calculation, then the Aggregate extension method is one approach you can use (the other approach being that you could write your own operator).
Let's say we wanted to see the total number of threads running on a machine. We could get that number lambda style, or with a query comprehension, or with a custom aggregate.
This particular overloaded version of Aggregate follows a common pattern of "Initialize – Accumulate – Terminate". You can see this pattern in extensible aggregation strategies from Oracle to SQLCLR. The first parameter represents an initialization expression. We need to provide an initialized accumulator – in this case just an integer value of 0.
The second parameter is a Func<int, Process, int> expression that the aggregate method will invoke as it iterates across the sequence of inputs. For each process we get our accumulator value (an int), and a reference to the current process in the iteration stage (a Process), and we return a new accumulator value (an int).
The last parameter is the terminate expression. This is an opportunity to provide any final calculations. For our summation, we just need to return the value in the accumulator.
Now, let's compute a more thorough summary of running threads, including a standard deviation. Although we could get away with a simple double accumulator for stddev, we can also use a more sophisticated accumulator to encapsulate some calculations, facilitate unit tests, and make the syntax easier on the eye.
Put the accumulator to use like so:
Given this simple Employee class:
How many employees do you expect to see from the following query with a Distinct operator?
The answer is 4 – we'll see both Hillary objects. The docs for Distinct are clear – the method uses the default equality comparer to test for equality, and the default comparer sees 4 distinct object references. One way to get around this would be to use the overloaded version of Distinct that accepts a custom IEqualityComparer.
Let's try the query again and project a new, anonymous type with the same properties as Employee.
That query only yields three objects – Distinct removes the duplicate Hillary! How'd it suddenly get so smart?
Turns out the C# compiler overrides Equals and GetHashCode for anonymous types. The implementation of the two overridden methods uses all the public properties on the type to compute an object's hash code and test for equality. If two objects of the same anonymous type have all the same values for their properties – the objects are equal. This is a safe strategy since anonymously typed objects are essentially immutable (all the properties are read-only). Fiddling with the hash code of a mutable type gets a bit dicey.
Interestingly – I stumbled on the Visual Basic version of anonymous types as I was writing this post and I see that VB allows you to define "Key" properties. In VB, only the values of Key properties are compared during an equality test. Key properties are readonly, while non-key properties on an anonymous type are mutable. That's a very C sharpish thing to do, VB team.
The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…
At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.
After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.
As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.
Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.
One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.
I was experimenting with the new SyndicationFeed class in 3.5 earlier this year and devised a mashup LINQ query:
The code is able to filter and sort RSS items from an arbitrary number of blogs with a 6 line query expression. I was thinking of this code when I ran across Scott Hanselman's Weekly Source Code 19 – LINQ and more What, Less How. Scott's reader David Nelson had the following observation:
I disagree with Siderite, in that I think the LINQ example is more readable than the iterative example; however, as has been pointed out, it leaves no room for error handling or AppDomain transitions. This is a problem with LINQ in general; in trying to make everything very compact, it leaves too little room to maneuver.
The LINQ query I'm using isn't production code. If just one blog is down and the XmlReader throws an exception, the entire operation is borked. One solution is to wrap the feed reading into a method that uses exception handling and returns an empty SyndicationFeed in case of an exception - then invoke the method from inside the query. Could anything else go wrong? Sure - one null PublishDate on an item and again we'd be borked. Bullet-proofing a LINQ query might take some work, especially when dealing with third party types.
As LINQ moves us into the "What" instead of the "How", it might be harder to see these types of error scenarios. LINQ is a fantastic technology, but like everything in software, it is a good idea to look the gift horse in the mouth.
The Lost Art of TSR Programming
Abstract: Return to the glory days of DOS 2.0 and INT 21h as we write a simple Terminate and Stay Resident application using the latest software development techniques. We will construct our x86 assembler code using test driven development and mock extended memory managers.
Why Am I Here On A Saturday?
Abstract: Because even if you weren't here, you'd still be at the computer. Don't think you'd be doing chores at home, like dusting off the entertainment center, because chores are boring.
Life of a Gnat
Abstract: This session has nothing to do with GNU software, but will describe (in excruciating detail) the journey of the common fungus gnat from egg to adulthood. Pictures of mating swarms may not be appropriate for younger attendees.
P.S. In all seriousness, the spring code camps are coming to the Mid-Atlantic and the topics are far better than the ones presented above.