Expression Magic

Thursday, November 13, 2008

In the last post we talked about needing some Expression<T> background. There is a lot of good information out there about Expression<T>, but if you haven’t heard – this class is pure magic. If you want a long version of the story, see “LINQ and C# 3.0”. For a short version of the story, read on.

.NET compilers like the C# and VB compilers are really good at converting code into an intermediate language that the CLR’s JIT compiler will transform into native code for the CPU to execute. So if you write the following …

Func<int, int> square = x => x * x;
var result = square(3); // yields 9

… and you open the assembly with a tool like Reflector, you’ll find the IL instructions created by the compiler.

ldarg.0
ldarg.1
mul
stloc.0

That essentially says – load up two arguments, multiply them, then store the result. These instructions will be part of an anonymous method (as lambda expressions are just a shorter syntax for writing an anonymous method), and you can invoke the method by applying the () operator to the square variable to compute some result. IL is the perfect representation for code that ultimately needs to execute, but it’s not a great representation of the developer’s orginal intent. As an example, consider the following.

var query = ctx.Animals
               .Where(animal => animal.Name == "Fido");

If ctx.Animals is an in-memory collection of objects, then compiling the code inside the Where method will generate efficient instructions to search for Fido. That’s good – but what if Fido lives in a database, behind a web service, or in some other remote location? Then a LINQ provider needs to translate the code inside the Where method into a web service call, a SQL query, or some other type of remote command. The LINQ provider will need to understand the programmer’s intent of the code inside the Where method, and IL is not designed to express this intent. We don’t want LINQ providers “decompiling” programs at runtime. Thus, we have Expression<T>.

Expression<Func<int, int>> square = x => x * x;
var result = square.Compile()(3); // yields 9

Wrapping our function inside an Expression produces something we can’t invoke directly – we have to .Compile the expression before we can invoke it and capture a result. This is because the .NET compilers don’t produce IL when they come across an assignment to Expression<T> - instead they produce a data structure known as an abstract syntax tree (AST). ASTs aren’t the prettiest things to look at, but they are a better representation for the code if you need to figure out what the code is trying to do. The AST for the Fido search will tell us the code consists of a binary expression that tests for equality, and that the left hand side of the equality test is the animal’s name, and the right hand side is a string constant “Fido”. This is enough information for a remote LINQ provider to translate the expression into something like “WHERE Name = ‘Fido’”.

Code As Data

Expression<T> gives us the ability to treat a piece of code as data, which is a relatively old concept (hello, LISP!) and fairly powerful. It gives us the ability to walk through “code” at runtime and examine what it intends to do. This feature facilitates all of the remote LINQ query providers, like LINQ to SQL and the Entity Framework, because they can examine the AST and formulate commands that represent the original code in a different language (SQL). This translation is far from simple, but it would have been impossible if the compiler was generating IL instead of syntax trees.

There are additional scenarios that Expression<T> enables that have nothing to do with queries or databases, which is where Expression<T> gets exciting because you can use it in places you may not have expected, like in a business layer, or a mapping layer, or in the flowcharts I was working on.

flowChartShape.RequiresField(casedata => casedata.SmokingCounseling)

You can think of the RequiresField method call as something that can add metadata to a flowchart shape, and this metadata describes the property on some data object that the shape will use during execution. The metadata is strongly typed, intelli-sensed, and refactor friendly. We can use the metadata at runtime to determine what fields to enable in the UI, or what fields are missing that a user needs to address. We’ll dig into this more in a future post.

All My Pluralsight Courses

OdeToCode by K. Scott Allen