Jeremy’s post “Don’t Check In Spike Code” reminds of something I’ve advocated for years: dedicate a place in your source control repository where each developer can check in their “experimental” code. The rule of thumb is to throw away code you write for a spike, but all code can be valuable, even if it isn’t production worthy.
Here are the kinds of things I’ve checked into “experiments/sallen” over the years:
I’ve found that keeping a dedicated area for these types of check-ins offers some advantages:
If nothing else, you can look back at what you wrote when you were spiking on a new technology and laugh in that “oh man, we really didn’t know what we were doing ” way.
There is a lot of humor in the Bad Variable Names entry on the c2 wiki. I like this confession from Alex:
The worst of which was my counter variable names. I now use i, j, k, and so on for local counts and things like activeRowCount for the more descriptive names. Before, in the early years mind you, it shames me to say, I would name my counters things like Dracula, Chocula*, MonteChristo. They are all counts after all. I apologize for my intial variable naming conventions and shall go beat my face now as punishment.
(*If you are not familiar with Count Chocula, he’s the mascot of a chocolate flavored breakfast cereal of the same name. The cereal is popular in America, as is any corn based cereal that mixes chocolate and sugar.)
One letter variable names are considered bad, unless you need a loop counter (i), a spatial coordinate ( x,y,z) …
… or you’re writing a lambda expression:
var query = employees.Where(e => e.Name == "Tom");
This isn’t just C# – other languages also encourage short variable names in nested function definitions. The short names have been bothering me recently, particularly when the chained operators begin to work on different types.
var query = employees.GroupBy(e => e.Name)
.Where(g => g.Count() > 2)
.Select(g => g.Key);
LINQ joins are particularly unreadable. There is a piece of code I wrote in a presenter class many months ago, and every time I see the code I wince.
var controlsToEnable = results.SelectMany(r => r.RequiredFields) .Join(mapper.Entries, ps => ps.PropertyName, e => e.Left.PropertyName, (ps, e) => e.Right) .Select(ps => ps.GetValue<IAnswerControl>(_view));
I’ve been thinking that one way to improve the readability is to use a custom Join operator. The types used in the query above are PropertySpecifier<T> and MapEntry<TLeft, TRight>. The idea is to join property specifiers against the left hand side of a mapping entry to retrieve the right hand side of the mapping entry (which is another property specifier). The extension method has a lot of generics noise.
public staticIEnumerable<PropertySpecifier<TRight>> Join<TLeft, TRight>(
thisIEnumerable<PropertySpecifier<TLeft>> specifiers,
IEnumerable<MapEntry<TLeft, TRight>> entries)
{
return specifiers.Join(entries,
propertySpecifer => propertySpecifer.PropertyName,
mapEntry => mapEntry.Left.PropertyName,
(propertySpecifier, mapEntry) => mapEntry.Right);
}
At least the custom Join cleans up the presenter class query, which I also think looks better with more descriptive variable names.
var controlsToEnable = results.SelectMany(result => result.RequiredFields) .Join(mapper.Entries) .Select(property => property.GetValue<IAnswerControl>(_view));
For the time being I'm going to avoid one letter variable names in all but the simplest lambda expressions.
Once you know about the magic of Expression<T>, it’s hard not to make use of it.
Or perhaps, abuse it.
Here is an excerpt of a class I wrote that uses Expression<T> as a reflection helper.
public class PropertySpecifier<T> { public PropertySpecifier( Expression<Func<T, object>> expression) { _expression = (expression.Body) as MemberExpression; if (_expression == null) { // raise an error } } public string PropertyName { get { return _expression.Member.Name; } } // ... MemberExpression _expression; }
I use the class to attach metadata to my flowchart shapes about which pieces of data they require via a RequiresField extension method.
.WithShape("FibrinolyticTherapy")
.RequiresField(pd => pd.FibrinolyticAdministration)
The property specifier allows me to find, by name, the property the expression is touching, and also build dynamic getters and setters for the property. For my flowcharts,once each shape is associated with its required data, I can query the flowchart evaluation results to find out what pieces of information the flowchart requires the user needs to enter. This information can be used to selectively enable UI controls.
var controlsToEnable = results.SelectMany(r => r.RequiredFields) .Join(mapper.Entries, ps => ps.PropertyName, e => e.Left.PropertyName, (ps, e) => e.Right) .Select(ps => ps.GetValue<IAnswerControl>(_view));
Now, that particular piece of code is something I’m not too happy about, but more on that in a later post. This code is in a presenter class and joins the PropertySpecifier objects the flowchart requires with PropertySpecifer objects that reference a view class. The query essentially turns Expression<T> metadata into cold, hard Control references with no magic strings and strong typing all the way down. It’s also easy to write unit tests using some reflection to ensure all properties are mapped, and mapped correctly.
All that is needed is a property map to store the associations between model data and view controls. This starts with a PropertyMapEntry class.
public class PropertyMapEntry<TLeft, TRight> { public PropertySpecifier<TLeft> Left; public PropertySpecifier<TRight> Right; }
And a PropertyMapper.
public class PropertyMapper<TLeft, TRight> { public PropertyMapper() { Entries = new List<PropertyMapEntry<TLeft, TRight>>(); } public List<PropertyMapEntry<TLeft, TRight>> Entries { get; set; } }
And a fluent-ish API to build the map.
static PropertyMapper<PnemoniaCaseData, IPnemoniaWorksheet> mapper = new PropertyMapper<PnemoniaCaseData, IPnemoniaWorksheet>() .Property(cd => cd.ChestXRay.Length).MapsTo(v => v.ChestXRay) .Property(cd => cd.ClinicalTrial).MapsTo(v => v.ClinicalTrial) // ... ;
I’m pretty happy with how it worked out – at least it looks like this will produce v 1.0 of shipping product. Yet I still wonder if I’m using Expression<T> for evil or for good. What do you think?
In the last post we talked about needing some Expression<T> background. There is a lot of good information out there about Expression<T>, but if you haven’t heard – this class is pure magic. If you want a long version of the story, see “LINQ and C# 3.0”. For a short version of the story, read on.
.NET compilers like the C# and VB compilers are really good at converting code into an intermediate language that the CLR’s JIT compiler will transform into native code for the CPU to execute. So if you write the following …
Func<int, int> square = x => x * x; var result = square(3); // yields 9
… and you open the assembly with a tool like Reflector, you’ll find the IL instructions created by the compiler.
ldarg.0
ldarg.1
mul
stloc.0
That essentially says – load up two arguments, multiply them, then store the result. These instructions will be part of an anonymous method (as lambda expressions are just a shorter syntax for writing an anonymous method), and you can invoke the method by applying the () operator to the square variable to compute some result. IL is the perfect representation for code that ultimately needs to execute, but it’s not a great representation of the developer’s orginal intent. As an example, consider the following.
var query = ctx.Animals .Where(animal => animal.Name == "Fido");
If ctx.Animals is an in-memory collection of objects, then compiling the code inside the Where method will generate efficient instructions to search for Fido. That’s good – but what if Fido lives in a database, behind a web service, or in some other remote location? Then a LINQ provider needs to translate the code inside the Where method into a web service call, a SQL query, or some other type of remote command. The LINQ provider will need to understand the programmer’s intent of the code inside the Where method, and IL is not designed to express this intent. We don’t want LINQ providers “decompiling” programs at runtime. Thus, we have Expression<T>.
Expression<Func<int, int>> square = x => x * x; var result = square.Compile()(3); // yields 9
Wrapping our function inside an Expression produces something we can’t invoke directly – we have to .Compile the expression before we can invoke it and capture a result. This is because the .NET compilers don’t produce IL when they come across an assignment to Expression<T> - instead they produce a data structure known as an abstract syntax tree (AST). ASTs aren’t the prettiest things to look at, but they are a better representation for the code if you need to figure out what the code is trying to do. The AST for the Fido search will tell us the code consists of a binary expression that tests for equality, and that the left hand side of the equality test is the animal’s name, and the right hand side is a string constant “Fido”. This is enough information for a remote LINQ provider to translate the expression into something like “WHERE Name = ‘Fido’”.
Expression<T> gives us the ability to treat a piece of code as data, which is a relatively old concept (hello, LISP!) and fairly powerful. It gives us the ability to walk through “code” at runtime and examine what it intends to do. This feature facilitates all of the remote LINQ query providers, like LINQ to SQL and the Entity Framework, because they can examine the AST and formulate commands that represent the original code in a different language (SQL). This translation is far from simple, but it would have been impossible if the compiler was generating IL instead of syntax trees.
There are additional scenarios that Expression<T> enables that have nothing to do with queries or databases, which is where Expression<T> gets exciting because you can use it in places you may not have expected, like in a business layer, or a mapping layer, or in the flowcharts I was working on.
flowChartShape.RequiresField(casedata => casedata.SmokingCounseling)
You can think of the RequiresField method call as something that can add metadata to a flowchart shape, and this metadata describes the property on some data object that the shape will use during execution. The metadata is strongly typed, intelli-sensed, and refactor friendly. We can use the metadata at runtime to determine what fields to enable in the UI, or what fields are missing that a user needs to address. We’ll dig into this more in a future post.
In a previous post, I talked about modeling flowcharts with C# code. The flowcharts are designed, documented, and standardized by a non-profit organization charged with measuring the quality of patient care inside of hospitals. They do so by looking at common cases that every hospital will see, like heart failure and pneumonia patients. The logic inside each flowchart can determine if the hospital followed the “best practices” for treating each type of case. Some of the logic becomes quite elaborate, particularly when evaluating the types of antibiotics a patient received, and the antibiotic timings, relative orderings, and routes of administration.
Sitting down with 100 of pages of flowchart logic was intimidating, and realizing that new versions came along every six months was enough to induce fear. I experimented with a few visual tools and rules engines but nothing was making the job easy and producing a maintainable solution.
From the beginning I also was thinking about a domain specific language. At one point I decided to sit down with pen and paper to write down what I saw in the flow charts and what I thought a DSL might look like:
a flowchart for pneumonia case data
has a shape named “Smoking History”
with an arrow that points to “Smoking Counseling” if the patient DOES smoke
and an arrow that points to the shape “…” if patient DOES NOT smoke
has a shape named “Smoking Counseling”
with an arrow that points to …
Not great, but it wasn’t too hard to read. It also made me realize that only a few simple abstractions were needed to make an executable flowchart. Flowcharts contain shapes, and shapes contain arrows, and each arrow points to another shape and has an associated rule to evaluate and determine if the arrow should be followed. I showed the code to a couple of these classes in a previous post.
The only hard part then would be correctly assembling shape, arrow, and rule objects into the proper hierarchy for evaluation. One option was to parse instructions like the text that I had put down on paper, but I wanted to try something simpler first.
“Object graph” is a computer scientist’s term for a collection of related objects. I needed to arrange shapes, arrows, and rules into an object graph. We deal with object graphs all the time in software. ASPX files describe an object graph that is assembled at runtime to emit HTML. An even better example is XAML - the XML dialect for WPF, Silverlight, and WF. XAML is the direct representation of a hierarchical graph of CLR objects. I considered what a flowchart might look like in XML.
<Flowchart Name="Pnemonia Measure 6" CaseDataType="PnemoniaCaseData"> <Shape Name="Smoking History"> <Arrow PointsTo="Smoking Cessation" Rule="?" /> </Shape> <Shape Name="Smoking Counseling"> <!-- ... --> </Shape> <!-- ... --> </Flowchart>
The one stickler was how to represent the rule for an arrow. There are mechanisms available to represent expressions and code in XML (XAML’s x:Code directive element and serialized CodeDom objects ala WF rule sets are two examples), but code in XML is always tedious, cumbersome, and worst of all - impervious to refactoring operations.
Not to mention, I think many developers today are feeling “XML fatigue” – me included. Five years of XML hyperbole followed by 5 years of software vendors putting everything and anything between < and > will do that.
The other option was to assemble the flowchart using a fluent interface in C# – a.k.a an internal DSL. Chad Myers recently posted some excellent notes on internal DSLs, which he defines as “…bending your primary language of choice to create a special syntax that’s easier for the consumers of your API to use to accomplish some otherwise complicated task”. Chad also has notes on the method chaining pattern, which is the approach I took.
First, I needed an extension method to add a Shape to a Flowchart…
public static Flowchart<T, R> WithShape<T, R>(
this Flowchart<T, R> chart, string shapeName) { Shape<T, R> shape = new Shape<T, R> { Name = shapeName }; chart.Shapes.Add(shape); return chart; }
…and another extension method to add an Arrow to a Shape. The trick was realizing that the arrow would always apply to the last shape added to the flowchart.
public static Flowchart<T, R> WithArrowPointingTo<T, R>(
this Flowchart<T, R> chart, string pointsTo) { Arrow<T> arrow = new Arrow<T>(); arrow.PointsTo = pointsTo; chart.LastShape().Arrows.Add(arrow); return chart; }
Similarly, a Rule always goes to the last Arrow in the last Shape:
public static Flowchart<T, R> AndTheRule<T, R>(
this Flowchart<T, R> chart, Func<T, bool> rule) { chart.LastShape().LastArrow().Rule = rule; return chart; }
Using the API looks like:
new Flowchart<PnemoniaCaseData, MeasureResult>() // ... .WithShape("AdultSmokingHistory") .RequiresField(pd => pd.SmokingHistory) .WithArrowPointingTo("Rejected") .AndTheRule(pd => pd.SmokingHistory.IsMissing) .WithArrowPointingTo("Excluded") .AndTheRule(pd => pd.SmokingHistory == YesNoAnswer.No) .WithArrowPointingTo("AdultSmokingCounseling") .AndTheRule(pd => pd.SmokingHistory == YesNoAnswer.Yes) .WithShape("AdultSmokingCounseling") .RequiresField(pd => pd.SmokingCounseling) .WithArrowPointingTo("Rejected") .AndTheRule(pd => pd.SmokingCounseling.IsMissing) .WithArrowPointingTo("InDenominator") .AndTheRule(pd => pd.SmokingCounseling == YesNoAnswer.No) .WithArrowPointingTo("InNumerator") .AndTheRule(pd => pd.SmokingCounseling == YesNoAnswer.Yes) .WithShape("Rejected")
.YieldingResult(MeasureResult.Rejected) .WithShape("InDenominator")
.YieldingResult(MeasureResult.InMeasurePopulation) .WithShape("InNumerator")
.YieldingResult(MeasureResult.InNumeratorPopulation);
Overall I was happy with how quickly we were able to pound out flowcharts using the fluent API. The code is relatively easy to compare to the original flowcharts in the specifications (although with the larger flowcharts, nothing is too easy). It’s also easy to test, diff-able after check-ins, and still works after refactoring the underlying data model. The real measure of success though, will be how well the code stands up to the changes and revisions over time. The verdict is still out.
In a future post I’d like to tell you about the .RequiresField method, as it drove a number of key scenarios in the solution. First, we’ll have to talk about the magic of Expression<T>…
Tim Mallalieu, PM of LINQ to SQL and LINQ to Entities, recently announced:
“…as of .NET 4.0 the Entity Framework will be our recommended data access solution for LINQ to relational scenarios.”
Tim later tried to clarify the announcement in a carefully worded post:
“We will continue make some investments in LINQ to SQL based on customer feedback.”
Although Microsoft may not remove LINQ to SQL from the framework class library just yet, they do appear to be treating LINQ to SQL as an experiment that escaped from the lab – something to minimize until it’s erased from our memories. Many consider the decision inevitable, as the ADO.NET Entity Framework is considered the crowning achievement in Microsoft’s long quest to build a deep abstraction over relational databases.
On the surface the reasoning is sound. The technologies are both solutions in the same problem space, and have overlapping features. Both frameworks use metadata to map relational data to objects, and both implement change tracking, identity maps, and a unit of work class. Each framework also has distinguishing features, however. For instance, LINQ to SQL supports POCOs and implicit lazy loading, while the Entity Framework delivers support for multiple database providers and sophisticated mapping strategies.
The belief is that merging the distinguishing features of LINQ to SQL into the Entity Framework will produce a piece of software that makes everyone happy – but I’m not so sure.
On the surface, conservative governments and liberal governments are both solutions in the same problem space - governance. Both will implement a chief of staff and a head of state, but the implementation details don’t matter. It’s the philosophical differences between conservatives and liberals that determines the quality of life for the individual citizens of the government … and that’s as far as this analogy will go.
LINQ to SQL and the Entity Framework promote fundamentally different philosophies.
LINQ to SQL is geared for thinking about objects. There is an “I’ll make this easy for you” feeling you get from the technology with its flexibility (such as mapping with XML or attributes) , and escape hatches (such as ExecuteQuery<T>). The simplicity surrounding LINQ to SQL makes many people think of the technology as a toy, or only suitable for RAD prototyping. The truth is that LINQ to SQL just wants to give you objects and let you go about your business.
The Entity Framework is geared for thinking about data.The crown jewel of the framework is the theoretically grounded Entity Data Model. The Entity Framework’s intent is to promote the EDM as the canonical, consumable data model both inside an application (via technologies like LINQ to Entities, Entity SQL, and ASP.NET Dynamic Data) and outside an application (via technologies like ADO.NET Data Services). The Entity Framework wants the data model to be pervasive.
Some developers like objects, and some developers like data. I think it’s impossible for the Entity Framework to make both types of developers happy 100% of the time. Personally, I prefer the design of LINQ to SQL despite some obvious deficiencies. The Entity Framework wanted to distinguish itself from similar frameworks produced by OSS projects and third party companies, but in doing so I think they missed the key reasons that developers turn to these technologies in the first place. Version 2.0 promises to deliver some of these features, including the ability to do model first development instead of schema first development, but I’m still not convinced that the Entity Framework can ever refactor itself to deliver the simplicity, usability, and philosophy of LINQ to SQL.
There are many suggestions floating around on what to do with LINQ to SQL. David Hayden and Steve Streeting both suggest an open source approach, while Ian Cooper suggests a successor to both EF and L2S.
Here is another idea…
It occurred to me recently that I’ve spent an inordinate amount of time over the last 10 years mapping data. Perhaps this is why I am so suspicious of anyone promoting a canonical data model. Between 1998 and 2001 I was writing commercial software for the mortgage banking industry, and our biggest source of pain was getting mortgage information into our software. We were always on the lookout for a way to map data from spreadsheets, CSV files, XML files, and all the other quirky formats banks had to offer. From 2001 to present I’ve been writing commercial software for the healthcare industry. Again, there is mapping required to process data in the hospital’s format, the government’s format, the insurance company’s format, the third party vendor’s format, and 100 permutations of the canonical (there is that word again) “industry standard” formats.
The mapping never stops at the system boundaries, either. In the last week I’ve mapped domain objects to DTOs, DTOs to UI controls, and UI control status to the UI control status status display. Sprinkle in a little OLAP, a little XML, and some JSON web services with their own custom mappings and I’ve started to think that all software is controlled by a series of carefully constructed hash tables.
The internals of LINQ to SQL could, I think, form the nice foundation for a generic data mapping framework that would be complementary to ORM type frameworks like the Entity Framework. The framework could take out some of the grunt work of object to object mapping, and perhaps through a provider type architecture offer additional mapping capabilities to and from various formats, like XHTML, JSON, CSV. Transforming, massaging, and pushing around data is a fact of life in most large applications and a framework that makes this job easier to implement and test in a uniform manner would be a boon for productivity.
In the end, I hope LINQ to SQL garners enough support to remain viable and evolve, but at this time it looks as if LINQ to SQL is facing an Order 66.
A few months ago I worked on a system that was based on a set specifications that included some gnarly flowcharts (see pages 7 – 17 for an example). The good news was that the specs were concrete and readily available. The bad news was that the specs change every 6 months.
I explored a number of options for modeling the logic in the flowcharts, including WF, WF rules, and code generation from XML, but ultimately decided on staying as close as possible to the flowcharts with C# code. Maintainability and testability were the keys to survival. The end result of building a flowchart in code looks like the following:
var flowChart = new Flowchart<PnemoniaCaseData, MeasureResult>() // ... .WithShape("TransferFromAnotherED") .RequiresField(pd => pd.TransferFromAnotherED) .WithArrowPointingTo("Rejected") .AndTheRule(pd => pd.TransferFromAnotherED.IsMissing) .WithArrowPointingTo("Excluded") .AndTheRule(pd => pd.TransferFromAnotherED == YesNoAnswer.Yes) .WithArrowPointingTo("PointOfOriginForAdmissionOrVisit") .AndTheRule(pd => pd.TransferFromAnotherED == YesNoAnswer.No) .WithShape("PointOfOriginForAdmissionOrVisit") .RequiresField(pd => pd.PointOfOriginForAdmissionOrVisit) // ... lots more of the same .WithShape("Rejected") .YieldingResult(MeasureResult.Rejected) .WithShape("Excluded") .YieldingResult(MeasureResult.Excluded) .WithShape("InDenominator") .YieldingResult(MeasureResult.InMeasurePopulation) .WithShape("InNumerator") .YieldingResult(MeasureResult.InNumeratorPopulation);
The flowcharts, particularly the lengthier ones, were still tedious to build, but overall I think this approach gave us something that we could use to crank out the 30 odd flowcharts in a maintainable, testable, and arguably more readable fashion than some of the other methods I considered. All the string literals tend to make me nervous, but they mostly match a corresponding shape in the specification, making an eyeball comparison easier. Typographical errors in the strings are easily negated with tests that use LINQ queries. For example, there should never be two shapes in the same flowchart with the same name:
var duplicateShapes = Shapes.GroupBy(s => s.Name).Where(g => g.Count() > 1);
And there should never be an arrow pointing to a shape name that doesn’t exist:
var names = Shapes.Select(s => s.Name); var problemTransitions = Shapes.SelectMany(s => s.Arrows) .Where(a => !names.Contains(a.PointsTo));
The classes used to model the flowchart were relatively simple. For example, here is the Shape class that takes generic parameters to define the type of data it operates on, and the type of the result it can yield.
public class Shape<T,R> { public Shape() { Arrows = new List<Arrow<T>>(); Result = default(R); } public R Result { get; set; } public string Name { get; set; } public PropertySpecifier<T> RequiredField { get; set; } public List<Arrow<T>> Arrows { get; set; } }
Every shape contains zero or more arrows:
public class Arrow<T> { public string PointsTo { get; set; } public Func<T, bool> Rule { get; set; } public Action<T> Action { get; set; } }
I learned a lot about the capabilities of LINQ, internal DSLs, and functional programming to some degree. If there is some interest, I thought it might be interesting to dig into some of these details in future posts…