Tim Mallalieu, PM of LINQ to SQL and LINQ to Entities, recently announced:
“…as of .NET 4.0 the Entity Framework will be our recommended data access solution for LINQ to relational scenarios.”
Tim later tried to clarify the announcement in a carefully worded post:
“We will continue make some investments in LINQ to SQL based on customer feedback.”
Although Microsoft may not remove LINQ to SQL from the framework class library just yet, they do appear to be treating LINQ to SQL as an experiment that escaped from the lab – something to minimize until it’s erased from our memories. Many consider the decision inevitable, as the ADO.NET Entity Framework is considered the crowning achievement in Microsoft’s long quest to build a deep abstraction over relational databases.
On the surface the reasoning is sound. The technologies are both solutions in the same problem space, and have overlapping features. Both frameworks use metadata to map relational data to objects, and both implement change tracking, identity maps, and a unit of work class. Each framework also has distinguishing features, however. For instance, LINQ to SQL supports POCOs and implicit lazy loading, while the Entity Framework delivers support for multiple database providers and sophisticated mapping strategies.
The belief is that merging the distinguishing features of LINQ to SQL into the Entity Framework will produce a piece of software that makes everyone happy – but I’m not so sure.
On the surface, conservative governments and liberal governments are both solutions in the same problem space - governance. Both will implement a chief of staff and a head of state, but the implementation details don’t matter. It’s the philosophical differences between conservatives and liberals that determines the quality of life for the individual citizens of the government … and that’s as far as this analogy will go.
LINQ to SQL and the Entity Framework promote fundamentally different philosophies.
LINQ to SQL is geared for thinking about objects. There is an “I’ll make this easy for you” feeling you get from the technology with its flexibility (such as mapping with XML or attributes) , and escape hatches (such as ExecuteQuery<T>). The simplicity surrounding LINQ to SQL makes many people think of the technology as a toy, or only suitable for RAD prototyping. The truth is that LINQ to SQL just wants to give you objects and let you go about your business.
The Entity Framework is geared for thinking about data.The crown jewel of the framework is the theoretically grounded Entity Data Model. The Entity Framework’s intent is to promote the EDM as the canonical, consumable data model both inside an application (via technologies like LINQ to Entities, Entity SQL, and ASP.NET Dynamic Data) and outside an application (via technologies like ADO.NET Data Services). The Entity Framework wants the data model to be pervasive.
Some developers like objects, and some developers like data. I think it’s impossible for the Entity Framework to make both types of developers happy 100% of the time. Personally, I prefer the design of LINQ to SQL despite some obvious deficiencies. The Entity Framework wanted to distinguish itself from similar frameworks produced by OSS projects and third party companies, but in doing so I think they missed the key reasons that developers turn to these technologies in the first place. Version 2.0 promises to deliver some of these features, including the ability to do model first development instead of schema first development, but I’m still not convinced that the Entity Framework can ever refactor itself to deliver the simplicity, usability, and philosophy of LINQ to SQL.
There are many suggestions floating around on what to do with LINQ to SQL. David Hayden and Steve Streeting both suggest an open source approach, while Ian Cooper suggests a successor to both EF and L2S.
Here is another idea…
It occurred to me recently that I’ve spent an inordinate amount of time over the last 10 years mapping data. Perhaps this is why I am so suspicious of anyone promoting a canonical data model. Between 1998 and 2001 I was writing commercial software for the mortgage banking industry, and our biggest source of pain was getting mortgage information into our software. We were always on the lookout for a way to map data from spreadsheets, CSV files, XML files, and all the other quirky formats banks had to offer. From 2001 to present I’ve been writing commercial software for the healthcare industry. Again, there is mapping required to process data in the hospital’s format, the government’s format, the insurance company’s format, the third party vendor’s format, and 100 permutations of the canonical (there is that word again) “industry standard” formats.
The mapping never stops at the system boundaries, either. In the last week I’ve mapped domain objects to DTOs, DTOs to UI controls, and UI control status to the UI control status status display. Sprinkle in a little OLAP, a little XML, and some JSON web services with their own custom mappings and I’ve started to think that all software is controlled by a series of carefully constructed hash tables.
The internals of LINQ to SQL could, I think, form the nice foundation for a generic data mapping framework that would be complementary to ORM type frameworks like the Entity Framework. The framework could take out some of the grunt work of object to object mapping, and perhaps through a provider type architecture offer additional mapping capabilities to and from various formats, like XHTML, JSON, CSV. Transforming, massaging, and pushing around data is a fact of life in most large applications and a framework that makes this job easier to implement and test in a uniform manner would be a boon for productivity.
In the end, I hope LINQ to SQL garners enough support to remain viable and evolve, but at this time it looks as if LINQ to SQL is facing an Order 66.
A few months ago I worked on a system that was based on a set specifications that included some gnarly flowcharts (see pages 7 – 17 for an example). The good news was that the specs were concrete and readily available. The bad news was that the specs change every 6 months.
I explored a number of options for modeling the logic in the flowcharts, including WF, WF rules, and code generation from XML, but ultimately decided on staying as close as possible to the flowcharts with C# code. Maintainability and testability were the keys to survival. The end result of building a flowchart in code looks like the following:
var flowChart = new Flowchart<PnemoniaCaseData, MeasureResult>() // ... .WithShape("TransferFromAnotherED") .RequiresField(pd => pd.TransferFromAnotherED) .WithArrowPointingTo("Rejected") .AndTheRule(pd => pd.TransferFromAnotherED.IsMissing) .WithArrowPointingTo("Excluded") .AndTheRule(pd => pd.TransferFromAnotherED == YesNoAnswer.Yes) .WithArrowPointingTo("PointOfOriginForAdmissionOrVisit") .AndTheRule(pd => pd.TransferFromAnotherED == YesNoAnswer.No) .WithShape("PointOfOriginForAdmissionOrVisit") .RequiresField(pd => pd.PointOfOriginForAdmissionOrVisit) // ... lots more of the same .WithShape("Rejected") .YieldingResult(MeasureResult.Rejected) .WithShape("Excluded") .YieldingResult(MeasureResult.Excluded) .WithShape("InDenominator") .YieldingResult(MeasureResult.InMeasurePopulation) .WithShape("InNumerator") .YieldingResult(MeasureResult.InNumeratorPopulation);
The flowcharts, particularly the lengthier ones, were still tedious to build, but overall I think this approach gave us something that we could use to crank out the 30 odd flowcharts in a maintainable, testable, and arguably more readable fashion than some of the other methods I considered. All the string literals tend to make me nervous, but they mostly match a corresponding shape in the specification, making an eyeball comparison easier. Typographical errors in the strings are easily negated with tests that use LINQ queries. For example, there should never be two shapes in the same flowchart with the same name:
var duplicateShapes = Shapes.GroupBy(s => s.Name).Where(g => g.Count() > 1);
And there should never be an arrow pointing to a shape name that doesn’t exist:
var names = Shapes.Select(s => s.Name); var problemTransitions = Shapes.SelectMany(s => s.Arrows) .Where(a => !names.Contains(a.PointsTo));
The classes used to model the flowchart were relatively simple. For example, here is the Shape class that takes generic parameters to define the type of data it operates on, and the type of the result it can yield.
public class Shape<T,R> { public Shape() { Arrows = new List<Arrow<T>>(); Result = default(R); } public R Result { get; set; } public string Name { get; set; } public PropertySpecifier<T> RequiredField { get; set; } public List<Arrow<T>> Arrows { get; set; } }
Every shape contains zero or more arrows:
public class Arrow<T> { public string PointsTo { get; set; } public Func<T, bool> Rule { get; set; } public Action<T> Action { get; set; } }
I learned a lot about the capabilities of LINQ, internal DSLs, and functional programming to some degree. If there is some interest, I thought it might be interesting to dig into some of these details in future posts…
Someone asked me why LINQ operators return an IEnumerable<T> instead of something more useful, like a List<T>. In other words, in the following code:
List<Book> books = new List<Book>(); // ... IEnumerable<Book> filteredBooks = books.Where(book => book.Title.StartsWith("R"));
... we started with a List<Book>, so why isn’t the Where operator smart enough to return a new List<Book>, or modify the existing list by removing books that don’t match the Where condition?
Let’s talk about modifying the original list.
I hope you’ll agree that it would be odd for a query to modify a data source. Imagine sending a SELECT statement to a database and finding out later your SELECT removed all but one record from a table. Although the Where operator is just a method call that could change the underlying list of books, it’s better to return something new and leave the original list intact. You won’t find any LINQ operators that modify input , and this behavior produces many benefits. One obvious benefit is that you can think of the above code as a query, and it won’t surprise you by removing books from your original list.
What about creating a new List<T>?
It turns out that creating a new list can be quite expensive, but only because we often don’t need a List<T> returned. Think about the number of lists created in the following query (if each operator created a new list).
var filteredBooks = books.Where(book => book.Title.StartsWith("R")) .OrderBy(book => book.Published.Year) .Select(book => book.Title);
Here we would have three lists created (one each by the Where, OrderBy, and Select operators). We’d only needed a single list of titles as the result, but since the operators cannot modify their underlying data source (for the reasons outlined above), they would be forced to each create a new list, and most of that work would be wasted as two of the lists are immediately discarded.
Imagine if you only needed to count the number of books whose title starts with the letter R – in that case you wouldn’t want any of these lists being created and destroyed when you computed the result. It would all be wasted work.
One of the principles of LINQ is to be lazy. A LINQ query won’t do any work unless you force the query to do the work. Even when a query does perform work – it does the least amount of work possible. If you wanted a List<Book> as a result, you’d have to force LINQ to create the list:
List<Book> filteredBooks = books.Where(book => book.Title.StartsWith("R")) .ToList();
Instead of lists, LINQ works with a beautifully pure abstraction called IEnumerable<T>. It’s defined like so:
public interface IEnumerable<T> : IEnumerable { IEnumerator<T> GetEnumerator(); }
The only thing you can do with an IEnumerable<T> is ask for an enumerator. An enumerator is something that knows how to visit each item in a collection. Some languages call these enumerator things “iterators”, because they iterate over a collection of objects, returning each object and moving to the next.
The in-memory LINQ operators, like Where, OrderBy, and Select, all work on inputs that implement IEnumerable<T>. That means the operators work on anything that can be enumerated over in a one-by-one fashion. The beauty is that there are so many data sources that these operators can work on, because IEnumerable<T> has such simple demands. Arrays are enumerable, lists are enumerable, dictionaries, trees, stacks, queues, files in a directory, elements in an XML document - all enumerable. Even a simple string is enumerable, since it is composed from a sequence of individual characters.
These same operators return IEnumerable<T>, because it’s the lowest common denominator for everything you’d ever need from a query. Plus, it’s lazy. You want to count the results? The Count operator will get the enumerator and move through the results, one by one, to sum the total number of items. You want to make a concrete list from the results? The ToList operator will get the enumerator and move through the results, one by one, adding each item to a new list it creates. Do you want just the first item in the results? Then the enumerator does just a little bit of work to find that first item. In most cases it does not need to iterate the entire collection to find the first item. Enumerators are lazy, too.
The important point is that the enumerator itself doesn’t perform any useful work. It’s you, or the other LINQ operators, that use the enumerator to iterate through the result and produce something meaningful. In the odd case that you never need to look at the result - no enumeration work is performed at all. No lists area created. Pure laziness!
The beauty of IEnumerable<T> is that it only says “you can get something to enumerate this”. To return something that offers the possibility of enumeration is very little work. And no work is needed unless you actually count the results, create a list from the results, or bind the results to a control for display.
The interface IEnumerable<T> is so wonderfully lazy it inspired me to write a short, short story. If you came to read about LINQ, skip the story as the words are entirely uninteresting and mostly devoid of meaning.
The scientist approached the big cat with a notepad and a pencil in her hands. She was worried, of course. The cat was a predator, and likely to be hungry at this hour of the day. “I need to know”, she asked the cat, “do you keep a list of things to do each day?”
The cat stirred. He was a snow leopard with dark rosettes blotted onto his thick, cream colored fur. The big cat’s eyes were only half open, but he turned and focused them on her.
“I don’t keep a list, dear lady”, he said, followed by a rumbling yawn. “I keep an enumeration”.
“An enumeration?”, she asked.
“Yes, an enumeration”, he replied. “Lists are like the gold bracelet on your wrist, dear lady. Very tangible – very concrete things, lists are. Keeping a list of everything I might want to do is a burden and chore. I’d need to carry your paper, and your pencil”, he said, with his eyes focusing on her hands.
The scientist’s pencil raced across her notebook as she transcribed every word the leopard spoke. She glanced at him as he began to stare, and instinctively pulled herself a little further away.
The leopard continued. “With a list I’d have to add things, and remove things, and constantly reorder the things I want to do. Too much work”, he said, shaking his head. “Do you know what I can do with an enumeration?”, he asked.
She paused at the leopard’s question, and pushed her hair back - she wore glasses when she worked. After some thought, she asked, “Enumerate it?”
“Yes, dear lady”, said the leopard. “I can enumerate it. I enumerate the possibilities one by one, and find the perfect fit for this moment in my life. If I’m thirsty, I’ll find water. If I’m sleepy, I’ll find a place to sleep.” He tilted his head slightly to the right. “If I’m hungry, I’ll find food”, he said.
She finished writing the leopard’s last words and glanced up. Was that a tooth showing? Was he hungry now?
The cat started speaking again.
“One of the wonderful things about enumerations is they theoretically last forever. Lists have a beginning and an end – an Omega for every Alpha. With enumerations, you can keep asking for the next thing, over and over and over again. I ask for them when I’m ready to do something. If I’m tired of doing, they’ll still be there tomorrow. You might say it’s unpredictable behavior, I say I’m just being lazy. Either way, I can’t help it, it’s in my genes”. His soft voice trailed off with a tired tone.
“You intended to live forever?”, she asked. The leopard snarled. Or smiled. She couldn’t quite tell.
“No, dear lady”, he said. “I said the enumeration theoretically lasts forever. One day I’m sure my enumeration will run out of things to give me, or maybe I’ll just be too tired to ask for the next thing, so I’ll sleep forever. I don’t know how it ends. Maybe I should ask you.”
She looked at him again. She felt uneasy now, being here with a leopard. He seemed nice enough, as leopards go, and he certainly gave her interesting topics for research, but he was still a leopard. A carnivore. He was not a beast to be trifled with. She could never let her guard down again.
“I don’t know how it ends either”, she said, and closed her notepad. She tucked her pencil behind her ear, backed away from the cage, and left the leopard alone with his enumeration.
Mike had to model answers. Yes or no answers, date and time answers - all sorts of answers. One catch was that any answer could be “missing” or could be “empty”. Both values had distinct meanings in the domain. An interface definition fell out of the early iterative design work:
public interface IAnswer { bool IsMissing { get; } bool IsEmpty { get; } }
Mike was prepared to implement a DateTimeAnswer class, but first a test:
[TestMethod] public void Can_Represent_Empty_DateTimeAnswer() { DateTimeAnswer emptyAnswer = new DateTimeAnswer(); Assert.IsTrue(emptyAnswer.IsEmpty); }
After a little work, Mike had a class that could pass the test:
public class DateTimeAnswer : IAnswer { public bool IsEmpty { get { return Value == _emptyAnswer; } } public bool IsMissing { get { return false; } // todo } public DateTime Value { get; set; } static DateTime _emptyAnswer = DateTime.MinValue; static DateTime _missingAnswer = DateTime.MaxValue; }
After sitting back and looking at the code, Mike realized there were a couple facets of the class he didn’t like:
Mike returned to his test project, and changed his first test to agree with his idea of how the class should work. Mike figured adding a couple well known DateTimeAnswer objects (named Empty and Missing) would get rid of the magic DateTime values in client code.
[TestMethod] public void Can_Represent_Empty_DateTimeAnswer() { DateTimeAnswer emptyAnswer = DateTimeAnswer.Empty; Assert.IsTrue(emptyAnswer.IsEmpty); }
Feeling pretty confident, Mike returned to his DateTimeAnswer class and added a constructor, changed the Value property to use a protected setter, implemented IsMissing, and published the two well known DateTimeAnswer objects based on his previous code:
public class DateTimeAnswer : IAnswer { public DateTimeAnswer (DateTime value) { Value = value; } public bool IsEmpty { get { return Value == _emptyAnswer; } } public bool IsMissing { get { return Value == _missingAnswer; } } public DateTime Value { get; protected set; } public static DateTimeAnswer Empty = new DateTimeAnswer(_emptyAnswer); public static DateTimeAnswer Missing = new DateTimeAnswer(_missingAnswer); static DateTime _emptyAnswer = DateTime.MinValue; static DateTime _missingAnswer = DateTime.MaxValue; }
Mike’s test passed. Mike was so confident about his class he never wrote a test for IsMissing. It was just too easy – what could possible go wrong? Imagine his surprise when someone else wrote the following test, and it failed!
[TestMethod] public void Can_Represent_Missing_DateTimeAnswer() { DateTimeAnswer missingAnswer = DateTimeAnswer.Missing; Assert.IsTrue(missingAnswer.IsMissing); }
What went wrong?
Over a month ago I did a presentation on LINQ and promised a few people I’d share the code from the session. Better late than never, eh?
We warmed up by building our own filtering operator to use in a query. The operator takes an Expression<Predicate<T>>, which we need to compile before we invoking the predicate inside.
public static class MyExtensions { public static IEnumerable<T> Where<T>( this IEnumerable<T> sequence, Expression<Predicate<T>> filter) { foreach (T item in sequence) { if (filter.Compile()(item)) { yield return item; } } } }
The following query uses our custom Where operator:
IEnumerable<Employee> employees = new List<Employee>() { new Employee() { ID= 1, Name="Scott" }, new Employee() { ID =2, Name="Paul" } }; Employee scott = employees.Where(e => e.Name == "Scott").First();
Of course, if we are just going to compile and invoke the expression there is little advantage to using an Expression<T>, but it generally turns into an “a-ha!” moment when you show someone the difference between an Expression<Predicate<T>> and a plain Predicate<T>. Try it yourself in a debugger.
We also wrote a LINQ version of “Hello, World!” that reads text files from a temp directory (a.txt would contain “Hello,”, while b.txt would contain “World!”. A good demonstration of map-filter-reduce with C# 3.0.
var message = Directory.GetFiles(@"c:\temp\") .Where(fname => fname.EndsWith(".txt")) .Select(fname => File.ReadAllText(fname)) .Aggregate( new StringBuilder(), (sb, s) => sb.Append(s).Append(" "), sb => sb.ToString() ); Console.WriteLine(message);
Moving into NDepend territory, we also wrote a query to find the namespaces with the most types (for referenced assemblies only):
var groups = Assembly.GetExecutingAssembly() .GetReferencedAssemblies() .Select(aname => Assembly.Load(aname)) .SelectMany(asm => asm.GetExportedTypes()) .GroupBy(t => t.Namespace) .OrderByDescending(g => g.Count()) .Take(10); foreach (var group in groups) { Console.WriteLine("{0} {1}", group.Key, group.Count()); foreach (var type in group) { Console.WriteLine("\t" + type.Name); } }
And finally, some LINQ to XML code that creates an XML document out of all the executing processes on the machine:
XNamespace ns = "https://odetocode.com/schemas/linqdemo"; XNamespace ext = "https://odetocode.com/schemas/extensions"; XDocument doc = new XDocument( new XElement(ns + "Processes", new XAttribute(XNamespace.Xmlns + "ext", ext), from p in Process.GetProcesses() select new XElement(ns + "Process", new XAttribute("Name", p.ProcessName), new XAttribute(ext + "PID", p.Id))));
Followed by a query for the processes ID of any mspaint instances:
var query = (from e in doc.Descendants(ns + "Process") where (string)e.Attribute("Name") == "mspaint" select (string)e.Attribute(ext + "PID"));
More on LINQ to come…