Units of Work

Friday, December 12, 2008

The Unit of Work (let’s call it UoW) is another common design pattern found in persistence frameworks, but it’s new to many .NET developers who are just starting to use LINQ to SQL or the Entity Framework as the “new ADO.NET”. First, the obligatory P of EAA definition:

A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

Please think of a “business conversation” if the phrase “business transaction” makes you think of database transactions with BEGIN and COMMIT keywords. Although a UoW may use a real database transaction, the idea is to think at a slightly higher level. When your software needs to fulfill  a specific business need, like process an expense report, the unit of work will record all the changes you make to the objects involved in processing the expense report and save those changes to the database. The unit of work is like a black box recorder.

DataSets and the UoW

If you’ve been using ADO.NET, then it turns out you’ve already been using a UoW. The ADO.NET DataSet offers a good UoW implementation. You can take some records from the database and shove them into a DataSet, then make changes to the data during the course of a business conversation. When the conversation is complete you can use a table adapter thingy to save the entire batch of changes into the database. The DataSet and adapter combination gives you concurrency checks and the ability to undo changes. You can even take a DataSet and send it on an epic journey outside of your app domain, outside your process, or even outside your machine, and wait for the DataSet to return at some future point in time. During its journey, the DataSet will faithfully continue to record changes the adapter can save. 

DataSets have drawbacks, however, and I’m not going to resurrect that tired conversation. Suffice to say there is a reason that many developers have, for many years, been using persistence frameworks that offer a higher level of abstraction.

Persistence Frameworks and the UoW

The UoW implementation varies widely among the different persistence frameworks. LLBLGen has a serializable class explicitly named UnitOfWork that supports field level versioning, while NHibernate hides a UoW implementation behind an ISession interface. In LINQ to SQL the UoW centers around the DataContext, while in the Entity Framework the UoW centers around the ObjectContext (let’s just refer to both as the Context objects). I’m not going to compare and contrast the different implementations – I’m just pointing out that most frameworks provide a UoW, but the features can vary.

It is the L2S and EF implementations that the rest of this post will discuss because they often trip up the ADO.NET developers who make assumptions about the how the Context will behave. With ADO.NET DataSets, every query sent to the database will retrieve the latest data, but this isn’t always true with the Context objects.

[TestMethod]
public void Context_Is_Isolated_From_Outside_Changes()
{           
    using (var ctx = new MovieReviewsContext())
    {
        var movie = ctx.Movies.Where(m => m.ID == 100)
                              .First();
        // make a change
        movie.ReleaseDate = new DateTime(2007, 1, 1);

        SimulateOtherUsersWork();

        // decide we want to refresh from database
        movie = ctx.Movies.Where(m => m.ID == 100)
                               .First();

        // but we don’t see the new data!!!!!!!!!
        Assert.AreEqual(2007, movie.ReleaseDate.Year); 
    }
}

public void SimulateOtherUsersWork()
{
    using (var ctx = new MovieReviewsContext())
    {
        var movie = ctx.Movies.Where(m => m.ID == 100)
                              .First();
        movie.ReleaseDate = new DateTime(2008, 1, 1);
        ctx.SaveChanges();
    }
}

Remember that the context objects implement an identity map. When the second LINQ query goes to the database and fetches the Movie record with an ID of 100, the context will  consult its identity map and see it has already created an object to represent the Movie with an ID of 100. The context simply returns a reference to the existing object instead of creating a new Movie entity  – meaning the context effectively ignores any changes that have been made in the database.

Some developers will think this is a bug (you mean I can’t get fresh data?), but the context objects are designed to live for a single unit of work - a single business conversation. During this time they also isolate your entities from changes that were persisted in different business conversations. Imagine passing an ExpenseReport for 1000 paperclips through business validation logic and submitting changes – only to have the ExpenseReport contents morph into 1000 voodoo dolls immediately after validation because of some query that was executed implicitly. Practically speaking, many applications will never run into the problem of picking up unintended changes, but this isolated behavior is on by default and the consequences catch more than a few people off guard.

Why Is This Important?

Using LINQ to SQL’s DataContext or the Entity Framework’s ObjectContext for more than one business conversation is usually the wrong thing to do. These classes are designed to support a single unit of work because of the way they track changes and use an identity map. If you keep them around too long and do too much work, then the identity map will grow too big, and the data will become too stale.

For desktop applications it is common to start a unit of work when a dialog opens, and complete the unit of work when the dialog closes (context per view). On the server side it is common to start a unit of work when an HTTP request begins processing, and close the unit of work when the HTTP request ends (context per request).

Of course, not every business conversation can complete in a single request or a single dialog box. Managing “long winded” conversations with EF or L2S is tricky, unfortunately, and it’s part of the debate you’ll see in blogs these day. We’ll have to address this issue in a different post.


Comments
Matt Brooks Friday, December 12, 2008
Good post. With regard to your second query marked 'decide we want to refresh from database'. If this is _absolutely_ what you need to do then I think it would be good to see some discussion about DataContext.Refresh(RefreshMode, object). I've seen very little discussion about this method (and its overloads) and I find the documentation a little unclear.
scott Friday, December 12, 2008
@Matt - yes, I might make that a future post. Refresh and MergeOptions for a query.
Rick Strahl Friday, December 12, 2008
@Scott - Great post. UoW would be great for L2S and EF if they really followed the patterns.

But instead the DataContext at least bleeds across instances. You can't edit the same instance even if they are retrieved from separate data contexts which kind of defeats the whole purpose of UoW don't you think?
Chip H. Friday, December 12, 2008
Sounds like what you're saying is that the "I" in ACID has been extended to the business logic. So instead of your database transaction having isolation from other people's changes, your Context object is the one that protects you.

Which raises an interesting question - how does the EF deal with concurrency issues?
scott Friday, December 12, 2008
@Chip - Concurrency checks are off by default in EF, but you can add them in. Concurrency checks for entities that span mulitple units of work (the disconnected scenario) is a bit tricker - you have to keep the original values around somewhere. But, yeah - it's "acidy", perhaps not as "acidy" as an rdbms, though, so I didn't use that term explicitly.

@Rick - are you thinking of the detached/disconnected scenario. If so, yeah, it sucks eggs.
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!