The Power of Programming With Attributes

Nothing can compare to the Real Power of programming with attributes. Why, just one pair of square brackets and woosh – my object can be serialize to XML. Woosh – my object can persist to a database table. Woosh – there goes my object over the wire in a digitally signed SOAP payload. One day I expect to see a new item template in Visual Studio – the "Add New All Powerful Attributed Class" template: *

[Table]    
[
DataObject]
[
DataContract]    
[
Serializable]
[
TwoKitchenSinks]      
[
CLSCompliant(true)]        
[
DefaultProperty("Name")]
[
DefaultBindingProperty("Name")]
[
DebuggerStepThroughAttribute]
[
GuidAttribute("F0DD2CAA-2132-11DD-AC50-FE9355D89593")]
public class Person
{
    [
Column]        
    [
DataMember]        
    [
XmlAttribute]
    [
Browsable(true)]
    [
ReadOnly(false)]
    [
Category("Advanced")]
    [
Description("The person's name")]        
    
public string Name { get; set; }

    
// TODO: YOUR INSIGNIFIGANT BIZ LOGIC GOES HERE...
}

Which begs the question – could there ever be a way to separate attributes from the class definition?**

* Put down the flamethrower and step away - I'm kidding.

**This part was a serious question.

posted by scott with 13 Comments

Two LINQ to SQL Myths

LINQ to SQL requires you to start with a database schema.

Not true – you can start with code and create mappings later. In fact, you can write plain-old CLR object like this:

class Movie
{
    
public int ID { get; set; }
    
public string Title { get; set; }
    
public DateTime ReleaseDate { get; set; }
}

… and later either create a mapping file (full of XML like <Table> and <Column>), or decorate the class with mapping attributes (like [Table] and [Column]). You can even use the mapping to create a fresh database schema via the CreateDatabase method of the DataContext class.

LINQ to SQL requires your classes to implement INotifyPropertyChanged and use EntitySet<T> for any associated collections.

Not true, although foregoing either does come with a price. INotifyPropertyChanged allows LINQ to SQL to track changes on your objects. If you don't implement this interface LINQ to SQL can still discover changes for update scenarios, but will take snapshots of all objects, which isn't free. Likewise, EntitySet provides deferred loading and association management for one-to-one and one-to-many relationships between entities. You can build this yourself, but with EntitySet being built on top of IList<T>, you'll probably be recreating the same wheel. There is nothing about EntitySet<T> that ties the class to LINQ to SQL (other than living inside the System.Data.Linq namespace).

LINQ to SQL has limitations and it's a v1 product, but don't think of LINQ to SQL as strictly a drag and drop technology.

posted by scott with 3 Comments

Mr. President the Programmer

Daily Standup Transcription 06 May 2008 1300 Zulu
Time In 00:02:34.66

"… so, yesterday I continued the refactorafication of some classes. The job isn't easy, but I'm going to work hard and continue the collaborativity with my programming partner. Together, we will eliminate the evil of legacy code operating inside the code base.

I know it's been slow going, but we did misundestimerate the threat of static ... static … statictistical dependencies in the code.

Now, if you'll excuse me, I need to get back to work for the great customers of this company."

Time Out 00:02:54.29

posted by scott with 1 Comments

There Is Always Risk In Portability

roll the dice with LINQAfter my last post, someone asked me if the "portable" repository pattern was really a good idea. He was referring to the fact the LINQ queries in the MVC Storefront and Background Motion applications would sometimes execute against in-memory collections (for unit testing), while the rest of the time the queries would execute against a relational database. Isn't there a huge risk in developers not knowing if the software really works with the database?

I don't think of the repository as a "portability" layer, although since it is an abstraction layered on top of the data access code it can provide some nice indirections, like the ability to switch the persistence store. Is this risky? Sure, there is always some element of risk in portability. Just ask anyone who has written code with a portable UI toolkit, or in HTML for that matter. You don't know what is going to happen until the 1s and 0s hit the silicon.

But …

That's not the job for unit tests. Ideally, you'll have some other tests to verify what happens when the "production" code runs.

Before continuing, I must say that in the last post I neglected to tell you that the brainy Mindscape team and Andrew Peters are responsible for the Background Motion web site, and the code that powers the site. Make sure to visit the site and marvel at the beauty of New Zealand, then drop into the Mindscape blogs. Everyone - let's hear it for New Zealand!

What Is This Risk You Speak Of?

You can write a LINQ query that works fine against in-memory collections, but that can fail spectacularly when you swap in a remote LINQ provider. Here is an obvious example:

var result =
    from a in dc.Addresses
    where !String.IsNullOrEmpty(a.PostalCode)
    select a;

This query is happy to execute using LINQ to Objects, but it fails with an exception if LINQ to SQL is sitting behind the sequence (NotSupportedException: Method 'Boolean IsNullOrEmpty(System.String)' has no supported translation to SQL).

Those types of problems are easy to spot in automated integration testing because exceptions are relatively easy to track down. The real risk is in the queries that don't flame out in spectacular fashion, but execute successfully with slight variations. Here is one example:

var distinctPostalCodes =
       (
        
from a in addresses
        
orderby a.State ascending
         select
new { a.PostalCode, a.State }
       ).Distinct();

This query wants to get distinct list of zip codes and states for all our customers, and order the list by state. Works perfectly with LINQ to objects, and executes successfully in LINQ to SQL. Just one tiny problem you might observe in the generated SQL:

SELECT DISTINCT [t0].[PostalCode], [t0].[State] FROM [dbo].[Address] AS [t0]

Notice the distinct (pardon the pun) lack of an ORDER BY clause. If the upper layers were expecting the results sorted by State then we have problems.

It turns out that LINQ to SQL throws out an inner OrderBy operator when the Distinct operator comes into play. This could be for several reasons, but the most likely reason is DISTINCT and ORDER BY have an uneasy relationship in ANSI SQL (it's not just MS SQL). You can read more about this on Jeff Smith's blog: SELECT DISTINCT and ORDER BY, and there is another good explanation here: Some Common Mis-conceptions about DISTINCT.

One also has to wonder if Distinct might reorder the results in its quest to remove duplicates - it's not explicitly documented that it doesn't. In this case, it's better to forego the query comprehension syntax and make the pipeline of operators more explicit:

var distinctPostalCodes =
      addresses.Select(a =>
new { a.PostalCode, a.State })
               .Distinct()
               .OrderBy(a => a.State);

This forces LINQ to SQL to generate a safe query with the expected results.

Is there risk? Sure – and it's not just in LINQ to SQL. Any multi-target technology runs the same risk. You just need an awareness and safety net (in the form of tests) to mitigate the risk.

posted by scott with 1 Comments

Contrasting Two MVC / LINQ to SQL Applications for the Web

There are two applications on CodePlex that are interesting to compare and contrast. The MVC Storefront and Background Motion.

MVC Storefront

MVC Storefront is Rob Conery's work. You can watch Rob lift up the grass skirt as he builds this application in a series of webcasts (currently up to Part 8). Rob is using the ASP.NET MVC framework and LINQ to SQL. The Storefront is a work in progress and Rob is actively soliciting feedback on what you want to see.

At the it's lowest level, the Storefront uses a repository type pattern.

public interface ICatalogRepository {
    
IQueryable<Category> GetCategories();
    
IQueryable<Product> GetProducts();
    
IQueryable<ProductReview> GetReviews();
    
IQueryable<ProductImage> GetProductImages();
}

The repository interface is implemented by a TestCatalogRepository (for testing), and a SqlCatalogRepository (when the application needs to get real work done). Rob uses the repositories to map the LINQ to SQL generated classes into his own model, like the following code that maps a ProductImage (the LINQ generated class with a [Table] attribute) into a ProductImage (the Storefront domain class).

public IQueryable<ProductImage> GetProductImages() {

    
return from i in ReadOnlyContext.ProductImages
          
select new ProductImage
           {
               ID = i.ProductImageID,
               ProductID = i.ProductID,
               ThumbnailPhoto = i.ThumbUrl,
               FullSizePhoto = i.FullImageUrl
           };
}

Notice the repository also allows IQueryable to "float up", which defers the query execution. The repositories are consumed by a service layer that the application uses to pull data. Here is an excerpt of the CatalogService.

public Category GetCategory(int id) {
    
    
Category result = _repository.GetCategories()
        .WithCategoryID(id)
        .SingleOrDefault();

    
return result;
}

Controllers in the web application then consume the CatalogService.

public ActionResult Show(string productcode) {

    
CatalogData data = new CatalogData();
    
CatalogService svc = new CatalogService(_repository);

    data.Categories = svc.GetCategories();
    data.Product = svc.GetProduct(productcode);

    
return RenderView("Show",data);
}

Another interesting abstraction in Rob's project is LazyList<T> - an implementation of IList<T> that wraps an IQueryable<T> to provide lazy loading of a collection. LINQ to SQL provides this behavior with the EntitySet<T>, but Rob is isolating his upper layers from LINQ to SQL needs a different strategy. I'm not a fan of the GetCategories method in CatalogService – that looks like join that the repository should put together for the service, and the service layer itself doesn't appear to add a tremendous amount of value, but overall the code is easy to follow and tests are provided. Keep it up, Rob!

Background Motion

The Background Motion (BM) project carries significantly more architectural weight. Not saying this is better or worse, but you know any project using the Web Client Software Factory is not going to be short on abstractions and indirections.

Unlike the Storefront app, the BM app uses a model that is decorated with LINQ to SQL attributes like [Table] and [Column]. BM has a more traditional repository pattern and leverages both generics and expression trees to give the repository more functionality and flexibility.

public interface IRepository<T> where T : IIdentifiable
{
  
int Count();
  
int Count(Expression<Func<T, bool>> expression);
  
void Add(T entity);
  
void Remove(T entity);
  
void Save(T entity);
  T FindOne(
int id);
  T FindOne(
Expression<Func<T, bool>> expression);
  
bool TryFindOne(Expression<Func<T, bool>> expression, out T entity);
  
IList<T> FindAll();
  
IList<T> FindAll(Expression<Func<T, bool>> expression);
  
IQueryable<T> Find();
  
IQueryable<T> Find(int id);
  
IQueryable<T> Find(Expression<Func<T, bool>> expression);
}

Notice the BM repositories will "float up" a deferred query into higher layers by returning an IQueryable, and allow higher layers to "push down" a specification in the form of an expression tree. Combining this technique with generics means you get a single repository implementation for all entities and minimal code. Here is the DLinqRepository implementation of IRepository<T>'s Find method.

public override IQueryable<T> Find(Expression<Func<T, bool>> expression)
{
  
return DataContext.GetTable<T>().Where(expression);
}

Where FindOne can be used like so:

Member member = Repository<Member>.FindOne(m => m.HashCookie == cookieValue);

BM combines the repositories with a unit of work pattern and consumthines both directly in the website controllers.

public IList<Contribution> GetMostRecentByContentType(int contentTypeId, int skip)
{
  
using (UnitOfWork.Begin())
  {
    
return ModeratedContributions
      .Where(c => c.ContentTypeId == contentTypeId)
      .OrderByDescending(c => c.AddedOn)
      .Skip(skip)
      .Take(
Constants.PageSize).ToList();
  }
}

The Background Motion project provides stubbed implementation of all the required repositories and an in-memory unit of work class for unit testing, although the test names leave something to be desired. One of the interesting classes in the BM project is LinqContainsPredicateBuilder – a class whose Build method takes a collection of objects and a target property name. The Build method returns an expression tree that checks to see if the target property equals any of the values in the collection (think of the IN clause in SQL). 

If you want to see Background Motion in action, check out backgroundmotion.com!
posted by scott with 5 Comments

The XML Namespace Tax

While XML literal features in Visual Basic get all the love, the new XElement API for the CLR makes working with XML in C# a bit more fun, too. It's a prime cut of functional programming spiced with syntactic sugar.

One example is how the API works with XML namespaces. When namespaces are present, they demand attention in almost every XML operation you can perform. It's like a tax you need to pay that doesn't pay back any benefits. An old poll on xml-dev once asked people to list their "favorite five bad problems" with XML, to which Peter Hunsberger replied:

  1. Namespaces
  2. Namespaces
  3. Namespaces
  4. Namespaces
  5. Namespaces

And in a different message, Joe English hit the nail on the head:

I'd rather treat element type names and attribute names as simple, atomic strings. This is possible with a sane API, but most XML APIs aren't sane.

The API we had pre .NET 3.5 was a fill-out-this-form-in-triplicate-and-wait-quietly-in-line bureaucracy living inside System.Xml. The new API tries to be a bit saner:

XNamespace xmlns = "http://schemas.foo.com/widgets";        

XDocument doc = new XDocument(
    
new XElement(xmlns + "Widgets",
        
new XElement(xmlns + "Widget", new XAttribute("ID", 1)),
        
new XElement(xmlns + "Widget", new XAttribute("ID", 2))));

Which produces:

<Widgets xmlns="http://schemas.foo.com/widgets">
  <Widget ID="1" />
  <Widget ID="2" />
</
Widgets>

It's little things you don't notice at first that makes the API easier. Like XNamepace has an implicit conversion from string, and redefines the + operator to combine itself with a string to form a full XName (which also has an implicit string conversion operator).

Someone spent some time designing this API for users instead of for a standards body, and it's much appreciated.

posted by scott with 7 Comments

Mocks - It's A Question Of When

Ross Neilson reminded me about a question I left hanging - "when should I use a mock object framework?"

If you have to ask "when", the answer is probably "not now". I feel that mock object frameworks are something you have to evolve into.

First, we can talk about mocks in general. Some people have a misconception that mock objects are only useful if you need to simulate interaction with a resource that is difficult to use in unit tests - like an object that communicates with an SMTP server. This isn't true. Colin Mackay has an article on mocks that outlines other common scenarios:

  • The real object has nondeterministic behavior
  • The real object is difficult to setup
  • The real object has behavior that is hard to trigger
  • The real object is slow
  • The real object is a user interface
  • The real object uses a call back
  • The real object does not yet exist

If we were to step back and generalize this list, we'd say test doubles are useful when you want to isolate code under test. Isolation is good. Let's say we are writing tests for a business component. We wouldn't want the tests to fail when someone checked in bad code for an auditing service the business component uses. We only want the test to fail when something is wrong with the business component itself. Providing a mock auditing service allows us to isolate the business component and control any stimuli the component may pick up from its auditing service. When you start feeling the pain of writing numerous test doubles by hand, you'll know you need a mock object framework.

Mocks aren't just about isolation, however. Mocks are also play a role in test driven development, which is what Colin's last bullet point alludes. The authors of "Mock Roles, Not Objects" say that mocks are:

"… a technique for identifying types in a system based on the roles that objects play … In particular, we now understand that the most important benefit of Mock Objects is what we originally called interface discovery".

Using a mock object framework alongside TDD allows a continuous, top-down design of software. Many TDD fans know they need a mock object framework from day one.

But Aren't Mock Object Frameworks Complex?

This is another question I've been asked recently. Mock object frameworks are actually rather simple and expose a small API. There is complexity, though, just not in the framework itself. As I said earlier, I think there is a path you can follow where you evolve into using mock object frameworks. The typical project team using a mock object framework is experienced with TDD and inversion of control containers. Trying to get up to speed on all these topics at once can be overwhelming.

There is also some complexity in using mocks effectively. In Mocks and the Dangers of Overspecified Software, Ian Cooper says:

"When you change the implementation of a method under test, mocks can break because you now make additional or different calls to the dependent component that is being mocked. … The mocks began to make our software more resistant to change, more sluggish, and this increased the cost to refactoring. As change becomes more expensive, we risked becoming resistant to making it, and we risk starting to build technical debt. A couple of times the tests broke, as developers changed the domain, or changed how we were doing persistence, without changing the test first, because they were frustrated at how it slowed their development. The mocks became an impedance to progress."

Mock object frameworks make interaction based testing easy, but can also lead to the problems Ian outlines. Here a couple more reads on this topic:

In summary – mock object frameworks aren't for everyone. You'll know when you need one!

posted by scott with 2 Comments

Microsoft versus Open Source Software - ALT.NET notes

At the Seattle alt.net conference, I co-sponsored a session with Justin Angel. The topic was "Choosing Microsoft versus Mature Open Source Alternatives". We wanted to hear the rationale people were using when making choices, like:

LINQ to SQL or Castle Active Record

Entity Framework or NHibernate

Subversion and assorted tools or Team Foundation Server

Not once do I remember price being a factor. Most of the fishbowl conversation revolved around risk. There are risks that technical people don't like, and risks that business people don't like. I tried to take all the major topics mentioned and fit them into the following table.

Choose Microsoft

Choose OSS

Business Risks


License issues

Lack of formal support

Hard to hire experts

Technical Risks

V1 and V2 won't always work

Waiting on bug fixes

Friction

Small communities

Lack of training material

Quick summary - Microsoft is a safe choice from the business perspective, but MSFT products can create an uphill struggle for developers. Brad Abrams and ScottGu both popped into the fishbowl to talk about Microsoft's change of direction in building closed source frameworks with "big bang" releases. ScottGu also reminded us that patent trolls create problems for everyone in the ecosystem.

ALT.NET Trivia

How much ALT.NET can you fit in a Hyundai?

According to Hertz, the Hyundai Elantra will accommodate 5 people and 3 pieces of luggage.

The Elantra I drove into Redmond accommodated 5 people (me, Jeremy Miller, Udi Dahan, Steven "I Love The Back Middle Seat" Harman, and Ayende), 6 pieces of luggage, and 3, maybe 4 laptop bags. It was tight. 

Who said developers can't optimize for space anymore?

posted by scott with 7 Comments

A Gentle Introduction to Mocking

At the last CMAP Code Camp I did a "code-only" presentation entitled "A Gentle Introduction to Mocking". We wrote down some requirements, opened Visual Studio, and started writing unit tests. Matt Podwysocki provided color commentary. Code download is here.

I started "accepting" mock objects as one tool in my unit testing toolbox about three years ago (see "The 5 Stages Of Mocking"). Times have changed quite a bit since then, and the tools have improved dramatically. During the presentation we used the following:

Rhino Mocks – the first mocking framework used in the presentation. Years ago, Oren and Rhino Mocks saved us from "string based" mock objects. Rhino Mocks can easily conjure up a strongly typed mock object. The strong typing results in fewer errors, and greatly enhances the refactoring experience.

moq – is the latest mocking framework in the .NET space and is authored by kzu and friends. moq uses lambda expressions and expression trees to define mock object behavior, and also provides strongly typed mocks. The recent addition of factories and mock verification means you can do traditional interaction style testing with moq, if that is the path you choose. The primary differentiator between the two frameworks is that moq does not use a record / playback paradigm.

Here is a test we wrote with Rhino Mocks:

[Fact]
public void Does_Not_Make_Deposit_When_Verification_Fails()
{
    
MockRepository mocks = new MockRepository();
    
IAuditService auditService = mocks.DynamicMock<IAuditService>();
    
IVerificationService verificationService = mocks.CreateMock<IVerificationService>();

    
decimal amount = 1000;
    
BankAccount account = new BankAccount(auditService,
                                          verificationService);

    
using (mocks.Record())
    {
        
Expect.Call(verificationService.VerifyDeposit(account, amount))
              .Return(
false);

    }

    
using (mocks.Playback())
    {
        account.Deposit(amount);
    }

    account.Balance.ShouldEqual(0);

}

The same test using moq:

[Fact]
public void Does_Not_Make_Deposit_When_Verification_Fails()
{
    
Mock<IAuditService> _auditMock = new Mock<IAuditService>();
    
Mock<IVerificationService> _verificationMock = new Mock<IVerificationService>();

    
decimal amount = 1000;
    
BankAccount account = new BankAccount(_auditMock.Object, _verificationMock.Object);


    _auditMock.Expect(a => a.WriteMessage(
It.IsAny<string>()));
    _verificationMock.Expect(v => v.VerifyDeposit(account, amount))
                     .Returns(
false);

    account.Deposit(amount);
    account.Balance.ShouldEqual(0);            
}

xUnit.net – although not featured in the presentation, xUnit.net drove all the unit tests. xUnit is a new framework authored by Jim Newkirk and Brad Wilson. The framework codifies some unit testing best practices and takes advantage of new features in the C# language and .NET framework. I like it.

One question that came up a few times was "when should I use a mock object framework"? Turns out I've been asked a lot of questions starting with when lately, so I'll answer that question in the next post.

posted by scott with 4 Comments

Following Principles

A dictionary definition of principle often uses the word "law", but principles in software development still require judgment. Sometimes the judgment requires some technical knowledge, like knowing the strengths and weaknesses of a particular technology. Other times the judgment requires some business knowledge, like the ability to anticipate where change is likely to occur.

Asking someone to make a sensible judgment about a principle is difficult when all you see is a snippet of code in a blog. The code is outside of its context. Take Leroy's BankAccount class. We don't really know what sort of business Leroy works for, or even what type of software Leroy is building. Nevertheless, let's apply a few principles to see what's bothering Leroy.

Remaining Single

Does Leroy's original BankAccount class violate the Single Responsibility Principle? I think so. The class is opening text files for logging, calculating interest, and oh, by the way, it needs to provide all the state and behavior for a financial account, too. Even without knowing the context, it seems reasonable to remove the auditing cruft into a separate class. After writing some tests, and implementing a concrete auditing class, Leroy's BankAccount might look like the following.

public class BankAccount
{        
    
public void Deposit(decimal amount)
    {
        Balance += amount;
        _log.Write(
"Deposited {0} on {1}", amount, DateTime.Now);
    }

  
// ...
  
  
AuditLog _log = new AuditLog();

}

Leroy has an almost infinite number of choices to make before coming up with the above implementation, though. Leroy could have derived BankAccount from an Auditable base class, or forced BankAccount to implement an IAuditable interface. But what guides Leroy to this particular solution in the universe of a million possibilities are other principles - like the Interface Segregation Principle, and Composition Over Inheritance.

An Addiction to Auditing

Leroy might still frown at his class, feeling he has violated the Dependency Inversion Principle. Without any additional information, we have to trust Leroy's judgment when he decides to make some additional changes.

public class BankAccount
{        
    
public BankAccount (IAuditLog auditLog)
    {
        _log = auditLog;
    }

    
public void Deposit(decimal amount)
    {
        Balance += amount;
        _log.Write(
"Deposited {0} on {1}", amount, DateTime.Now);
    }

  
// ...
  
  
IAuditLog _log;
}

Perhaps Leroy already knew about some future changes in his auditing implementation, or perhaps Leroy just wanted to make his class more testable. Some of us view software as a massive heap of dependencies, and we fight to reduce the brittleness created by dependencies using inversion of control and dependency injection techniques. In some environments, this isn't needed. The principles to apply depend on the language you use, the tools you use, and ultimately depend on the problem the software is trying to solve.

Is There Still Something Wrong With This Code?

WWWTC has had a run of 19 episodes. I have some material for at least another 20. Problem is, most of the material deals with API trivia and edge cases you might never see. Interesting? To me, at least, but I'm thinking of introducing more squishy design type entries. I know a lot of people struggle to apply the latest frameworks and libraries, but design questions are always enlightening and produce the most spirited debate, giving us all something we can learn from.

posted by scott with 5 Comments

Testing Old Code Is Hard

WWWTC #19 presented a BankAccount class from a developer named Leroy and garnered some great feedback. A couple people spotted an actual bug in the interest calculation, which was unintentional. If only Leroy had written some tests for the code…

"Gee, if only I'd written some tests for this code", thought Leroy. Back when Leroy first wrote the code, he considered testing as a job for those irritating people on the other side of the office building. Now, Leroy was looking at changing the BankAccount class to add new features. He was wishing he'd discovered the joys of unit tests earlier than he did. He'd be able to review the existing tests and understand the behavior of the class in more detail, plus, he'd be able to make changes to the class and know immediately if he was breaking any functionality.

"Better late than never", Leroy thought. Writing tests at this point would give him a better understanding of the class and offer the safety net he needed for the upcoming changes. Leroy created a new class library with references to some xUnit assemblies and started in. After a bit of test running, he reached this point:

public class BankAccountTests
{
    [
Fact]
    
public void BalanceShouldEqualFirstDeposit()
    {
        
decimal amount = 200.00m;
        
BankAccount account = new BankAccount();
        account.Deposit(amount);

        account.Balance.ShouldEqual(amount);
    }

    
// ...

    [Fact]
    
public void DepositShouldCreateLogEntry()
    {
        
// ...
    }
}

"Hmm – verifying the log entry is tricky", Leroy thought to himself. "It's too bad this BankAccount class is responsible for formatting and writing the log entry and all that banking logic. Maybe I should do something about that…"

To be continued…

posted by scott with 2 Comments

What's Wrong With This Code (#19)

Leroy was shocked when the source code appeared. It was familiar yet strange, like an old lover's kiss. The code was five years old – an artifact of Leroy's first project. Leroy slowly scrolled through the code and pondered his next move. It wasn't a bug that was bothering Leroy – there were no race conditions or tricky numerical conversions. No performance problems or uncaught error conditions. It was all about design …

public class BankAccount
{
    
public void Deposit(decimal amount)
    {
        _balance += amount;
        LogTransaction(
"Deposited {0} on {1}", amount, DateTime.Now);
    }

    
public void Withdraw(decimal amount)
    {
        _balance -= amount;
        LogTransaction(
"Withdrew {0} on {1}", amount, DateTime.Now);
    }

    
public void AccumulateInterest(decimal baseRate)
    {
        
decimal interest;

        
if (_balance < 10000)
        {
            interest = _balance * baseRate;
        }
        
else
        {
            interest = _balance * (baseRate + 0.01);
        }
        LogTransaction(
"Accumulated {0} interest on {1}", interest, DateTime.Now);
    }

    
void LogTransaction(string message, params object[] parameters)
    {
        
using(FileStream fs = File.Open("auditlog.txt", FileMode.OpenOrCreate))
        
using(StreamWriter writer = new StreamWriter(fs))
        {
            writer.WriteLine(message, parameters);
        }
    }

    
public decimal Balance
    {
        
get { return _balance; }
        
set { _balance = value; }
    }

    
decimal _balance;
}

"Times have changed, and so I have, fortunately", Leroy thought to himself. "And so will this code…"

To be continued…

posted by scott with 32 Comments

Custom Aggregations In LINQ

Aggregate is a standard LINQ operator for in-memory collections that allows us to build a custom aggregation. Although LINQ provides a few standard aggregation operators, like Count, Min, Max, and Average, if you want an inline implementation of, say, a standard deviation calculation, then the Aggregate extension method is one approach you can use (the other approach being that you could write your own operator).

Let's say we wanted to see the total number of threads running on a machine. We could get that number lambda style, or with a query comprehension, or with a custom aggregate.

var processes = Process.GetProcesses();

int totalThreads = 0;

totalThreads = processes.Sum(p => p.Threads.Count);

totalThreads = (
from process in processes
                
select process.Threads.Count).Sum();            

totalThreads =
     processes.Aggregate(
            0,                                  
// initialize
            (acc, p) => acc += p.Threads.Count, // accumulate
            acc => acc                          // terminate
      );

This particular overloaded version of Aggregate follows a common pattern of "Initialize – Accumulate – Terminate". You can see this pattern in extensible aggregation strategies from Oracle to SQLCLR. The first parameter represents an initialization expression. We need to provide an initialized accumulator – in this case just an integer value of 0.

The second parameter is a Func<int, Process, int> expression that the aggregate method will invoke as it iterates across the sequence of inputs. For each process we get our accumulator value (an int), and a reference to the current process in the iteration stage (a Process), and we return a new accumulator value (an int).

The last parameter is the terminate expression. This is an opportunity to provide any final calculations. For our summation, we just need to return the value in the accumulator.

StdDev

Now, let's compute a more thorough summary of running threads, including a standard deviation. Although we could get away with a simple double accumulator for stddev, we can also use a more sophisticated accumulator to encapsulate some calculations, facilitate unit tests, and make the syntax easier on the eye.

class StdDevAccumulator<TSource>
{        
    
public StdDevAccumulator(IEnumerable<TSource> source,
                            
Func<TSource, double> avgSelector)
    {
        SampleAvg = source.Average(avgSelector);
        SampleCount = source.Count();
    }

    
public StdDevAccumulator<TSource> Accumulate(double value)
    {
        TotalDeviation +=
Math.Pow(value - SampleAvg, 2.0);
        
return this;
    }

    
public double ComputeResult()
    {
        
if (SampleCount < 2)
        {
            
return 0.0;
        }
        
return Math.Sqrt(TotalDeviation / (SampleCount - 1));  
    }

    
public double SampleAvg { get; set; }
    
public int    SampleCount { get; set; }
    
public double TotalDeviation { get; set; }
}

Put the accumulator to use like so:

var processes = Process.GetProcesses();

var summary = new
    {
        TotalProcesses = processes.Count(),
        TotalThreads = processes.Sum(p => p.Threads.Count),
        MinThreads = processes.Min(p => p.Threads.Count),
        MaxThreads = processes.Max(p => p.Threads.Count),
        StdDevThreads = processes.Aggregate(    
                
new StdDevAccumulator<Process>(processes, p => p.Threads.Count),
                (acc, p) => acc.Accumulate(p.Threads.Count),                    
                (acc)    => acc.ComputeResult()
        )
    };

posted by scott with 6 Comments

And Equality for All ... Anonymous Types

Given this simple Employee class:

public class Employee
{
    
public int ID { get; set; }
    
public string Name { get; set; }    
}

How many employees do you expect to see from the following query with a Distinct operator?

var employees = new List<Employee>
{
    
new Employee { ID=1, Name="Barack" },
    
new Employee { ID=2, Name="Hillary" },
    
new Employee { ID=2, Name="Hillary" },
    
new Employee { ID=3, Name="Mac" }
};

var query =
        (
from employee in employees        
        
select employee).Distinct();

foreach (var employee in query)
{
    
Console.WriteLine(employee.Name);
}

The answer is 4 – we'll see both Hillary objects. The docs for Distinct are clear – the method uses the default equality comparer to test for equality, and the default comparer sees 4 distinct object references. One way to get around this would be to use the overloaded version of Distinct that accepts a custom IEqualityComparer.

Let's try the query again and project a new, anonymous type with the same properties as Employee.

var query =
  (
from employee in employees                            
  select new { employee.ID, employee.Name }).Distinct();

That query only yields three objects – Distinct removes the duplicate Hillary! How'd it suddenly get so smart?

Turns out the C# compiler overrides Equals and GetHashCode for anonymous types. The implementation of the two overridden methods uses all the public properties on the type to compute an object's hash code and test for equality. If two objects of the same anonymous type have all the same values for their properties – the objects are equal. This is a safe strategy since anonymously typed objects are essentially immutable (all the properties are read-only). Fiddling with the hash code of a mutable type gets a bit dicey.

Interestingly – I stumbled on the Visual Basic version of anonymous types as I was writing this post and I see that VB allows you to define "Key" properties. In VB, only the values of Key properties are compared during an equality test. Key properties are readonly, while non-key properties on an anonymous type are mutable. That's a very C sharpish thing to do, VB team.

posted by scott with 1 Comments

Inner, Outer, Let's All Join Together With LINQ

The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…

At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
  select new { employee.Name, Department = department.Name };

After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
    
  into employeeGroup
  select new { department.Name, Employees = employeeGroup };

As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.

foreach (var department in query)
{
    
Console.WriteLine("{0}", department.Name);
    
foreach (var employee in department.Employees)
    {
        
Console.WriteLine("\t{0}", employee.Name);
    }
}

Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
    
  into employeeGroups
  from employee in employeeGroups.DefaultIfEmpty()
  select new { DepartmentName = department.Name, EmployeeName = employee.Name };

One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.

posted by scott with 9 Comments