What's Wrong With This Code (#19)

Monday, March 31, 2008 by scott

Leroy was shocked when the source code appeared. It was familiar yet strange, like an old lover's kiss. The code was five years old – an artifact of Leroy's first project. Leroy slowly scrolled through the code and pondered his next move. It wasn't a bug that was bothering Leroy – there were no race conditions or tricky numerical conversions. No performance problems or uncaught error conditions. It was all about design …

public class BankAccount

{

    public void Deposit(decimal amount)

    {

        _balance += amount;

        LogTransaction("Deposited {0} on {1}", amount, DateTime.Now);

    }

    public void Withdraw(decimal amount)

    {

        _balance -= amount;

        LogTransaction("Withdrew {0} on {1}", amount, DateTime.Now);

    }

    public void AccumulateInterest(decimal baseRate)

    {

        decimal interest;

        if (_balance < 10000)

        {

            interest = _balance * baseRate;

        }

        else

        {

            interest = _balance * (baseRate + 0.01);

        }

        LogTransaction("Accumulated {0} interest on {1}", interest, DateTime.Now);

    }

    void LogTransaction(string message, params object[] parameters)

    {

        using(FileStream fs = File.Open("auditlog.txt", FileMode.OpenOrCreate))

        using(StreamWriter writer = new StreamWriter(fs))

        {

            writer.WriteLine(message, parameters);

        }

    }

    public decimal Balance

    {

        get { return _balance; }

        set { _balance = value; }

    }

    decimal _balance;

}

"Times have changed, and so I have, fortunately", Leroy thought to himself. "And so will this code…"

To be continued…

Custom Aggregations In LINQ

Saturday, March 29, 2008 by scott

Aggregate is a standard LINQ operator for in-memory collections that allows us to build a custom aggregation. Although LINQ provides a few standard aggregation operators, like Count, Min, Max, and Average, if you want an inline implementation of, say, a standard deviation calculation, then the Aggregate extension method is one approach you can use (the other approach being that you could write your own operator).

Let's say we wanted to see the total number of threads running on a machine. We could get that number lambda style, or with a query comprehension, or with a custom aggregate.

var processes = Process.GetProcesses();

int totalThreads = 0;

 totalThreads = processes.Sum(p => p.Threads.Count);

totalThreads = (from process in processes

                select process.Threads.Count).Sum();            

totalThreads =

     processes.Aggregate(

            0,                                  // initialize

            (acc, p) => acc += p.Threads.Count, // 
    accumulate

            acc => acc                          // terminate

      );

This particular overloaded version of Aggregate follows a common pattern of "Initialize – Accumulate – Terminate". You can see this pattern in extensible aggregation strategies from Oracle to SQLCLR. The first parameter represents an initialization expression. We need to provide an initialized accumulator – in this case just an integer value of 0.

The second parameter is a Func<int, Process, int> expression that the aggregate method will invoke as it iterates across the sequence of inputs. For each process we get our accumulator value (an int), and a reference to the current process in the iteration stage (a Process), and we return a new accumulator value (an int).

The last parameter is the terminate expression. This is an opportunity to provide any final calculations. For our summation, we just need to return the value in the accumulator.

StdDev

Now, let's compute a more thorough summary of running threads, including a standard deviation. Although we could get away with a simple double accumulator for stddev, we can also use a more sophisticated accumulator to encapsulate some calculations, facilitate unit tests, and make the syntax easier on the eye.

class StdDevAccumulator<TSource>

{        

    public StdDevAccumulator(IEnumerable<TSource> source, 

                             Func<TSource, double> avgSelector)

    {

        SampleAvg = source.Average(avgSelector);

        SampleCount = source.Count();

    }

    public StdDevAccumulator<TSource> Accumulate(double value)

    {

        TotalDeviation += Math.Pow(value - SampleAvg, 2.0);

        return this;

    }

    public double 
    ComputeResult()

    {

        if (SampleCount < 2)

        {

            return 0.0;

        }

        return Math.Sqrt(TotalDeviation / (SampleCount - 1));   

    }

    public double SampleAvg { get; set; }

    public int    SampleCount { get; set; }

    public double TotalDeviation { get; set; }

}

Put the accumulator to use like so:

var processes = Process.GetProcesses();

var summary = new

    {

        TotalProcesses = processes.Count(),

        TotalThreads = processes.Sum(p => p.Threads.Count),

        MinThreads = processes.Min(p => p.Threads.Count),

        MaxThreads = processes.Max(p => p.Threads.Count),

        StdDevThreads = processes.Aggregate(    

                new StdDevAccumulator<Process>(processes, p => p.Threads.Count),

                (acc, p) => acc.Accumulate(p.Threads.Count),                    

                (acc)    => acc.ComputeResult()

        )

    };

LINQ

Saturday, March 29, 2008 by scott

An Introduction To LINQ
C# 3.0 and LINQ

And Equality for All ... Anonymous Types

Wednesday, March 26, 2008 by scott

Given this simple Employee class:

public class Employee

{

    public int ID { get; set; }

    public string Name { get; set; }    

}

How many employees do you expect to see from the following query with a Distinct operator?

var employees = new List<Employee>

{

    new Employee { ID=1, Name="Barack" },

    new Employee { ID=2, Name="Hillary" },

    new Employee { ID=2, Name="Hillary" },

    new Employee { ID=3, Name="Mac" }

};

var query =

        (from employee in employees        

         select employee).Distinct();

foreach (var employee in query)

{

    Console.WriteLine(employee.Name);

}

The answer is 4 – we'll see both Hillary objects. The docs for Distinct are clear – the method uses the default equality comparer to test for equality, and the default comparer sees 4 distinct object references. One way to get around this would be to use the overloaded version of Distinct that accepts a custom IEqualityComparer.

Let's try the query again and project a new, anonymous type with the same properties as Employee.

var query =

(from employee in employees                            

   select new { employee.ID, employee.Name }).Distinct();

That query only yields three objects – Distinct removes the duplicate Hillary! How'd it suddenly get so smart?

Turns out the C# compiler overrides Equals and GetHashCode for anonymous types. The implementation of the two overridden methods uses all the public properties on the type to compute an object's hash code and test for equality. If two objects of the same anonymous type have all the same values for their properties – the objects are equal. This is a safe strategy since anonymously typed objects are essentially immutable (all the properties are read-only). Fiddling with the hash code of a mutable type gets a bit dicey.

Interestingly – I stumbled on the Visual Basic version of anonymous types as I was writing this post and I see that VB allows you to define "Key" properties. In VB, only the values of Key properties are compared during an equality test. Key properties are readonly, while non-key properties on an anonymous type are mutable. That's a very C sharpish thing to do, VB team.

Inner Join, Outer Left Join, Let's All Join Together With LINQ

Tuesday, March 25, 2008 by scott

The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…

At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.

var query =
  from department in departments
  join employee in employees
      on department.ID equals employee.DepartmentID 
  select new { employee.Name, Department = department.Name };

After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.

var query =
  from department in departments 
  join employee in employees 
      on department.ID equals employee.DepartmentID
      into employeeGroup 
  select new { department.Name, Employees = employeeGroup };

As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.

foreach (var department in query)
{
    Console.WriteLine("{0}", department.Name);
    foreach (var employee in department.Employees)
    {
        Console.WriteLine("\t{0}", employee.Name);
    }
}

Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.

var query =
  from department in departments
  join employee in employees
      on department.ID equals employee.DepartmentID
      into employeeGroups
  from employee in employeeGroups.DefaultIfEmpty()
  select new { DepartmentName = department.Name, EmployeeName = employee.Name };

One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.

Mashups with SyndicationFeed and LINQ

Monday, March 17, 2008 by scott

I was experimenting with the new SyndicationFeed class in 3.5 earlier this year and devised a mashup LINQ query:

string[] feedUrls = { "https://odetocode.com/blogs/scott/rss.aspx",

                      "http://www.pluralsight.com/blogs/mainfeed.aspx",

                      "http://feeds.feedburner.com/ScottHanselman"     

                    };

var items = 

    from url in feedUrls

        let feed = SyndicationFeed.Load(XmlReader.Create(url))

    from item in feed.Items

    where item.PublishDate > DateTime.Now.AddDays(-30)

    orderby item.PublishDate descending

    select item;

// display the most recent 15 items

foreach (SyndicationItem item in items.Take(15))

{

    Console.WriteLine("{0} : {1}", 

        item.PublishDate.Date.ToShortDateString(), 

        item.Title.Text);

}

The code is able to filter and sort RSS items from an arbitrary number of blogs with a 6 line query expression. I was thinking of this code when I ran across Scott Hanselman's Weekly Source Code 19 – LINQ and more What, Less How. Scott's reader David Nelson had the following observation:

I disagree with Siderite, in that I think the LINQ example is more readable than the iterative example; however, as has been pointed out, it leaves no room for error handling or AppDomain transitions. This is a problem with LINQ in general; in trying to make everything very compact, it leaves too little room to maneuver.

The LINQ query I'm using isn't production code. If just one blog is down and the XmlReader throws an exception, the entire operation is borked. One solution is to wrap the feed reading into a method that uses exception handling and returns an empty SyndicationFeed in case of an exception - then invoke the method from inside the query. Could anything else go wrong? Sure - one null PublishDate on an item and again we'd be borked. Bullet-proofing a LINQ query might take some work, especially when dealing with third party types.

As LINQ moves us into the "What" instead of the "How", it might be harder to see these types of error scenarios. LINQ is a fantastic technology, but like everything in software, it is a good idea to look the gift horse in the mouth.

Talks You Won’t See At the Local Code Camp

Tuesday, March 11, 2008 by scott

The Lost Art of TSR Programming
Abstract: Return to the glory days of DOS 2.0 and INT 21h as we write a simple Terminate and Stay Resident application using the latest software development techniques. We will construct our x86 assembler code using test driven development and mock extended memory managers.

Why Am I Here On A Saturday?
Abstract: Because even if you weren't here, you'd still be at the computer. Don't think you'd be doing chores at home, like dusting off the entertainment center, because chores are boring.

Life of a Gnat
Abstract: This session has nothing to do with GNU software, but will describe (in excruciating detail) the journey of the common fungus gnat from egg to adulthood. Pictures of mating swarms may not be appropriate for younger attendees.

P.S. In all seriousness, the spring code camps are coming to the Mid-Atlantic and the topics are far better than the ones presented above.