Abstractions, Patterns, and Interfaces

Thursday, November 15, 2012

Someone recently asked me how to go about building an application that processes customer information. The customer information might live in a database, but it also might live in a .csv file.

The interesting thing is I’m in the middle of building a tool for DBAs that one day will be a fancy WPF desktop application integrated with source control repositories and relational databases, but for the moment is a simple console application using only the file system for storage. Inside I’ve faced many scenarios similar to the question being asked, and these are scenarios I’ve faced numerous times over the years.

There are 101 ways to solve the problem, but let’s work through one solution together.

Getting Started

We might start by writing some tests, or we might start by jumping in and trying to display a list of all customers in a data source, but either way we’ll eventually find ourselves with the following code, which contains the essence of the question:

var customers = // how do I get customers?

To make things more concrete, let’s take that line of code and put it in a class that will do something useful. A class to put all customer names into the output of a console program.

public class CustomerDump
{        
    public void Render()
    {
        var customers = // how ?
        foreach (var customer in customers)
        {
            Console.WriteLine(customer.Name);
        }
    }
}

Although we might not know how to retrieve the customer data, we probably do know what data we need about each customer. We’ll go ahead and define a Customer class for objects to hold customer data.

public class Customer
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Location { get; set; }
}

Now we can work on the main question. The business has told us we need to be flexible with the customer data, so how will we go about retrieving customers?

Defining an Interface

Interfaces are wonderful for a language like C#. Interfaces give us everything we need to work with an object in a strongly-typed manner, but place the least number of constraints on the object implementing the interface. Interfaces make the C# compiler happy without forcing us to pay an inheritance tax for working with a class hierarchy. We’ll define an interface that describes exactly how we want to fetch customers and how we want the customers packaged for us to consume.

public interface ICustomerDataSource
{
    IList<Customer> FetchAllCustomers();
}

There are many subtleties to interface design. Even the simple interface here required us to make a number of decisions.

First, what is the name of the operation? Do we want to FetchAllCustomers? SelectAllCustomers? GetCustomers? I believe names are important at this level, but you don’t want to give too much away. A name like SelectAllCustomers is biased towards working with a relational database, and we know we’ll be working with more than just a SQL database.

Often the name is influenced by what we know about the project and the business. Fortunately, refactoring tools make names easy to change.

Another design decision is the return type. When you are trying to abstract away some operation, you have to decide if you’ll go for the lowest common denominator (anything can return IEnumerable), or something that might only be achieved by an advanced data source (like IQueryable). In this example we are forcing all implementations to return a list, which has some tradeoffs, but at least we know we’ll be getting specific type of data structure. IEnumerable would be targetting the lowest common denominator and means the interface is easier to implement, but we might not have all the convenience features we need.

Once again, knowing a bit about the direction of the project and being in tune with the business needs will help in determining when to add flexibility and when to enforce constraints.

Implementing the Interface

One question we might have had in the back of our mind is how to provide an implementation of the data loading interface when some implementations might need parameters like a database connection string, while other implementations might need file system details, like the path the to the .csv file with customers inside.

When designing an interface we need to put those thoughts in the back of our mind and focus entirely on the client’s needs first. Just watch how this unfolds as we build a class to read custom data from a csv file.

class CustomerCsvDataSource : ICustomerDataSource
{
    public CustomerCsvDataSource(string path)
    {
        _path = path;
    }


    public IList<Customer> FetchAllCustomers()
    {
        return File.ReadAllLines(_path)
                   .Select(line => line.Split(','))
                   .Select((values, index) =>
                        new Customer
                            {
                                Id = index,
                                Name = values[0],
                                Location = values[1]
                            }).ToList();
    }


    readonly string _path;
}

This isn’t the most robust CSV parser in the world (it won’t deal with embedded commas, so we might want to get some help), but it does demonstrate a pattern I’ve been using over and over again recently. Class implements interface, stores constructor parameters in read-only fields, exposes methods to implement the interface, and above all keep things simple, small, and focused.

Here is the pattern again, this time in a class that uses Mark Rendle’s Simple.Data to access SQL Server, but we could do the same thing with raw ADO.NET, the Entity Framework, or even MongoDB.

class CustomerDbDataSource : ICustomerDataSource
{
    public CustomerDbDataSource(string connectionString)
    {
        _connectionString = connectionString;
    }


    public IList<Customer> FetchAllCustomers()
    {
        var db = Database.OpenConnection(_connectionString);
        return db.Customers.All().ToList<Customer>();       
    }


    readonly string _connectionString;
}

We can see now that worrying about connection strings and file names while defining the interface was premature worrying. These were all implementation details the interface isn’t concerned with, as the interface only exposes the operations clients need, like the ability to fetch customers.

Instead, these classes are “programmed” with implementation specific instructions given by constructor parameters, and the instructions give them everything they need to do the work required by the interface. The classes never change the instructions (they are all saved in read-only fields), but they use the instructions to produce new results.

We have now reached the point where we have two different classes to deal with two different sources of data, but how do we use them?

Consuming the Interface

Returning to our CustomerDump class, one obvious approach to producing results is the following.

public class CustomerDump
{
    public void Render()
    {
        var dataSource = new CustomerCsvDataSource("customers.csv");
        var customers = dataSource.FetchAllCustomers();
        
        foreach (var customer in customers)
        {
            Console.WriteLine(customer.Name);
        }
    }
}

The above approach can work, but we’ve tied the CustomerDump class to the CSV data source by instantiating CustomerCsvDataSource directly. If we need CustomerDump to only work with a CSV data source, this is reasonable, but we know most of the application needs to work with different data sources so we’ll need to avoid this approach in most places.

Instead of CustomerDump choosing a data source and coupling itself to a specific class, we’ll force someone to give CustomerDump the data source to use.

public class CustomerDump
{       
    public CustomerDump(ICustomerDataSource dataSource)
    {
        _dataSource = dataSource;
    }
 
    public void Render()
    {            
        var customers = _dataSource.FetchAllCustomers();
        
        foreach (var customer in customers)
        {
            Console.WriteLine(customer.Name);
        }
    }
 
    readonly ICustomerDataSource _dataSource;
}

Now, any logic we have inside of CustomerDump can work with customers from anywhere, and we can add new data sources in the future. We’ve gained a lot of flexibility in an area where the business demands flexibility, and hopefully didn’t build a mountain of abstractions where none were required. All the pieces are small and focused, and they way they will fit together depends on the application you are building. Which leads to the next question – who is responsible for putting CustomerDump together?

At the top level of every application built in this fashion you’ll have some bootstrapping code to arrange all the pieces and set them in motion. For a console mode application it might look like this:

static void Main(string[] args)
{
    // arrange
    var connectionString = @"server=(localdb)\v11.0;database=Customers";
    var dataSource = new CustomerDbDataSource(connectionString);
    var dump = new CustomerDump(dataSource);
    
    // execute    
    dump.Render();
}

Here we have hard-coded values again, but you can imagine hard-coded connection strings and class names getting intermingled or replaced with if/else statements and settings from the app.config file. As the application becomes more complex, we could turn to tools like MEF or StructureMap to manage the construction of the building blocks we need.

Going Further

One of the biggest challenges in building well factored software is knowing when to stop adding abstractions. For example, we can say the CustomerDump class is currently tied too tightly to Console.Out. To remove the dependency we’ll instead inject a Stream for CustomerDump to use.

public CustomerDump(ICustomerDataSource dataSource,
                    Stream output)
{
    _dataSource = dataSource;
    _output = new StreamWriter(output);
}

Alternatively, we could say CustomerDump shouldn’t be responsible for both getting and formatting each customer as well as sending the result to the screen. In that case we’ll just have CustomerDump create the formatted string, and leave it to the caller to decide what to do with the result.

public string CreateDump()
{
    var builder = new StringBuilder();
    var customers = _dataSource.FetchAllCustomers();

    foreach (var customer in customers)
    {
        builder.AppendFormat("{0} : {1}", 
            customer.Name, customer.Location);
    }
    return builder.ToString();
}

Now we might look at the code and decide that getting and formatting are two different responsibilities, so we’ll need someone to pass the list of customers to format instead of having the method use the data source directly. And so on, and so on.

Where do we stop?

That’s where most samples break down because the right place to stop is the place where we have just enough abstraction to make things work and still meet our requirements for testability, maintainability, scalability, readability, extensibility, and all the other ilities we need. Samples like this can show you the patterns you can use to achieve specific results, but only in the context of a specific application do we know the results we need. We need to apply both YAGNI and SRP in the right places and at the right time.

All My Pluralsight Courses

OdeToCode by K. Scott Allen