Brute Force Might Work

Thursday, November 29, 2012 by K. Scott Allen
1 comment

Problem: A content heavy web site undergoes a massive reorganization. A dozen URL rewrite rules are added to issue permanent redirects and preserve permalinks across the two versions. The question is - will the rules really work?

Brute Force Approach: Take the last 7 days of web server logs, and write about 40 lines of C# code:

static void Main()
{
    var baseUrl = new Uri("http://localhost/");
    var files = Directory.GetFiles(@"..\..\Data", "*.log");

    Parallel.ForEach(files, file =>
        {
            var requests = File.ReadAllLines(file)
                .Where(line => !line.StartsWith("#"))
                .Select(line => line.Split(' '))
                .Select(parts => new
                                     {
                                         Url = parts[6],
                                         Status = parts[16],
                                         Verb = parts[5]
                                     })
                .Where(request => request.Verb == "GET" && 
                                  request.Status[0] != '4' &&
                                  request.Status[0] != '5');

            foreach (var request in requests)
            {
                try
                {
                    var client = new WebClient();
                    client.DownloadData(new Uri(baseUrl, request.Url));
                }
                catch (Exception ex)
                {
                    // .. log it (in a thread safe way)
                }
            }
        });
}            

The little console mode application isn't foolproof, but it did uncover a number of problems and edge cases. As a bonus, it also turned into a good workload generator for SQL Server index re-tuning.

Why All The Lambdas?

Tuesday, November 27, 2012 by K. Scott Allen
24 comments

"Why All The Lambdas?" is a question that comes up with ASP.NET MVC.

@Html.TextBoxFor(model => model.Rating)

Instead of lambdas, why don't we just pass a property value directly?

@Html.TextBoxFor(Model.Rating)

Aren't we simply trying to give a text input the value of the Rating property?

image

There is more going on here than meets the eye, so let's …

Start From The Top

If all we need is a text box with a value set to the Rating property, then all we need to do is write the following:

<input type="text" value="@Model.Rating" />

But, the input is missing an important attribute, the name attribute, which allows the browser to associate the value of the input with a name when it sends information back to the server for processing. In ASP.NET MVC we typically want the name of an input to match the name of a property or action parameter so model binding will work (we want the name Rating, in this case). The name is easy enough to add, but we need to remember to keep the name in synch if the property name ever changes.

<input type="text" value="@Model.Rating" name="Rating"/>

Creating the input manually isn't difficult, but there are still some open issues:

- If Model is null, the code will fail with an exception.

- We aren't providing any client side validation support.

- We have no ability to display an alternate value (like an erroneous value a user entered on a previous form submission – we want to redisplay the value, show an error message, and allow the user to fix simple typographical errors).

To address these scenarios we'd need to write some more code and use if statements each time we displayed a text input. Since we want to keep our views simple,we'll package up the code inside an HTML helper.

Without Lambdas

Custom HTML helpers are easy to create.  Let's start with a naïve helper to create a text input.

// this code demonstrates a point, don't use it
public static IHtmlString MyTextBox<T>(
this HtmlHelper<T> helper, object value, string name)
{
var builder = new TagBuilder("input");
builder.MergeAttribute("type", "text");
builder.MergeAttribute("name", name);
builder.MergeAttribute("value", value.ToString());

return new HtmlString(
builder.ToString(TagRenderMode.SelfClosing)
);
}

The way you'd call the helper is to give it a value and a name:

@Html.MyTextBox(Model.Rating, "Rating")

There isn't much going on inside the helper, and the helper doesn't address any of the scenarios we thought about earlier. We'll still get an exception in the view if Model is null, so instead of forcing the view to give the helper a value (using Model.Rating), we'll instead have the view pass the name of the model property to use.

@Html.MyTextBox("Rating")

Now the helper itself can check for null models, so we don't need branching logic in the view.

// this code demonstrates a point, don't use it for real work
public static IHtmlString MyTextBox<T>(
this HtmlHelper<T> helper, string propertyName)
{
var builder = new TagBuilder("input");
builder.MergeAttribute("type", "text");
builder.MergeAttribute("name", propertyName);

var model = helper.ViewData.Model;
var value = "";

if(model != null)
{
var modelType = typeof(T);
var propertyInfo = modelType.GetProperty(propertyName);
var propertyValue = propertyInfo.GetValue(model);
value = propertyValue.ToString();
}

builder.MergeAttribute("value", value);

return new HtmlString(
builder.ToString(TagRenderMode.SelfClosing)
);
}

The above helper is not only using the propertyName parameter to set the name of the input, but it also uses propertyName to dig a value out of the model using reflection. We can build robust and useful HTML helpers just by passing property names as strings (in fact many of the built-in helpers, like TextBox, accept string parameters to specify the property name). Giving the helper a piece of data (the property name) instead of giving the helper a raw value grants the helper more flexibility.

The problem is the string literal "Rating". Many people prefer strong typing, and this is where lambda expressions can help.

With Lambdas

Here is a new version of the helper using a lambda expression.

// this code demonstrates a point, don't use it for real work
public static IHtmlString MyTextBox<T>(
this HtmlHelper<T> helper,
Func<T, object> propertyGetter,
string propertyName)
{
var builder = new TagBuilder("input");
builder.MergeAttribute("type", "text");
builder.MergeAttribute("name", propertyName);

var value = "";
var model = helper.ViewData.Model;

if(model != null)
{
value = propertyGetter(model).ToString();
}

builder.MergeAttribute("value", value);

return new HtmlString(
builder.ToString(TagRenderMode.SelfClosing)
);
}

Notice the reflection code is gone, because we can use the lambda expression to retrieve the property value at the appropriate time.  But look at how we use the helper in a view, and we'll see it is a step backwards.


@Html.MyTextBox(m => m.Rating, "Rating")

Yes, we have some strong typing, but we have to specify the property name as a string. Although the helper can use the lambda expression to retrieve a property value, the lambda doesn't give the helper any data to work with – just executable code. The helper doesn't know the name of the property the code is using. This is the point where Expression<T> is useful.

With Expressions

This version of the helper will wrap the incoming lambda with Expression<T>. The Expression<T> data type is magical. Instead of giving the helper executable code, an expression will force the C# compiler to give the helper a data structure that describes the code (my article on C# and LINQ describes this in more detail).

The HTML helper can use the data structure to find all sorts of interesting things, like the property name, and given the name it can get the value in different ways.

// this code demonstrates a point, don't use it for real work
public static IHtmlString MyTextBox<T, TResult>(
this HtmlHelper<T> helper,
Expression<Func<T, TResult>> expression)
{
var builder = new TagBuilder("input");
builder.MergeAttribute("type", "text");

// note – not always this simple
var body = expression.Body as MemberExpression;
var propertyName = body.Member.Name;
builder.MergeAttribute("name", propertyName);

var value = "";
var model = helper.ViewData.Model;
if (model != null)
{
var modelType = typeof(T);
var propertyInfo = modelType.GetProperty(propertyName);
var propertyValue = propertyInfo.GetValue(model);
value = propertyValue.ToString();
}

builder.MergeAttribute("value", value);

return new HtmlString(
builder.ToString(TagRenderMode.SelfClosing)
);
}

The end result being the HTML helper gets enough information from the expression that all we need to do is pass a lambda to the helper:


@Html.MyTextBox(m => m.Rating)

Summary

The lambda expressions (of type Expression<T>) allow a view author to use strongly typed code, while giving an HTML helper all the data it needs to do the job.

Starting with the data inside the expression, an HTML helper can also check model state to redisplay invalid values, add attributes for client-side validation, and change the type of input (from text to number, for example) based on the property type and data annotations.

That's why all the lambdas.

Flood Filling In A Canvas

Wednesday, November 21, 2012 by K. Scott Allen
3 comments

Canvasfill is a demo for a friend who wants to flood fill a clicked area in an HTML 5 canvas.

A couple notes:

JavaScript loads a PNG image into the canvas when the page loads.

var img = new Image();
img.onload = function () {
canvas.width = img.width;
canvas.height = img.height;
context.drawImage(this,0,0);
};
img.src = "thermometer_01.png";

The image and the JavaScript must load from the same domain for the sample to work, otherwise you’ll run into security exceptions (unless you try to CORS enable the image, which doesn’t work everywhere).

The code uses a requestAnimationFrame polyfill from Paul Irish for efficient animations.

The code uses getImageData and putImageData to get and color a single pixel on each iteration.

image = context.getImageData(point.x, point.y, 1, 1);
var pixel = image.data;

This is not the most efficient approach to using the canvas, so if you need speed you’ll want to look at grabbing the entire array of pixels. With the current approach it is easier to “see” how the flood fill algorithm works since you can watch as pixels change colors in specific directions.

The flood fill algorithm itself is an extremely primitive queue-based (non-recursive) algorithm. It doesn’t deal well with anti-aliased images, for example, so you might need to look at more advanced algorithms if the image is not a blocky clip art image or a screen shot of Visual Studio 2012 with the default color scheme.

Two ASP.NET MVC 4 Courses

Tuesday, November 20, 2012 by K. Scott Allen
11 comments

Now on Pluralsight:

The ASP.NET MVC 4 Fundamentals training course spends most of its time on new features for version 4 of the framework, including:

- Mobile display modes, display providers, and browser overriding

- Async programming with C# 5 and the async / await keywords

- The WebAPI

- Bundling and minification with the Web Optimization bits

The Building Applications with ASP.NET MVC 4 training course is a start to finish introduction to programming with ASP.NET MVC 4. Some of the demos in the 7+ hours of content include:

- Using controllers, action results, action filters and routing

- Razor views, partial views, and layout views

- Models, view models, data annotations, and validation

- Custom validation attributes and self-validating models

- Entity Framework 5 code-first programming

- Entity Framework migrations and seeding

- Security topics including mass assignment and cross site request forgeries

- Using JavaScript and jQuery to add paging, autocompletion, async form posts, and async searches

- Taking control of Simple Membership

- Using OAuth and OpenID

- Caching, localization, and diagnostics

- Error logging with ELMAH

- Unit testing with Visual Studio 2012

- Deploying to IIS

- Deploying to a Microsoft Windows Azure web site

Enjoy!

GroupBy With Maximum Size

Monday, November 19, 2012 by K. Scott Allen
3 comments

I recently needed to group some objects, which is easy with GroupBy, but I also needed to enforce a maximum group size, as demonstrated by the following test.

public void Splits_Group_When_GroupSize_Greater_Than_MaxSize()
{
var items = new[] { "A1", "A2", "A3", "B4", "B5" };

var result = items.GroupByWithMaxSize(i => i[0], maxSize: 2);

Assert.True(result.ElementAt(0).SequenceEqual(new[] { "A1", "A2" }));
Assert.True(result.ElementAt(1).SequenceEqual(new[] { "A3" }));
Assert.True(result.ElementAt(2).SequenceEqual(new[] { "B4", "B5" }));
}
The following code is not the fastest or cleverest solution, but it does make all the tests turn green.  
public static IEnumerable<IEnumerable<T>> GroupByWithMaxSize<T, TKey>(
this IEnumerable<T> source, Func<T, TKey> keySelector, int maxSize)
{
var originalGroups = source.GroupBy(keySelector);

foreach (var group in originalGroups)
{
if (group.Count() <= maxSize)
{
yield return group;
}
else
{
var regroups = group.Select((item, index) => new { item, index })
.GroupBy(g => g.index / maxSize);
foreach (var regroup in regroups)
{
yield return regroup.Select(g => g.item);
}
}
}
}
In this case I don’t need the Key property provided by IGrouping, so the return type is a generically beautiful IEnumerable<IEnumerable<T>>.

Abstractions, Patterns, and Interfaces

Thursday, November 15, 2012 by K. Scott Allen
10 comments

Someone recently asked me how to go about building an application that processes customer information. The customer information might live in a database, but it also might live in a .csv file.

The interesting thing is I’m in the middle of building a tool for DBAs that one day will be a fancy WPF desktop application integrated with source control repositories and relational databases, but for the moment is a simple console application using only the file system for storage. Inside I’ve faced many scenarios similar to the question being asked, and these are scenarios I’ve faced numerous times over the years.

There are 101 ways to solve the problem, but let’s work through one solution together.

Getting Started

We might start by writing some tests, or we might start by jumping in and trying to display a list of all customers in a data source, but either way we’ll eventually find ourselves with the following code, which contains the essence of the question:

var customers = // how do I get customers?

 

To make things more concrete, let’s take that line of code and put it in a class that will do something useful. A class to put all customer names into the output of a console program.

public class CustomerDump

{
public void Render()
{
var customers = // how ?
foreach (var customer in customers)
{
Console.WriteLine(customer.Name);
}
}
}


Although we might not know how to retrieve the customer data, we probably do know what data we need about each customer. We’ll go ahead and define a Customer class for objects to hold customer data.

public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public string Location { get; set; }
}

Now we can work on the main question. The business has told us we need to be flexible with the customer data, so how will we go about retrieving customers?

Defining an Interface

Interfaces are wonderful for a language like C#. Interfaces give us everything we need to work with an object in a strongly-typed manner, but place the least number of constraints on the object implementing the interface. Interfaces make the C# compiler happy without forcing us to pay an inheritance tax for working with a class hierarchy. We’ll define an interface that describes exactly how we want to fetch customers and how we want the customers packaged for us to consume.

public interface ICustomerDataSource
{
IList<Customer> FetchAllCustomers();
}

There are many subtleties to interface design. Even the simple interface here required us to make a number of decisions.

First, what is the name of the operation? Do we want to FetchAllCustomers? SelectAllCustomers? GetCustomers? I believe names are important at this level, but you don’t want to give too much away. A name like SelectAllCustomers is biased towards working with a relational database, and we know we’ll be working with more than just a SQL database.

Often the name is influenced by what we know about the project and the business. Fortunately, refactoring tools make names easy to change.

Another design decision is the return type. When you are trying to abstract away some operation, you have to decide if you’ll go for the lowest common denominator (anything can return IEnumerable), or something that might only be achieved by an advanced data source (like IQueryable). In this example we are forcing all implementations to return a list, which has some tradeoffs, but at least we know we’ll be getting specific type of data structure. IEnumerable would be targetting the lowest common denominator and means the interface is easier to implement, but we might not have all the convenience features we need.

Once again, knowing a bit about the direction of the project and being in tune with the business needs will help in determining when to add flexibility and when to enforce constraints. 

Implementing the Interface

One question we might have had in the back of our mind is how to provide an implementation of the data loading interface when some implementations might need parameters like a database connection string, while other implementations might need file system details, like the path the to the .csv file with customers inside.

When designing an interface we need to put those thoughts in the back of our mind and focus entirely on the client’s needs first. Just watch how this unfolds as we build a class to read custom data from a csv file.

class CustomerCsvDataSource : ICustomerDataSource
{
public CustomerCsvDataSource(string path)
{
_path = path;
}



public IList<Customer> FetchAllCustomers()
{
return File.ReadAllLines(_path)
.Select(line => line.Split(','))
.Select((values, index) =>
new Customer
{
Id = index,
Name = values[0],
Location = values[1]
}).ToList();
}



readonly string _path;
}


This isn’t the most robust CSV parser in the world (it won’t deal with embedded commas, so we might want to get some help), but it does demonstrate a pattern I’ve been using over and over again recently. Class implements interface, stores constructor parameters in read-only fields, exposes methods to implement the interface, and above all keep things simple, small, and focused.

Here is the pattern again, this time in a class that uses Mark Rendle’s  Simple.Data to access SQL Server, but we could do the same thing with raw ADO.NET, the Entity Framework, or even MongoDB.

class CustomerDbDataSource : ICustomerDataSource
{
public CustomerDbDataSource(string connectionString)
{
_connectionString = connectionString;
}



public IList<Customer> FetchAllCustomers()
{
var db = Database.OpenConnection(_connectionString);
return db.Customers.All().ToList<Customer>();
}



readonly string _connectionString;
}


We can see now that worrying about connection strings and file names while defining the interface was premature worrying. These were all implementation details the interface isn’t concerned with, as the interface only exposes the operations clients need, like the ability to fetch customers.

Instead, these classes are “programmed” with implementation specific instructions given by constructor parameters, and the instructions give them everything they need to do the work required by the interface. The classes never change the instructions (they are all saved in read-only fields), but they use the instructions to produce new results.

We have now reached the point where we have two different classes to deal with two different sources of data, but how do we use them?

Consuming the Interface

Returning to our CustomerDump class, one obvious approach to producing results is the following.

public class CustomerDump
{
public void Render()
{
var dataSource = new CustomerCsvDataSource("customers.csv");
var customers = dataSource.FetchAllCustomers();

foreach (var customer in customers)
{
Console.WriteLine(customer.Name);
}
}
}

The above approach can work, but we’ve tied the CustomerDump class to the CSV data source by instantiating CustomerCsvDataSource directly. If we need CustomerDump to only work with a CSV data source, this is reasonable, but we know most of the application needs to work with different data sources so we’ll need to avoid this approach in most places.

Instead of CustomerDump choosing a data source and coupling itself to a specific class, we’ll force someone to give CustomerDump the data source to use.

public class CustomerDump
{
public CustomerDump(ICustomerDataSource dataSource)
{
_dataSource = dataSource;
}

public void Render()
{
var customers = _dataSource.FetchAllCustomers();

foreach (var customer in customers)
{
Console.WriteLine(customer.Name);
}
}

readonly ICustomerDataSource _dataSource;
}

Now, any logic we have inside of CustomerDump can work with customers from anywhere, and we can add new data sources in the future. We’ve gained a lot of flexibility in an area where the business demands flexibility, and hopefully didn’t build a mountain of abstractions where none were required. All the pieces are small and focused, and they way they will fit together depends on the application you are building. Which leads to the next question – who is responsible for putting CustomerDump together?

At the top level of every application built in this fashion you’ll have some bootstrapping code to arrange all the pieces and set them in motion. For a console mode application it might look like this:

static void Main(string[] args)
{
// arrange
var connectionString = @"server=(localdb)\v11.0;database=Customers";
var dataSource = new CustomerDbDataSource(connectionString);
var dump = new CustomerDump(dataSource);

// execute
dump.Render();
}

Here we have hard-coded values again, but you can imagine hard-coded connection strings and class names getting intermingled or replaced with if/else statements and settings from the app.config file. As the application becomes more complex, we could turn to tools like MEF or StructureMap to manage the construction of the building blocks we need.

Going Further

One of the biggest challenges in building well factored software is knowing when to stop adding abstractions. For example, we can say the CustomerDump class is currently tied too tightly to Console.Out. To remove the dependency we’ll instead inject a Stream for CustomerDump to use.

public CustomerDump(ICustomerDataSource dataSource,
Stream output)
{
_dataSource = dataSource;
_output = new StreamWriter(output);
}
Alternatively, we could say CustomerDump shouldn’t be responsible for both getting and formatting each customer as well as sending the result to the screen. In that case we’ll just have CustomerDump create the formatted string, and leave it to the caller to decide what to do with the result.
public string CreateDump()
{
var builder = new StringBuilder();
var customers = _dataSource.FetchAllCustomers();

foreach (var customer in customers)
{
builder.AppendFormat("{0} : {1}",
customer.Name, customer.Location);
}
return builder.ToString();
}

Now we might look at the code and decide that getting and formatting are two different responsibilities, so we’ll need someone to pass the list of customers to format instead of having the method use the data source directly. And so on, and so on.

Where do we stop?

That’s where most samples break down because the right place to stop is the place where we have just enough abstraction to make things work and still meet our requirements for testability, maintainability, scalability, readability, extensibility, and all the other ilities we need. Samples like this can show you the patterns you can use to achieve specific results, but only in the context of a specific application do we know the results we need. We need to apply both YAGNI and SRP in the right places and at the right time.

Two Great New Conferences

Monday, November 12, 2012 by K. Scott Allen
0 comments

DevIntersection – Las Vegas

DevIntersection is the final stop in the .NET Rocks! road trip and runs from December 9th to the 12th in Las Vegas. In addition to conference sessions I’ll be doing an ASP.NET MVC 4 workshop on December 13th. Register by November 15th to receive a Windows 8 tablet!

DevIntersection Las Vegas

 

Warm Crocodile – Copenhagen

The Warm Crocodile Developer Conference is a 2 day conference in Copenhagen, Denmark with a great selection of speakers and sessions. In the words of the organizers – “We want to create a brand, a conference brand that promises to deliver on set of great things, both people and content, but also, and just as much fun and partying and networking.” Warm Crocodile Developer Conference

I hope to see you there!

by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!