Brute Force Might Work

Thursday, November 29, 2012

Problem: A content heavy web site undergoes a massive reorganization. A dozen URL rewrite rules are added to issue permanent redirects and preserve permalinks across the two versions. The question is - will the rules really work?

Brute Force Approach: Take the last 7 days of web server logs, and write about 40 lines of C# code:

static void Main()
    var baseUrl = new Uri("http://localhost/");
    var files = Directory.GetFiles(@"..\..\Data", "*.log");

    Parallel.ForEach(files, file =>
            var requests = File.ReadAllLines(file)
                .Where(line => !line.StartsWith("#"))
                .Select(line => line.Split(' '))
                .Select(parts => new
                                         Url = parts[6],
                                         Status = parts[16],
                                         Verb = parts[5]
                .Where(request => request.Verb == "GET" && 
                                  request.Status[0] != '4' &&
                                  request.Status[0] != '5');

            foreach (var request in requests)
                    var client = new WebClient();
                    client.DownloadData(new Uri(baseUrl, request.Url));
                catch (Exception ex)
                    // .. log it (in a thread safe way)

The little console mode application isn't foolproof, but it did uncover a number of problems and edge cases. As a bonus, it also turned into a good workload generator for SQL Server index re-tuning.

Gravatar Matt Friday, November 30, 2012
Great idea. Love it.
