Problem: A content heavy web site undergoes a massive reorganization. A dozen URL rewrite rules are added to issue permanent redirects and preserve permalinks across the two versions. The question is - will the rules really work?
Brute Force Approach: Take the last 7 days of web server logs, and write about 40 lines of C# code:
static void Main()
{
var baseUrl = new Uri("http://localhost/");
var files = Directory.GetFiles(@"..\..\Data", "*.log");
Parallel.ForEach(files, file =>
{
var requests = File.ReadAllLines(file)
.Where(line => !line.StartsWith("#"))
.Select(line => line.Split(' '))
.Select(parts => new
{
Url = parts[6],
Status = parts[16],
Verb = parts[5]
})
.Where(request => request.Verb == "GET" &&
request.Status[0] != '4' &&
request.Status[0] != '5');
foreach (var request in requests)
{
try
{
var client = new WebClient();
client.DownloadData(new Uri(baseUrl, request.Url));
}
catch (Exception ex)
{
// .. log it (in a thread safe way)
}
}
});
}
The little console mode application isn't foolproof, but it did uncover a number of problems and edge cases. As a bonus, it also turned into a good workload generator for SQL Server index re-tuning.
OdeToCode by K. Scott Allen