I’m looking at 5-7 files of data in CSV format. The columns and format can change every 3 to 6 months. What I wanted was a strongly type wrapper / data transfer object for all the CSV file formats I had to work with, and I thought this would be a good opportunity to try a T4 template.
If you haven’t heard of T4 templates before, then it is a technology that is worth your time to investigate. Hanselman has a pile of links in his post T4 (Text Template Transformation Toolkit) Code Generation – Best Kept Visual Studio Secret.
Each CSV file I’m working with has a header row:
PID,CaseID,BirthDate,DischargeDate,Status …
1001,2001,1/1/1970,6/1/2009,Foo …
… and so on for ~ 100 columns.
I wanted a template that could look at every CSV file it finds in the same directory as itself and create a class with a string property for each column. The template would derive the name of the class from the filename. Here’s a starting point:
<#@ template language="c#v3.5" hostspecific="true" #>
<#@ import namespace="System.IO" #>
<#@ import namespace="System.Collections.Generic" #>
using System;
namespace Acme.Foo.CsvImport {
<#
foreach(var fileName in GetCsvFileNames())
{
var fields = GetFields(fileName);
var className = GenerateClassName(fileName);
#>
public class <#= className #> : ImportRecord
{
public <#= className #>(string data)
{
var fields = data.Split(',');
<# for(int i = 0; i < fields.Length; i++) { #>
<#= fields[i] #> = (fields[<#= i #>]);
<# } #>
}
<#
foreach(var field in fields) { #>
public string <#= field #> { get; set; } <#
} #>
}
<# } #>
} // end namespace
<#+
private string GenerateClassName(string fileName)
{
return Path.GetFileName(fileName).Substring(0,2)
+ "_ImportRecord";
}
private string[] GetFields(string fileName)
{
using (var stream = new StreamReader(
File.OpenRead(fileName)))
{
return stream.ReadLine().Split(',');
}
}
private IEnumerable<string> GetCsvFileNames()
{
var path = Host.ResolvePath("");
return Directory.GetFiles(path, "*.csv");
}
#>
The template itself needs some polishing, and the generated code needs some error handling, but this was enough to prove how easy it is to spit out type definitions from a CSV header:
public class AA_ImportRecord
{
public AA_ImportRecord(string data)
{
var fields = data.Split(',');
ID = fields[0];
CaseID = fields[1];
BirthDate = fields[2];
}
public string ID { get; set; }
public string CaseID { get; set; }
public string BirthDate { get; set; }
// ...
}
What I Learned
- If you want to use the C# 3.5 compiler and LINQ features, make sure to use the correct template and assembly directives. Oleg Sych has the details: Understanding T4: <#@ template #> directive.
- When the template runs it is an object derived from the TextTransformation class. Oleg’s post that I linked to above also has details on this magic. Use the <#+ #> syntax to add members (a.k.a helper functions) to the class definition for this object.
- The Host property is of type ITextEngineTemplatingHost. You can use the Host to log errors, find referenced assemblies, and more. In the template I had to use Host.ResolvePath to get the absolute path to the directory where the template file lives.
- The biggest issue I had was using <# and #> instead of <% and %>, and <#= #> instead of <%= %>. I created at least 100 errors because my fingers are trained for ASP.NET views instead of text templates.