OdeToCode IC Logo

A Simpler MapReduce with MongoDB and C#

Thursday, March 29, 2012

Continuing from the previous post about MapReduce with MongoDB, we can clean up the code a bit using an extension method that takes a single string parameter.

var db = server.GetDatabase("mytest");
var collection = db.GetCollection("movies");
var results = collection.MapReduce("CategorySummary");

foreach (var result in results.GetResults())
{
    Console.WriteLine(result.ToJson());
}

The code no longer needs to embed JavaScript into the C# code, or specify the map reduce options. Instead, the "CategorySummary" parameter tells the extension method everything it needs to know.

imageFirst, it can load the map, reduce, and finalize functions from JavaScript files on the file system using the parameter to build the path to the files. Keeping the script code in .js files makes the functions easier to author, maintain, and test.

Secondly, it can automatically build output options and will assume you want to send the MapReduce results to an output collection using the same name. The code looks a bit like the following:

public static class MapReduceExtensions
{
    public static MapReduceResult MapReduce(
        this MongoCollection collection, 
        string name)
    {
        var options = new MapReduceOptionsBuilder();
        options.SetOutput(name);            
        if(FinalizeExists(name))
        {
            options.SetFinalize(ReadFinalizeFile(name));
        }
        return collection.MapReduce(ReadMapFile(name), 
                                    ReadReduceFile(name), 
                                    options);
    }

    static string ReadMapFile(string name)
    {
        return File.ReadAllText(Path.Combine(ROOTDIR, name, "map.js"));
    }

    // ......

    static bool FinalizeExists(string name)
    {
        return File.Exists(Path.Combine(ROOTDIR, name, "finalize.js"));
    }

    private static string ROOTDIR = "mapreduce";
}

The idea is to make things like MapReduce as easy as possible by hiding some infrastructure, and applying easy to learn conventions.