Debugging Map Reduce in MongoDB

Wednesday, February 11, 2015

There isn’t much insight into the execution of a map reduce script in MongoDB, but I’ve found three techniques to help. Of course the preferred technique for map reduce is to use declarative aggregation operators, but there are some problems that naturally lend themselves to copious amounts of imperative code. That’s the kind of debugging i needed to do recently.

Log File Debugging

In a Mongo script you can use print and printjson to send strings and objects into standard output. During a map reduce these functions don’t produce output on stdout, unfortunately, but the output will appear in the log file if the verbosity is set high enough. Starting mongod with a –vvvv flag works for me.

Log file output can be useful in some situations, but in general, digging through a log file created in high verbosity mode is difficult.

Attached To The Output Debugging

The best way I’ve found to debug map reduce scripts running inside Mongo is to attach logging data directly to the output of the map and reduce functions.

To debug map functions, this means you’ll emit an object that might have an array attached, like the following.

{
  "name": "Scott",
  "total" : 15,
  "events" : [
    "Step A worked",
    "Flag B is false",
    "More debugging here"
  ]
}

Inside the map function you can push debugging strings and objects into the events array. Of course the reduce function will have to preserve this debugging information, possibly by aggregating the arrays. However, if you are debugging the map function I’d suggest simplifying the process by not reducing and simply letting emitted objects pass through to the output collection. Another technique to do this is to emit using a key of new ObjectId, so each emitted object is in its own bucket.

As an aside, my favorite tool for poking around in Mongo data is Robomongo (works on OSX and Windows). Robomongo is shell oriented, so you can use all the Mongo commands you already know and love. 

robomongo for mongodb

Robomongo’s one shortcoming is in trying to edit system stored JavaScript. For that task I use MongoVue (Windows only, requires a  license to unlock some features).

Browser Debugging

By far the best debugging experience is to move a map or reduce function, along with some data, into a browser. The browser has extremely capable debugging tools where you can step through code and inspect variables, but there are a few things you’ll need to do in preparation.

1. Define any globals that the map function needs. At a minimum, this would be an emit function, which might be as simple as the following.

var result = null;

window.emit = function(id, value) {
    result = value;
};

2. Have a plan to manage ObjectId types on the client. With the C# driver I use the following ActionResult derived class to get raw documents to the browser with custom JSON settings to transform ObjectId fields into legal JSON. 

public class BsonResult : ActionResult
{
    private readonly BsonDocument _document;

    public BsonResult(BsonDocument document)
    {
        _document = document;
    }

    public override void ExecuteResult(ControllerContext context)
    {
        var settings = new JsonWriterSettings();
        settings.OutputMode = JsonOutputMode.Strict;

        var response = context.HttpContext.Response;
        response.ContentType = "application/json";                      
        response.Write(_document.ToJson(settings));
    }        
}

Note that using JsonOutputMode.Strict will give you a string that a browser can parse using JSON.parse, but it will change fields of type ObjectId into full fledged objects ({ “$oid”: “5fac…ffff”}). This behavior will create a problem if the map script ever tries to compare ObjectId fields by value (object1.id === object2.id will always be false). If the ObjectId creates a problem, the best plan, I think, is to walk through the document using reflection in the browser and change the fields into simple strings with the value of the ID.

Hope that helps!


Comments
Comments are closed.

My Pluralsight Courses

K.Scott Allen OdeToCode by K. Scott Allen
What JavaScript Developers Should Know About ECMAScript 2015
The Podcast!