Using Node.js To ETL Mongo Documents for Azure Table Storage

Tuesday, February 4, 2014

In a previous post we saw how to work with the Windows Azure package for Node.js and interact with blob storage.

Another Azure adventure I’ve been tinkering with is moving MongoDb documents into Azure Table Storage. There are a couple challenges. Like Mongo, Table Storage is unstructured, but there are some limitations (not that Mongo is without limitations, but the limitations are different).

First, table storage requires every entity stored inside to include a partition key and row key. The two keys form a unique ID for an entity inside of table storage. Another limitation is that table storage only supports a few primitive data types including bool, string, date, double, and int, but there is no ability to store a collection of items directly. Thus, moving a BSON document into table storage requires some transformations to make collections work.

Here is a brute force node script to move data from a Mongo data source into table storage. The script uses the core mongodb package for Mongo since the data work is relatively low level work.

var azure = require('azure');
var MongoClient = require('mongodb').MongoClient;

var tableCreated = function(error) {
    if(error) throw error;
    MongoClient.connect('mongodb://localhost/PatientDb', mongoConnected);
}

var mongoConnected = function(error, db){
    if(error) throw error;

    var collection = db.collection("Patients");
    var patientCursor = collection.find();
    patientCursor.each(function(error, patient){
        if(error) throw error;
        if(patient) {
            transformPatient(patient);    
            loadPatient(patient);
        }
        else{
            db.close();
        }
    });
};

var transformPatient = function(patient){    
    patient.PartitionKey = '' + (patient.HospitalId || '0');
    patient.RowKey = '' + patient._id;
    patient.Ailments = JSON.stringify(patient.Ailments);
    patient.Medications = JSON.stringify(patient.Medications);
    delete patient._id;    
};

var loadPatient = function(patient){    
    tableService.insertEntity('patients', patient, function(error){
        if(error) throw error;
        console.log("Loaded " + patient.RowKey);
    });
};

var retryOperations = new azure.ExponentialRetryPolicyFilter();
var tableService = azure.createTableService()
                        .withFilter(retryOperations);
tableService.createTableIfNotExists('patients', tableCreated);

The solution I’m currently tinkering with is transforming collections into strings using JSON. It will take some more time and experiments to know if this approach is workable for the real application.


Comments
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!