Thoughts On JavaScript Generators

Monday, February 17, 2014

In a previous post we used a Node.js script to copy a MongoDB collection into Azure Table Storage. The program flow is hard to follow because the asynch APIs of both drivers require us to butcher the procedural flow of steps into callback functions. For example, consider the simple case of executing three procedural steps:

  • Open a connection to MongoDB
  • Close the connection to MongoDB
  • Print “Finished!”

In code, the steps aren’t as straightforward:

var mongo = require('mongodb').MongoClient;

var main = function(){
    mongo.connect('mongodb://localhost', mongoConnected);
};

var mongoConnected = function(error, db){
    if(error) console.dir(error);
    
    db.close();
    console.log('Finished!');
};

main();

Some amount of mental effort is required to see that the mongoConnected function is a continuation of main. Callbacks are painful.

Make Me No Promises

broken promisesWhen anyone complains about the pain of callback functions, the usual recommendation is to use promises. Promises have been around for a number of years and there are a variety of promise libraries available for both Node and client scripting environments, with Q being a popular package for node.

Promises are useful, particularly when they destroy a pyramid of doom. But promises can’t un-butcher the code sliced into functions to meet the demands of an asynch API. Also, promises require shims to transform a function expecting a callback into a function returning a promise, which is what Q.nfcall will do in the following code.

var Q = require('Q');
var mongo = require('mongodb').MongoClient;

var main = function(){
    Q.nfcall(mongo.connect, 'mongodb://localhost2')
     .then(mongoConnected)
     .catch(function(error){
        console.dir(error);
     });
};

var mongoConnected = function(db){    
    db.close();
    console.log('Finished!');
};

main();

I’d say promises don’t improve the read-ability or write-ability of the code presented so far (though we can debate the usefulness if multiple calls to then are required). This is the current state of JavaScript, but JavaScript is evolving.

Generators

ECMAScript 6 (a.k.a Harmony) introduces the yield keyword. Anyone with some programming experience in C# or Python (and a number of other languages, except Ruby) will already be familiar with how yield can suspend execution of a function and return control (and a value) to the caller. At some later point, execution can return to the point just after yield occurred. In ES6, functions using the yield keyword are known as generator functions and have a special syntax (function*), as the following code demonstrates.

"use strict";

var numberGenerator = function*(){
    yield 1;
    yield 2;
    console.log("About to go to 10");
    yield 10;
};

for(let i of numberGenerator()){
    console.log(i);
};

A couple of other notes about the above code:

  1. The new let keyword gives us proper block scoping of variables.
  2. The new for-of statement allows looping over iterable objects. Arrays will be iterable, as will generator functions. 
  3. You’ll need to run Node version > 0.11 (currently “unstable”) with the –harmony flag for the code to execute.

The code will print:

1
2
About to go to 10
10

Instead of using for-of, a low level approach to working with generator functions is to work with the iterator they return. The following code will produce the same output.

let sequence = numberGenerator();
let result = sequence.next();
while(!result.done){   
    console.log(result.value);
    result = sequence.next();
}

However, what is more interesting about working with iterator objects at this level is how you can pass a value to the next method, and the value will become the result of the last yield expression inside the generator. The ability to modify the internal state of the generator is fascinating. Consider the following code, which generates 1, 2, and 20.

var numberGenerator = function*(){
    let result = yield 1;
    result = yield 2 * result;    
    result = yield 10 * result;    
};

let sequence = numberGenerator();
let result = sequence.next();
while(!result.done){  
    console.log(result.value); 
    result = sequence.next(result.value);
}

The yield keyword has interesting semantics because not only does the word itself imply the giving up of control (as in ‘I yield to temptation’), but also the production of some value (‘My tree will yield fruit’), and now we also have the ability to communicate back into the yield expression to produce a value from the caller. 

Imagine then using yield in a generator function to produce promise objects. The generator function can yield one or more promises and suspend execution to let the caller wait for the resolution of the promise. The caller can iterate over multiple promises from the generator and process each one in turn to push a result into the yield expressions.  What would it take to turn a dream into a reality?

Generators and Promises

It turns out that Q already includes an API for working with Harmony generator functions, specifically the the spawn method will immediately execute a generator. The spawn method allows us to un-butcher the code with three simple steps.

"use strict";

var Q = require('Q');
var mongo = require('mongodb').MongoClient;

var main = function*(){
    try {
        var db = yield Q.nfcall(mongo.connect, 'mongodb://localhost');
        db.close();
        console.log("Finished!");
    }
    catch(error) {
        console.dir(error);
    }
};

Q.spawn(main);

Not only does spawn un-butcher the code, but it also allows for simpler error handling as a try catch can now surround all the statements of interest. To gain a deeper understanding of spawn you can write your own. The following code is simple and makes two assumptions. First, it assumes there are no errors, and secondly it assumes all the yielded objects are promises.

var spawn = function(generator){
    
    var process = function(result){        
        
        result.value.then(function(value){
            if(!result.done) {
                process(sequence.next(value));   
            }
        });
    };

    let sequence = generator();
    let next = sequence.next();
    process(next);
};

spawn(main);

While the syntax is better, it is unfortunate that promises aren’t baked into all modules, as the shim syntax is ugly and changes the method signature. Another approach is possible, however.

Generators Without Promises

The suspend module for Node can work with or without promises, but for APIs without promises the solution is rather clever. The resume method can act as a callback factory that will allow suspend to work with yield.

var mongo = require('mongodb').MongoClient;
var suspend = require('suspend');
var resume = suspend.resume;

var main = function*(){
    try {
        let db = yield mongo.connect('mongodb://localhost', resume());
        db.close();        
        console.log("Finished!");
    }
    catch(error) {
        console.dir(error);
    }    
};

suspend.run(main);

And now the original ETL script looks like the following.

var azure = require('azure');
var mongo = require('mongodb').MongoClient;
var suspend = require('suspend');
var resume = suspend.resume;

var storageAccount = '...';
var storageAccessKey = '...';

var main = function *() {
    
    let retryOperations = new azure.ExponentialRetryPolicyFilter();
    let tableService = azure.createTableService(storageAccount, storageAccessKey)
                            .withFilter(retryOperations);
    yield tableService.createTableIfNotExists("patients", resume());

    let db = yield mongo.connect('mongodb://localhost/PatientDb', resume());
    let collection = db.collection('Patients');
    let patientCursor = collection.find();

    patientCursor.each(transformAndInsert(db, tableService));
};

var transformAndInsert = function(db, table){
    return function(error, patient){
        if (error) throw error;
        if (patient) {
            transform(patient);
            suspend.run(function*() {
                yield table.insertEntity('patients', patient, resume());
            });
            console.dir(patient);
        } else {
            db.close();
        }
    };
};

var transform = function(patient){    
    patient.PartitionKey = '' + (patient.HospitalId || '0');
    patient.RowKey = '' + patient._id;
    patient.Ailments = JSON.stringify(patient.Ailments);
    patient.Medications = JSON.stringify(patient.Medications);
    delete patient._id;    
};

suspend.run(main);

And this approach I like.


Comments
gravatar Chris Jones Monday, February 17, 2014
Q is awesome, make it so easy work with async API's and to make modular re-usable objects that can be put together using Q as glue. Because promises are object you can just return them from a function, which can in turn compose multiple promises. Doing this lets you build high level function (connect, close) that hide the noise of low level calls. Gist here --> https://gist.github.com/cajones/9052737
gravatar scott Monday, February 17, 2014
@Chris yes, that's the type code I'm trying to avoid, or at least explore alternatives.
gravatar Rob Monday, February 17, 2014
Scott have you seen this libarary? http://taskjs.org/
gravatar scott Monday, February 17, 2014
@Rob: I did look at it quickly, but if you recommend it I'll take another look.
gravatar Oran Monday, February 17, 2014
Sorry to jump on the "have you seen" bandwagon, but have you seen the Bluebird promises library? You can do Promise.promisifyAll(mongo), which adds *Async versions of the callback methods, allowing you to use it as "yield mongo.connectAsync('...');" inside a call to Promise.spawn(). Bluebird is orders of magnitude faster than Q, so well worth checking out. See this for details: http://spion.github.io/posts/why-i-am-switching-to-promises.html
gravatar Ryan Cromwell Thursday, February 20, 2014
Ruby does have yield https://gist.github.com/cromwellryan/9107304
gravatar scott Thursday, February 20, 2014
@Ryan: Yes, I specifically called out Ruby because Ruby's yield is different (not saying it's bad, but invoking a procedure object will feel upside down to someone who is used to other languages that use yield as a keyword, I think).
gravatar Ajay Monday, February 24, 2014
Thanks, nice article.
gravatar Brian Reischl Tuesday, March 4, 2014
The result looks quite similar to the C# async & await keywords. The implementation is not completely different either - the structure seems (very) vaguely similar to the state machine that async/await creates under the covers.
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!