Some Basic Azure Table Storage Abstractions

Thursday, February 27, 2014 by K. Scott Allen
1 comment

When working with any persistence layer you want to keep the  infrastructure code separate from the business and UI logic, and working with Windows Azure Table Storage is no different. The WindowsAzure.Storage package provides a smooth API for working with tables, but not smooth enough to allow it into all areas of an application.

What I’d be looking for is an API as simple to use as the following.

var storage = new WidgetStorage();
var widgets= storage.GetAllForFacility("TERRITORY2", "FACILITY3");            
foreach (var widget in widgets)
{                
    Console.WriteLine(widget.Name);
}

The above code requires a little bit of work to abstract away connection details and query mechanics. First up is a base class for typed table storage access.

public class TableStorage<T> where T: ITableEntity, new()
{
    public TableStorage(string tableName, string connectionName = "StorageConnectionString")
    {
        var storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting(connectionName));
        var tableClient = storageAccount.CreateCloudTableClient();
        
        Table = tableClient.GetTableReference(tableName);
        Table.CreateIfNotExists();
    }

    public virtual string Insert(T entity)
    {
        var operation = TableOperation.Insert(entity);
        var result = Table.Execute(operation);
        return result.Etag;
    }

    // update, merge, delete, insert many ...
   
    protected CloudTable Table;
}

The base class can retrieve connection strings and abstract away TableOperation and BatchOperation work. It’s easy to extract an interface definition if you want to work with an abstract type. Meanwhile, derived classes can layer query operations into the mix.

public class WidgetStorage : TableStorage<Widget>
{
    public WidgetStorage()
        : base(tableName: "widgets")
    {

    }

    public IEnumerable<Widget> GetAll()
    {
        var query = new AllWidgets();
        return query.ExecuteOn(Table);
    }

    // ...

    public IEnumerable<Widget> GetAllForFacility(string territory, string facility)
    {
        var query = new AllWidgetsInFacility(territory, facility);
        return query.ExecuteOn(Table);
    }       
}

The actual query definitions I like to keep as separate classes.

public class AllWidgetsInFacility : StorageQuery<Widget>
{
    public AllWidgetsInFacility(string territory, string facility)
    {
        Query =
            Query.Where(InclusiveRangeFilter(
                key: "PartitionKey",
                from: territory + "-" + facility,
                to: territory + "-" + facility + "."));
    }
}

Separate query classes allow a base class to focus on query execution, including the management of continuation tokens, timeout and retry policies, as well as query helper methods using TableQuery. The base class also allows for easy testability via the virtual ExecuteOn method.

public class StorageQuery<T> where T:TableEntity, new()
{
    protected TableQuery<T> Query;
        
    public StorageQuery()
    {
        Query = new TableQuery<T>();
    }

    public virtual IEnumerable<T> ExecuteOn(CloudTable table)
    {
        var token = new TableContinuationToken();
        var segment = table.ExecuteQuerySegmented(Query, token);
        while (token != null)
        {
            foreach (var result in segment)
            {
                yield return result;
            }
            segment = table.ExecuteQuerySegmented(Query, token);
            token = segment.ContinuationToken;
        }                           
    }

    protected string InclusiveRangeFilter(string key, string from, string to)
    {
        var low = TableQuery.GenerateFilterCondition(key, QueryComparisons.GreaterThanOrEqual, from);
        var high = TableQuery.GenerateFilterCondition(key, QueryComparisons.LessThanOrEqual, to);
        return TableQuery.CombineFilters(low, TableOperators.And, high);
    }       
}

As an aside, one of the most useful posts on Azure Table storage is now almost 3 years old but contains many good nuggets of information. See: How to get (the) most out of Windows Azure Tables.

Easy Animations For AngularJS With Animate.css

Tuesday, February 25, 2014 by K. Scott Allen
4 comments

Animations in AngularJS can be slightly tricky. First you need to learn about the classes that Angular adds to an element during an animated event, and then you have to write the correct CSS to perform an animation. There are also special cases to consider such as style rules that require !important and Angular’s rule of cancelling nested animations.

There is a detailed look at animations on yearofmoo, but the basic premise is that Angular will add and remove CSS classes to DOM elements that are entering, leaving, showing, hiding, and moving.

First, Angular adds a class to prepare the animation. For example, when a view is about to become active, angular adds an ng-enter class. This class represents a preparation phase where a stylesheet can apply the transition rule and identify which properties to transition and how long the transition should last, as well as the initial state of the element. Opacity 0 is a good starting point for a fade animation.

div[ng-view].ng-enter {
    transition: all 0.5s linear;
    opacity: 0;
}

Next, Angular will apply a class to activate the animation, in this case .ng-enter-active.

div[ng-view].ng-enter-active {
    opacity: 1;
}

Angular will inspect the computed styles on an element to see how long the transition lasts, and automatically remove .ng-enter and .ng-enter-active when the animation completes. There is not much required for a simple animation like this.

With Animate.css

Animate.css is to transitions what Bootstrap is to layout, which means it comes with a number of pre-built and easy to use styles. Animate uses keyframe animations. which specify the start, end, and in-between points of what an element should look like, and although Animate is not tied to Angular, keyframes make Angular animations easier because there is no need to specify the “preparation” phase. Also, complicated animations roll up into a single keyframe name.

So, for example, the previous 7 lines of CSS for animating the entrance of a view become the following 4 lines of code, which not only fade in an element, but give it a natural bounce.

div[ng-view].ng-enter {
    -webkit-animation: fadeInRight 0.5s;
    animation: fadeInRight 0.5s;
}

The ng-hide and ng-show directives need a little more work to function correctly. These animations use “add” and “remove” classes, and adding !important is a key to override the default ng-hide style of display:none.

.ng-hide-remove {
    -webkit-animation: bounceIn 2.5s;
    animation: bounceIn 2.5s;
}

.ng-hide-add {
    -webkit-animation: flipOutX 2.5s;
    animation: flipOutX 2.5s;
    display: block !important;
}

Hope that helps!

Symmetric Encryption Benchmarks with C#

Monday, February 24, 2014 by K. Scott Allen
6 comments

It all started with the question of which approach will be faster – encrypt just the pieces of data in a file that need encrypted – or encrypt the entire file?

Take for example, a books.xml file where the <author> element value must be encrypted. Is it faster to encrypt the entire file, or just the individual author elements, or will it not matter because disk IO is a huge bottleneck?

These are the types of scenarios that are easy to hypothesize over, but it’s also easy to whip up some code to produce qualitative answers and put the answers in colorful charts that can be printed on glossy paper and hung on an office wall where a visitor’s natural reaction will be to ask “what’s this?”, at which point you can bombard your visitor with minutiae about symmetric encryption initialization vectors and after 10 minutes they will want to leave without remembering that they had first come into your office to ask about that nasty work item #9368, which you still haven’t fixed.

That’s called victory.

But as for the code, first we need a method to time some arbitrary action over a number of iterations.

 private static void Time(Action action, string description)
 {
     var stopwatch = new Stopwatch();
     stopwatch.Start();

     for (int i = 0; i <= IterationCount; i++)
     {
         action();
     }

     stopwatch.Stop();
     Console.WriteLine("{0} took {1}ms", description, stopwatch.ElapsedMilliseconds);
 }

Then some code to encrypt an entire file.

private static void EncryptFileTest()
{
    var provider = new AesCryptoServiceProvider();            
    var encryptor = provider.CreateEncryptor(provider.Key, provider.IV);
    using (var destination = File.Create("..\\..\\temp.dat"))
    {                
        using (var cryptoStream = new CryptoStream(destination, encryptor, CryptoStreamMode.Write))
        {
            var data = File.ReadAllBytes(FileName);
            cryptoStream.Write(data, 0, data.Length);
        }                                
    }            
}

Then some code to encrypt just the author fields.

private static void EncryptFieldsTest()
{
    var provider = new AesCryptoServiceProvider();
    var encryptor = provider.CreateEncryptor(provider.Key, provider.IV);
    var document = XDocument.Load(FileName);
    var names = document.Descendants("author");

    foreach (var element in names)
    {
        using (var destination = new MemoryStream())
        {
            using (var cryptoStream = new CryptoStream(destination, encryptor, CryptoStreamMode.Write))
            {
                using (var cryptoWriter = new StreamWriter(cryptoStream))
                {
                    cryptoWriter.Write(element.Value);
                }
                element.Value = Convert.ToBase64String(destination.ToArray());
            }
        }
    }

    document.Save("..\\..\\temp.xml");
}

The results of the benchmark on the small  books.xml file (28kb) showed that encrypting individual fields generally came out 3-25% faster than encrypting an entire file.

image

Such wide variances made me suspect that disk I/O was too unpredictable, so I also ran tests where the timing took place on in-memory operations only, and all disk IO took place before the encryption work, like with the following code.

var bytes = File.ReadAllBytes(FileName);
var document = XDocument.Load(FileName);
var elements = document.Descendants("author").ToList();

Time(() => EncryptFileTest(bytes), "EncryptFile");
Time(() => EncryptFieldsTest(elements), "EncryptFields");

Now the results started to show that encrypting one big thing was a regularly 20% faster than encrypting lots of little things.

image

The larger the input data, the faster it became to encrypt all at once.

image

Then after playing with various parameters, like different provider modes, an amazing thing happened. I switch from AesCryptoServiceProvider (which provides an interface to the native CAPI libraries) to AesManaged (which is a managed implementation of AES and not FIPS compliant, but that’s a topic for another post). Encrypting the entire file was 6x slower with managed code compared to CAPI, which wasn’t the surprising part. The surprising part was that encrypting fields with AesManaged was much faster than encrypting the entire file with AesManaged, and in fact, encrypting fields with AesManaged was almost twice as fast as encrypting fields with AesCryptoServiceProvider, and almost as fast as encrypting the entire file with a CSP.

image

After double checking to make sure this wasn’t a fluke, I came to three conclusions.

1. Once again, benchmarks prove more useful than a hypothesis, because the numbers are often counterintuitive.

2. It must be much more efficient to reuse an AesManaged provider to create multiple crypto streams than reusing an AES CSP.

3. There is still enough variability that testing against sample data like books.xml won’t cut it, I’ll need to work against real files (which might easily hit 500MB, maybe 1GB, but I hope not).

This is the point where people smarter than me will tell me everything I’ve done wrong.

Thoughts On JavaScript Generators

Monday, February 17, 2014 by K. Scott Allen
9 comments

In a previous post we used a Node.js script to copy a MongoDB collection into Azure Table Storage. The program flow is hard to follow because the asynch APIs of both drivers require us to butcher the procedural flow of steps into callback functions. For example, consider the simple case of executing three procedural steps:

  • Open a connection to MongoDB
  • Close the connection to MongoDB
  • Print “Finished!”

In code, the steps aren’t as straightforward:

var mongo = require('mongodb').MongoClient;

var main = function(){
    mongo.connect('mongodb://localhost', mongoConnected);
};

var mongoConnected = function(error, db){
    if(error) console.dir(error);
    
    db.close();
    console.log('Finished!');
};

main();

Some amount of mental effort is required to see that the mongoConnected function is a continuation of main. Callbacks are painful.

Make Me No Promises

broken promisesWhen anyone complains about the pain of callback functions, the usual recommendation is to use promises. Promises have been around for a number of years and there are a variety of promise libraries available for both Node and client scripting environments, with Q being a popular package for node.

Promises are useful, particularly when they destroy a pyramid of doom. But promises can’t un-butcher the code sliced into functions to meet the demands of an asynch API. Also, promises require shims to transform a function expecting a callback into a function returning a promise, which is what Q.nfcall will do in the following code.

var Q = require('Q');
var mongo = require('mongodb').MongoClient;

var main = function(){
    Q.nfcall(mongo.connect, 'mongodb://localhost2')
     .then(mongoConnected)
     .catch(function(error){
        console.dir(error);
     });
};

var mongoConnected = function(db){    
    db.close();
    console.log('Finished!');
};

main();

I’d say promises don’t improve the read-ability or write-ability of the code presented so far (though we can debate the usefulness if multiple calls to then are required). This is the current state of JavaScript, but JavaScript is evolving.

Generators

ECMAScript 6 (a.k.a Harmony) introduces the yield keyword. Anyone with some programming experience in C# or Python (and a number of other languages, except Ruby) will already be familiar with how yield can suspend execution of a function and return control (and a value) to the caller. At some later point, execution can return to the point just after yield occurred. In ES6, functions using the yield keyword are known as generator functions and have a special syntax (function*), as the following code demonstrates.

"use strict";

var numberGenerator = function*(){
    yield 1;
    yield 2;
    console.log("About to go to 10");
    yield 10;
};

for(let i of numberGenerator()){
    console.log(i);
};

A couple of other notes about the above code:

  1. The new let keyword gives us proper block scoping of variables.
  2. The new for-of statement allows looping over iterable objects. Arrays will be iterable, as will generator functions. 
  3. You’ll need to run Node version > 0.11 (currently “unstable”) with the –harmony flag for the code to execute.

The code will print:

1
2
About to go to 10
10

Instead of using for-of, a low level approach to working with generator functions is to work with the iterator they return. The following code will produce the same output.

let sequence = numberGenerator();
let result = sequence.next();
while(!result.done){   
    console.log(result.value);
    result = sequence.next();
}

However, what is more interesting about working with iterator objects at this level is how you can pass a value to the next method, and the value will become the result of the last yield expression inside the generator. The ability to modify the internal state of the generator is fascinating. Consider the following code, which generates 1, 2, and 20.

var numberGenerator = function*(){
    let result = yield 1;
    result = yield 2 * result;    
    result = yield 10 * result;    
};

let sequence = numberGenerator();
let result = sequence.next();
while(!result.done){  
    console.log(result.value); 
    result = sequence.next(result.value);
}

The yield keyword has interesting semantics because not only does the word itself imply the giving up of control (as in ‘I yield to temptation’), but also the production of some value (‘My tree will yield fruit’), and now we also have the ability to communicate back into the yield expression to produce a value from the caller. 

Imagine then using yield in a generator function to produce promise objects. The generator function can yield one or more promises and suspend execution to let the caller wait for the resolution of the promise. The caller can iterate over multiple promises from the generator and process each one in turn to push a result into the yield expressions.  What would it take to turn a dream into a reality?

Generators and Promises

It turns out that Q already includes an API for working with Harmony generator functions, specifically the the spawn method will immediately execute a generator. The spawn method allows us to un-butcher the code with three simple steps.

"use strict";

var Q = require('Q');
var mongo = require('mongodb').MongoClient;

var main = function*(){
    try {
        var db = yield Q.nfcall(mongo.connect, 'mongodb://localhost');
        db.close();
        console.log("Finished!");
    }
    catch(error) {
        console.dir(error);
    }
};

Q.spawn(main);

Not only does spawn un-butcher the code, but it also allows for simpler error handling as a try catch can now surround all the statements of interest. To gain a deeper understanding of spawn you can write your own. The following code is simple and makes two assumptions. First, it assumes there are no errors, and secondly it assumes all the yielded objects are promises.

var spawn = function(generator){
    
    var process = function(result){        
        
        result.value.then(function(value){
            if(!result.done) {
                process(sequence.next(value));   
            }
        });
    };

    let sequence = generator();
    let next = sequence.next();
    process(next);
};

spawn(main);

While the syntax is better, it is unfortunate that promises aren’t baked into all modules, as the shim syntax is ugly and changes the method signature. Another approach is possible, however.

Generators Without Promises

The suspend module for Node can work with or without promises, but for APIs without promises the solution is rather clever. The resume method can act as a callback factory that will allow suspend to work with yield.

var mongo = require('mongodb').MongoClient;
var suspend = require('suspend');
var resume = suspend.resume;

var main = function*(){
    try {
        let db = yield mongo.connect('mongodb://localhost', resume());
        db.close();        
        console.log("Finished!");
    }
    catch(error) {
        console.dir(error);
    }    
};

suspend.run(main);

And now the original ETL script looks like the following.

var azure = require('azure');
var mongo = require('mongodb').MongoClient;
var suspend = require('suspend');
var resume = suspend.resume;

var storageAccount = '...';
var storageAccessKey = '...';

var main = function *() {
    
    let retryOperations = new azure.ExponentialRetryPolicyFilter();
    let tableService = azure.createTableService(storageAccount, storageAccessKey)
                            .withFilter(retryOperations);
    yield tableService.createTableIfNotExists("patients", resume());

    let db = yield mongo.connect('mongodb://localhost/PatientDb', resume());
    let collection = db.collection('Patients');
    let patientCursor = collection.find();

    patientCursor.each(transformAndInsert(db, tableService));
};

var transformAndInsert = function(db, table){
    return function(error, patient){
        if (error) throw error;
        if (patient) {
            transform(patient);
            suspend.run(function*() {
                yield table.insertEntity('patients', patient, resume());
            });
            console.dir(patient);
        } else {
            db.close();
        }
    };
};

var transform = function(patient){    
    patient.PartitionKey = '' + (patient.HospitalId || '0');
    patient.RowKey = '' + patient._id;
    patient.Ailments = JSON.stringify(patient.Ailments);
    patient.Medications = JSON.stringify(patient.Medications);
    delete patient._id;    
};

suspend.run(main);

And this approach I like.

Building A Simple File Server With OWIN and Katana

Monday, February 10, 2014 by K. Scott Allen
7 comments

I have a scenario where I want to serve up HTML, JavaScript, and CSS files over HTTP from a .NET desktop application. This is the type of scenario Katana makes easy. Here is an example using a console application.

First, use NuGet to install a couple packages into the project.

install-package Microsoft.Owin.StaticFiles
install-package Microsoft.Owin.SelfHost

The source for the entire application is 17 lines of code, including using statements.

using System;
using Microsoft.Owin.Hosting;
using Owin;

namespace ConsoleApplication5
{
    class Program
    {
        static void Main(string[] args)
        {
            var url = "http://localhost:8080";
            WebApp.Start(url, builder => builder.UseFileServer(enableDirectoryBrowsing:true));            
            Console.WriteLine("Listening at " + url);
            Console.ReadLine();
        }
    }
}

The FileServer middleware will serve files from the same directory as the executable.

Static Files With OWIN and Katana

If you don’t want to use the default location, you can provide your own IFileSystem and serve files from anywhere. Katana currently provides two implementations of IFileSystem – one to serve embedded resources and one to serve files from the physical file system. You can construct a PhysicalFileSystem for an arbitrary location on the hard drive.

static void Main(string[] args)
{
    var url = "http://localhost:8080";
    var root = args.Length > 0 ? args[0] : ".";
    var fileSystem = new PhysicalFileSystem(root);

    var options = new FileServerOptions
    {
        EnableDirectoryBrowsing = true, 
        FileSystem = fileSystem                             
    };

    WebApp.Start(url, builder => builder.UseFileServer(options));            
    Console.WriteLine("Listening at " + url);
    Console.ReadLine();
}

The FileServer middleware is actually a composite wrapper around three other pieces of middleware – DefaultFiles (to select a default.html file, if present, when a request arrives for a directory), DirectoryBrowser (to list the contents of a directory if no default file is found), and StaticFile (to reply with the contents of a file in the file system). All three pieces of middleware are configurable through the UseFileServer extension method.

For example, the static file middleware will only serve files with a known content type. Although the list of known content types is extensive, you might need to serve files with uncommon extensions, in which case you can plug a custom  IContentTypeProvider into the static files middleware.

public class CustomContentTypeProvider : FileExtensionContentTypeProvider
{
    public CustomContentTypeProvider()
    {
        Mappings.Add(".nupkg", "application/zip");
    }
}

Now the final program looks like the following.

static void Main(string[] args)
{
    var url = "http://localhost:8080";
    var root = args.Length > 0 ? args[0] : ".";
    var fileSystem = new PhysicalFileSystem(root);
    var options = new FileServerOptions();
    
    options.EnableDirectoryBrowsing = true;
    options.FileSystem = fileSystem;         
    options.StaticFileOptions.ContentTypeProvider = new CustomContentTypeProvider();

    WebApp.Start(url, builder => builder.UseFileServer(options));            
    Console.WriteLine("Listening at " + url);
    Console.ReadLine();
}

Using Node.js To ETL Mongo Documents for Azure Table Storage

Tuesday, February 4, 2014 by K. Scott Allen
0 comments

In a previous post we saw how to work with the Windows Azure package for Node.js and interact with blob storage.

Another Azure adventure I’ve been tinkering with is moving MongoDb documents into Azure Table Storage. There are a couple challenges. Like Mongo, Table Storage is unstructured, but there are some limitations (not that Mongo is without limitations, but the limitations are different).

First, table storage requires every entity stored inside to include a partition key and row key. The two keys form a unique ID for an entity inside of table storage. Another limitation is that table storage only supports a few primitive data types including bool, string, date, double, and int, but there is no ability to store a collection of items directly. Thus, moving a BSON document into table storage requires some transformations to make collections work.

Here is a brute force node script to move data from a Mongo data source into table storage. The script uses the core mongodb package for Mongo since the data work is relatively low level work.

var azure = require('azure');
var MongoClient = require('mongodb').MongoClient;

var tableCreated = function(error) {
    if(error) throw error;
    MongoClient.connect('mongodb://localhost/PatientDb', mongoConnected);
}

var mongoConnected = function(error, db){
    if(error) throw error;

    var collection = db.collection("Patients");
    var patientCursor = collection.find();
    patientCursor.each(function(error, patient){
        if(error) throw error;
        if(patient) {
            transformPatient(patient);    
            loadPatient(patient);
        }
        else{
            db.close();
        }
    });
};

var transformPatient = function(patient){    
    patient.PartitionKey = '' + (patient.HospitalId || '0');
    patient.RowKey = '' + patient._id;
    patient.Ailments = JSON.stringify(patient.Ailments);
    patient.Medications = JSON.stringify(patient.Medications);
    delete patient._id;    
};

var loadPatient = function(patient){    
    tableService.insertEntity('patients', patient, function(error){
        if(error) throw error;
        console.log("Loaded " + patient.RowKey);
    });
};

var retryOperations = new azure.ExponentialRetryPolicyFilter();
var tableService = azure.createTableService()
                        .withFilter(retryOperations);
tableService.createTableIfNotExists('patients', tableCreated);

The solution I’m currently tinkering with is transforming collections into strings using JSON. It will take some more time and experiments to know if this approach is workable for the real application.

Azure Blobs with Node.js and Express

Monday, February 3, 2014 by K. Scott Allen
0 comments

The Node.js packages for Azure are fun to work with. Here’s a little walkthrough for building an Express app to display the blobs in an Azure storage container.

1. Create an application with Express.

2. Install the Windows Azure Client Library for node (npm install azure)

3. When Express created the application, it should have added a route to send a request for the root of the site to routes.index. In other words, the app.js file will look like the following.

var express = require('express')
  , routes = require('./routes')
  , path = require('path');

var app = express();

app.set('port', process.env.PORT || 3000);
app.set('views', __dirname + '/views');
app.set('view engine', 'jade');
app.use(app.router);

app.get('/', routes.index);

app.listen(app.get('port'), function(){
  console.log('Listening on ' + app.get('port'));
});

4. The index function is where the code consumes the Azure API and connects to the blob service, fetches a list of all the blobs in a container, and renders a view to display the blobs.

var azure = require('azure');

exports.index = function(request, response) {

    var accessKey = 'yourKey';
    var storageAccount= 'yourName';
    var container = 'yourContainerName';

    var blobService = azure.createBlobService(storageAccount, accessKey);
    blobService.listBlobs(container, function(error, blobs) {        
        response.render('index', {
             error: error, 
             container: container, 
             blobs: blobs
        });        
    });
}

You can pass the storage account name and the access key to createBlobService, or pass no parameters and define the environment variables AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_ACCESS_KEY.

5. Finally, the Jade view to dump blob information into the browser.

extends layout

block content
    h1 Blob listing for #{container}

    if error
        h3= error

    if blobs
        h3= container
        table
            tr
                th Name
                th Properties
            - each blob in blobs
                tr
                    td= blob.name
                    td
                        ul
                            - each value, name in blob.properties
                                if value
                                    li= name + ":" + value


Coming soon – Node.js and Azure Table Storage

by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!