OdeToCode IC Logo

Counting Array Entries in a Cosmos DB Document

Thursday, August 30, 2018

I’ve been trying to figure out the most efficient approach to counting matches in a Cosmos DB array, without redesigning collections and documents.

To explain the scenario, imagine a document based on the following class.

class Patient
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }

    public int[] Attributes { get; set; }   

I need a query to return patient documents with a count of how many values in the Attributes property match the values in an array I provide as a parameter.

That’s very abstract, so imagine the numbers in the array each represent a patient’s favorite food. 1 is pasta, 2 is beef, 3 is pork, 4 is chicken, 5 is eggplant, 6 is cauliflower, and so on. An Attributes value of [1, 4, 6] means a patient likes pasta, chicken, and cauliflower.

Now I need to issue a query to see what patients think of a meal that combines pasta, chicken, and eggplant (a [1, 4, 5]).

Cosmos provides a number of aggregation and array operators, including an ARRAY_CONTAINS, but to make multiple queries with dynamic parameter values, I thought a user-defined function might be easier.

In Cosmos, we implement UDFs as JavaScript functions. Here’s a UDF that takes two arrays and counts the number of items in the arrays that intersect, or match, so intersectionCount([2,5], [1,2,3,5,7,10]) returns 2.

function intersectionCount(array1, array2) {
    var count = array1.reduce((accumulator, value) => {
        if (array2.indexOf(value) > -1) {
            return accumulator + 1;
        return accumulator;
    }, 0);
    return count;

One way to use the UDF is to query the collection and return the count of matches with each document.

SELECT p, udf.intersectionCount([4,5], p.Attributes) 
FROM Patients p

I can also use the UDF in a WHERE clause.

FROM Patients p
WHERE udf.intersectionCount([1,3], p.Attributes) > 1

The UDF makes the queries easy, but might not be the best approach for performance. You’ll need to evaluate the impact of this approach using your own data and application behavior.