Azure’s DocumentDB has an appealing scalability model, but you must pay attention to the limits and quotas from the start. Of particular interest to me is the maximum request size for a document, which is currently 512kb. When DocumentDB first appeared the limit was a paltry 16KB, so 512kb feels roomy, but how much real application data can that hold?
Let’s say you need to store a collection of addresses for a hospital patient.
public class Patient { public string Id { get; set; } public IList<Address> Addresses { get; set; } } public class Address { public string Description { get; set; } public string City { get; set; } public string Country { get; set; } }
In theory the list of address objects is an unbounded collection and could exceed the maximum request size and generate runtime errors. But in practice, how many addresses could a single person associate with? There is the home address, the business address, perhaps a temporary vacation address. You don’t want to complicate the design of the application to support unlimited addresses, so instead you might enforce a reasonable limit in the application logic and tell customers that having more than 5 addresses on file is not supported.
Here’s a slightly trickier scenario.
public class Patient { public string Id { get; set; } public IList<Medication> Medications { get; set; } } public class Medication { public string Code { get; set; } public DateTime Ordered { get; set; } public DateTime Administered { get; set; } }
Each medication entry consists of an 8 character code and two DateTime properties, which gives us a fixed size for every medication a patient receives, but again the potential problem is the total number of medications a patient might receive.
The first question then, is how many Medication objects can a 512kb request support?
The answer, estimated with a calculator and verified with code, is just over 6,000.
The second question then, is 6,000 a safe number?
To answer the second question I found it useful to analyze some real data and find that the odds of busting the request size are roughly 1 in 100,000, which is just over 4 standard deviations. Generally a 4 sigma number is good enough to say “it won’t happen”, but what’s interesting when operating at scale, is that with 1 million patients you’ll observe the 4 sigma event not once, but 10 times.
From the business perspective, the result is unacceptable, so back to the drawing board.
We use to say that you spend 80% of your time on 20% of the problem. At scale there is the possibility of spending 80% of your time on 0.000007% of the problem.