OdeToCode IC Logo

The Law of the JSON

Monday, February 4, 2019

From Wikipedia: The law of the instrument is a cognitive bias that involves an over-reliance on a familiar tool.

The software industry is good at adopting standard data formats, then hyping and ultimately abusing those formats by pushing them into every conceivable nook and cranny of the trade.

Take XML, for example. What started as a simple, usable, textual data format became the lingua franca for enterprise web services. Along the way, we also used XML to build configuration files, replace make files, and implement more than a dozen significant UI frameworks.

Ten years ago, I was working with Microsoft’s triumvirate crown jewels of XML technology – WCF, WPF, and WF (Windows Workflow). Workflow is a good example of how XML transformed itself into a bloviating monster of complexity.

Example A: How to represent the integer value 5,000 in WF markup:

<ns0:CodePrimitiveExpression>
    <ns0:CodePrimitiveExpression.Value>
    <ns1:Int32 xmlns:ns1="clr-namespace:System;Assembly=mscorlib, 
                  Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">
           5000
       </ns1:Int32>
    </ns0:CodePrimitiveExpression.Value>
</ns0:CodePrimitiveExpression>

Use XML, they said. It’s human readable, they said.

The Curse of Ubiquity

A few weeks ago, I became interested in cloud governance. Not an exciting topic, but when a company turns developers loose in the cloud, someone must enforce some basic rules about what services to use, and ensure the settings are in place to make the services as secure as possible. For Azure, I looked at using Azure Policy. It was obvious that I’d need to write some custom rules for Policy. To see how rule authoring works, I looked at a built-in rule that checks if data encryption is on for an Azure SQL instance:

{
  "if": {
    "allOf": [
      {
        "field": "type",
        "equals": "Microsoft.Sql/servers/databases"
      },
      {
        "field": "name",
        "notEquals": "master"
      }
    ]
  },
  "then": {
    "effect": "[parameters('effect')]",
    "details": {
      "type": "Microsoft.Sql/servers/databases/transparentDataEncryption",
      "name": "current",
      "existenceCondition": {
        "allOf": [
          {
            "field": "Microsoft.Sql/transparentDataEncryption.status",
            "equals": "enabled"
          }
        ]
      }
    }
  }
}

So, in the last 10 years, programming has evolved from writing syntax trees in XML to writing syntax trees in JSON. Actually, software has evolved from doing everything in XML to doing everything in JSON - web services, configuration files, build systems, database storage, and more. JSON’s ubiquity is surprising given that JSON doesn’t have any of the same flexibility and extensibility features as XML.

Or, as @kellabyte recently said:

Yes, I know. I lived through the AJAX days when everyone said JSON was the future. Reason being that browsers and mobile devices didn't come with XML parsers by default.

Web browsers and mobile devices today carry AI for facial recognition, support headsets for augmented reality, stream 4K video, and execute GPU accelerated 3D animations.

But parse XML documents? Nobody wants to walk in that minefield.

JSON is the easy choice.

The Easy Choice

I can understand the argument for easy. I imagine trying to design Azure Policy and thinking about implementation details for custom rules.

Can we force everyone to use Powershell? No!

JavaScript? How to sandbox the execution?

Wait, I got it – describe the code in JSON! Everyone knows JSON. All the tools support JSON. The answer must be JSON.

I’ve been down a similar road before. Years ago, I needed to create a custom ETL tool to move data between relational databases as fast as possible. At the time, Microsoft’s offering from SQL Server was SSIS 1 (SQL Server Integration Services). I spent a couple of days with SSIS and decided it was not an appropriate choice for our specific scenario. The XML was complex, hard to version, hard to test, hard to debug, slow to execute, yielded unhelpful error messages, and made a team wholly dependent on UI tools that ship with SQL Server. Not to mention, SSIS wouldn’t easily scale to meet the thousands of packages and package variations we needed to support. I had been down that road before with Microsoft’s previous ETL tools (DTS – data transformation services) and vowed never again.

Once I decided to build my own ETL tool, I needed to decide on the language for the tool. My first attempt, which survived in the wild for a brief time, relied on Ruby and a fluent API. The second attempt tried to simplify things. I needed SQL, but a way to surround the SQL with metadata.

Why not use XML? Everyone knows XML, and all the tools support XML. XML is easy!

The result used "packages" that looked something like the following 2:

<Package Type="Arracher.Core.Packages.BulkLoadPackage">
  <Name>Load Movies</Name>
  <Source>LiveDb</Source>
  <Destination>RapidDb</Destination>
  <DestinationTable>tbl_movies</DestinationTable>  
  <Query>
    <![CDATA[        
    DECLARE @StartDate datetime
    SET @StartDate = '@{StartDate}'
               
    SELECT CONVERT(varchar(1),NULL) As title, release_date  as releasedon          
    FROM dbo.movies WHERE release_date > @StartDate          
    ]]>
  </Query>
</Package>

Is it the best tool in the world? No.

But, packages are easy to version, easy to diff, easy to edit, author, and test, and best of all – a dba can open the file, copy out the SQL, tweak the SQL, and paste the SQL back in without any special editing tools and a minimal amount of fuss. The SQL can be as complicated as SQL can be.

The tool works, I believe, because the XML doesn’t get in the way. That’s the problem you can run into when you embed one language in another – one of the languages will get in the way.

Embedding

Azure Policy tries to embed Boolean rules inside of JSON, but to me, the JSON only gets in the way. It’s like embedding SQL code inside of C# or Java code – some language combinations are hard on the eyes, which makes the result hard to write, and impossible to read. You can’t just skim the code to get an idea of what might happen.

With policy, the more complex the expression, the more unreadable the code. The solution is error prone, hard to test, and therefore not scalable.

Here’s @kellabyte again, this time on embedding in JSON:

This is one of the reasons why LINQ is so compelling in C#. There’s no embedding – I have C# code inside of C# code, but some of the C# code might just translate itself into a different language and execute in a different process on some distant machine.

Despite what you might think of ORMs, or the hidden performance costs of LINQ, the feature still amazes me every time I see a query. I had the same sense of excitement the first time I ran across Gremlin-Python, too.

Summary

Truthfully, I wrote this post to organize my thoughts around Azure Policy.

Do I want to take a pass on what comes in the box with Azure? I think so.

Can I rationalize a custom solution? I think so.

I can invent my own tool for governance audits and make the rules easier to author, change, and test, as well as be more flexible.

And just for fun, I’ll write the tool in Go.

I’ll rationalize that decision in another post.

[1] The current offering for ETL (keep in mind ETL is a legacy term for a legacy form of processing) from Microsoft is Azure Data Factory (ADF). We author ADF packages in JSON. ADF v2 supports calling into SSIS XML packages. This is like candy built from data formatting standards - a hard JSON shell surrounds a creamy-sweet XML interior.

[2] I named the tool Arracher – French for “rip out”, or extract.


Comments
Gravatar Chris Dickerson Monday, February 4, 2019
I work with C# and LabVIEW, and in both camps, XML support has been waning for years. Microsoft hasn't touched their .NET XML handling (serialization, anyone?) since around... 2003? Newtonsoft's handling of JSON and conversion of XML to JSON and back, validation, etc. is awesome.Oh... and actively supported. I've even gone as far as changed out C# config files to JSON. The customer rarely changes an apps config file, and reading it is a single line of code to deserialize it. I can't speak to Azure and JSON, I just store data there. :)
Gravatar cocowalla Monday, February 4, 2019
I really like the Azure platform, but ARM templates... what were they thinking trying to cram conditionals and logic into JSON! When you get round to doing for loops, your eyes will really begin to bleed, check the 1st example in this page: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-multiple "name": "[concat(copyIndex(),'storage', uniqueString(resourceGroup().id))]", Argh! My eyes!
Gravatar scott Monday, February 4, 2019
Oh, ARM templates. [shudder]!
Gravatar Jon Monday, February 4, 2019
Why not use .csx with an expected output object? I'm not sure how you would do this. But I would think it would be possible.
Gravatar Jon Monday, February 4, 2019
For that tweet. Why not do something like this: ```json { "query": "SELECT * FROM dbo.whatever WHERE (@)something = 42 ", "arguments": { "something": 46} } ``` I don't know all the logic it entails but that is pretty simple and can be generated.
Gravatar Brad Wood Tuesday, February 5, 2019
Oh man, I can most certainly agree on 2 points: * WF / WPF. XAML specifically. What a disaster. Countless hours wasted trying to declare some object or make a super obscure binding statement work. * SSIS error messages. World class worthless! SSIS is the most miserable technology I've ever worked with. But as for JSON, it's only going increase in use cases; C# 8 is adding much better JSON support.
Gravatar Jesse Tuesday, February 5, 2019
Sooo... these formats are good and useful until Microsoft needs to make them Enterprise Ready™, is that what I'm getting?
Gravatar Alex Tuesday, February 5, 2019
The Azure CLI, which I must say is cleaner and leaner than the powershell alternative is written in Python.. Instead of creating these crazy tree representation of code in data transfer files, how about Python 2.x scripts? Its not outside their experience. I do like python for scripting, it has decent tool support, fully cross platform and no endless braces, whats not to like?
Gravatar Jason Wednesday, February 6, 2019
Also, if you've worked with Kubernetes before, you know there is A LOT of yaml. It's all about finding the right tool for the job IMHO.
Gravatar Alastairtree Wednesday, February 6, 2019
Been spending a lot of time working with ARM templates and yeah JSON is not a good tool for the job. If you have not already you should have a look at terraforn and Hashicorps configuration language HCL for an interesting take on an alternative text format for declaring state to JSON that is also JSON interoperble. It's weird at first but actually very nice on the author after a while, bit like markdown in its simple author/reader friendly style rather than computer friendly style of XML/JSON. The HCL HEREDOC syntax is also a cute solution for embedding one language inside another.
Your Comment