From Wikipedia: The law of the instrument is a cognitive bias that involves an over-reliance on a familiar tool.
The software industry is good at adopting standard data formats, then hyping and ultimately abusing those formats by pushing them into every conceivable nook and cranny of the trade.
Take XML, for example. What started as a simple, usable, textual data format became the lingua franca for enterprise web services. Along the way, we also used XML to build configuration files, replace make files, and implement more than a dozen significant UI frameworks.
Ten years ago, I was working with Microsoft’s triumvirate crown jewels of XML technology – WCF, WPF, and WF (Windows Workflow). Workflow is a good example of how XML transformed itself into a bloviating monster of complexity.
Example A: How to represent the integer value 5,000 in WF markup:
<ns0:CodePrimitiveExpression> <ns0:CodePrimitiveExpression.Value> <ns1:Int32 xmlns:ns1="clr-namespace:System;Assembly=mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"> 5000 </ns1:Int32> </ns0:CodePrimitiveExpression.Value> </ns0:CodePrimitiveExpression>
Use XML, they said. It’s human readable, they said.
A few weeks ago, I became interested in cloud governance. Not an exciting topic, but when a company turns developers loose in the cloud, someone must enforce some basic rules about what services to use, and ensure the settings are in place to make the services as secure as possible. For Azure, I looked at using Azure Policy. It was obvious that I’d need to write some custom rules for Policy. To see how rule authoring works, I looked at a built-in rule that checks if data encryption is on for an Azure SQL instance:
{ "if": { "allOf": [ { "field": "type", "equals": "Microsoft.Sql/servers/databases" }, { "field": "name", "notEquals": "master" } ] }, "then": { "effect": "[parameters('effect')]", "details": { "type": "Microsoft.Sql/servers/databases/transparentDataEncryption", "name": "current", "existenceCondition": { "allOf": [ { "field": "Microsoft.Sql/transparentDataEncryption.status", "equals": "enabled" } ] } } } }
So, in the last 10 years, programming has evolved from writing syntax trees in XML to writing syntax trees in JSON. Actually, software has evolved from doing everything in XML to doing everything in JSON - web services, configuration files, build systems, database storage, and more. JSON’s ubiquity is surprising given that JSON doesn’t have any of the same flexibility and extensibility features as XML.
Or, as @kellabyte recently said:
How is it developers killed the use of XML for JSON, a format that has no ability to add comments?
— Kelly Sommers (@kellabyte) January 28, 2019
What developers decided this was a good idea? A format which half the lines in the editor have 1 character in them.
Yes, I know. I lived through the AJAX days when everyone said JSON was the future. Reason being that browsers and mobile devices didn't come with XML parsers by default.
Web browsers and mobile devices today carry AI for facial recognition, support headsets for augmented reality, stream 4K video, and execute GPU accelerated 3D animations.
But parse XML documents? Nobody wants to walk in that minefield.
JSON is the easy choice.
I can understand the argument for easy. I imagine trying to design Azure Policy and thinking about implementation details for custom rules.
Can we force everyone to use Powershell? No!
JavaScript? How to sandbox the execution?
Wait, I got it – describe the code in JSON! Everyone knows JSON. All the tools support JSON. The answer must be JSON.
I’ve been down a similar road before. Years ago, I needed to create a custom ETL tool to move data between relational databases as fast as possible. At the time, Microsoft’s offering from SQL Server was SSIS 1 (SQL Server Integration Services). I spent a couple of days with SSIS and decided it was not an appropriate choice for our specific scenario. The XML was complex, hard to version, hard to test, hard to debug, slow to execute, yielded unhelpful error messages, and made a team wholly dependent on UI tools that ship with SQL Server. Not to mention, SSIS wouldn’t easily scale to meet the thousands of packages and package variations we needed to support. I had been down that road before with Microsoft’s previous ETL tools (DTS – data transformation services) and vowed never again.
Once I decided to build my own ETL tool, I needed to decide on the language for the tool. My first attempt, which survived in the wild for a brief time, relied on Ruby and a fluent API. The second attempt tried to simplify things. I needed SQL, but a way to surround the SQL with metadata.
Why not use XML? Everyone knows XML, and all the tools support XML. XML is easy!
The result used "packages" that looked something like the following 2:
<Package Type="Arracher.Core.Packages.BulkLoadPackage"> <Name>Load Movies</Name> <Source>LiveDb</Source> <Destination>RapidDb</Destination> <DestinationTable>tbl_movies</DestinationTable> <Query> <![CDATA[ DECLARE @StartDate datetime SET @StartDate = '@{StartDate}' SELECT CONVERT(varchar(1),NULL) As title, release_date as releasedon FROM dbo.movies WHERE release_date > @StartDate ]]> </Query> </Package>
Is it the best tool in the world? No.
But, packages are easy to version, easy to diff, easy to edit, author, and test, and best of all – a dba can open the file, copy out the SQL, tweak the SQL, and paste the SQL back in without any special editing tools and a minimal amount of fuss. The SQL can be as complicated as SQL can be.
The tool works, I believe, because the XML doesn’t get in the way. That’s the problem you can run into when you embed one language in another – one of the languages will get in the way.
Azure Policy tries to embed Boolean rules inside of JSON, but to me, the JSON only gets in the way. It’s like embedding SQL code inside of C# or Java code – some language combinations are hard on the eyes, which makes the result hard to write, and impossible to read. You can’t just skim the code to get an idea of what might happen.
With policy, the more complex the expression, the more unreadable the code. The solution is error prone, hard to test, and therefore not scalable.
Here’s @kellabyte again, this time on embedding in JSON:
You’re right JSON fans. Of course this is a beautiful way to write a query language in JSON over HTTP. What was I thinking? pic.twitter.com/AP9eTZmGDT
— Kelly Sommers (@kellabyte) January 29, 2019
This is one of the reasons why LINQ is so compelling in C#. There’s no embedding – I have C# code inside of C# code, but some of the C# code might just translate itself into a different language and execute in a different process on some distant machine.
Despite what you might think of ORMs, or the hidden performance costs of LINQ, the feature still amazes me every time I see a query. I had the same sense of excitement the first time I ran across Gremlin-Python, too.
Truthfully, I wrote this post to organize my thoughts around Azure Policy.
Do I want to take a pass on what comes in the box with Azure? I think so.
Can I rationalize a custom solution? I think so.
I can invent my own tool for governance audits and make the rules easier to author, change, and test, as well as be more flexible.
And just for fun, I’ll write the tool in Go.
I’ll rationalize that decision in another post.
[1] The current offering for ETL (keep in mind ETL is a legacy term for a legacy form of processing) from Microsoft is Azure Data Factory (ADF). We author ADF packages in JSON. ADF v2 supports calling into SSIS XML packages. This is like candy built from data formatting standards - a hard JSON shell surrounds a creamy-sweet XML interior.
[2] I named the tool Arracher – French for “rip out”, or extract.