Tuesday, November 30, 2004 by scott

Richard Morin wrote in the Washington Post:

“…perhaps people will start ignoring the bloggers, who proved once more that their spectacular lack of judgment is matched only by their abundant arrogance.”

This line from the article “Surveying The Damage” spent time bashing bloggers who released early exit poll data on the U.S. presidential race.

I read this line thinking Richard Morin had blogging envy and was taking a few shots at the competition. Then I listened to two IT Conversation podcasts and started to agree. The podcasts were the Election 2004 session from BloggerCon III, and the Gillmor Gang’s interview with eBay tech evangelist Jeff McManus. There was more than one voice in these shows demonstrating a self-important arrogance.

If bloggers can convince politicians to blog, perhaps some good could result. If bloggers want to ram blogging and podcasting down everyone’s throats as the solution to all political, marketing, and business problems, then I don’t think they will get very far.

Poking fun at people who “just don’t get it” doesn’t work either.

Persistence and favorable results do work.

I think blogs are working extremely well in the tech community. I’ve made a number of friends I’ve never met (but hope to), and there is a great community atmosphere. It's passion, not arrogance, which makes it fun.

BI and SQL Server 2005

Monday, November 29, 2004 by scott
Just to finish up the story I started a few days ago (and get the albatross off my neck, so to speak), here is how the exploration of SQL 2005 DTS and OLAP features turned out. The end result was a cube of all domestic commercial flights for January 2003 (about 560,000 records) with delay, cancellation, flight time, taxi time information and more.

There were a few disappointments, like the handful of unhandled exceptions that crashed the Business Intelligence Development Studio. I have a feeling some of the problems were related to the source control features of the beta, as most of the weirdness would occur after a checkout operation (I’m using SourceGear Vault at home – Vault is free for a single user). I’ve never had a problem with Vault, so I’m guessing all the problems are in the BIDS beta.

The most frustrating problem was getting the OLAP designer to pick up schema changes in the underlying fact table. I had to delete the cube, dimensions, and data source views and start over.

The good news is, 2005 not only fixes many of the frustrations in 2000, but adds a good many features.

It requires considerably less work to get a basic cube up and running thanks to the ability of the IDE to analyze relationships and figure out a base set of dimensions, levels, and measures. Perhaps 160 mouse clicks down to 6.

The brain dead cube browser in 2000 has been replaced with what appears to be a version of the pivot table in Office Web Components.

The GUI remains responsive during cube processing (although it does not appear to show the number of records processed).

Everything appears to save as XML, the project, the structures, the packages. The source control integration and deployment options appear that they will be CM and deployment friendly without hand rolling tools.

The performance of the import and of Analysis Services processing was very good considering this is all running on a Virtual PC with 512 MB of RAM.

It appears everything is looking pretty good, hopefully the documentation and features will shore up by beta 3. There are so many options and buttons it is hard to stay oriented at times. In one week there will be a flurry of SQL Server 2005 webcasts, unfortunately only 1 appears to be related to Analysis Services. Hopefully there will be more coming down the road.

Give Me More Text

Saturday, November 27, 2004 by scott
To continue with my previous post (Give Me Text), I wanted to show a little UI from the SQL Server 2005 Business Intelligence Studio. There are obvious improvements in the feature set formerly known as Data Transformation Services. The toolbox is brimming with tasks and transformations you can drag and drop into the Data Flow designer. Stuff like “Pivot”, “Aggregate”, and “Fuzzy Grouper”.

There are parts, however, that remind me why I still prefer programming with keyboards and text files to GUIs. GUIs do not scale. In order to perform multiple operations I need to perform multiple actions. Like the following dialog: 

The problem here is it took at least 34 mouse clicks to change the behavior of each column. from “Fail Component” to “Redirect Row” (to send erroneous records from a flat file to an error file instead of halting). UI designers do this to me all the time, and it’s the reason I get frustrated with them.

Text files scale because you can select one operation and have it applied to multiple locations. UIs do not do this too well – they make you click, click, clickety drag click. It’s why I try to avoid the DTS designer in 2000. The OLAP designer in 2000 is even worse.

The good news in 2005 is that there really is a text file underneath, and perhaps with some more documentation on the XML format these tasks might become easier.

It might also be interesting to see what UI accessibility improvements come along with Avalon and Longhorn. Perhaps they can provide some sort of meta layer where I can group similar controls easily and modify them with one action.

Even better would be if I could speak to the computer and ask it to “change all error drop down selections to redirect rows”. The magical accessibility layer can find drop down controls under the Error column and set everything without me moving a mouse.

That would be cool, but I’m still not giving up my text files.

Give Me Text

Saturday, November 27, 2004 by scott
Every now and then someone will talk about the sad fact that we still use text files and text editors to build software. I used to think about this when I was in the DTS or OLAP cube designers of SQL Server 2000. I’d think about how writing software with a GUI wasn’t all it was cracked up to be. UI designers can be tedious, and in the Microsoft world they usually save data in opaque blobs.

Because of the blob factor, both DTS and OLAP resist attempts at serious configuration management. The deployment strategy for an OLAP cube is to drag the cube with a mouse from one server to the next in the MMC. I kept looking for the “generate script” option in the MMC – there is none*. One of my first tasks with OLAP was to write a utility to walk the object model exposed by OLAP’s COM interface and serialize the structure into an XML file.

XML is easier on the eyes than a structured storage file, in the same way it looks prettier on the wire when compared to a DCOM packet. You can check XML into source control, diff it, parse it, query it, and transform it. XML is friendly and comfortable. Using an XML format paid off for us recently when we needed to localize cubes for a client in another country. Add a little XPath, a little config file, and mes cubes parlent français.

In playing with the Business Intelligence Development Studio of SQL Server 2005, I am happy to see both Integration Services (DTS) packages and OLAP cubes save into files with an XML format. I’m even happier to see the “Add Solution To Source Control” option. It’s good to see progress. It’s good to see text!

* On the SQL 2000 Resource CD Microsoft released the MetaDataScripter. You can point this tool at an OLAP database and it will generate VBScript that can create the cube. For non-trivial databases though, the resulting vbs script can easily exceed 1MB. It is text, but it is not too friendly.

Array Size

Saturday, November 27, 2004 by scott
During my occasional forays into the land of Dim, Sub, and IsNothing, I’ve needed to remind myself that the number in an array definition represents the upper bound.

Dim months(11) As String 'an array with upper bound 11 has 12 elements

It feels weird, but as long as I kept telling myself the number represented the upper bound, and not the length of the array, everything seemed OK.

Then one day I needed an array of 0 elements.

Dim foo(-1) As String 'an array of 0 elements is declared with 0 – 1 = -1. 

I can’t get this statement out of my head. The illusion is broken. I now think of the number not as the upper bound, but as the length of the array minus 1.

Suddenly, writing array code in VB requires more thought and attention to detail with all of these subtractions going on.


In Search Of Raw Data

Thursday, November 25, 2004 by scott
Perhaps if Samuel Taylor Coleridge were alive today, he’d write verse like the following:

Data, data, every where,
And all the disks did shrink;
Data, data, everwhere,
Nor any bit to think.

Then again, if Coleridge were alive today maybe he’d be spending all of his time watching reenactments of Mongol military campaigns on cable TV’s History Channel and never write a single poem. Who knows? At least he had the foresight to end lines with semi-colons.

I’ve wanted to experiment with the new DTS and Analysis Services offerings in SQL Server 2005 for some time. I’ve spent a fair amount of time with the current versions and have come to love and loath certain features. I decided the first order of business would be to find an interesting set of data to work with from start to finish. Some google searches for raw data in CSV format turned up several possibilities.

The Federal Justice Statistics Resource Center collects information on crime, but inappropriate use of the data violates three federal regulations. Next.

The Baseball Archive is highly touted as having the most complete set of baseball statistics anywhere. These stats are already sliced and diced to death. Next.

Carnegie Mellon has StatLib – a collection of various datasets and statistical software. There were some interesting possibilities here.

The Center For Disease Control and Prevention publishes tons of data, but not all in easy formats, and I’m already hip deep in mortality data during the day. Next.

A Jounalist’s Database of Databases has a collection of links to national and international sources of data – this is where I found the following entry.

The Bureau Of Transportation Statistics has plenty of information about planes, trains, and automobiles. The web interface allows you to pick which fields to export. One data set that caught my eye was the on-time performance of domestic flights. It’s a simple CSV import and might make for an interesting OLAP cube. One month of data looks like it has about 1 million records.

Perhaps if Coleridge were alive today, and had a bad experience at the airport, he’d write something like this:

Day after day, day after day,
We stuck, nor breath nor motion ;
As idle as a painted jet,
On this tarmac of commotion.

Remoting Over TDS

Sunday, November 21, 2004 by scott

This week saw the release of a long awaited white paper: Using CLR Integration in SQL Server 2005. This white paper is the first step in what I am sure will be an ongoing stream of best practice information for using the common language runtime inside the database engine.

One area in need of some additional guidance will be the design and usage of user defined types (UDTs). In SQL 2005, if you need a columnar data type outside of the normal varchar, int, and datetime etc. types, you can write your own. As the paper outlines, UDTs are not a way to achieve object-relational extensibility, so we won’t want to create a Customer UDT. Instead, UDTs will augment the basic scalar data types in SQL Server. The engine can index a column of UDTs and enforce RI constraints against them.

All of the examples I’ve seen to date deal with UDTs from TSQL and CLR code executing inside database. Wally McClure found the following error message when using a SELECT statement from the query tool:

Could not load file or assembly "Assembly Name, Version=xxxx.xxxx.xxxx.xxxx, Cultere=neutral, PublicKeyToken=null' or one of its dependencies. HRESULT: 0x80070002 The system cannot find the file specified

Niels Berglund explains why this happens. If you use a ToString on the column in the SELECT statement you’ll get back strings, but with no methods invoked you fetch actual instances of the UDT type. The query tool’s appdomain has to load the assembly for this type. Unless the assembly is properly deployed the load will fail.

The interesting thought here is that there is a new .NET remoting channel in town: the tabular data stream protocol used natively by SQL Server. Along with this capability comes all of the intricacies of sharing type across a boundary, like versioning and GAC deployments. Whoa.