OdeToCode IC Logo

BI and SQL Server 2005

Monday, November 29, 2004 by scott
Just to finish up the story I started a few days ago (and get the albatross off my neck, so to speak), here is how the exploration of SQL 2005 DTS and OLAP features turned out. The end result was a cube of all domestic commercial flights for January 2003 (about 560,000 records) with delay, cancellation, flight time, taxi time information and more.

There were a few disappointments, like the handful of unhandled exceptions that crashed the Business Intelligence Development Studio. I have a feeling some of the problems were related to the source control features of the beta, as most of the weirdness would occur after a checkout operation (I’m using SourceGear Vault at home – Vault is free for a single user). I’ve never had a problem with Vault, so I’m guessing all the problems are in the BIDS beta.

The most frustrating problem was getting the OLAP designer to pick up schema changes in the underlying fact table. I had to delete the cube, dimensions, and data source views and start over.

The good news is, 2005 not only fixes many of the frustrations in 2000, but adds a good many features.

It requires considerably less work to get a basic cube up and running thanks to the ability of the IDE to analyze relationships and figure out a base set of dimensions, levels, and measures. Perhaps 160 mouse clicks down to 6.

The brain dead cube browser in 2000 has been replaced with what appears to be a version of the pivot table in Office Web Components.

The GUI remains responsive during cube processing (although it does not appear to show the number of records processed).

Everything appears to save as XML, the project, the structures, the packages. The source control integration and deployment options appear that they will be CM and deployment friendly without hand rolling tools.

The performance of the import and of Analysis Services processing was very good considering this is all running on a Virtual PC with 512 MB of RAM.

It appears everything is looking pretty good, hopefully the documentation and features will shore up by beta 3. There are so many options and buttons it is hard to stay oriented at times. In one week there will be a flurry of SQL Server 2005 webcasts, unfortunately only 1 appears to be related to Analysis Services. Hopefully there will be more coming down the road.

OdeToCode Links For November 28

Monday, November 29, 2004 by otcnews

Give Me More Text

Saturday, November 27, 2004 by scott
To continue with my previous post (Give Me Text), I wanted to show a little UI from the SQL Server 2005 Business Intelligence Studio. There are obvious improvements in the feature set formerly known as Data Transformation Services. The toolbox is brimming with tasks and transformations you can drag and drop into the Data Flow designer. Stuff like “Pivot”, “Aggregate”, and “Fuzzy Grouper”.

There are parts, however, that remind me why I still prefer programming with keyboards and text files to GUIs. GUIs do not scale. In order to perform multiple operations I need to perform multiple actions. Like the following dialog: 

The problem here is it took at least 34 mouse clicks to change the behavior of each column. from “Fail Component” to “Redirect Row” (to send erroneous records from a flat file to an error file instead of halting). UI designers do this to me all the time, and it’s the reason I get frustrated with them.

Text files scale because you can select one operation and have it applied to multiple locations. UIs do not do this too well – they make you click, click, clickety drag click. It’s why I try to avoid the DTS designer in 2000. The OLAP designer in 2000 is even worse.

The good news in 2005 is that there really is a text file underneath, and perhaps with some more documentation on the XML format these tasks might become easier.

It might also be interesting to see what UI accessibility improvements come along with Avalon and Longhorn. Perhaps they can provide some sort of meta layer where I can group similar controls easily and modify them with one action.

Even better would be if I could speak to the computer and ask it to “change all error drop down selections to redirect rows”. The magical accessibility layer can find drop down controls under the Error column and set everything without me moving a mouse.

That would be cool, but I’m still not giving up my text files.

Give Me Text

Saturday, November 27, 2004 by scott
Every now and then someone will talk about the sad fact that we still use text files and text editors to build software. I used to think about this when I was in the DTS or OLAP cube designers of SQL Server 2000. I’d think about how writing software with a GUI wasn’t all it was cracked up to be. UI designers can be tedious, and in the Microsoft world they usually save data in opaque blobs.

Because of the blob factor, both DTS and OLAP resist attempts at serious configuration management. The deployment strategy for an OLAP cube is to drag the cube with a mouse from one server to the next in the MMC. I kept looking for the “generate script” option in the MMC – there is none*. One of my first tasks with OLAP was to write a utility to walk the object model exposed by OLAP’s COM interface and serialize the structure into an XML file.

XML is easier on the eyes than a structured storage file, in the same way it looks prettier on the wire when compared to a DCOM packet. You can check XML into source control, diff it, parse it, query it, and transform it. XML is friendly and comfortable. Using an XML format paid off for us recently when we needed to localize cubes for a client in another country. Add a little XPath, a little config file, and mes cubes parlent français.

In playing with the Business Intelligence Development Studio of SQL Server 2005, I am happy to see both Integration Services (DTS) packages and OLAP cubes save into files with an XML format. I’m even happier to see the “Add Solution To Source Control” option. It’s good to see progress. It’s good to see text!

* On the SQL 2000 Resource CD Microsoft released the MetaDataScripter. You can point this tool at an OLAP database and it will generate VBScript that can create the cube. For non-trivial databases though, the resulting vbs script can easily exceed 1MB. It is text, but it is not too friendly.

Array Size

Saturday, November 27, 2004 by scott
During my occasional forays into the land of Dim, Sub, and IsNothing, I’ve needed to remind myself that the number in an array definition represents the upper bound.

Dim months(11) As String 'an array with upper bound 11 has 12 elements

It feels weird, but as long as I kept telling myself the number represented the upper bound, and not the length of the array, everything seemed OK.

Then one day I needed an array of 0 elements.

Dim foo(-1) As String 'an array of 0 elements is declared with 0 – 1 = -1. 

I can’t get this statement out of my head. The illusion is broken. I now think of the number not as the upper bound, but as the length of the array minus 1.

Suddenly, writing array code in VB requires more thought and attention to detail with all of these subtractions going on.

 

In Search Of Raw Data

Thursday, November 25, 2004 by scott
Perhaps if Samuel Taylor Coleridge were alive today, he’d write verse like the following:

Data, data, every where,
And all the disks did shrink;
Data, data, everwhere,
Nor any bit to think.

Then again, if Coleridge were alive today maybe he’d be spending all of his time watching reenactments of Mongol military campaigns on cable TV’s History Channel and never write a single poem. Who knows? At least he had the foresight to end lines with semi-colons.

I’ve wanted to experiment with the new DTS and Analysis Services offerings in SQL Server 2005 for some time. I’ve spent a fair amount of time with the current versions and have come to love and loath certain features. I decided the first order of business would be to find an interesting set of data to work with from start to finish. Some google searches for raw data in CSV format turned up several possibilities.

The Federal Justice Statistics Resource Center collects information on crime, but inappropriate use of the data violates three federal regulations. Next.

The Baseball Archive is highly touted as having the most complete set of baseball statistics anywhere. These stats are already sliced and diced to death. Next.

Carnegie Mellon has StatLib – a collection of various datasets and statistical software. There were some interesting possibilities here.

The Center For Disease Control and Prevention publishes tons of data, but not all in easy formats, and I’m already hip deep in mortality data during the day. Next.

A Jounalist’s Database of Databases has a collection of links to national and international sources of data – this is where I found the following entry.

The Bureau Of Transportation Statistics has plenty of information about planes, trains, and automobiles. The web interface allows you to pick which fields to export. One data set that caught my eye was the on-time performance of domestic flights. It’s a simple CSV import and might make for an interesting OLAP cube. One month of data looks like it has about 1 million records.

Perhaps if Coleridge were alive today, and had a bad experience at the airport, he’d write something like this:

Day after day, day after day,
We stuck, nor breath nor motion ;
As idle as a painted jet,
On this tarmac of commotion.

What's Cool for November 22

Monday, November 22, 2004 by otcnews