OdeToCode IC Logo

Versioning Databases – The Baseline

Friday, February 1, 2008

Continuing from the last post (Three Rules for Database Work), I wanted to drill into some database versioning strategies that have worked well for me.

Caveats and Considerations

As a preface, let me say there are many different strategies that can work. I'm not presenting the one true way. My goal is to roll out database changes in a consistent, testable, reproducible manner. I'm used to working with largish databases that are packaged to install behind a customer's firewall. I don't want to use naïve update scripts that run for 6 hours. Concerns like this bias my thinking and strategies. Every application should manage database change, but you'll have to decide on the simplest strategy that works for your environment.

The Baseline

The first step in versioning a database is to generate a baseline schema. This is the starting point for versioning a database. After you've published the baseline to an authoritative source, any changes to the schema require a schema change script (a topic for the next post, because I have a feeling this is going to be long for a blog post).

You are probably not going to baseline the database on day 1 of a project. I'd suggest letting the early design of the schema settle in a bit so you are not creating a huge number of change scripts. This might sound like I'm suggesting you need to do some big, up front schema design – but that's not true. You can get pretty far into an application these days with in-memory data, fakes, stubs, mocks, and unit tests. Once the model in your code starts to stabilize, you can start thinking of the schema required to persist all the data. If you are ORMing, you can even generate the first schema from your model.

On the other hand, maybe your project and database have already been around a couple years. That's ok – you can baseline today (tomorrow at the latest), and manage change moving forward.

But How?

If you want to do things the hard way, then open a new file in a text editor and write all the SQL commands that will create every table, constraint, function, view, index, and every other object in your database. You'll also want to include commands that populate lookup tables with static data and include any bootstrap data needed by the application. Test the new script against a fresh database server and if successful, commit the file to source control. Consider your schema baselined!

Nobody really does this step the hard way, though. Most of us use tools that we point to a database, and the tools generate one or more scripts for us. Some people like to generate everything into one big script file. Others like to generate one script file for every database object. SQL Server Management Studio provides both option when you elect to script a database. I've seen both approaches work, but the "one file per object" approach feels cumbersome on a day to day basis, and unwieldy if the number of objects grow into the thousands.

Personally, I like to take a hybrid approach. I like to keep all of the SQL needed to create tables, constraints, defaults, and primary indexes in a single file. Any views, stored procedures, and functions are scripted one per file.

If you go the multiple file approach, make sure to write a batch file, shell script, application, or some other form of automation that can automatically locate and run all of the script files required to install the database. Human intervention in this process is a step backwards.

Also, many tools like to include CREATE DATABASE commands and/or include database names in the scripts they generate. You'll want to purge any reference to a hardcoded database name. You want the name to be configurable (a default name is fine), and you probably want to support multiple databases for your application on the same database instance (mostly for testing).

Whichever approach you take (one single file, or multiple files), you now have scripts that can recreate the database schema on any developer machine, test machine, or production machine. Everyone uses the exact same schema. Congratulations! You've just increased the quality of your software because the database can be reliably reproduced in any environment.

I Almost Forgot the Most Important Ingredient

At some point in the future, the schema will have to change. Before you baseline the database you need to add a table to record these schema changes. The following table is the kind of table I'd use to track every change to a database.

CREATE TABLE [dbo].[SchemaChanges](
   [ID] [int]
   [MajorReleaseNumber] [varchar](2)
   [MinorReleaseNumber] [varchar](2)
   [PointReleaseNumber] [varchar](4)
   [ScriptName] [varchar](50)
   [DateApplied] [datetime]

CONSTRAINT [PK_SchemaChangeLog]

The first baseline schema script should, as the last step, officially install version 1.0 of the database:

INSERT INTO [SchemaChangeLog]
       ,'initial install'