Versioning Databases – The Baseline

Friday, February 1, 2008

Continuing from the last post (Three Rules for Database Work), I wanted to drill into some database versioning strategies that have worked well for me.

Caveats and Considerations

As a preface, let me say there are many different strategies that can work. I'm not presenting the one true way. My goal is to roll out database changes in a consistent, testable, reproducible manner. I'm used to working with largish databases that are packaged to install behind a customer's firewall. I don't want to use naïve update scripts that run for 6 hours. Concerns like this bias my thinking and strategies. Every application should manage database change, but you'll have to decide on the simplest strategy that works for your environment.

The Baseline

The first step in versioning a database is to generate a baseline schema. This is the starting point for versioning a database. After you've published the baseline to an authoritative source, any changes to the schema require a schema change script (a topic for the next post, because I have a feeling this is going to be long for a blog post).

You are probably not going to baseline the database on day 1 of a project. I'd suggest letting the early design of the schema settle in a bit so you are not creating a huge number of change scripts. This might sound like I'm suggesting you need to do some big, up front schema design – but that's not true. You can get pretty far into an application these days with in-memory data, fakes, stubs, mocks, and unit tests. Once the model in your code starts to stabilize, you can start thinking of the schema required to persist all the data. If you are ORMing, you can even generate the first schema from your model.

On the other hand, maybe your project and database have already been around a couple years. That's ok – you can baseline today (tomorrow at the latest), and manage change moving forward.

But How?

If you want to do things the hard way, then open a new file in a text editor and write all the SQL commands that will create every table, constraint, function, view, index, and every other object in your database. You'll also want to include commands that populate lookup tables with static data and include any bootstrap data needed by the application. Test the new script against a fresh database server and if successful, commit the file to source control. Consider your schema baselined!

Nobody really does this step the hard way, though. Most of us use tools that we point to a database, and the tools generate one or more scripts for us. Some people like to generate everything into one big script file. Others like to generate one script file for every database object. SQL Server Management Studio provides both option when you elect to script a database. I've seen both approaches work, but the "one file per object" approach feels cumbersome on a day to day basis, and unwieldy if the number of objects grow into the thousands.

Personally, I like to take a hybrid approach. I like to keep all of the SQL needed to create tables, constraints, defaults, and primary indexes in a single file. Any views, stored procedures, and functions are scripted one per file.

If you go the multiple file approach, make sure to write a batch file, shell script, application, or some other form of automation that can automatically locate and run all of the script files required to install the database. Human intervention in this process is a step backwards.

Also, many tools like to include CREATE DATABASE commands and/or include database names in the scripts they generate. You'll want to purge any reference to a hardcoded database name. You want the name to be configurable (a default name is fine), and you probably want to support multiple databases for your application on the same database instance (mostly for testing).

Whichever approach you take (one single file, or multiple files), you now have scripts that can recreate the database schema on any developer machine, test machine, or production machine. Everyone uses the exact same schema. Congratulations! You've just increased the quality of your software because the database can be reliably reproduced in any environment.

I Almost Forgot the Most Important Ingredient

At some point in the future, the schema will have to change. Before you baseline the database you need to add a table to record these schema changes. The following table is the kind of table I'd use to track every change to a database.

CREATE TABLE [dbo].[SchemaChanges](
   [ID] [int]
IDENTITY(1,1) NOT NULL,
   [MajorReleaseNumber] [varchar](2)
NOT NULL,
   [MinorReleaseNumber] [varchar](2)
NOT NULL,
   [PointReleaseNumber] [varchar](4)
NOT NULL,
   [ScriptName] [varchar](50)
NOT NULL,
   [DateApplied] [datetime]
NOT NULL,

    
CONSTRAINT [PK_SchemaChangeLog]
        
PRIMARY KEY CLUSTERED ([SchemaChangeID] ASC)
)

The first baseline schema script should, as the last step, officially install version 1.0 of the database:

INSERT INTO [SchemaChangeLog]
       ([MajorReleaseNumber]
       ,[MinorReleaseNumber]
       ,[PointReleaseNumber]
       ,[ScriptName]
       ,[DateApplied])
VALUES
       ('01'
       ,'00'
       ,'0000'
       ,'initial install'
       ,GETDATE())

Comments
Holger H. Friday, February 1, 2008
Hi Scott,
I didn't find in SQLWB 2005 any option to script a database to individual files. Sorry to be so blind, but: where do I have to look?

Thanks,
Holger

scott Friday, February 1, 2008
@Holger:

Sure - in the Object Explorer window, right-click a database, go to Tasks in the context menu, and then click on "Generate Scripts". There are a plethora of options to set there.

Sorry for not providing more detail. I'm trying to bang out these posts without writing the equivalent of a book :)
Sean Friday, February 1, 2008
Thanks for the schema table tip! That's a piece I was missing.

When I started doing this I was using SSMS as well but made the switch to Red Gate SQL Compare a while back. Much smoother process now.
Shawn Friday, February 1, 2008
In regards to creating a baseline script, if it's a green field project here's what has worked well for me in the past.

Create a fully automated build script for the DB which drops the DB if it exists and then builds it from scratch including any "dev test" test data and static data.

Everytime a dev updates from source control, they rebuild their local DB. This requires us to only keep a create script around and no alter scripts (remember this is green field).

Once release one is out the door, we start writing alter scripts.

I've found this very easy to setup and maintain since we can make drastic changes to the model without the need to write (sometimes) time consuming alter scripts, at least in the beginning of the project. This also has the benefit of being able to use the built in SQL Server Management tools and then export the script when done if you aren't using an OR/M tool to create your script.
scott Saturday, February 2, 2008
@Sean:

That's a good approach. I wish every project was a green field...
Alvin Saturday, February 2, 2008
Scott, do you know any tool (or script) that can script out data?
Jesse Saturday, February 2, 2008
Here's an open source tool I wrote that uses SMO to generate scripts from the cmd line. It's useful if you want to automate things.

http://www.codeplex.com/scriptdb
scott Saturday, February 2, 2008
Alvin: The RedGate tools can script data.

Also, if you use MS SQL Server see: weblogs.asp.net/...
scott Saturday, February 2, 2008
Plus there is Jess's tool :)
Zubair.NET! Sunday, February 3, 2008
@Alvin,

Here's another tool on codeproject.com that makes my life alot easier.

www.codeproject.com/.../SQLStatementGenerator.aspx
Gonzalo Sunday, February 3, 2008
Nice posts about versioning; but how do you deal with scripts that need to be back-ported to maintenance branches?

Say you have your trunk or main code-line on version "02.01.0001", and you add a new DB script to fix a bug. You commit, everything is fine.

Then, you need to fix the same bug on a previous branch, say "01.09.0050", you merge everything in your branch, commit, and everything is OK.

But, what happens when the branched version "01.09.0050" is updated to "02.01.0001" or higher version? How do you keep an 'identity' of the script to avoid executing it again?

The only possible way is to have idempotent scripts?
scott Monday, February 4, 2008
@Gonzalo:

It's tricky as I point out here: odetocode.com/.../11746.aspx, and I can't say I have the perfect answer. Your scenario (merge mainline into branch) isn't one my process will handle smoothly. The scripts in this case will have to be idempotent, which I usually don't worry about. It might mean you have to modify the published script in the trunk to make it idempotent, which is really evil, I know. You can call me bad names now :)

Will Gage Monday, February 4, 2008
First of all, thanks for the insights! I like the general approach that you're taking, but I'd argue that requiring a separate database to manage your database update scripts might be overkill. I much prefer the idea that you can build your whole app out of just what's in the source control system, without other external dependencies, and I think that's possible here. Rather than a table which maintains your list of update scripts, you can simply have a directory which contains update scripts that adhere to a file naming convention: ${id}.${majorReleaseNumber}.${minorReleaseNumber}.${pointReleaseNumber}.sql . The date information should be maintained by the version control system, but otherwise, the file naming convention captures all of the information you were keeping in a table. Now all your developers need to do to get a list of changes is: ls.
Bob Archer Monday, February 4, 2008
We use a tool called DbGhost. It is designed for managing DB change and source controlling the schema/data scripts.

We only generate change scripts for each release version... not for each build. For the interium build versions each dev has a copy of Dbghost and can use it to update thier local db from script updated from subversion. Or, they can grab the "template" db's backup to get up to date if they'd like.

This works very well for us. DbGhost also has a command line interface which makes it easy to run from NAnt or whatever build tool you use.
Paul Tuesday, February 5, 2008

This might sound like I'm suggesting you need to do some big, up front schema design – but that's not true.

I disagree, this is something you should be doing! Remember your app will probably not be the only thing accessing the data. Having a well-designed schema will help not just your app, but all other apps that will come along that need access to this data.
David Keaveny Friday, February 8, 2008
Any thoughts on how to handle SQL object dependencies? I can see that with your hybrid approach, you can script the tables with their indexes and constraints etc and presumably SQL Server will sort out dependencies, especially if you use schema binding.

So what happens if you have your views, UDFs and SPs in one-file-per-object and you have some batch file to execute all the scripts - how do you ensure that stored procedure B, which is depended on by stored procedure A, is created first?
scott Friday, February 8, 2008
@David:

Actually, there is another text file checked in that lists all the sproc and view files to apply to a database. It's more work to maintain the list, but the tool will then apply the sql files in order, so you can control the dependancies. I'm told Oracle can do this automagically...
Dev2010 Thursday, January 7, 2010
To David,

One approach that works where I work is to prefix the name of each database object file with a number, ascending in order of object dependency.

The execution build script just retrieves files name ascending and everything gets run in the correct order relative to dependencies. It has the added benefit that most file browsing tools allow you to sort by name easily, quickly allowing you to verify that dependencies are satisfied.
gravatar Dejan Thursday, September 9, 2010
You need to change a SQL script

here is an error

CONSTRAINT [PK_SchemaChangeLog]
PRIMARY KEY CLUSTERED ([SchemaChangeID] ASC)

Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!