Versioning Databases – Change Scripts

Saturday, February 2, 2008

After considering the three rules and creating a baseline, an entire team can work with a database whose definition lives safely in a source control repository. The day will come, however, when the team needs to change the schema. Each change creates a new version of the database. In my plan, the baseline scripts created a schema change log to track these changes.

schema change log 1

By "change", I mean a change to a table, index, key, constraint, or any other object that requires DDL, with the exception of views, stored procedures, and functions. I treat those objects differently and we'll cover those in the next post. I also include any changes to static data and bootstrap data in change scripts.

Alternatives

Before jumping into an example, I just wanted to point out there are many ways to manage database changes and migrations. Phil Haack describes his idempotent scripts in Bulletproof Sql Change Scripts Using INFORMATION_VIEWS. Elliot Smith and Rob Nichols are database agnostic with Ruby Migrations. If you know of other good articles, please post links in the comments. Again, the goal is to manage change in the simplest manner possible for your project.

Example, Please

The team just baselined their database that includes a Customers table, but now wants to add a new column to store a Customer's shoe size. They also realized their OrderState table, a lookup table that include the values 'Open', and 'Shipped', now needs a new value of 'Canceled'. They would need to create a schema change script that looks like the following. Notice I can include as many changes as needed into a single script.

File: sc.01.00.0001.sql

ALTER TABLE Customer
  
ADD ShoeSize int NOT NULL DEFAULT 0
GO

INSERT INTO OrderStates
   (OrderStateID, OrderState)
   
VALUES(3, 'Cancelled')
GO

Whoever writes this change script will test it thoroughly and against a variety of test data, then commit the change script into source control. The schema change is officially published. The schema change will start to appear in developer workspaces as they update from source control, and on test machines as new builds are pushed into QA and beyond.

It's good to automate database updates as ruthlessly as possible. Ideally, a developer, tester, or installer can run a tool that looks at the schema version of the database in the SchemaChangeLog table, and compares that version to the available updates it finds in the form of schema change scripts that reside on the file system. Sorting files by name will suffice using the techniques explained here. Command line tools work the best, because you can use them in build scripts, developer machines, and install packages. You can always wrap the tool in a GUI to make it easy to use. Here's the output I recently saw from a schema update tool (some entries removed for brevity):

Connected to server .
Connected to database xyz
Host: SALLEN2 User: xyz\sallen
Microsoft SQL Server 2005 - 9.00.3054.00 (Intel X86)

Schema history:
05.00.0000 initial-install on Dec 13 2007 11:26AM
05.00.0001 sc.05.00.0001.sql on Dec 13 2007 11:26AM
05.00.0002 sc.05.00.0002.sql on Dec 13 2007 11:26AM
...
05.00.0012 sc.05.00.0012.sql on Dec 13 2007 11:26AM
05.01.0000 sc.05.01.0000.sql on Dec 13 2007 11:26AM
05.01.0001 sc.05.01.0001.sql on Dec 13 2007 11:26AM
...
05.01.0019 sc.05.01.0019.sql on Dec 13 2007 11:26AM
Current version:
05.01.0019

The following updates are available:
sc.05.01.0020.sql
sc.05.01.0021.sql

Once a developer runs the schema change in this post, they should see the following in the schema change log:

Schema Change Rules and Tips

Once a script is published into source control, it cannot be changed! Once someone updates their database with an update script, they should never have to run the same script on that same database.

The responsibility of updating the SchemaChangeLog table should reside with the tool that applies the updates. This way, the tool can ensure the script runs to completion before the recording the update into the change log.

Always backup a production database before applying a change script. If a change script happens to fail with an error, you can at least get the database back into a known state.

Some people will wonder what happens if multiple developers are all working on change scripts at the same time. Fortunately, the source control repository is a great mediator. I can't add an update script with the same name as your update script – I'll have to rev my version number if you beat me to the commit by a few seconds.

Of course, changes are not just about schema changes. You also have to write the code to migrate data. For instance, if for some reason we moved the ShoeSize column from the Customers table to the CustomerDetails table, the update script will also need to move and preserve the data. Data manipulation is often the trickiest part of change scripts and where they need the most testing.

Summary

Managing change with your database yields a great many benefits. Since the schema change scripts are in source control, you can recreate your database as it looked at any point in time. Is a customer reporting a bug on build 3.1.5.6723? Pull the source code tagged or labeled with that version number and run the baseline, then all schema change scripts included in the tag. You now have the same database and have a much better chance to recreate the bug. Also, changes move from development to test, and ultimately into production in a consistent, orderly, and reproducible manner.

I've skipped over quite a few nitty-gritty details, but I hope you get the general idea. I've used this approach for years and its worked well. Still, I feel the approach is a bit long in the tooth and I'm looking for ways to improve. Feedback is appreciated. What do you think?

Coming up next – managing views and stored procedures.


Comments
Jason Stangroome Saturday, February 2, 2008
I like this method for versioning databases but I have encountered two issues trying to use it myself:

1. When you add a column to a table, it really needs to be scripted as an update script for existing databases and also changed in the "create entire database" script so that deploying a new database doesn't require the entire history of update scripts to be applied too. This synchronisation is tricky to enforce.

2. Personally, I am very comfortable writing a change script to rename an existing column, change it from varchar to int also, and maintain all the existing data too. Not everyone on the team is. Most auto-scripting tools won't get these more complex changes right. Delegating one guy on the team as the DB guy isn't good when he's on leave or otherwise unavailable.

How have you dealt with these situations?

Regards,
scott Saturday, February 2, 2008
@Jason. The quick answers are:

1. I only update the baseline periodically to avoid too much overhead in this step. A new database will just have to run a few update scripts now and then.

2. I agree - this is where the tools can produce code I can't use. Most of the update scripts that deal with data migrations are hand written. Leaving one person in charge of DB work is always a problem waiting to happen. Someone else needs to step up and get comfortable with SQL.

I think this might be a good topic for yet another blog post.
Steve Campbell Saturday, February 2, 2008
I prefer to keep the database structure in a declarative form (xml), and then use code-generation (via the build scripts) to create the final SQL.

This has the benefit of each generated script being a complete baseline. Also, it eliminates a lot of programmer error. The cost is that each script needs to support being run multiple times, i.e. only create new tables and columns when they do not already exist, etc.

Some (mostly data manipulation) SQL still needs to be hand-written of course. We have versioned SQL scripts for those, similar to yours. The build appends the individual SQL files together into a single final SQL file, which can then be run (or re-run) safely.
Mickaël Sauvée Monday, February 4, 2008
I agree with Steve Campbell, I prefer have a declarative form of the schema in a xml file (or a set of files).
I prefer also manage schema as source, meaning store in source cotnrol the full version of the schema and rely on the source control to get the diff, then use tools to generate the scripts to path my database. It's obviously an heavier solution than your, but it's more reliable (we experiments both approach in the past).
I also avoid as much as possible to mix data and meta data (so not save the schema in the database) and separate demo data from schema.

Your approach as the huge benefit to be simple, but may not be reliable enough for complex system.
German B. Monday, February 4, 2008
Hi Scott,
You have covered overlapping version numbers by different developers, but what about the possible conflicts between their respective DB changes? I think that, if you got into the repository just before me, I should first get your change and make sure it works for me, however I'd be testing on my own schema which already has my changes. And if my changes are to be checked-in, they will be a version following yours, and not preceding yours like it was in my schema. How critical do you think these situations can get, and how do you think they should be dealt with?
scott Monday, February 4, 2008
@German:

In practice I've never seen this cause a problem, but I admit there is the potential for a problem.

To avoid this I've seen many teams that say all schema changes have to go through a single person on a team. Perhaps there is a DBA involved and he/she will be the one to test/verify/publish the change script. If you have a team that communicates and works together well I don't think you need to get to that extreme, though. On bigger teams it would be the safer option.
flipdoubt Thursday, February 7, 2008
@Scott: Great article. I came up with an eerily similar solution before reading this article. If I can ask to verify an example listing of unique files you would check into source control, would your repository (we use SVN) look something like this:

/db
/db/baseline.sql
/db/sc.05.00.0012.sql
/db/sc.05.01.0000.sql
/db/sc.05.01.0001.sql

In my vision, (a) you never revise the baseline to keep the starting point simple and (b) you check in change scripts "side-by-side", so you never revise or delete these either.
scott Friday, February 8, 2008
@flipdoubt: That's correct.

I do revise the baseline as a new release approaches. The new baseline would change the major or minor version number of the 'initial install' so that only schema change scripts after that version are applied to a new database created with the new baseline.

I never revise or delete change scripts. Well, I have revised change scripts, but thats rare and only because a serious error wasn't caught until after check-in (bad, bad, bad, I know).
Rick Glos Saturday, February 9, 2008
I'd really recommend not using change scripts. To truly be 1st class, you just store the latest and greatest. Just like your C# source code. You don't store the changes you made to your source code, ala - delete line 53, then add int i = 0 on line 54, etc...

You'll wind up managing tons of little change scripts. Then you'll have problems like our team did where db objects get changed multiple times in development and someone isn't sure who's first anymore, changes start colliding, etc - just like German mentions. It was a nightmare. We've switched to generating change scripts and never looked back. What benefit does the customer get when you change the same proc 5 or more times because your changed it over your internal development cycle between releases? None.

Use a tool; roll your own, Red Gate SQL Compare (what we use), VS DB Pro, etc, to generate the change script from point A to point B. (This can all be automated from the command line, in fact our nightly CI build and deploy build server generates the change script by compare the master schema so that all the developer has to worry about is changing the single script for their object [ie. table, proc, view, etc])

Store the DB version in a simple stored procedure that returns a value. Checkout the SQL Reporting ReportServer db as a perfect example, GetDBVersion.

It really simplified the process.
scott Saturday, February 9, 2008
@Rick:

Changing a stored proc doesn't require a change script, so that's no overhead at all.

I agree tools like SQL Compare can really speed up this process. But, like I said when I started this series I found the tools unusable because the diff scripts they generated will run for 32 hours on a large production database. The developers have to understand the ramifications of trying to insert a new column between existing columns (SQL Compare and SSMS will copy the entire contents of a table to do this). We can afford some downtime, but not an entire day's worth.
Ash Tewari Tuesday, February 19, 2008
Scott, great set articles. I am providing a free tool to implement this system - www.tewari.info/...
gravatar vivek Nema Friday, November 20, 2009
Hi,

Nice concep, nice idea, good approah.
Will you be able to share your script to me.?

looking forward for your reply

regards, vivek
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!