Versioning Databases – Views, Stored Procedures, and the Like

Saturday, February 2, 2008

What started as a short brain dump is tuning in to a longer series of posts thanks to all the feedback and questions. In this post, I want to explain some of my thoughts on controlling objects like database views, stored procedures, functions, and triggers.

But first...

I haven't actually used a trigger in years. This isn't to say that triggers aren't valuable, but I've tended to shy away. Jon Galloway has posted a good example of what you can do with triggers.

Secondly, stored procedures have fallen out of favor in my eyes. I came from the WinDNA school of indoctrination that said stored procedures should be used all the time. Today, I see stored procedures as an API layer for the database. This is good if you need an API layer at the database level, but I see lots of applications incurring the overhead of creating and maintaining an extra API layer they don't need. In those applications stored procedures are more of a burden than a benefit.

One File per Object

My strategy is to script every view, stored procedure, and function into a separate file, then commit the files to source control. If someone needs to add a new view, they script the view into a file and commit the file to source control. If someone needs to modify a view, they modify the view's script file and commit again. If you need to delete a view from the database, delete the file from source control. It's a relatively simple workflow.

The magic happens when a developer, tester, or installer updates from source control and runs a tool that updates their local database. The tool uses a three step process:

  1. The tool applies new schema changes by comparing the available schema change files to the SchemaChangeLog records in the database.
  2. The tool will DROP all stored procedures, views, and functions in the database.
  3. The tool will run all of the scripts needed to add the views, stored procedures, and functions back into the database. 

Say What?

People often wonder why I want to drop and destroy all these objects and then re-add them. You could lose work if you update your database and haven't scripted out a view you were in the middle of creating.

The simple reason is to find problems as early as possible. If someone commits a schema change and the change removes a column used by a view, you'll find out there is an error early – hopefully before the build escapes for testing. Likewise, if someone checks in a view but forgets to publish a schema change the view needs, someone else is going to show up at their desk a few minutes later asking why they are breaking the software.

A second reason is to avoid some rare errors I've seen. Some databases have a tendency to screw up execution plans when the schema changes underneath a view. Dropping everything and starting over avoids this problem, which is really hard to track down.

Of course, this does mean a database will need a little down time to apply updates. I realize not everybody has that luxury.

Summary

Through the power of source control and automation, every database from development to production can keep a schema in synch. If you need to go back in time to see what the database looked like on July 20, 2007, or in build 1.58 of the application, you can do that with this strategy, too.

Next question: what happens if there is a branch in the source control tree? That's a topic for the next post.


Comments
Jon Galloway Sunday, February 3, 2008
I've been down on triggers, too, especially for enforcement of business logic. One place I worked hit the 8K character limit on triggers in SQL 6.5 and solved the problem by renaming all variables to two characters. It was almost impossible to predict what would happen when you tried to insert a new order in the orders table. In the case of logging, though, it finally seemed like the right fit.

You didn't mention how you're running all the scripts, did you? Are you using an existing tool like the Database Publishing Wizard, or something homegrown?
Wayne Monday, February 4, 2008
Great post, in fact I was pondering DB versioning and how to best approach it. Thanks for the ideas!
John B Monday, February 4, 2008
There's an awful lot of ORM's that take the same approach to pushing database schema changes: wipe the slate clean, and start over again. That's all fine and dandy in a development environment, but eventually, *someone* will actually need to use the application. And at that point, you've got legacy data to worry about. Dropping everything and starting all over simply won't do.

I can appreciate why it seems to be the conventional approach (it's easy!). Surely there has to be a non-destructive approach that is equally as effective? That's a discussion I'd like to join in on.
scott Monday, February 4, 2008
@Jon:

I have a home grown tool. I've often thought of making a more generic, open source version. Wanna help? :)

@John: That's why I have a problem with many of the tools out there. I have legacy data to protect. The method I've outlined never destroys data. In fact, the hand crafted updated scripts avoid that scenario and migrate large tables in an effective manner.

The only thing I destroy are derived objects (view, stored procedures), and thats jus to ensure consistency all around.

Mike Desjardins Monday, February 4, 2008
Amen to re-building everything, every time. When I proposed doing this at my previous job, it was met with a lot of resistance. It ended up being a godsend for keeping things consistent and easily identifying what versions were in production/staging/test at any given time.
Wilfred Monday, February 4, 2008
I've been trying to get build automation in combination with version control. There's still (at least) one thing I can't get my head around.

What about the following scenario:
1) baseline 1 is checked in
2) database view A is changed an checked in
3) a migration script is checked in that relies on the current version of view A
4) database view A is changed again (after these changes it is no longer compatible with the migration script of 3)

a developer could have revision 3 on his local database. If he updates from version control and builds, the changes of revision 4 would be applied. I love this "incremental build" strategy, as doing a clean baseline and roll forward can be very time consuming in our 1000+ tables schema.

But, if a developer does a clean checkout and runs the build-script, I can't take the baseline of step 1 and roll-forward. If I want to do that, I need two versions of view A and that doesn't seem to fit the typical approach of version control. But this has to work, as an installation in production will be something very similar.

The other alternative is that the developer making change 4 is also responsible of fixing the broken migration script of step 3. But how is the developer going to notice this? Build automation on a central server could do the trick. It could run an automated build and roll-forward on any checkin. The automated build would break and could notify the developer of revision 4.

Do you have any experience or advice for situations like these?
scott Tuesday, February 5, 2008
@Wilfred:

I'm not undertanding your scenario.
If the view is always compiled after all schema change scripts are applied, than it doesn't matter if you are rolling forward from the baseline or from an existing database to the latest schema change as the new view won't get recompiled until all the schema changes are in place (I don't consider modifying the view a schema change).
Wilfred Tuesday, February 5, 2008
Sorry for not explaining it any better.

Let's try again :-)

Let's say I do a baseline install and then roll forward to the latest revision. I first get the baseline which contains view A. Revision 2 causes nothing in the roll forward. Then comes the migration script of revision 3. If that runs with the original verion of view A, the migration script might fail. When the developer created the migration script of revision 3 he programmed against the new view structure from revision 2.

I can't decide to install the new objects, before applying the migration scripts. That would install the latest revision of view A (the one from revision 4) and then run the migration scripts. That might also fail since the view definition is then newer than the migration script.

The example might seem a bit far-fetched, but this is a very common problem for is with changing database code (package, procedures, triggers, and business rules). Migration scripts rely on a specific version of this database code to function. With the view, and the single migration script I'm just trying to give a simplified example. (apparantly that didn't work ;-) )
Gabor Kecskemeti Tuesday, February 5, 2008
Scott,

Your reason to drop everything and recreate them is to find errors as early as possible. I agree with the goal, but I don't see why dropping and recreating everything is necessary.

You could just as simply apply only the changes. (Your update tool would automatically maintain the version number of each object in a table as it applies them, or if your version control uses global revision numbers like SubVersion, then just log that single revision number.) After that the tool would try to recompile all invalid objects, and than list every object which is invalid even after that. You can easily find which objects are invalid from the data dictionary. You get the same result, catch all errors just the same, in much much less time.
Gabor Kecskemeti Tuesday, February 5, 2008
@Wilfred:

Simply have a rule that all migration scripts should access the tables directly, and not use views.
For the same reason, if you use triggers, you should disable all triggers on the table you are running DMLs against before running the migration script, and re-enable them after. This should be a stored procedure or script, which gets the triggers from the data dictionary, so even triggers which didn't exist at the time when you wrote the migration script get disabled. (But it's better to simply not use triggers at all.)
scott Tuesday, February 5, 2008
@Wilfred:

Ah, I see what you mean now. Like Gabor, I have a policy to only touch tables in a migration script. If a sproc or function would be nessecary in the script, the script will be responsible for creating and then dropping the sproc/function.

@Gabor: Good one. I'll have to think about this. Maintaining the list of dependencies might be a chore. I know MSSQL has a sproc you can call to list dependencies, but in 2000 it wasn't 100% accurate and seemed to miss some sprocs.
Scott H. Wednesday, February 6, 2008
I've tried recreating all sprocs in a similar way before, but have always had problems with dependencies among them. Inevitably, one sproc is trying to create and a sproc/function that it depends on is not yet created.

How does your tool manages dependencies among views, sprocs, and functions?
scott Wednesday, February 6, 2008
@Scott H:

Instead of just pulling in all view/sproc scripts from the filesystem I can have a text file that lists the view/sproc scripts to run. When you have to take this step it means there is a little more work involved to maintain the list of scripts to run, but then the dependancies are managable.
Gabor Kecskemeti Wednesday, February 6, 2008
@Scott: I use Oracle, and there dependency tracking is automatic (and accurate of course). When you modify an object (be it a table, view, proc, etc) it automatically changes the status of all dependent objects to "invalid", which means it needs to be recompiled. There is also a built-in proc, which will try to recompile all invalid objects in a schema. So, what I suggested can be actually very easily automated.

Oh, and because of this you don't need to track dependencies. If a procedure you try to create/modify depends on another one which is not yet created or it's an older version, the procedure will still be created, but it will be invalid. After you created everything (in whatever order) you just recompile them, and that's it.
gravatar PeterBente Thursday, October 22, 2009
I like the idea of having stored procedures in source control, just like the code files. But how do you make sure that views, sprocs etc are created in the correct order?

Thanks
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!