What started as a short brain dump is tuning in to a longer series of posts thanks to all the feedback and questions. In this post, I want to explain some of my thoughts on controlling objects like database views, stored procedures, functions, and triggers.
But first...
I haven't actually used a trigger in years. This isn't to say that triggers aren't valuable, but I've tended to shy away. Jon Galloway has posted a good example of what you can do with triggers.
Secondly, stored procedures have fallen out of favor in my eyes. I came from the WinDNA school of indoctrination that said stored procedures should be used all the time. Today, I see stored procedures as an API layer for the database. This is good if you need an API layer at the database level, but I see lots of applications incurring the overhead of creating and maintaining an extra API layer they don't need. In those applications stored procedures are more of a burden than a benefit.
One File per Object
My strategy is to script every view, stored procedure, and function into a separate file, then commit the files to source control. If someone needs to add a new view, they script the view into a file and commit the file to source control. If someone needs to modify a view, they modify the view's script file and commit again. If you need to delete a view from the database, delete the file from source control. It's a relatively simple workflow.
The magic happens when a developer, tester, or installer updates from source control and runs a tool that updates their local database. The tool uses a three step process:
Say What?
People often wonder why I want to drop and destroy all these objects and then re-add them. You could lose work if you update your database and haven't scripted out a view you were in the middle of creating.
The simple reason is to find problems as early as possible. If someone commits a schema change and the change removes a column used by a view, you'll find out there is an error early – hopefully before the build escapes for testing. Likewise, if someone checks in a view but forgets to publish a schema change the view needs, someone else is going to show up at their desk a few minutes later asking why they are breaking the software.
A second reason is to avoid some rare errors I've seen. Some databases have a tendency to screw up execution plans when the schema changes underneath a view. Dropping everything and starting over avoids this problem, which is really hard to track down.
Of course, this does mean a database will need a little down time to apply updates. I realize not everybody has that luxury.
Summary
Through the power of source control and automation, every database from development to production can keep a schema in synch. If you need to go back in time to see what the database looked like on July 20, 2007, or in build 1.58 of the application, you can do that with this strategy, too.
Next question: what happens if there is a branch in the source control tree? That's a topic for the next post.
Comments
You didn't mention how you're running all the scripts, did you? Are you using an existing tool like the Database Publishing Wizard, or something homegrown?
I can appreciate why it seems to be the conventional approach (it's easy!). Surely there has to be a non-destructive approach that is equally as effective? That's a discussion I'd like to join in on.
I have a home grown tool. I've often thought of making a more generic, open source version. Wanna help? :)
@John: That's why I have a problem with many of the tools out there. I have legacy data to protect. The method I've outlined never destroys data. In fact, the hand crafted updated scripts avoid that scenario and migrate large tables in an effective manner.
The only thing I destroy are derived objects (view, stored procedures), and thats jus to ensure consistency all around.
What about the following scenario:
1) baseline 1 is checked in
2) database view A is changed an checked in
3) a migration script is checked in that relies on the current version of view A
4) database view A is changed again (after these changes it is no longer compatible with the migration script of 3)
a developer could have revision 3 on his local database. If he updates from version control and builds, the changes of revision 4 would be applied. I love this "incremental build" strategy, as doing a clean baseline and roll forward can be very time consuming in our 1000+ tables schema.
But, if a developer does a clean checkout and runs the build-script, I can't take the baseline of step 1 and roll-forward. If I want to do that, I need two versions of view A and that doesn't seem to fit the typical approach of version control. But this has to work, as an installation in production will be something very similar.
The other alternative is that the developer making change 4 is also responsible of fixing the broken migration script of step 3. But how is the developer going to notice this? Build automation on a central server could do the trick. It could run an automated build and roll-forward on any checkin. The automated build would break and could notify the developer of revision 4.
Do you have any experience or advice for situations like these?
I'm not undertanding your scenario.
If the view is always compiled after all schema change scripts are applied, than it doesn't matter if you are rolling forward from the baseline or from an existing database to the latest schema change as the new view won't get recompiled until all the schema changes are in place (I don't consider modifying the view a schema change).
Let's try again :-)
Let's say I do a baseline install and then roll forward to the latest revision. I first get the baseline which contains view A. Revision 2 causes nothing in the roll forward. Then comes the migration script of revision 3. If that runs with the original verion of view A, the migration script might fail. When the developer created the migration script of revision 3 he programmed against the new view structure from revision 2.
I can't decide to install the new objects, before applying the migration scripts. That would install the latest revision of view A (the one from revision 4) and then run the migration scripts. That might also fail since the view definition is then newer than the migration script.
The example might seem a bit far-fetched, but this is a very common problem for is with changing database code (package, procedures, triggers, and business rules). Migration scripts rely on a specific version of this database code to function. With the view, and the single migration script I'm just trying to give a simplified example. (apparantly that didn't work ;-) )
Your reason to drop everything and recreate them is to find errors as early as possible. I agree with the goal, but I don't see why dropping and recreating everything is necessary.
You could just as simply apply only the changes. (Your update tool would automatically maintain the version number of each object in a table as it applies them, or if your version control uses global revision numbers like SubVersion, then just log that single revision number.) After that the tool would try to recompile all invalid objects, and than list every object which is invalid even after that. You can easily find which objects are invalid from the data dictionary. You get the same result, catch all errors just the same, in much much less time.
Simply have a rule that all migration scripts should access the tables directly, and not use views.
For the same reason, if you use triggers, you should disable all triggers on the table you are running DMLs against before running the migration script, and re-enable them after. This should be a stored procedure or script, which gets the triggers from the data dictionary, so even triggers which didn't exist at the time when you wrote the migration script get disabled. (But it's better to simply not use triggers at all.)
Ah, I see what you mean now. Like Gabor, I have a policy to only touch tables in a migration script. If a sproc or function would be nessecary in the script, the script will be responsible for creating and then dropping the sproc/function.
@Gabor: Good one. I'll have to think about this. Maintaining the list of dependencies might be a chore. I know MSSQL has a sproc you can call to list dependencies, but in 2000 it wasn't 100% accurate and seemed to miss some sprocs.
How does your tool manages dependencies among views, sprocs, and functions?
Instead of just pulling in all view/sproc scripts from the filesystem I can have a text file that lists the view/sproc scripts to run. When you have to take this step it means there is a little more work involved to maintain the list of scripts to run, but then the dependancies are managable.
Oh, and because of this you don't need to track dependencies. If a procedure you try to create/modify depends on another one which is not yet created or it's an older version, the procedure will still be created, but it will be invalid. After you created everything (in whatever order) you just recompile them, and that's it.
Thanks