Data integrity rules fall into one of three categories: entity, referential, and domain. We want to briefly describe these terms to provide a complete discussion.
Entity Integrity
Entity integrity ensures each row in a table is a uniquely identifiable entity. You can apply entity integrity to a table by specifying a PRIMARY KEY constraint. For example, the ProductID column of the Products table is a primary key for the table.
Referential Integrity
Referential integrity ensures the relationships between tables remain preserved as data is inserted, deleted, and modified. You can apply referential integrity using a FOREIGN KEY constraint. The ProductID column of the Order Details table has a foreign key constraint applied referencing the Orders table. The constraint prevents an Order Detail record from using a ProductID that does not exist in the database. Also, you cannot remove a row from the Products table if an order detail references the ProductID of the row.
Entity and referential integrity together form key integrity.
Domain Integrity
Domain integrity ensures the data values inside a database follow defined rules for values, range, and format. A database can enforce these rules using a variety of techniques, including CHECK constraints, UNIQUE constraints, and DEFAULT constraints. These are the constraints we will cover in this article, but be aware there are other options available to enforce domain integrity. Even the selection of the data type for a column enforces domain integrity to some extent. For instance, the selection of datetime for a column data type is more restrictive than a free format varchar field.
The following list gives a sampling of domain integrity constraints.
- A product name cannot be NULL.
- A product name must be unique.
- The date of an order must not be in the future.
- The product quantity in an order must be greater than zero.
Unique Constraints
As we have already discussed, a unique constraint uses an index to ensure a column (or set of columns) contains no duplicate values. By creating a unique constraint, instead of just a unique index, you are telling the database you really want to enforce a rule, and are not just providing an index for query optimization. The database will not allow someone to drop the index without first dropping the constraint.
From a SQL point of view, there are three methods available to add a unique constraint to a table. The first method is to create the constraint inside of CREATE TABLE as a column constraint. A column constraint applies to only a single column. The following SQL will create a unique constraint on a new table: Products_2.
CREATE TABLE Products_2 ( ProductID int PRIMARY KEY, ProductName nvarchar (40) Constraint IX_ProductName UNIQUE )
This command will actually create two unique indexes. One is the unique, clustered index given by default to the primary key of a table. The second is the unique index using the ProductName column as a key and enforcing our constraint.
A different syntax allows you to create a table constraint. Unlike a column constraint, a table constraint is able to enforce a rule across multiple columns. A table constraint is a separate element in the CREATE TABLE command. We will see an example of using multiple columns when we build a special CHECK constraint later in the article. Notice there is now a comma after the ProductName column definition.
CREATE TABLE Products_2 ( ProductID int PRIMARY KEY, ProductName nvarchar (40), CONSTRAINT IX_ProductName UNIQUE(ProductName) )
The final way to create a constraint via SQL is to add a constraint to an existing table using the ALTER TABLE command, as shown in the following command:
CREATE TABLE Products_2 ( ProductID int PRIMARY KEY, ProductName nvarchar (40) ) ALTER TABLE Products_2 ADD CONSTRAINT IX_ProductName UNIQUE (ProductName)
If duplicate data values exist in the table when the ALTER TABLE command runs, you can expect an error message similr to the following:
Server: Msg 1505, Level 16, State 1, Line 1 CREATE UNIQUE INDEX terminated because a duplicate key was found for index ID 2. Most significant primary key is 'Hamburger'. Server: Msg 1750, Level 16, State 1, Line 1 Could not create constraint. See previous errors. The statement has been terminated.
Check Constraints
Check constraints contain an expression the database will evaluate when you modify or insert a row. If the expression evaluates to false, the database will not save the row. Building a check constraint is similar to building a WHERE clause. You can use many of the same operators (>, <, <=, >=, <>, =) in additional to BETWEEN, IN, LIKE, and NULL. You can also build expressions around AND and OR operators. You can use check constraints to implement business rules, and tighten down the allowed values and formats allowed for a particular column.
We can use the same three techniques we learned earlier to create a check constraint using SQL. The first technique places the constraint after the column definition, as shown below. Note the constraint name is optional for unique and check constraints.
CREATE TABLE Products_2 ( ProductID int PRIMARY KEY, UnitPrice money CHECK(UnitPrice > 0 AND UnitPrice < 100) )
In the above example we are restricting values in the UnitPrice column between 0 and 100. Let’s try to insert a value outside of this range with the following SQL.
INSERT INTO Products_2 VALUES(1, 101)
The database will not save the values and should respond with the following error.
Server: Msg 547, Level 16, State 1, Line 1 INSERT statement conflicted with COLUMN CHECK constraint 'CK__Products___UnitP__2739D489'. The conflict occurred in database 'Northwind', table 'Products_2', column 'UnitPrice'. The statement has been terminated.
The following sample creates the constraint as a table constraint, separate from the column definitions.
CREATE TABLE Products_2 ( ProductID int PRIMARY KEY, UnitPrice money, CONSTRAINT CK_UnitPrice2 CHECK(UnitPrice > 0 AND UnitPrice < 100) )
Remember, with a table constraint you can reference multiple columns. The constraint in the following example will ensure we have either a telephone number or a fax number for every customer.
CREATE TABLE Customers_2 ( CustomerID int, Phone varchar(24), Fax varchar(24), CONSTRAINT CK_PhoneOrFax CHECK(Fax IS NOT NULL OR PHONE IS NOT NULL) )
You can also add check constraints to a table after a table exists using the ALTER TABLE syntax. The following constraint will ensure an employee date of hire is always in the past by using the system function GETTIME.
CREATE TABLE Employees_2 ( EmployeeID int, HireDate datetime ) ALTER TABLE Employees_2 ADD CONSTRAINT CK_HireDate CHECK(hiredate < GETDATE())
Check Constraints and Existing Values
As with UNIQUE constraints, adding a CHECK constraint after a table is populated runs a chance of failure, because the database will check existing data for conformance. This is not optional behavior with a unique constraint, but it is possible to avoid the conformance test when adding a CHECK constraint using WITH NOCHECK syntax in SQL.
CREATE TABLE Employees_2 ( EmployeeID int, Salary money ) INSERT INTO Employees_2 VALUES(1, -1) ALTER TABLE Employees_2 WITH NOCHECK ADD CONSTRAINT CK_Salary CHECK(Salary > 0)
Check Constraints and NULL Values
Earlier in this section we mentioned how the database will only stop a data modification when a check restraint returns false. We did not mention, however, how the database allows the modification to take place if the result is logically unknown. A logically unknown expression happens when a NULL value is present in an expression. For example, let’s use the following insert statement on the last table created above.
INSERT INTO Employees_2 (EmployeeID, Salary) VALUES(2, NULL)
Even with the constraint on salary (Salary > 0) in place, the INSERT is successful. A NULL value makes the expression logically unknown. A CHECK constraint will only fail an INSERT or UPDATE if the expression in the constraint explicitly returns false. An expression returning true, or a logically unknown expression will let the command succeed.
Restrictions On Check Constraints
Although check constraints are by far the easiest way to enforce domain integrity in a database, they do have some limitations, namely:
- A check constraint cannot reference a different row in a table.
- A check constraint cannot reference a column in a different table.
NULL Constraints
Although not a constraint in the strictest definition, the decision to allow NULL values in a column or not is a type of rule enforcement for domain integrity.
Using SQL you can use NULL or NOT NULL on a column definition to explicitly set the nullability of a column. In the following example table, the FirstName column will accept NULL values while LastName always requires a non NULL value. Primary key columns require a NOT NULL setting, and default to this setting if not specified.
CREATE TABLE Employees_2 ( EmployeeID int PRIMARY KEY, FirstName varchar(50) NULL, LastName varchar(50) NOT NULL, )
If you do not explicitly set a column to allow or disallow NULL values, the database uses a number of rules to determine the "nullability" of the column, including current configuration settings on the server. I recommended you always define a column explicitly as NULL or NOT NULL in your scripts to avoid problems when moving between different server environments.
Given the above table definition, the following two INSERT statements can succeed.
INSERT INTO Employees_2 VALUES(1, 'Geddy', 'Lee') INSERT INTO Employees_2 VALUES(2, NULL, 'Lifeson')
However, the following INSERT statement should fail with the error shown below.
INSERT INTO Employees_2 VALUES(3, 'Neil', NULL) Server: Msg 515, Level 16, State 2, Line 1 Cannot insert the value NULL into column 'LastName', table 'Northwind.dbo.Employees_2'; column does not allow nulls. INSERT fails. The statement has been terminated.
You can declare columns in a unique constraint to allow NULL values. However, the constraint checking considers NULL values as equal, so on a single column unique constraint, the database allows only one row to have a NULL value.
Default Constraints
Default constraints apply a value to a column when an INSERT statement does not specify the value for the column. Although default constraints do not enforce a rule like the other constraints we have seen, they do provide the proper values to keep domain integrity in tact. A default can assign a constant value, the value of a system function, or NULL to a column. You can use a default on any column except IDENTITY columns and columns of type timestamp.
The following example demonstrates how to place the default value inline with the column definition. We also mix in some of the other constraints we have seen in this article to show you how you can put everything together.
CREATE TABLE Orders_2 ( OrderID int IDENTITY NOT NULL , EmployeeID int NOT NULL , OrderDate datetime NULL DEFAULT(GETDATE()), Freight money NULL DEFAULT (0) CHECK(Freight >= 0), ShipAddress nvarchar (60) NULL DEFAULT('NO SHIPPING ADDRESS'), EnteredBy nvarchar (60) NOT NULL DEFAULT(SUSER_SNAME()) )
We can examine the behavior of the defaults with the following INSERT statement, placing values only in the EmployeeID and Frieght fields.
INSERT INTO Orders_2 (EmployeeID, Freight) VALUES(1, NULL)
If we then query the table to see the row we just inserted, we should see the following results.
OrderID:1 EmployeeID:1 OrderDate:2003-01-02 Freight: NULL ShipAddress: NO SHIPPING ADDRESS EnteredBy: sa
Notice the Freight column did not receive the default value of 0. Specifying a NULL value is not the equivalent of leaving the column value unspecified, the database does not use the default and NULL is placed in the column instead.
Maintaining Constraints
In this section we will examine how to delete an existing constraint. We will also take a look at a special capability to temporarily disable constraints for special processing scenarios.
Dropping Constraints
First, let’s remove the check on UnitPrice in Product table.
ALTER TABLE Products DROP CONSTRAINT CK_Products_UnitPrice
If all you need to do is drop a constraint to allow a one time circumvention of the rules enforcement, a better solution is to temporarily disable the constraint, as we explain in the next section.
Disabling Constraints
Special situations often arise in database development where it is convenient to temporarily relax the rules. For example, it is often easier to load initial values into a database one table at a time, without worrying with foreign key constraints and checks until all of the tables have finished loading. After the import is complete, you can turn constraint checking back on and know the database is once again protecting the integrity of the data.
Note: The only constraints you can disable are the FOREIGN KEY constraint, and the CHECK constraint. PRIMARY KEY, UNIQUE, and DEFAULT constraints are always active.
Disabling a constraint using SQL is done through the ALTER TABLE command. The following statements disable the CHECK constraint on the UnitsOnOrder column, and the FOREIGN KEY constraint on the CategoryID column.
ALTER TABLE Products NOCHECK CONSTRAINT CK_UnitsOnOrder ALTER TABLE Products NOCHECK CONSTRAINT FK_Products_Categories
If you need to disable all of the constraints on a table, manually navigating through the interface or writing a SQL command for each constraint may prove to be a laborious process. There is an easy alternative using the ALL keyword, as shown below:
ALTER TABLE Products NOCHECK CONSTRAINT ALL
You can re-enable just the CK_UnitsOnOrder constraint again with the following statement:
ALTER TABLE Products CHECK CONSTRAINT CK_UnitsOnOrder
When a disabled constraint is re-enabled, the database does not check to ensure any of the existing data meets the constraints. We will touch on this subject shortly. To turn on all constraints for the Products table, use the following command:
ALTER TABLE Products CHECK CONSTRAINT ALL
Manually Checking Constaints
With the ability to disable and re-enable constraints, and the ability to add constraints to a table using the WITH NOCHECK option, you can certainly run into a condition where the referential or domain integrity of your database is compromised. For example, let’s imagine we ran the following INSERT statement after disabling the CK_UnitsOnOrder constraint:
INSERT INTO Products (ProductName, UnitsOnOrder) VALUES('Scott''s Stuffed Shells', -1)
The above insert statement inserts a -1 into the UnitsOnOrder column, a clear violation of the CHECK constraint in place on the column. When we re-enable the constraint, SQL Server will not complain as the data is not checked. Fortunately, SQL Server provides a Database Console Command you can run from any query tool to check all enabled constraints in a database or table. With CK_UnitsOnOrder re-enabled, we can use the following command to check for constraint violations in the Products table.
dbcc checkconstraints(Products)
To check an entire database, omit the parentheses and parameter from the DBCC command. The above command will give the following output and find the violated constraint in the Products table.
Table Constraint Where --------- ---------------- -------------------- Products CK_UnitsOnOrder UnitsOnOrder = '-1'
You can use the information in the DBCC output to track down the offending row.
Summary
Through the course of this article we learned how to use various constraints to ensure the data in our database stays intact and matches our expectations. The proper use of constrains is the prevention needed to avoid data integrity problems as your application grows.