The term "data integrity" can mean different things to different people, but the most difficult and pervasive problem facing organizations these days is the semantic integrity of the data. As organizations store and process more and more data from various disparate sources, ensuring that the data is accurate is a colossal, but sometimes ignored, undertaking. Making sure that your data is correct requires proper design, processes that match your business requirements, good communication skills, and constant vigilance.
Semantic data integrity requires a deep understanding of the meaning of data and relationships that need to be maintained between different types of data. The DBMS provides options, controls and procedures to define and assure the semantic integrity of the data stored within its databases. Examples include triggers and referential integrity, as well as the primary topic of this month’s column, check constraints.
Check Constraints are Supported in all Major DBMSs
Check constraints are supported in all of the major DBMS products, including DB2, Oracle, and SQL Server. But they are too-often ignored and not implemented in production databases. Properly defined, check constraints can provide enhanced data integrity without requiring procedural logic. A constraint is basically a restriction placed upon the data values that can be stored in a column or columns of a table.
Check constraints place specific data value restrictions on the contents of a column through the specification of a Boolean expression. The expression is explicitly defined in the table DDL and is formulated in much the same way that SQL WHERE clauses are formulated. Any attempt to modify the column data (i.e., during INSERT and UPDATE processing) will cause the expression to be evaluated. If the modification conforms to the Boolean expression, the modification is permitted to continue. If not, the statement will fail with a constraint violation.
Check constraints are easy to code because they are written using recognizable SQL syntax. The check constraint syntax consists of two components: a constraint name and a check condition. The constraint name is an SQL long identifier. The check condition defines the actual constraint logic and it can be defined using any of the basic predicates (>, <, =, <>, <=, >=), as well as BETWEEN, IN, LIKE, and NULL. Furthermore, AND and OR can be used to string conditions together. But the check condition cannot referto any other table.
Let’s take a quick look at an example of table with check constraints defined for it:
CREATE TABLE EMP
(EMPNO INTEGER
CONSTRAINT CHECK_EMPNO
CHECK (EMPNO BETWEEN 100 and 25000),
EMP_ADDRESS VARCHAR(70),
EMP_TYPE CHAR(8)
CHECK (EMP_TYPE IN (‘TEMP’, ‘FULLTIME’, ‘CONTRACT’)),
EMP_DEPT CHAR(3) NOT NULL WITH DEFAULT,
SALARY DECIMAL(7,2) NOT NULL
CONSTRAINT CHECK_SALARY
CHECK (SALARY < 50000.00),
COMMISSION DECIMAL(7,2),
BONUS DECIMAL(7,2)
) IN DB.TS;
This CREATE statement contains three different check constraints:
- The name of the first check constraint, defined on the EMPNO column, is CHECK_EMPNO. This constraint ensures that the EMPO column can contain values that range from 100 to 25000 (instead of the domain of all valid integers).
- The second check constraint for this table is on the EMP_TYPE column. This is an example of an unnamed constraint. Though this is possible, it is not recommended. It is best to always provide an explicit constraint name in order to ease identification and administration. This specific constraint restricts the values that can be placed into EMP_TYPE as: 'TEMP', 'FULLTIME', and 'CONTRACT'; no other values would be accepted.
- The last check constraint on this table is named CHECK_SALARY. It effectively ensures that no employee can be entered with a salary of more than $50,000. (Who wants to work there?)
You can see how coding check constraints can help to ensure the validity of data because the constraints place controls on the actual values that are permitted to be stored. The primary benefit of check constraints is the ability to enforce business rules directly in the database without requiring additional application logic. Once defined, the business rule is physically implemented and cannot be bypassed.
Fewer Data Integrity Problems Mean Higher Quality Databases
DBA and database designers should take advantage of the mechanisms provided by the DBMS to ensure data integrity. When DBMS-provided methods are used, fewer data integrity problems are likely to be found. Fewer data integrity problems mean higher quality databases and more proficient end users. You have to know what integrity rules are proper for the DBMS to enforce. But once defined, many of those rules can be enforced by the DBMS, thereby improving data quality … and your business processes.