In this issue's column I'll be providing a fundamental introduction to database and database management concepts. Many of you may think that they understand the basic concepts and fundamentals of database technology. But quite a few of you likely do not, so please do not skip over this.
First of all, what is a database? DB2 is not a database; neither are Informix, Oracle and SQL Server. Each of these is a DBMS, or Database Management System. You can use DB2 (or Informix or SQL Server) to create a database, but DB2, in and of itself, is not a database.
A database is a large structured set of persistent data. So a phone book is a database. But within the world of IT, a database usually is associated with software. A simple database might be a single file containing many records, each of which contains the same set of fields where each field is a certain data type and length. In short, a database is an organized store of data wherein the data is accessible by named data elements.
A Database Management System (DBMS) is a software package designed to create, store, and manage databases. The DBMS software enables users to share data. It provides a systematic method of creating, updating, retrieving and storing information in a database. DBMSs also are generally responsible for data integrity, data access control, and automated rollback, restart and recovery.
So, DB2, Oracle, et al. are database management systems. Your payroll application uses the payroll database, which is implemented using a DBMS. This distinction is important because it minimizes confusion and improves clarity.
So why do we use a DBMS? The main advantage of using a DBMS is to impose a logical, structured organization on the data. A DBMS delivers economy of scale for processing large amounts of data because it is optimized for such operations.
A DBMS can be distinguished by the model of data upon which it is based. A data model is a collection of concepts used to describe data. There are two fundamental components of a data model: its structure, that is, the way data is stored, and its operations, that is the way that data can be manipulated. There are four major data models used by the most popular DBMS products:
1. Hierarchical
2. Network (or CODASYL)
3. Relational
4. Object-oriented
I refer to each of these as a data model for simplicity sakes. Actually, only the relational and network models have any true, formal data model specification. The others are more loosely defined. Different models of data lead to different logical and structural data organizations. The relational model is the most popular data model. It offers the easiest abstraction, while providing powerful data manipulation and access capabilities.
Of course, there is no requirement to stop at four models. There are always newer "data models" popping up, such as the NoSQL movement.
Using a DBMS provides a central store of data that can be accessed by multiple users, from multiple locations. Data can be shared among multiple applications, instead of new iterations of the same data being propagated and stored in new files for every new application. Sadly, it is true that many DBMSes are not used this way, instead data is duplicated over and over in multiple databases throughout the corporation. But this is not a DBMS problem, it is a usage problem.
Central storage and management of data within a DBMS provides:
- Data abstraction and independence
- Data security
- A locking mechanism for concurrent access with ACID properties (ACID is an acronym for atomicity, consistency, isolation, and durability)
- An efficient handler to balance the needs of multiple applications using the same data
- The ability to swiftly recover from crashes and errors
- Robust data integrity capabilities
- Simple access using a standard API
- Uniform administration procedures for data
The most important characteristic of a DBMS is its ability to maintain and query large amounts of data while assuring data integrity and consistency. It offers transparent recovery from failures, concurrent access, and data independence. In fact, most modern computer applications rely on DBMS and database technology to manage data. Understanding the topic is of benefit to all IT professionals.
Summary
While this review of DBMS fundamentals is brief, I urge you to research the topic more fully. If you require additional details on the basic operations and qualities of DBMSs and databases, I recommend Chris Date's 8th edition of An Introduction to Database Systems for an academic and theoretical approach to the material, and Joe Celko's Data & Databases: Concepts In Practice for a good practical overview of the topic.