Have we come full circle in database technologies? At this year’s San Jose Strata conference, Doug Turnbull (@softwaredoug) provided a recap of the last 30 to 40 years of database history, and posed that question.
Looking back, we can see three distinct waves of database technologies fairly clearly. Methods for collecting, organizing, and processing data existed before the electronic computer, of course. Libraries—especially those using comprehensive indexing systems—represent a form of data store. And even before the electronic computer, information was still being processed mechanically: Punch card technology was in widespread use at the end of the 19th century and the tabulating machines of that era gave rise to the IBM of modern times.
The electronic computer enabled the first revolution in database technologies by allowing high-speed and random access to huge amounts of information. In the first database systems—what we now call pre-relational databases—the physical structure of the data on disk determined the way in which the data could be accessed and navigated. Relationships between data items—between customers and orders for instance—were predefined in the very structure of the database. There was little capability for ad hoc query, and only sophisticated programmers could hope to extract data from such a system.
During the late 1960s, Edgar Codd—an ex-RAF (Royal Air Force) pilot and mathematician who was working at IBM—became convinced that there was a more theoretically rigorous way to represent data on databases and to allow these databases to be accessed without complex programming. Thus arose the relational database model, which ushered in the era of RDBMS dominance. For a generation of database professionals—more than 25 years—the relational database has reigned supreme.
There are some fairly sophisticated aspects to relational theory. However, at the most fundamental level, the relational database describes how data should be represented logically in a way which eliminates inconsistency and redundancy. In a relational system, underlying physical storage, such as the order of records on disk, the presence of indexes, or the way in which related data is linked, does not affect the way in which a user or application might query or access this data.
While NoSQL systems successfully respond to the limitations of CAP theorem, they reinvent some of the issues that led to the genesis of the relational model.
Relational databases were characterized as well by an ACID (atomic-consistent-independent-durable) transaction model which ensured that all users had the same view of the data at any instant. This insistence on strict consistency became a drawback in the era of web-scale applications. Brewer’s CAP theorem demonstrated that databases could not achieve global availability and consistency simultaneously. Should the network connection between two geographical locations be lost, an ACID-compliant database must fail in one of those two regions in order to maintain strict consistency. In clustered relational databases this is referred to as the “split brain” problem.
NoSQL databases, especially those based on Amazon’s DynamoDB, were explicitly designed to allow the database to continue operating in the presence of such a network partition by sacrificing strict consistency. Instead, such a database offers eventual consistency.However, as Turnbull pointed out in his Strata talk, while these NoSQL systems successfully respond to the limitations of CAP theorem, they reinvent some of the issues that led to the genesis of the relational model. In particular, the data model again reflects the physical storage rather than the logical organization of data. And, again, we have created databases which can only be queried effectively by experienced programmers.
There’s no doubt that the new wave of nonrelational systems represents an important and necessary revolution in database technology. We need to avoid being wedded to the technologies of the past and continuously innovate. At the same time, ignoring the lessons of history is never a good idea.
The slides from Doug Turnbull’s talk are available at www.slideshare.net/o19s/codd-tobrewer.