For the first time in over 20 years, there appear to be cracks forming in the relational model’s dominance of the database management systems market. The relational database management system (RDBMS) of today is increasingly being seen as an obstacle to the IT architectures of tomorrow, and - for the first time - credible alternatives to the relational database are emerging. While it would be reckless to predict the demise of the relational database as a critical component of IT architectures, it is certainly feasible to imagine the relational database as just one of several choices for data storage in next-generation applications.
The Last DBMS Revolution
The relational database architecture became dominant throughout the 1980s in conjunction with the rise of minicomputer and client-server architectures. Client-server applications were revolutionary in terms of ease of use, functionality, development and deployment costs. The relational database also made it easier for business to access and leverage DBMS data. Business Intelligence, reporting tools and the data warehouse entrenched the value of data and helped the relational database achieve almost total dominance by the mid-1990s.
The Failed OODBMS Revolution
However, from an application developer’s point of view, the relational model was not ideal. The RDBMS came to prominence during the same period as object-oriented (OO) programming. While relational database represented data as a set of tables with regular row-column structure, OO represented data in objects that not only associated behaviors, but which also had complex internal structure. The disconnect between these two representations created an “impendence mismatch” that reduced application cohesiveness.
In an attempt to resolve this disconnect, the object-oriented database management system (OODBMS) was established. In an OODBMS, application data is represented by persistent objects that match the objects used in the programming language.
However, OODBMS failed to have a significant impact. The OO model was programmer-centric and did not address business intelligence needs. Eventually, the Internet boom rendered the issue moot and the industry standardized on the more mature RDBMS. As a workaround, many application frameworks developed object-relational mapping (ORM) schemes which allowed object-oriented access to relational data.
Enter Utility Computing
The Internet gold rush and the global Y2K effort resulted in almost budget-less funding for computer hardware, software and staffing. However, since the bursting of the bubble, the IT industry had been subjected to unrelenting pressure to reduce cost.
It had been clear for some time that the allocation and utilization of computing resources was inherently inefficient. Because each application used dedicated hardware, the hardware had to be sized to match peak application processing requirements. Off-peak, these resources were wasted.
The utility computing concept introduced the idea of allowing computing resources to be allocated on demand in much the same way a power company makes make electricity available on-demand to consumers. Such an approach could reduce cost both through economies of scale and by averaging out peak demands between applications.
Virtualization, grid computing and the Internet as a universal wide area network have combined to deliver an emerging realization of the utility vision in the shape of a computing “cloud.”
In a cloud computing configuration, application resources - or even the application itself - are made available from virtualized resources located somewhere in the Internet (e.g., in the cloud).
fig1
Figure 1 Grids, virtual servers and the cloud
RDBMS Gets in the Way Again
Most components of modern applications can be deployed to a virtualized or grid environment without significant disruption. Web servers and applications servers all cluster naturally and resources can be added or removed from these layers simply by starting or stopping members of a cluster.
Unfortunately, it’s much harder to cluster databases. In a traditional database cluster, data must either be replicated across the cluster members, or partitioned between them. In either case, adding a machine to the cluster requires data to be copied or moved to the new node. Since this data shipping is a time-consuming and expensive process, databases are unable to be dynamically and efficiently provisioned on demand.
Oracle’s attempt to build a grid database - Oracle Real Application Cluster (RAC) - is in theory capable meeting the challenges of the cloud. However, RAC is seen as being too proprietary, expensive and high maintenance by most of those trying to establish computing clouds.
Cloud Databases
For those seeking to create public computing clouds (such as Amazon) or those trying to establish massively parallel, redundant and economical data-driven applications (such as Google), relational databases became untenable. These vendors needed a way of managing data that was almost infinitely scalable, inherently reliable and cost-effective.
Google’s BigTable solution was to develop a relatively simple storage management system that could provide fast access to petabytes of data, potentially redundantly distributed across thousands of machines.
Physically, BigTable resembles a B-tree index-organized table in which branch and leaf nodes are distributed across multiple machines. Like a B-tree, nodes “split” as they grow and - since nodes are distributed - this allows for very high scalability across large numbers of machines.
fig2
Figure 2 Cloud databases distribute data across many hosts
Data elements in BigTable are identified by a primary key, column name and (optionally) a timestamp. Lookups via primary key will be predictable and relatively fast. BigTable provides the data storage mechanism for Google App Engine - Google’s cloud based application environment.
Amazon’s SimpleDB is conceptually similar to BigTable and forms a key part of the Amazon Web Services (AWS) cloud computing environment. Microsoft’s SQL Server Data Services (SSDS) provides a similar capability.
The chasm between the database management capabilities of these cloud databases and mainstream relational databases is huge. Consequently, it’s easy to dismiss their long term potential.
However, for applications already using the ORM-based frameworks, these cloud databases can easily provide core data management functionality. Furthermore, they can provide this functionality with compelling scalability and economic advantages. In short, they exhibit the familiar signature of a disruptive technology - one that provides adequate functionality together with a compelling economic advantage.
Challenges for the Cloud Database
Cloud Databases still have significant technical drawbacks. These include:
* Transactional support and referential integrity. Applications using cloud databases are largely responsible for maintaining the integrity of transactions and relationships between “tables.”
* Complex data accesses. The ORM pattern - and cloud databases - excel at single row transactions - get a row, save a row, etc. However, most non-trivial applications do have to perform joins and other operations.
* Business Intelligence. Application data has value not only in terms of powering applications, but also as information which drives business intelligence. The dilemma of the pre-relational database - in which valuable business data was locked inside of impenetrable application data stores - is not something to which business will willingly return.
Cloud databases could displace the relational database for a significant segment of next-generation, cloud-enabled applications. However, business is unlikely to be enthusiastic about an architecture that prevents application data from being leveraged for BI and decision support purposes. An architecture that delivered the scalability and other advantages of cloud databases without sacrificing information management would therefore be very appealing. In the next part of this article, we’ll look at an intriguing proposal that seems to deliver just that.