Over the past decade, we’ve seen a wave of diversification followed by consolidation in database technologies. Relational databases such as Oracle, MySQL, and SQL Server completely dominated database technology until a relatively sudden explosion of new “NoSQL” databases emerged in the 2008–2010 time frame. These new databases rejected the ACID transaction model, SQL language, and relational database model in order to achieve greater scalability and/or developer productivity.
But in the last 5 years, we’ve seen a blurring of the distinction between many of these upstart databases and the traditional SQL databases. NoSQL databases such as MongoDB have added features typically associated with relational databases—transactions, SQL connectors, and the like—while the SQL databases have introduced support for JSON document models. We can see that databases such as PostgreSQL and MongoDB are increasingly converging on a common set of features. They will probably always be distinct in terms of their strongest features, but the gap is narrowing. However, one category of NoSQL databases seems to be bucking the convergence trend: graph databases.
The Importance of Relationships
In a relational or document database, the primary item of interest is an object. In a SQL database, the object might be represented by a row or a joined collection of rows, while in a document database such as MongoDB, the object of interest is represented as a JSON document. However, in some contexts, the relationships between objects are every bit as important as those objects themselves. It is in this domain that graph databases shine.
Graphs are familiar in our everyday world from social networks such as those represented in Facebook or Twitter. Finding “friends of friends” is a typical graph database use case. However, graph databases are also important in many scientific and engineering tasks, such as exploring the “explosion of parts” in complex machinery, in genomic analysis, and in computer network analysis.
Similar to the relational database, but unlike many non-relational systems, graph databases are based on a strong theoretical foundation. Graph theory is a long-established branch of mathematics with many practical applications in medicine, physics, and sociology, as well as in computer science. And some graph database standards—such as the RDF (Resource Description Framework) standard—date all the way back to the 1990s.
However, graph databases probably entered the mainstream with the emergence of Neo4j, a Java-based graph database that originally was an embedded database for Java applications. Today, Neo4j is available as an embedded database, standalone cluster, fault-tolerant and scalable cluster, or as a fully managed cloud service. Neo4j recently completed a $325 million investment round—the largest ever for a private database company—giving Neo4J a valuation of $2 billion.
Graph Database Providers
There are many other graph database contenders, such as TigerGraph, a far more recent technology, which aims to compete with Neo4j by providing faster graph processing. TigerGraph has also launched a cloud database service and raised $105 million in funding this year.
Moreover, graph capabilities have been added to mainstream database systems. Datastax, which commercializes the Apache Cassandra database, is a major contributor to the open source Gremlin graph database language and is heavily investing in graph capabilities within its flagship database product. Oracle offers graph capabilities using the Neo4j-compatible Cypher language. MongoDB and Microsoft’s Cosmos database also have graph capabilities.
However, graph capabilities layered on top of a non-graph database model can only go so far. Generally, real-time graph analytics on large datasets can only be provided by a database that uses a graph data model at the most fundamental level. Consequently, it seems as if graph databases might continue as a distinct niche in the database model.
Graph databases are not going to displace the traditional relational systems or take market share from the larger NoSQL databases such as MongoDB or Cassandra. But they represent a valid, distinct, and growing niche in the overall database landscape.