In distributed databases, no principle is quite as famous as “CAP” Theorem (also known as Brewer’s Theorem). CAP Theorem states that a distributed database can at most support two of the following three desirable characteristics:
- Consistency: Everybody gets the same view of the database.
- Availability: The database stays online through a failure of at least a minority of nodes.
- Partition Tolerance: The database stays available during network partitions.
CAP Theorem sounds complex, but it is pretty easy to understand in practice. If I have a database that has nodes in the U.S. and in Australia, what happens if the network between the U.S. and Australia fails? In some distributed databases, this is called the “split brain” scenario. Oracle’s RAC clustered database chooses consistency in this scenario: The smaller of the two partitions will shut down (probably Australia). Amazon’s Dynamo chooses availability: Both Australia and the U.S. stay online, but Australians and Americans may see slightly different views of the database until the network is restored. You simply cannot serve up identical views of an active database to two locations if there is no network connectivity between those two locations.
The implications of CAP Theorem, more than anything else, led to the schism in modern database management systems. With the rise of global applications with extremely high uptime requirements, it became unthinkable to sacrifice availability for perfect consistency. Almost in unison, the leading Web 2.0 companies such as Amazon, Google, and Facebook introduced new database services that were only “eventually” consistent but globally and highly available.
Google has tried to resolve this database schism with its Spanner SQL database. Google has not claimed to have overthrown the CAP Theorem but has instead adopted a “You don’t need to worry about that” philosophy.
CAP Theorem assumes that network partitions are inevitable in a wide area network. And in the universal wide area network of the internet, this is undoubtedly true—you simply can’t assume network availability when the network is constructed of so many varied service providers. But Spanner runs exclusively on Google’s global network. Google’s network has sufficient redundancy to eliminate hardware failure as a likely cause of a network partition and has adopted procedures designed to minimize the possibility of a human error-driven network failure.
Google doesn’t say a network partition is impossible; rather that it has made it much less probable than other possible failure scenarios. Therefore, for most applications, network partitions should no longer be a major concern. Nevertheless, it is worth noting that should a network partition occur, Spanner chooses consistency over availability, which means it has more in common with traditional databases such as Oracle than with next-generation databases such as Dynamo.
Another other novel feature of Spanner is its TrueTime system. Distributed databases go to a lot of effort to return consistent information from replicas maintained across the system. Locks are the primary mechanism to prevent inconsistent information from being created in the database, while snapshots are the primary mechanism for returning consistent information. Queries don’t see changes to data that occur while they are executing because they read from a consistent “snapshot” of data. Maintaining snapshots in distributed databases can be tricky: Usually there is a large amount of inter-node communication required to create agreement on the ordering of transactions and queries.
Google Spanner simplifies the snapshot mechanism by using GPS antennas and atomic clocks physically installed in each server. GPS provides an externally validated timestamp while the atomic clock provides high-resolution time between GPS “fixes.” The result is that every Spanner server across the world has the same clock time. This allows Spanner to order transactions and queries precisely without requiring inter-node communication.
Google engineering was once described as “Ph.D.s driving tanks.” Spanner is a typical Google combination of smart architecture, pragmatism, and brute-force engineering. It minimizes the impact of the CAP Theorem on distributed systems by providing a database that—almost—lets you “have it all.”