By John Maxwell, Senior Product Marketing Manager, AWS Database Services
Overview
NoSQL databases, also referred to as non-relational, continue to grow in popularity both on-premises and in the cloud. Of the 395 different database management systems tracked by site DB-engines, more than half are non-relational. These include key-value, document, graph, time series, and many more database models that fall into the NoSQL category.
As more organizations modernize their applications or develop new applications from scratch, they adopt a microservices-based architecture that allows them to take advantage of the benefits of NoSQL databases. Both legacy commercial databases and open-source relational databases can sometimes struggle to handle the load of performance intensive internet-scale applications. In many cases, relational database workloads degrade in performance as they grow. For example, the relational concept of JOINs can have a negative impact on application performance.
SQL supports four types of database JOINs; INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF JOIN. When a SQL query requires a JOIN, the system must run an algorithm to choose an access pattern and request certain records from each table involved. Depending on the size of the table, the system may perform a HASH JOIN which uses a lot of CPU to hash values found in each table to make matching faster, resulting in a CPU bottleneck. Also, a JOIN can cause a slowdown in workload performance due to I/O latency of accessing data randomly. Lastly, JOINs require a high level of SQL competence compared to a NoSQL model where a table can simply be created and then accessed via API.
This discussion will focus on the features that organizations should look for when selecting a NoSQL, or non-relational database for their applications.
Fully-Managed Databases
As noted in Gartner’s “DBMS Market Transformation 2021:The Big Picture,” the shift of databases to the cloud is one of the largest macro level changes to happen to the database market in decades. Cloud-native versus hosted are very different types of “cloud databases.” Cloud-native are databases that were born in the cloud like Amazon DynamoDB and are fully-managed. Fully-managed cloud databases remove the administration overhead of installing the database engine, applying software patches, setting up backups, and other overhead associated with installing and maintaining a database. Cloud-hosted databases are when an organization moves from on-premises to compute services like Amazon EC2 where they no longer have to buy and provision hardware or maintain datacenters, but they still must be responsible for the software upkeep for their databases and associated utilities.
Cloud-native database services are dominating the growth in the overall database market and should surpass on-premises in the coming year or years. The biggest benefit is the removal of all the administrative overhead associated with maintaining databases.
Serverless Changes the Game
Just as virtualization forever changed the x86 server market by allowing organizations to make full use of their servers, serverless is changing the cloud. With serverless, organizations simply pay for the resources they use. Serverless database services allow an organization to scale up or down dynamically (instantly in most cases) without the need to provision for peak resources. This not only negates the need for complex capacity planning, but has a major positive impact on TCO (Total Cost of Ownership) since a customer no longer has to pay for resources they aren’t using. The term “scaling to zero” applies to database services where the customer pays nothing for the service when it is not in use, depending on how a particular database service charges.
Serverless represents the future of not only NoSQL, but all databases because it essentially provides an API for applications to utilize with no administrative overhead, complex capacity planning, or complicated billing. With serverless databases, capacity is on-demand with infinite scalability and the cost model is one of paying for only the resources utilized.
Horizontal Scaling
Traditional database scaling has been done with vertical scaling by increasing the compute resources available to the database, such as processors and memory. The issue with vertical scaling is the physical limit of servers (and associated virtual machines). However, horizontal scaling, also known as scale-out, is what organizations should look for in order to meet workload demand today and tomorrow. With horizontal scaling, the database distributes the workload across multiple nodes, referred to as sharding. Relational databases can have a hard time with sharding due to the difficulty of spreading data across nodes and managing those nodes. NoSQL databases, such as Amazon DynamoDB can deliver performance at scale with horizontal scaling with no management overhead. A proof point for DynamoDB horizonal scaling can be seen with Amazon Prime Day 2022 where it maintained single-digit millisecond response, peaking at 105.2 million requests per second.
Availability and Reliability
Organizations can spend a large amount of their IT capital expenditure budget to make sure their databases provide the availability and reliability to support business-critical applications. However, even with regionally replicated databases, it can be hard to provide 24/forever availability due to software patching and inherent operating system and hardware issues.
For cloud-native database services, organizations should look for services with 99.99% or higher availability. Since cloud-native database services by definition are fully managed, their architecture, infrastructure, and operational excellence allow them to support internet-scale workloads better than most on-premises, self-administered databases. For example, Amazon DynamoDB provides active-active, multi-region replication at a global-scale supporting both local reads and writes. This not only provides failover in seconds, but provides application portability if a workload is moved to another region.
Data Integrity and Security
ACID stands for the properties of transactions in a database management system: Atomicity, Consistency, Isolation and Durability. These properties ensure a transaction leaves a database in a consistent state in the event of unexpected errors. For many enterprise workloads, the underlying database must be ACID compliant.
Additionally, most organizations should look for database services that provide physical data encryption. The most common is “encryption at rest” where the data is encrypted on the disk drives that store the database. Encryption of data in motion is also an important consideration for data transmitted over a network.
Summary
Given the rise of NoSQL databases, the bar must be set high to support business-critical and internet-scale applications. The minimum requirements outlined in this discussion are just that, a minimum set of features to support modern applications. Organizations should also look at additional features that may be of importance like change tracking, the ability to choose cheaper storage for infrequently accessed data, and integration with analytics applications without the need to perform ETL (Extract, Transform, Load). The good news is that organizations have many choices when it comes to selecting a NoSQL database. The hard part is making sure the database can deliver the required features, performance, and availability.
AWS provides 8 non-relational database services that allow organizations to find the best-fit database for their application. For example, AWS’ flagship non-relational database service, Amazon DynamoDB, meets all of the criteria outlined in this article and more. Most AWS database services offer a free tier, including Amazon DynamoDB.