By Vijay Karumajji, Database Solutions Architect, Amazon Web Services
Introduction
Developers have a strong affinity for deploying their applications on open-source database engines, such as MySQL, MariaDB, and PostgreSQL. Amazon Relational Database Service (Amazon RDS) offers these engines as a managed service and manages complex and time-consuming administrative tasks, such as database engine software installation and upgrades, storage management, replication for high availability and read throughput, and backups for disaster recovery. Amazon RDS makes it easier to set up, operate, and scale database deployments on the cloud. AWS is continuously innovating to deliver price-performance improvements to customers to enhance application performance, and it has recently introduced Amazon RDS Optimized Reads for RDS for MySQL, RDS for MariaDB, and RDS for PostgreSQL. It also released Amazon RDS Optimized Writes for RDS for MySQL and RDS for MariaDB. With Optimized Reads and Optimized Writes you can improve query processing and write throughput by up to 2x at no additional cost. In this article, we will review how these features improve price-performance and explore the different kinds of applications that stand to gain from them.
What are Optimized Reads and Optimized Writes?
Amazon RDS Optimized Reads enabled instances achieve faster query processing by placing temporary tables generated by MySQL, MariaDB, and PostgreSQL on local Non-Volatile Memory Express (NVMe) based solid state drive (SSD) block level storage that is physically connected to the host server. This feature can significantly speed up complex queries, including those that involve sorts, hash aggregations, high-load joins, and Common Table Expressions (CTEs), resulting in up to a 2x faster improvement in execution time on these Amazon RDS engines. Optimized Reads is automatically enabled when you select an instance type with local storage, like m5d or r6gd.
In MySQL and MariaDB you are protected from data loss due to unexpected events, such as a power failure, using a built-in feature called the “doublewrite buffer.” RDS for MySQL and RDS for MariaDB support Optimized Writes, which ensure protection from data loss without needing to write data twice while maintaining ACID properties. Optimized Writes uses the AWS Nitro System, a combination of dedicated hardware and lightweight hypervisor, to ensure 16KiB pages can be reliably and durably written directly to data files in one step, providing customers up to 2x improvement in write transaction throughput.
How do Optimized Reads and Optimized Writes Work?
Optimized Reads
Your application can create two types of temporary tables for optimal performance, internal (or implicit) temporary tables and user-created (or explicit) temporary tables. Database engines, such as MySQL, MariaDB, or PostgreSQL, create internal temporary tables to handle operations such as sorting aggregation, derived tables, or common table expressions (CTEs). When you use the CREATE TEMPORARY TABLE statement, you create a user-created (or explicit) temporary table. A SQL operation requires these intermediate results as a temporary dataset to compute the actual end result. At times, depending on the volume of data, building this intermediate result set requires scanning through hundreds of millions of rows from multiple tables, which can be both time- and resource-consuming.
When applications generate intermediate result sets, database engines create temporary tables to in-memory. This might be unsuccessful because of incompatible data types (BLOB or TEXT) or configuration limits (tmp_table_size or max_heap_table_size). In those scenarios, the database converts the table to an on-disk temporary table instead of holding it in-memory. Depending on the database engine, these disk-based internal temporary tables use either Aria or MyISAM. In addition, queries that perform grouping and sorting operations use files’ on disk storage to process the results. These files are removed soon after the completion of the associated queries. For more information, see Internal Temporary Table Use in MySQL and Aria storage engine.
MySQL 8.0 uses the TempTable storage engine as the default storage engine for in-memory internal temporary tables. Depending on the workload, you can customize the TempTable storage engine behavior. For example, the temptable_max_ram variable defines the maximum amount of RAM that can be used by the TempTable storage before it allocates space for memory-mapped files or before using InnoDB on-disk internal temporary tables. The variable tmp_table_size defines the maximum size of any individual in-memory internal temporary table. For more information about TempTable storage engine behavior on Amazon RDS for MySQL, see Use the TempTable storage engine on Amazon RDS for MySQL and Amazon Aurora MySQL.
In PostgreSQL, temporary objects are assigned to a temporary namespace that drops automatically at the end of the session. The temporary namespace while dropping removes any objects that are session-dependent, including schema-qualified objects, such as tables, functions, operators, or even extensions. In RDS for PostgreSQL, the temp_tablespaces parameter is configured for this temporary work area where the temporary objects are stored.
On Amazon RDS instances that do not have Optimized Reads enabled, the overflow path will change to use the on-disk temporary table’s both persistent (User-Created Temporary Tables) and non-persistent (Internal Temporary Tables) objects. These objects are stored in Amazon Elastic Block Store (Amazon EBS). Depending on the amount of storage provisioned and type of storage used, the performance of your workload may vary. To achieve optimal performance for your workload, you may need to consider using high throughput disk storage.
Optimized Reads is automatically enabled on instances with local storage. Amazon RDS instances utilize an EC2 instance store to store temporary objects. With Optimized Writes enables instances, any temporary objects, like tables and files, are stored on the local NVMe-based SSD block-level storage that’s physically connected to the host server (instance store). This storage optimizes your workload for low latency, high random I/O performance, and high sequential read throughput. These instance store volumes provide higher IOPS and performance, regardless of the storage configurations used for persistent Amazon EBS storage. Hence, Optimized Reads provides up to twice the read performance for workloads that heavily rely on a temporary work area.
The workloads that generate temporary objects in Amazon RDS for query processing can take advantage of the instance store for faster query processing. Optimized Reads are supported on RDS for MySQL versions 8.0.28 and higher; RDS for MariaDB versions 10.4.25, 10.5.16, 10.6.10, and higher; and RDS for PostgreSQL versions 13.10, 14.7, 15.2, and higher. Across all three engines, Optimized Reads supports instance classes that have the special notation “d.” For example, db.r5d, db.r6gd, db.m5d, db.x2idn, and so on. The instance class you choose for your applications will determine the size of the EC2 instance store. You can implement Optimized Reads by choosing this new option during instance creation or by modifying the DB instance class of your existing Amazon RDS instance to an Optimized Reads-enabled instance class. As a result, Amazon RDS can automatically improve performance for the temporary work area by using local storage instead of Amazon Elastic Block Store (Amazon EBS) for database workloads. See the AWS Documentation to review the instance store volume size allocated for each instance class.