Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But it is historically difficult because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often the DBMS writes data to storage.
The problem with configuration knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one knob can impact others), and not universal (i.e., what works for one application may be suboptimal for another). Worse, information about the effects of the knobs typically comes only from expensive, time-consuming trial-and-error experiences.
Another significant issue is that because there are hundreds of knobs per DBMS, it is beyond the capacity and reasoning of database administrators (DBAs) to tune all of them for each application. DBMS tuning guides strongly suggest that a DBA only changes one knob at a time. This advice is wise but woefully slow given for DBMS with a large number of tunable knobs. It is also not entirely helpful because changing one knob may affect the benefits of another. It is difficult enough for humans to understand the impact of one knob, let alone the interactions between multiple ones. That is why academia and industry are putting so much effort toward finding ways to automate database configuration tuning.
6 Things You Must Know to Avoid Surprises
Because tuning database configuration knobs is so complex, the saying "you don't know what you don't know" applies in spades—the entire tuning process can be full of surprises. Here are a few things to know and what surprises will arise when you start tuning any DBMS.
- Tuning is not a one-time activity. Non-stop tuning is essential because databases constantly change: data sizes, hardware capacity, new queries, and applications with evolving demands and workloads. This is why it is crucial always to be tuning your database and to automate the process as much as possible.
- Never rely on default settings. Part of what makes DBMSs so enigmatic is that their performance and scalability are highly dependent on their configurations. Further exacerbating this problem is that the out-of-the-box default configurations of DBMS knobs are notoriously bad. DBMS vendors intentionally set the default configuration to conservative values to ensure that the DBMS runs everywhere to consume the least resources possible and avoid security vulnerabilities. For example, the default configuration for MySQL 5.7 assumes that it is deployed on a machine that only has 512 MB of RAM. Know that the defaults always produce suboptimal results, even if they've been "pre-tuned" by a cloud database provider.
- Database knobs are interdependent. It is difficult to know how changing one knob interacts with and affects the others. Without an automated way to figure out dependencies, it takes a lot of time to try out knobs and determine their impact. The time requirements of such trial-and-error tuning to optimize a single database can consume days or even weeks, assuming that the application's workload is stable, which is usually not the case.