In the days of cloud computing and agile development, it might seem that being a DBA is somewhat less appealing as a career choice. However, the role of DBA is changing; it is not going away. In fact, it is becoming even more important. That said, if DBAs are to explain their value to the rest of the business, they must be seen as more than the company’s data librarians. Rather than belonging in a musty old library, DBAs should be responsible for creating the foundations to turn their organizations into high-performance businesses fit for the digital era. This means addressing a number of technical requirements, but above all, it boils down to how fast an organization can use its data to gain business-impacting insights. To achieve this, DBAs must retool their query optimizing skills to become data performance pros!
Database Complexity
This is no small challenge. The pressure to deliver applications at an ever-faster rate has given developers the freedom to “do whatever it takes” to build at speed and ignore the constraints of traditional corporate IT processes. This, in turn, has given rise to a whole new variety of database architectures such as document databases, which allow developers to compile data more quickly.
The document model works well when there are many variables that the developer wants to address. For example, consider a customer that is taking his or her car to the garage for service. The customer record will contain information that is structured, such as billing information and the make and model of the car. Some of it may not be, because the mechanic may have taken ad hoc notes or taken pictures of the repair work to explain what has been done to fix it. This could sit in a document database, enabling the developer to have all the information about the customer represented in a simple way, and information could be added to the database very quickly as the status of the car changes. The same approach could be used for medical records, where the information about the patient is always accessed through the record, but lots of different data types, such as diagnosis information, X-ray material, courses of treatment being recommended, and general notes from the physicians, are continuously added.
All this sounds great if you are an application developer, but it does make it harder to search and analyze data. Compare this to the old days when DBAs controlled the database structure to get the best out of it. The DBA was the data expert, who understood there are trade-offs and how to optimize systems to get the best performance overall. The DBA would make sure the database used the right indexes, normalized the schema, and optimized the overall performance of queries. For instance, while spreading the information about a customer or patient across several tables in a normalized schema may make the application developer’s job less straightforward when creating the record, a query to find everyone with an overdue bill could be highly optimized.
Databases Demand a Hybrid, Multi-Model Approach
Today, with so many types of data modeling options, it is harder to optimize data access across the different formats and access patterns. It can be very convenient to use document models to organize all the information relating to a particular entity in one place, to avoid storing unnecessary information, and to be able to attach ad hoc information directly to that record. At the same time, the lack of consistent structure and content across all entities of a given type can make it harder for the system to optimize standard query performance.
The solution for this challenge is to adopt a hybrid or multi-model approach to represent and search data. For instance, this is why PostgreSQL provides native support for storing JSON documents alongside its traditional relational table structures. This way, data that is highly variable can be represented as a JSON document and can be easily integrated with more structured data in the PostgreSQL database.
This approach gives DBAs greater flexibility to move data between document and relational models. It also reflects situations where objects can have specific immutable features, such as size, price, and what type of object it is. Returning to the garage example, the standard customer record could be in a relational table while the log of work history for a particular vehicle could be a document which includes the mechanic’s notes and photos.
Another benefit of this approach is that you can start with a very ad hoc, document-based model and evolve it over time. Once you have gained enough experience on what data is required and how the data is most frequently accessed, you can then clean up the database, moving data from the document model to a relational one to provide a more structured view of the data.
Of course, document and relational models aren’t the only options provided by data management systems today. For some use cases, it is convenient to be able to search using time-series, geo-based representations. Time-series approaches allow the database to run queries based on when certain events have happened. For example, the mechanic could search for how often customers replace their tires within months of having their oil changed. That may seem silly, but as we move closer to autonomous vehicles, it will enable the mechanic to analyze information in real time as event stream data. Likewise, using a geographic index to analyze where these events occur could also yield valuable insight. These same approaches could be used to analyze high-volume transactions in stock training or assessing IoT data to identify threats or peak performance demands. Even simple key-value lookups can be useful to optimize certain types of data access, for example, when caching information for frequent access.
A True Pro Knows When Not to Optimize
It should be noted, however, that being a true database performance expert is not just be about data modeling and optimizing queries. It is also about knowing when to apply those skills. There are many situations in which it is a waste of time to optimize queries. For example, the garage may only need to run a query about its office stationery requirements once each quarter, as it may only entail searching a small amount of data requiring a limited timeframe to complete the command. This type of optimization tiers what data an organization needs to query and when it is best to run the query.
The advent of cloud computing adds another dimension. If a database is hosted in a public cloud where resources are flexible, it may not make sense to optimize queries that are quite complex and require a lot of resources. For example, if a report is only run at the end of the quarter, perhaps the DBA can simply scale up the hardware to support this demand. The cost of a few hours of additional capacity would be dramatically lower than the cost to rework the queries, much less to re-architect the data model to squeeze out a bit more performance.
Ultimately, all of this knowledge about query optimization lies with the DBA, because that person’s remit has always been about the data. It presents a great opportunity for DBAs to be integral to the performance of applications in the digital business era, but it does require an understanding of the whole toolbox of optimization techniques available. The DBA’s role is broadening too, because it is not just about optimizing queries as in days gone by; it is also about what that optimization means for the business. What are you optimizing for? Are you optimizing for the best experience for the end user? Technically, this translates into ensuring a user can run a query in 1 second not 30 seconds, which requires an in-depth understanding of the impact on latency of running a particular search.
Application developers are going to continue to choose the database model that is most convenient for them, so it is incumbent on the DBA to optimize access to the data for users. That is more easily achieved if the DBA is using a multi-modal database that can integrate different data types, but also can be tuned for structured queries. It allows for far greater management and optimization of the data in a single location. The DBAs that become obsessive about such performance will be able to demonstrate the visible impact they are having on the business needs of their organizations. By its simplest measure, users will experience far more responsive, efficient, and accurate data searches, but more importantly, they will be able to do their jobs more effectively, collaborate more closely, and understand their customers more deeply as the data they are using is more complete. The DBAs that can meet this challenge will demonstrate how strategic their roles are to creating high-performance IT infrastructures for the digital business era.