When I worked for an aerospace company, a running joke was that we were a data storage company that happened to take satellite images. The amount of data we had stored in that prior life was measured in petabytes. That’s a lot of data. The problem was that it wasn’t all stored in a homogenous way. Some of it was in an Oracle database, some was in a PostgreSQL database, some was in a flat file, some was IRIS processed imagery, etc. With the number of formats and technologies involved, it was determined that we needed a data abstraction layer so that applications had one interface to work with—and our aptly named “data services layer” was born. Fast-forward about 15 years, and I am seeing a renewed push for data abstraction layers.What is driving the need for a data abstraction layer? For years, the data (certainly, with regard to relational databases) business has been primarily “owned” by a few big vendors—Oracle, SQL Server, DB2, Sybase, Informix, etc. Recently, I was looking at a laundry list of relational and non-relational database systems being watched by Gartner, and there were quite a few that were not familiar to me at all. Surprising? Not really. Our data capture and retention requirements continue to grow at a very fast rate, which brings new entrants in the SQL and NoSQL market all the time. However, not all data is created equal. Companies recognize that disparate data can and should be treated differently. That means the way we persist that data can be extremely varied. Now, enter applications that need to access all that data across a very heterogeneous landscape, and we get to the point where we’re reinventing the data access wheel every time someone needs to spin up another application or introduce another data source.
Positives
Think of data services as a “Data API” where we can call an interface and don’t have to worry about how the data is actually stored behind the scenes. This provides the application developers with a way to get data without the need to do the following:• Format each heterogeneous request in the correct format.
• Include all drivers and libraries necessary to “talk” to the correct data repository.
• Know where all of the data lives (which, by the way, can change often).
• Update applications every time a new data source technology is introduced.
You now call the data services layer and get your requests satisfied “automagically.” Problem solved for everyone, no?
Negatives
To be fair, we now need to look at the negative side of implementing a data services layer into your data center stack. If the data services layer goes down, the impact can be catastrophic in nature—meaning that it can bring the factory to a screeching halt. If you configure all of your data sources independently, chances are that only a portion of your factory will feel impact when a source goes down. Another drawback is in troubleshooting performance issues. If the data service treats requests as a unit of work, it can cause significant delays waiting for one data source out of many to fulfill its request. A very good understanding of dependencies and fairly verbose logging capabilities is likely required to track data source timings and delivery fulfillment.
When data services went down or were slow, it was a bad day for all. By design, our implementation was robust and we had an excellent development team at the ready.
Good or Bad?
The question of whether it is good or bad is difficult to answer in a very general way. However, there are certain datapoints that can help determine if a data services layer could help streamline data access:• How homogenous are your company data sources? The more homogeneity, the less the need for adding a data services layer.
• How many different applications are you developing that require the need to access a variety of data sources? The larger the number, the more often you will be reinventing the wheel when it comes to data source access.
• How many data sources in raw numbers does your application layer need to access? A large number of configurations in raw numbers can point to efficiencies gained with a data services layer.
• How is authentication handled and how frequently are updates required? If all of your authentication runs through AD/LDAP or another means of centralized credential and password management, then this may be less of a worry, but updating all applications with new credentials for data sources whenever there is an update can be a significant administrative commitment.
• How sensitive is your factory to catastrophic outages versus partial functionality loss? If you get dinged for any outage regarding SLAs or SLOs, then a data service layer may make some sense. If you can get partial credit for only part of the functionality going down, then you may not want to introduce the added data services layer.
What is right for your IT organization?