In order to integrate multiple disparate data sources/types it will be required that either the solution provider or the end user will be required to have a high degree of expertise in relational databases and SQL queries.
Web/API-based data access
API-based data access is typically used in cases where data is being generated by web-based technologies or highly customized/bespoke applications having measurements and values that may not necessarily be highly structured (e.g. text-based log files, etc.).
Benefits of the Web/API-based access approach:
The primary benefit is that it is very easy to do the initial integration work required in order to use the API to access the information being queried (in those cases when the data sources are supported out of the box by the relevant vendor).
Many web-based IT toolsets offer web service APIs for integration, increasing the potential for breadth of coverage.
Challenges and downsides to the Web/API-based approaches:
Performance: The most significant downside, especially when considering use cases of analyzing large quantities of historical data spanning multiple different data sources, is that such APIs simply do not scale in terms of being able to support rapid and complex analytical queries. This lack of scaling typically occurs because the queries themselves are architecturally limited in the amount of data returned by any given query. This situation occurs because most toolsets using and supporting web service APIs are themselves built on top of write-optimized data marts fed by high-volume real-time and near real-time toolsets. To allow more than a subset of data to be returned in response to any given API call would simultaneously cause the necessarily concurrent real-time population of the underlying data Mart to slow down or fail, thus losing data. And so we see that the vast majority of monitoring toolsets that offer an API approach to access their data stores restrict the amount of data that can be queried at any time. This restriction in turn means it is either impossible, or very difficult and time-consuming, to access the total set of information over time needed to conduct comprehensive historical analysis across the scope of metrics and sources being targeted.
Difficult and time consuming to customize: These APIs are typically difficult to customize for uses where “out of the box” doesn’t meet requirements. This requires far more technical expertise (a scarce resource) than SQL expertise (which is more plentiful), which requires coding skills. In some simple cases a vendor may provide graphical-based code generators for query-building. When existing, these graphical approaches suffer from more than being self-limiting due to their simplicity. Another downside is associated with the ongoing operational maintenance integrations themselves, both from a time and effort/cost perspective. The reality is that most data sources are going to change over time in terms of metrics, the structures by which the metrics are queried and the fact that the actual data sources themselves from the vendors to supply these tools periodically update and change the APIs as well. While the initial integration using these APIs may be simple and quick it becomes very difficult to maintain currency across large numbers of different sources and across the changes that inevitably occur to those sources because every single change must be re-implemented manually using a GUI. And the non-out of the box coding for such APIs must also be done manually.
Pros and Cons to Data Architectural Choices
There are advantages and disadvantages to both types of data storage architectural choice as well as both types of data access choice. For comprehensive IT operational analytic purposes, and especially those use cases that span a wide variety of not only technical data sources but also business data and other non-time series sources, it is clear that both types of data storage and data query approach may need to be utilized. Prospective customers should consider this when evaluating analytics solutions for use in their environment, and favor those approaches and architectures they can work flexibly across all types in support of their important use-cases.
Most importantly, customers should look very carefully and consider the ramifications of choosing an approach that requires “lock in” re-duplication of their data into a proprietary centralized data store. As is increasingly becoming obvious across the industry, open-based approaches are typically best, preserving customer investments in solutions and protecting against vendor lock-in, with the high costs and reduced flexibility and adaptability implied by such approaches.
TeamQuest Director of Market Development Dave Wagner has more than three decades of systems hardware and software experience including product management, marketing, sales, project management, customer support, and training. He is responsible for the successful business integration of TeamQuest Surveyor. He has authored many articles on the topic of capacity management and has presented at CMG, AFCOM, Pink, and other industry events.
For more information on Dave and his company, please visit www.teamquest.com or visit him on Twitter and YouTube.