There’s a phrase that’s being thrown around the networking industry—a mantra, if you will: Latency is the new outage. It used to be that experience was measured in uptime and downtime. Is Google down right now? Is AWS experiencing an outage in Asia? Can anybody log on to Salesforce? Those days are over. Being up and running is not enough. Applications need to be fast, and the expectation is that every retailer—from Target to the mom-and-pop store down the street—needs to have a digital omni-channel presence with experiences that rival Amazon—a nearly half-a-trillion-dollar global brand.
To meet these application experience expectations, developers need observability of how their code is behaving on a business level down to a single person’s experience. They need to know about any bottlenecks in performance and how each user is being impacted. Developers then need to very quickly dive into their code to triage these issues. Every second matters. Every transaction matters.
Legacy APM Tools are Ill-Equipped for Modern Applications
The problem is that legacy application performance management (APM) tools were built for a world that doesn’t exist anymore. APM solutions give IT operations teams visibility into monolithic, static off-the-shelf applications. Granted, the ability to view code in an application you didn’t develop is a powerful superpower. Something breaks, IT can use APM to detect a networking error or known bottleneck. They run some tests, work with engineering to troubleshoot the issue and the application comes back online—maybe a few minutes later, maybe an hour, sometimes in days.
That world doesn’t exist anymore. Applications are not shrink wrapped. They aren’t off the shelf. The modern app is home grown, built on open architectures. It’s dynamic, it’s distributed, and it lives across multiple clusters and clouds. It’s likely made up of dozens, even hundreds of microservices. And it can be spun up and scaled quickly to meet evolving user and market demands.
It is developers, not IT operations, who are responsible for running their code in production. This is because architecture flexibility in a multi-cloud world prevents legacy APM solutions from gaining visibility into application experience. Services are spread out across various cloud providers and on-premises infrastructure, and cloud architects do not know whether modern applications are meeting rising user expectations. Sure, cloud vendors provide their own operating tools, but you can’t use AWS tools to manage or secure workloads in Azure. Rather than rely on IT to piece together visibility into distributed workloads, developers are incorporating end-to-end observability into their applications across microservices architectures.
Writing Observability Directly into Code
Observability is now part of the software development lifecycle. Developers understand that they are responsible for the performance of their code. It’s them—not the IT administer—on the other end of the pager who goes off at midnight when there’s an experience issue, and they need to know exactly what is going on with their application. They’ve evolved from the legacy APM world that relied on an outside probe to identify bottlenecks in off-the-shelf software. Instead, developers are now able to build observability directly into their code that allows them to trace workloads in real time as they flow through an application across disparate microservices.
Developers can do this through logging, metrics and distributed tracing. Logging allows developers to basically dump a bunch of event data into an event stream that can be parsed and analyzed as necessary, but this can be costly and extremely slow. Fortunately, most modern development tools like Spring, .NET, Go and Ruby allow developers to emit metrics—essentially time-stamped entries with a value—and express them in a time series in a very succinct way for statistical analysis, anomaly detection and other troubleshooting methods. Distributed tracing allows developers to follow a transaction as it flows through an application, often end to end across microservices architecture.
These monitoring capabilities can be baked directly into the code, which essentially allows software to annotate itself. And, because they are built in the code, they follow the software throughout the entire development cycle until it reaches production, all without having to reconfigure anything between stages.
One Tool to Rule Them All
Developers need a unified observability platform to manage and stitch together these intrinsic monitoring capabilities. Many developers use a hodgepodge of open source tools to do this, but gaining observability across a true microservices architecture can require up to a dozen different solutions that you have to independently install, scale, and patch. This is fine when everything is performing as expected, but when things go wrong, and everyone jumps into a war room to figure things out, this complexity of stitching together a cornucopia of disparate open source tools becomes too much.
Don’t get me wrong, open source is a great way to transition from legacy APM solutions to built-in observability in a microservices environment. However, as organizations grow and continue down their cloud transformation journeys, developers need an enterprise-grade observability platform that lives in the cloud and isn’t tied to on-premises availability. When the data center goes down or the internet uplink fails, it’s critical that monitoring and observability remains available, so you know what is going on and how you can fix the issue.
Retailers, hospital networks and car-share platforms (just to name a few) shouldn’t have to build, host and manage their own observability platform; just like they shouldn’t have to build, host and manage their own video conferencing or collaboration platform. They rely on Zoom and Slack to do that for them. They don’t have the time or resources to fine-tune a home-grown platform, run it efficiently and ensure it can handle a large-scale increase in traffic. Instead, organizations should look to off-load that pain to a third-party vendor whose core competency is delivering the best possible observability platform with all the performance, reliability and features organizations need.
What Should this Unified Observability Platform look like?
Centralized managed solution: Given the choice between hiring dozens or hundreds of engineers to build, maintain and update their own centralized observability platform or achieving the same result through a Software-as-a-Service (SaaS) vendor, most organizations would choose the latter. It just makes sense from a capability, cost and operations standpoint.
Observability of different capabilities: At the end of the day, you are collecting things that are happening across your infrastructure, but you need to pull out the exact atom that is applicable to what you’re trying to solve. This includes metrics for understanding long-time trends, logging that allows you to look at unstructured data and a distributed tracing system that allows you to take in timespans. Developers should be able to render application topologies, do cost analysis and correlate from top-level business metrics down to a single request to better understand the experience of a single user and why their experience was impacted.
Data integration and context: This unified observability platform needs to integrate across and provide context into all these data sources. Developers should be able to analyze logs and compare them against distributed tracing timespans, and then use these insights with metrics to inform long-time statistical analysis that give them a holistic and drill-down view of application experience. Organizations need to break down these traditionally-siloed data sets to give developers data-driven insights into applications experience in real time.
Application architecture is undergoing a massive shift from a monolith model to microservices, and monitoring needs to catch up. Code is now becoming more instrumented, providing developers with visibility into whether applications are performing as intended. It’s clear that a unified observability platform delivered through the cloud can seamlessly orchestrate monitoring capabilities across metrics, logging and distributed tracing, allowing developers to better view and understand what’s going on in their applications so that they can deliver fast, reliable and consistent experiences to users.