Observe: Splunk O11y Cloud as a Native OpenTelemetry Backend

In the traditional monitoring of applications and infrastructure, telemetry data such as logs, metrics, and traces were processed by different, very specific backends and prepared for problem analysis.
Each of these backends often required its own proprietary agents to monitor an application or infrastructure. From an application developer’s point of view, it was not possible to dynamically set, for example, technical metrics in the application code (and thus enable even more targeted analyses), as a provider lock-in directly in the source code is not expedient.
The following overview outlines traditional monitoring of applications and infrastructure:

Another major disadvantage of this architecture is that it is not possible, or only possible with great effort, to correlate the different telemetry data with each other and thus obtain a comprehensive picture of the state of your own application/infrastructure.
The business challenge:
Organizations monitor parts of the infrastructure separately, which leads to isolated data silos. Due to a lack of correlation, the cause is not found or not found quickly enough.
OpenTelemetry
OpenTelemetry is designed to solve these problems. It has its origins in the OpenTracing and OpenCensus project and is now one of the most actively developed projects in the CNCF environment.
OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.
OpenTelemetry itself builds on open standards, such as the W3C Trace Context or the W3C Baggage specification.
A fundamentally optional, but very essential component is the OpenTelemetry Collector. This enables vendor-independent reception, processing and forwarding of telemetry data. We therefore have a clear recommendation to use this component in your own observability architecture.
The following diagram outlines a possible architecture:

Splunk Observability Cloud
Say goodbye to blind spots, guesswork and swivel-chair monitoring with all of your metrics, logs and traces automatically correlated in one place.
Most people will be familiar with Splunk as the most powerful tool for processing and preparing logs, namely Splunk Core. But Splunk now offers a much larger product portfolio. The individual products are linked to each other via the Splunk Platform, allowing customers maximum flexibility in terms of functionality but also licensing options.
One of the products of the Splunk Platform is the Splunk Observability Cloud (O11y Cloud), which is a native OpenTelemetry backend. Thanks to native OpenTelemetry support, proprietary agents are a thing of the past. Splunk also offers its own distributions of the OpenTelemetry Agents/Profiler and the OpenTelemetry Collector, so that enterprise support for these components is included in the license.
Furthermore, the O11y Cloud shines with a low entry barrier and a flat learning curve. This is also a significant distinction from Splunk Core. Splunk Core offers an extremely powerful query language, but requires more effort to learn. The O11y Cloud already comes with many pre-built dashboards and an ideal user experience for DevOps teams as standard. The tools thus complement each other perfectly.
Many areas of the Splunk Observability Cloud can also be automatically set up via Terraform with the Splunk Observability Cloud Terraform Provider. This allows observability to be optimally integrated into CI/CD and enables true “shift left”.
The following example provides a final overview of how telemetry data is linked in the O11y Cloud and how a root cause analysis is carried out with just a few user interface interactions.
From Metric to Root Cause
The problem analysis usually starts when an alert is triggered because a metric has fallen below or exceeded a limit. The determination of limit values as well as the correct classification of whether it is actually an exceptional situation is of course AI-supported in the O11y Cloud.
The classic image of the iceberg is very fitting for the root cause analysis. It should be noted that from our point of view

If you go directly to the Application Performance Monitoring (APM) menu item of the O11y Cloud, you will already receive overviews of various metrics over time. In the example shown, a high latency is recognized in the printing service. With a click in the diagram, a pop-up immediately opens with which you can navigate directly to associated traces:

It is already apparent in the pop-up that the O11y Cloud has detected a problem. In the detailed view of the trace, the cause of the problem becomes immediately comprehensible (the author-service delivers a server error):

Noteworthy is the lower area of the screenshot. Via the buttons “Infrastructure” and “Logs” you can link directly to infrastructure monitoring or to the Log Observer. This feature is called Related Content and contributes significantly to efficient problem analysis thanks to the correlation of the different telemetry data.
If you link to the Infrastructure Monitoring, you will immediately receive an overview of the nodes on which the printing-service is running. In this overview, you can now “zoom” down to the container level. In this way, targeted and detailed information can be called up for each infrastructure level.

If you link to the Log Observer, a search is automatically started with the corresponding trace ID. This displays all log entries for the trace ID across all participating services.

The business challenge:
Organizations monitor parts of the infrastructure separately, which leads to isolated data silos. Due to a lack of correlation, the cause is not found or not found quickly enough.
Our solution:
Observability integrates monitoring data from various sources, problems are quickly isolated and the cause is efficiently identified.
To stay with the image of the iceberg: The sketched example only touches on a small part of the possibilities of the O11y Cloud. In the course of demo sessions, we are very happy to present further functions of the O11y Cloud in detail 🚀




