This article introduces the concept of software observability using OpenTelemetry. It describes in detail OpenTelemetry and its components, the types of signals relevant to telemetry, i.e., logs, metrics, and traces. Reading further, you will learn how to generate these signals and how to use the OpenTelemetry Collector.
Telemetry refers to the data collected from logs, metrics, and traces that provide empirical value as measurements and give insight into a system.
Telemetry data can be enabled for a variety of systems, some of which may include:
Telemetry data, when properly analyzed, is useful for:
Telemetry data should be analyzed and properly visualized to uncover valuable insights about a system. While there are a variety of tools and vendors to analyze the data, OpenTelemetry provides a unified and standard framework for capturing these telemetry data in a vendor-agnostic way, which can then be used with any tool to gain insights into software performance.
This article will take a deep dive into OpenTelemetry and its uses. Keep in mind that it is a suite of tools built for techniques already used by cloud-native software developers, such as centralized logging, metrics and dashboards, traces, and data analysis.
OpenTelemetry is an open-source project that merges OpenTracing and OpenCensus. The latter was a set of libraries responsible for collecting metrics and distributed traces in applications; it is now merged into OpenTelemetry. The Cloud Native Computing Foundation (CNCF) funds the OpenTelemetry project with the goal of standardizing the collection and transmission of telemetry data to a backend service.
OpenTelemetry as a project has several specifications, APIs, SDKs for programming languages, libraries, and integration support. Being an open-source project, OpenTelemetry is being built and maintained by more than 300 companies and over 100,000 contributions.
OpenTelemetry comes with various functionalities that make it one of the best combinations of OpenTracing and OpenCensus, as well as the most widely adopted observability tool for developers. These functionalities include:
OpenTelemetry provides enough flexibility to select your preferred backend tool for observability by just installing agents or rewriting your code.
OpenTelemetry is made up of a broad ecosystem of tools, each with its own configuration method and relevance to a project. An individual project comprises signals, pipelines, resources, and context propagation.
In OpenTelemetry, signals play an important role in projects and are initially defined as traces, metrics, logs, and baggage. A signal in OpenTelemetry includes the following properties:
We will explore signals in further detail below.
Pipelines are mechanisms to generate, process, and export telemetry data that signals capture to data stores for storage and analysis.
To ensure no meaningful data is lost during transmission, pipeline components are created early in the application code. The components are:
These components must be in the following order:
Receiver(s) → Processor(s) → Exporter(s)
We will learn more about these components in the next section.
Resources are attributes applied to different signals to identify the telemetry data source. They collect information helpful for analyzing telemetry data in correlating different events that occur on the same resource.
Resource attributes remain unchanged throughout the application's lifetime as they identify whether the data source is a machine, container, or function. Resource attributes include the following:
Context propagation is a core component of OpenTelemetry that enables the transmission of valuable contextual data between two services separated by a logical boundary. It enables distributed tracing to combine requests made across multiple systems.
Context objects are key-value stores passed across API boundaries. As a result, implementing contexts is a language-dependent operation. Programming languages like Python and Go have their own built-in context mechanisms, such as the ContextVar module and the context package. OpenTelemetry API specifications recommend that the context API implementations leverage these existing mechanisms.
A complete OpenTelemetry client architecture requires the installation of a collection of software libraries. These libraries include:
In the context of OpenTelemetry, signals are independent observability tools, built on top of data sharing mechanisms (i.e., context propagation). As of now, we have four supported signals, which are traces, metrics, logs, and baggages. This section will expatiate on these pillars.
A log records events in a request cycle that is written to output. Log outputs are typically written to files on the disk or a remote service to improve searchability and aggregation.
The anatomy of a log contains the following:
Here’s an example for a payload:
169.10.0.19 - - [11/Nov/2022 11:42:35] "GET /inventory HTTP/1.1" 200 -
To provide more context, logs are typically combined with other signal types through correlation.
The tracing signal of OpenTelemetry is responsible for distributed tracing in the system. A trace is a series of event data that is generated at different points of a system and then bundled together through a unique identifier.
The identifier is transported to all other components involved in the request’s cycle, allowing their operations to access the event data from the data source. The complete anatomy of a trace in a span context includes these elements:
traceID,
a unique identifier of the associated tracespanID,
a unique identifier of the current method in the traceFor context, a span can be described as a method call or subset of a block of code called within a method.
For example, in a shopping application with the below workflow, each independent workflow can be thought of as a trace.
With OpenTelemetry distributed tracing, they can all be linked together by propagating trace context (traceparent headers) so that you get the complete picture of the workflow.
Metrics provide developers and operators with information about the state of a running application or system. Over time, metric data is collected and aggregated to identify patterns and trends in the system, then converted into a graph with different tools and visualizations.
There is a wide range of metric types as metrics can capture data from both low-level systems (CPU cycles) and high-level details (the number of items sold in one day).
For recorded data to be considered a metric, it has to have the following properties as part of its anatomy:
Metrics can be combined with traces to yield more depth and context about the events happening in a system. In OpenTelemetry, this combination is made possible with exemplars, included in an exemplar field during their definition. This field contains:
Exemplars in OpenTelemetry allow you to attach a specific trace or span to a log message, metric, or other piece of data. This can be useful for troubleshooting and identifying patterns in the application's behavior.
Here is an example of using exemplars in OpenTelemetry with the Go SDK:
package main
import (
"context"
"go.opentelemetry.io/otel/api/trace"
)
// Get a tracer from the global tracer provider
tracer := trace.GlobalTracer()
// Start a new span
ctx, span := tracer.Start(context.Background(), "my_span")
defer span.End()
// Set some attributes on the span
span.SetAttributes(
trace.StringAttribute("key", "value"),
)
// Create an exemplar for the span
exemplar := &trace.Exemplar{
Value: 42,
Timestamp: span.StartTime(),
Attachments: map[string]interface{}{
"trace_id": span.SpanContext().TraceID,
"span_id": span.SpanContext().SpanID,
},
}
// Attach the exemplar to a log message
logger.Infof("My log message", exemplar)
// You can also attach exemplars to metrics or other data
metric.Record(ctx, exemplar)
In this example, we have created a new span and set some attributes on it. We’ve then created an exemplar for the span, setting the value and timestamp fields as well as attaching the trace_id and span_id from the span's context. Finally, we’ve attached the exemplar to a log message and a metric using the exemplar argument.
This allows us to link the log message, metric, or other data back to the specific trace and span that it is associated with.
Instrumentation libraries for third-party tooling require minimal effort from users with OpenTelemetry. There are several instrumentation libraries for OpenTelemetry available in popular programming languages. Most offer support for both automatic and manual instrumentation, and for exporting data in these languages.
OpenTelemetry instrumentation must be done across the application's infrastructure, including HTTP clients & server, application framework libraries, and other necessary components like databases, queueing systems etc., to get the complete picture of the system.
This section will describe how to configure and make use of instrumentation libraries. Instrumenting OpenTelemetry using either of the instrumentation libraries follows the same process:
AlwaysOnSampler
)The libraries supported by OpenTelemetry are all listed here. However, the next section will refer to the Go instrumentation library.
The Go support library for OpenTelemetry currently covers the following statuses:
We will be using the Prometheus and Go clients for OpenTelemetry for metrics instrumentation. Import the following libraries:
import (
"context"
"fmt"
"log"
"math/rand"
"net/http"
"os"
"os/signal"
"time"
"github.com/prometheus/client_golang/prometheus/promhttp"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/prometheus"
"go.opentelemetry.io/otel/metric/instrument"
"go.opentelemetry.io/otel/sdk/metric"
)
For this, we will create a demo application that receives web requests, processing it, and responds. It will track the request rate, error rate, and processing latency distribution. Let’s create an exporter to handle reading and collection:
func init() {
rand.Seed(time.Now().UnixNano())
}// the rest of the code in the section are under func main
func main() {
exporter, err := prometheus.New()
if err != nil {
log.Fatal(err)
}
The metrics instrumentation can then be added by setting the exporter to send metrics to the OpenTelemetry Collector. Then the meter provider is defined to contain information of the service for querying metrics.
provider := metric.NewMeterProvider(metric.WithReader(exporter))
meter := provider.Meter("github.com/…/prometheus")
We start the prometheus server and start the collection:
go serveMetrics() // prometheus server function created below
attrs := []attribute.KeyValue{
attribute.Key("A").String("B"),
attribute.Key("C").String("D"),
}
The counter vector logic:
counter, err := meter.SyncFloat64().Counter("foo", instrument.WithDescription("a counter for demonstration"))
if err != nil {
log.Fatal(err)
} counter.Add(ctx, 5, attrs...)
gauge, err := meter.AsyncFloat64().Gauge("bar", instrument.WithDescription("a gauge for demonstration")) if err != nil {
log.Fatal(err)
}
err = meter.RegisterCallback([]instrument.Asynchronous{gauge}, func(ctx context.Context) {
n := -10. + rand.Float64()*(90.) // [-10, 100)
gauge.Observe(ctx, n, attrs...)
})
if err != nil {
log.Fatal(err)
}
Afterwards, we define our actual metrics with incrementing counters for tracking requests and errors, and a histogram vector similar to that of Prometheus’ for tracking latency.
histogram, err := meter.SyncFloat64().Histogram("baz", instrument.WithDescription("this is an histogram for visitor tracking"))
if err != nil {
log.Fatal(err)
}
histogram.Record(ctx, 23, attrs...)
histogram.Record(ctx, 7, attrs...)
histogram.Record(ctx, 101, attrs...)
histogram.Record(ctx, 105, attrs...)
ctx, _ = signal.NotifyContext(ctx, os.Interrupt)
<-ctx.Done()
Finally, the logic for the serveMetrics
method that contains the router and endpoint:
func serveMetrics() {
log.Printf("serving metrics at localhost:2223/metrics")
http.Handle("/metrics", promhttp.Handler())
err := http.ListenAndServe(":2223", nil)
if err != nil {
fmt.Printf("error serving http: %v", err)
return
}
}
Setting up distributed tracing to generate data involves configuring a number of components. The components in the tracing pipeline, which play an important role in instrumenting the code, include:
TracerProvider:
determines how spans are to be generatedSpanProcessor:
describes how spans are to be exportedSpanExporter:
describes where the spans are to be exportedThe TracerProvider
interface creates a method that allows us to obtain a tracer.
For tracing, Jaeger can be used. Let’s look at an example application where tracing is done with Jaeger. Start by importing the following libraries:
import (
"context"
"log"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/sdk/resource"
tracesdk "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.12.0"
)
We will create the traceProvider function that returns TracerProvider configured for Jeager exporter. The exporter will send spans to a provided url. The function also returns a TracerProvider which makes uses of a Resource containing all the information about the application. The code looks like this:
func tracerProvider(url string) (*tracesdk.TracerProvider, error)
{
exporter, err :=
jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url)
))
if err != nil {
return nil, err
}
trcprov := tracesdk.NewTracerProvider(
tracesdk.WithBatcher(exporter),
tracesdk.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("trace-demo),
attribute.String("environment", production),
attribute.Int64("ID", 1),
)),
)
return trcprov, nil
}
The main function would register the tracer provider for future imported implementations:
func main() {
trcprov, err :=
tracerProvider("<http://localhost:14263/api/traces>")
if err != nil {
log.Fatal(err)
}
otel.SetTracerProvider(trcprov)
// ... }
You can include code to shutdown and clean the system of telemetry data when the application exits:
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
defer func(ctx context.Context) {
ctx, cancel = context.WithTimeout(ctx, time.Second*5)
defer cancel()
if err := tp.Shutdown(ctx); err != nil {
log.Fatal(err)
}
}(ctx)
tr := tp.Tracer("component-main")
ctx, span := tr.Start(ctx, "foo")
defer span.End()
The implementation involves the tracing support as well as the span definition. In the next section, let’s look at the OpenTelemetry Collector.
Generating OpenTelemetry data is not all there is to using OpenTelemetry; making the most out of the data is equally important. There must be a configured backend to send the telemetry data to, with pipelines for logs, metrics, and traces. There must also be a configured service in place that makes it easier to switch backends when necessary. In this section, we will look at the OpenTelemetry Collector.
Telemetry data is sent from the application to OpenTelemetry, therefore the OpenTelemetry Protocol (OTLP) exporter should be installed in the program. For Go, the exporter is available in these modules:
go.opentelemetry.io/otel/exporters/otlp/otlptrace
go.opentelemetry.io/otel/exporters/otlp/metric
# OpenTelemetry logging is not yet implemented in Go
The Jaeger and Prometheus exporters are found in the respective modules below:
go.opentelemetry.io/otel/exporters/jaeger
go.opentelemetry.io/otel/exporters/prometheus
The OpenTelemetry Collector receives telemetry data in various formats and from different services. It processes the data and then exports it to the required amount of destinations.
Deploying the Collector can be intensive. Instead of dedicating resources to run it on your own, you can take advantage of an OTLP-supported vendor platform like Site24x7.
To install the OpenTelemetry Collector, follow the official guide for your development environment.
As seen in the client architecture for OpenTelemetry, the collector comprises receivers, processors, exporters, and pipelines. The collector was created as a fork of the OpenCensus project, and it supports several open-source protocols for data input and output.
The components of the collector implement a Component
interface in the application program that makes it relatively easy to extend the collector by adding additional components to it:
type Component interface {
Start(ctx context.Context, host Host) error
Shutdown(ctx context.Context) error
The components are configured in a single file, along with extensions and services that are not considered actual components because they are not part of the pipeline or require access to the telemetry data. An example configuration for all components will look like this:
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
exporters:
otlp:
endpoint: otelcol:4317
extensions:
health_check:
pprof:
zpages:
service:
extensions: [health_check,pprof,zpages]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
The section below will explore these components in detail.
As the name implies, a receiver component receives data in several supported formats and then converts the data to OpenTelemetry’s internally accepted data format. In a technical sense, the receiver defines a listener to listen for the protocols on a port in the collector. For example, the Jaeger receiver has support for these protocols:
The table below lists the supported receiver formats for each signal type:
Receivers | Logs | Metrics | Traces |
---|---|---|---|
Host metrics | ![]() |
||
Jaeger | ![]() |
||
Kafka | ![]() |
![]() |
![]() |
OpenCensus | ![]() |
![]() |
|
OpenTelemetry Protocol | ![]() |
![]() |
![]() |
Prometheus | ![]() |
||
Zipkin | ![]() |
It is possible to reuse receivers across multiple pipelines and to configure multiple receivers in the same pipeline.
The LogsReceiver, MetricsReceiver,
and TracesReceiver
receivers all embed the same Receiver
interface attached to the Component interface:
type Receiver interface {
Component
}
type LogsReceiver interface {
Receiver
}
// equally for other receivers.
Processors are responsible for additional tasks like filtering unwanted telemetry data and injecting additional attributes to the data before it is passed to the exporter. There are different types of processors, each with different capabilities. For example, instead of exporting data as and when you receive them, you can insert a batch processor to queue the data and export them at defined intervals. This saves valuable resources if your exporter needs to transmit such data over the internet.
Asides the Component
interface, consumer and processor interfaces are also embedded in processor definition to provide a function that consumes the signal and a chance for the processor to modify the data, respectively. The implementation looks similar to the following:
type Capabilities struct {
ModifiesData bool
}
type baseConsumer interface {
Capabilities() Capabilities
}
type Metrics interface {
baseConsumer
ConsumeMetrics(ctx context.Context, md pdata.Metrics) error
}
type MetricsProcessor interface {
Processor
consumer.Metrics
}
The exporter component receives data as it appears in the internal collector format, then marshals the data into the required output format and sends it to the configured destination(s). The exporter’s interface also has a consumer and embeds the Exporter interface:
type LogsExporter interface {
Exporter
consumer.Logs
}
Some of the available exporters and their supported signals are shown in the table below:
Exporters | Logs | Metrics | Traces |
---|---|---|---|
File | ![]() |
![]() |
![]() |
Jaeger | ![]() |
||
Kafka | ![]() |
![]() |
![]() |
OpenCensus | ![]() |
![]() |
|
OpenTelemetry Protocol | ![]() |
![]() |
![]() |
Prometheus | ![]() |
||
Zipkin | ![]() |
||
Logging | ![]() |
![]() |
![]() |
OpenTelemetry has become one of the most active projects in CNCF, second only to Kubernetes. The shifting needs of DevOps and expanding landscape of observability can no longer be satisfied by a single tool.
So, instead of adding different tools and then figuring out how to interpret and use the data from them to improve your business, you can use OpenTelemetry to future-proof your business’ observability needs by standardizing the data you want to monitor, and select the right tools to visualize and analyze them later.