What is OpenTelemetry? | A Complete Overview

Introduction to telemetry

Telemetry refers to the data collected from logs, metrics, and traces that provide empirical value as measurements and give insight into a system.

Telemetry data can be enabled for a variety of systems, some of which may include:

Keeping stock of the number of requests received
Logging events as they occur
Recording memory consumption on a machine as clear, empirical value
racing requests from clients to the backend

Telemetry data, when properly analyzed, is useful for:

Identifying the sources of faults in the system and providing evidence
Understanding what went wrong
Learning about the events that occur in a complex system

Telemetry data should be analyzed and properly visualized to uncover valuable insights about a system. While there are a variety of tools and vendors to analyze the data, OpenTelemetry provides a unified and standard framework for capturing these telemetry data in a vendor-agnostic way, which can then be used with any tool to gain insights into software performance.

This article will take a deep dive into OpenTelemetry and its uses. Keep in mind that it is a suite of tools built for techniques already used by cloud-native software developers, such as centralized logging, metrics and dashboards, traces, and data analysis.

OpenTelemetry—An overview

OpenTelemetry is an open-source project that merges OpenTracing and OpenCensus. The latter was a set of libraries responsible for collecting metrics and distributed traces in applications; it is now merged into OpenTelemetry. The Cloud Native Computing Foundation (CNCF) funds the OpenTelemetry project with the goal of standardizing the collection and transmission of telemetry data to a backend service.

OpenTelemetry as a project has several specifications, APIs, SDKs for programming languages, libraries, and integration support. Being an open-source project, OpenTelemetry is being built and maintained by more than 300 companies and over 100,000 contributions.

OpenTelemetry comes with various functionalities that make it one of the best combinations of OpenTracing and OpenCensus, as well as the most widely adopted observability tool for developers. These functionalities include:

Open specification
APIs and SDKs for language-specific clients
Instrumentation libraries
Semantic conventions
Agents collecting telemetry data
Medium to arrange, transmit, and receive data

OpenTelemetry provides enough flexibility to select your preferred backend tool for observability by just installing agents or rewriting your code.

The components of OpenTelemetry

OpenTelemetry is made up of a broad ecosystem of tools, each with its own configuration method and relevance to a project. An individual project comprises signals, pipelines, resources, and context propagation.

Signals

In OpenTelemetry, signals play an important role in projects and are initially defined as traces, metrics, logs, and baggage. A signal in OpenTelemetry includes the following properties:

Documented specifications that provide guidance for implementing the signal
Data model shows the signal must be represented during its implementation
API for application and library developers to build code
SDK that enables users to produce telemetry using APIs
Semantic conventions for only targeting consistent and valuable data
Instrumentation library that simplifies the usage and adoption of signals.

We will explore signals in further detail below.

Pipelines

Pipelines are mechanisms to generate, process, and export telemetry data that signals capture to data stores for storage and analysis.

To ensure no meaningful data is lost during transmission, pipeline components are created early in the application code. The components are:

Receivers transport data into the OpenTelemetry Collector. Receivers can either be push- or pull-based.
Processors modify the contents of the generated telemetry data and determine the medium for exporting the data.
Exporters convert the data from the internal data format used by OpenTelemetry into another format understandable to the exporter via configuration, for example:

Jaeger
Zipkin
OpenTelemetry protocol
Prometheus

These components must be in the following order:

Receiver(s) → Processor(s) → Exporter(s)

We will learn more about these components in the next section.

Resources

Resources are attributes applied to different signals to identify the telemetry data source. They collect information helpful for analyzing telemetry data in correlating different events that occur on the same resource.

Resource attributes remain unchanged throughout the application's lifetime as they identify whether the data source is a machine, container, or function. Resource attributes include the following:

service.name: a unique name describing the service
service.version: the version identifier for the service
host.name: the name of the host running the device

Context propagation

Context propagation is a core component of OpenTelemetry that enables the transmission of valuable contextual data between two services separated by a logical boundary. It enables distributed tracing to combine requests made across multiple systems.

Context objects are key-value stores passed across API boundaries. As a result, implementing contexts is a language-dependent operation. Programming languages like Python and Go have their own built-in context mechanisms, such as the ContextVar module and the context package. OpenTelemetry API specifications recommend that the context API implementations leverage these existing mechanisms.

OpenTelemetry client architecture

A complete OpenTelemetry client architecture requires the installation of a collection of software libraries. These libraries include:

The instrumentation API that provides a suite of components used to instrument the application logic
The SDK that implements the API by providing a plugin framework
Instrumentation library that plays a role in context propagation and provides the HTTP clients and servers, messaging queueing systems, application frameworks, and database clients
The OpenTelemetry Collector configured to contain receivers, processors, exporters, and pipelines

Signals—The relevant category of telemetry

In the context of OpenTelemetry, signals are independent observability tools, built on top of data sharing mechanisms (i.e., context propagation). As of now, we have four supported signals, which are traces, metrics, logs, and baggages. This section will expatiate on these pillars.

Logs

A log records events in a request cycle that is written to output. Log outputs are typically written to files on the disk or a remote service to improve searchability and aggregation.

The anatomy of a log contains the following:

Timestamp recording the time of each event
Message/payload representing the event

Here’s an example for a payload:

169.10.0.19 - - [11/Nov/2022 11:42:35] "GET /inventory HTTP/1.1" 200 -

To provide more context, logs are typically combined with other signal types through correlation.

Traces

The tracing signal of OpenTelemetry is responsible for distributed tracing in the system. A trace is a series of event data that is generated at different points of a system and then bundled together through a unique identifier.

The identifier is transported to all other components involved in the request’s cycle, allowing their operations to access the event data from the data source. The complete anatomy of a trace in a span context includes these elements:

traceID, a unique identifier of the associated trace
spanID, a unique identifier of the current method in the trace
Trace flags, which include extra information about the trace (for example, the sampling and trace level)
Trace state field, which carries vendor-specific information that helps interpret tracing data

For context, a span can be described as a method call or subset of a block of code called within a method.

For example, in a shopping application with the below workflow, each independent workflow can be thought of as a trace.

The client application accesses the internet.
The online shopping web app loads.
The web app connects to the inventory API whenever the user checks the availability of an item.
The inventory API checks the store for the item and calls the ordering API.
When the user orders the item, it is placed in an ordering queue.
Where applicable the purchased item is added to the user’s order history.
After going through the ordering queue, the purchase is completed.
A notification service is called for both the user and the shop owner.

With OpenTelemetry distributed tracing, they can all be linked together by propagating trace context (traceparent headers) so that you get the complete picture of the workflow.

Metrics

Metrics provide developers and operators with information about the state of a running application or system. Over time, metric data is collected and aggregated to identify patterns and trends in the system, then converted into a graph with different tools and visualizations.

There is a wide range of metric types as metrics can capture data from both low-level systems (CPU cycles) and high-level details (the number of items sold in one day).

For recorded data to be considered a metric, it has to have the following properties as part of its anatomy:

Name for the metrics being recorded
Empirical data point value
Additional dimensional information about the metric

Metrics can be combined with traces to yield more depth and context about the events happening in a system. In OpenTelemetry, this combination is made possible with exemplars, included in an exemplar field during their definition. This field contains:

Trace ID of the span
Span ID of the span
Timestamp of the event being measured
A set of attributes relating to the exemplar
The value being recorded

Exemplars in OpenTelemetry allow you to attach a specific trace or span to a log message, metric, or other piece of data. This can be useful for troubleshooting and identifying patterns in the application's behavior.

Here is an example of using exemplars in OpenTelemetry with the Go SDK:

package main 
import ( 
	"context" 
	"go.opentelemetry.io/otel/api/trace" 
) 
// Get a tracer from the global tracer provider 
tracer := trace.GlobalTracer() 
// Start a new span 
ctx, span := tracer.Start(context.Background(), "my_span") 
defer span.End() 
// Set some attributes on the span 
span.SetAttributes( 
	trace.StringAttribute("key", "value"), 
) 
// Create an exemplar for the span 
exemplar := &trace.Exemplar{ 
	Value: 42, 
	Timestamp: span.StartTime(), 
	Attachments: map[string]interface{}{ 
		"trace_id": span.SpanContext().TraceID, 
		"span_id": span.SpanContext().SpanID, 
	}, 
} 
// Attach the exemplar to a log message 
logger.Infof("My log message", exemplar) 
// You can also attach exemplars to metrics or other data 
metric.Record(ctx, exemplar)

In this example, we have created a new span and set some attributes on it. We’ve then created an exemplar for the span, setting the value and timestamp fields as well as attaching the trace_id and span_id from the span's context. Finally, we’ve attached the exemplar to a log message and a metric using the exemplar argument.

This allows us to link the log message, metric, or other data back to the specific trace and span that it is associated with.

Instrumentation libraries

Instrumentation libraries for third-party tooling require minimal effort from users with OpenTelemetry. There are several instrumentation libraries for OpenTelemetry available in popular programming languages. Most offer support for both automatic and manual instrumentation, and for exporting data in these languages.

OpenTelemetry instrumentation must be done across the application's infrastructure, including HTTP clients & server, application framework libraries, and other necessary components like databases, queueing systems etc., to get the complete picture of the system.

This section will describe how to configure and make use of instrumentation libraries. Instrumenting OpenTelemetry using either of the instrumentation libraries follows the same process:

Sending the telemetry to the observing backend (using systems like Jaeger, Zipkin, OpenTelemetry HTTP, or gRPC) or to the OpenTelemetry Collector
Defining a trace provider and a span processor (like SimpleSpanProcessor)
Defining the resources to add extra context
Defining a sampler that controls the number of trace samples sent to the backend (like AlwaysOnSampler)

The libraries supported by OpenTelemetry are all listed here. However, the next section will refer to the Go instrumentation library.

How to use OpenTelemetry in Go

The Go support library for OpenTelemetry currently covers the following statuses:

Tracing: stable
Metrics: alpha
Logging: not yet implemented

Custom Metrics implementation

We will be using the Prometheus and Go clients for OpenTelemetry for metrics instrumentation. Import the following libraries:

import ( 
	"context" 
	"fmt" 
	"log" 
	"math/rand" 
	"net/http" 
	"os" 
	"os/signal" 
	"time" 

	"github.com/prometheus/client_golang/prometheus/promhttp" 

	"go.opentelemetry.io/otel/attribute" 
	"go.opentelemetry.io/otel/exporters/prometheus" 
	"go.opentelemetry.io/otel/metric/instrument" 
	"go.opentelemetry.io/otel/sdk/metric" 
)

For this, we will create a demo application that receives web requests, processing it, and responds. It will track the request rate, error rate, and processing latency distribution. Let’s create an exporter to handle reading and collection:

func init() { 
		rand.Seed(time.Now().UnixNano()) 
}// the rest of the code in the section are under func main 
func main() { 
		exporter, err := prometheus.New() 
			if err != nil { 
			log.Fatal(err) 
		}

The metrics instrumentation can then be added by setting the exporter to send metrics to the OpenTelemetry Collector. Then the meter provider is defined to contain information of the service for querying metrics.

provider := metric.NewMeterProvider(metric.WithReader(exporter)) 
meter := provider.Meter("github.com/…/prometheus")

We start the prometheus server and start the collection:

go serveMetrics() // prometheus server function created below 

attrs := []attribute.KeyValue{ 
	attribute.Key("A").String("B"), 
	attribute.Key("C").String("D"), 
}

The counter vector logic:

counter, err := meter.SyncFloat64().Counter("foo", instrument.WithDescription("a counter for demonstration")) 
if err != nil { 
	log.Fatal(err) 
} counter.Add(ctx, 5, attrs...) 

gauge, err := meter.AsyncFloat64().Gauge("bar", instrument.WithDescription("a gauge for demonstration"))	if err != nil { 
		log.Fatal(err) 
	} 
	err = meter.RegisterCallback([]instrument.Asynchronous{gauge}, func(ctx context.Context) { 
		n := -10. + rand.Float64()*(90.) // [-10, 100) 
		gauge.Observe(ctx, n, attrs...) 
	}) 
	if err != nil { 
		log.Fatal(err) 
	}

Afterwards, we define our actual metrics with incrementing counters for tracking requests and errors, and a histogram vector similar to that of Prometheus’ for tracking latency.

	histogram, err := meter.SyncFloat64().Histogram("baz", instrument.WithDescription("this is an histogram for visitor tracking")) 
	if err != nil { 
		log.Fatal(err) 
	} 
	histogram.Record(ctx, 23, attrs...) 
	histogram.Record(ctx, 7, attrs...) 
	histogram.Record(ctx, 101, attrs...) 
	histogram.Record(ctx, 105, attrs...) 

	ctx, _ = signal.NotifyContext(ctx, os.Interrupt) 
	<-ctx.Done()

Finally, the logic for the serveMetrics method that contains the router and endpoint:

func serveMetrics() { 
	log.Printf("serving metrics at localhost:2223/metrics") 
	http.Handle("/metrics", promhttp.Handler()) 
	err := http.ListenAndServe(":2223", nil) 
	if err != nil { 
		fmt.Printf("error serving http: %v", err) 
		return 
	} 
}

Tracing implementation

Setting up distributed tracing to generate data involves configuring a number of components. The components in the tracing pipeline, which play an important role in instrumenting the code, include:

TracerProvider: determines how spans are to be generated
Resource object: identifies the source of all spans
SpanProcessor: describes how spans are to be exported
SpanExporter: describes where the spans are to be exported

The TracerProvider interface creates a method that allows us to obtain a tracer.

For tracing, Jaeger can be used. Let’s look at an example application where tracing is done with Jaeger. Start by importing the following libraries:

import ( 
	"context" 
	"log" 
	"time" 

	"go.opentelemetry.io/otel" 
	"go.opentelemetry.io/otel/attribute" 
	"go.opentelemetry.io/otel/exporters/jaeger" 
	"go.opentelemetry.io/otel/sdk/resource" 
	tracesdk "go.opentelemetry.io/otel/sdk/trace" 
	semconv "go.opentelemetry.io/otel/semconv/v1.12.0" 
)

We will create the traceProvider function that returns TracerProvider configured for Jeager exporter. The exporter will send spans to a provided url. The function also returns a TracerProvider which makes uses of a Resource containing all the information about the application. The code looks like this:

func tracerProvider(url string) (*tracesdk.TracerProvider, error) 
{ 
	exporter, err := 
jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url) 
)) 
	if err != nil { 
		return nil, err 
	} 

	trcprov := tracesdk.NewTracerProvider( 
		tracesdk.WithBatcher(exporter), 
		tracesdk.WithResource(resource.NewWithAttributes( 
			semconv.SchemaURL, 
			semconv.ServiceNameKey.String("trace-demo), 
			attribute.String("environment", production), 
			attribute.Int64("ID", 1), 
		)), 
	) 
	return trcprov, nil 
}

The main function would register the tracer provider for future imported implementations:

func main() { 
		trcprov, err := 
tracerProvider("<http://localhost:14263/api/traces>") 
		if err != nil { 
			log.Fatal(err) 
		} 
		otel.SetTracerProvider(trcprov) 
// ... }

You can include code to shutdown and clean the system of telemetry data when the application exits:

	ctx, cancel := context.WithCancel(context.Background()) 
	defer cancel() 

	defer func(ctx context.Context) { 
		ctx, cancel = context.WithTimeout(ctx, time.Second*5) 
		defer cancel() 
		if err := tp.Shutdown(ctx); err != nil { 
			log.Fatal(err) 
		} 
	}(ctx) 

	tr := tp.Tracer("component-main") 

	ctx, span := tr.Start(ctx, "foo") 
	defer span.End()

The implementation involves the tracing support as well as the span definition. In the next section, let’s look at the OpenTelemetry Collector.

The OpenTelemetry Collector

Generating OpenTelemetry data is not all there is to using OpenTelemetry; making the most out of the data is equally important. There must be a configured backend to send the telemetry data to, with pipelines for logs, metrics, and traces. There must also be a configured service in place that makes it easier to switch backends when necessary. In this section, we will look at the OpenTelemetry Collector.

Telemetry data is sent from the application to OpenTelemetry, therefore the OpenTelemetry Protocol (OTLP) exporter should be installed in the program. For Go, the exporter is available in these modules:

go.opentelemetry.io/otel/exporters/otlp/otlptrace 
go.opentelemetry.io/otel/exporters/otlp/metric 
# OpenTelemetry logging is not yet implemented in Go

The Jaeger and Prometheus exporters are found in the respective modules below:

go.opentelemetry.io/otel/exporters/jaeger 
go.opentelemetry.io/otel/exporters/prometheus

Using the OpenTelemetry Collector

The OpenTelemetry Collector receives telemetry data in various formats and from different services. It processes the data and then exports it to the required amount of destinations.

Fig. 1: The OpenTelemetry Collector receives and exports data from multiple services at a time

Deploying the Collector can be intensive. Instead of dedicating resources to run it on your own, you can take advantage of an OTLP-supported vendor platform like Site24x7.

To install the OpenTelemetry Collector, follow the official guide for your development environment.

Components of the OpenTelemetry Collector

As seen in the client architecture for OpenTelemetry, the collector comprises receivers, processors, exporters, and pipelines. The collector was created as a fork of the OpenCensus project, and it supports several open-source protocols for data input and output.

The components of the collector implement a Component interface in the application program that makes it relatively easy to extend the collector by adding additional components to it:

type Component interface { 
		Start(ctx context.Context, host Host) error 
		Shutdown(ctx context.Context)         error

The components are configured in a single file, along with extensions and services that are not considered actual components because they are not part of the pipeline or require access to the telemetry data. An example configuration for all components will look like this:

receivers: 
  otlp: 
    protocols: 
      grpc: 
      http: 

processors: 
  batch: 

exporters: 
   otlp: 
    endpoint: otelcol:4317 

 extensions: 
  health_check: 
  pprof: 
  zpages: 

service: 
  extensions: [health_check,pprof,zpages] 
  pipelines: 
    traces: 
      receivers: [otlp] 
      processors: [batch] 
      exporters: [otlp] 
    metrics: 
      receivers: [otlp] 
      processors: [batch] 
      exporters: [otlp] 
    logs: 
      receivers: [otlp] 
      processors: [batch] 
      exporters: [otlp]

The section below will explore these components in detail.

Receivers

As the name implies, a receiver component receives data in several supported formats and then converts the data to OpenTelemetry’s internally accepted data format. In a technical sense, the receiver defines a listener to listen for the protocols on a port in the collector. For example, the Jaeger receiver has support for these protocols:

Thrift Compact (port 6831)
Thrift Binary (port 6832)
gRPC (port 14250)
Thrift HTTP (port 14268)

The table below lists the supported receiver formats for each signal type:

Receivers	Logs	Metrics	Traces
Host metrics
Jaeger
Kafka
OpenCensus
OpenTelemetry Protocol
Prometheus
Zipkin

It is possible to reuse receivers across multiple pipelines and to configure multiple receivers in the same pipeline.

The LogsReceiver, MetricsReceiver, and TracesReceiver receivers all embed the same Receiver interface attached to the Component interface:

type Receiver interface { 
	Component 
} 

type LogsReceiver interface { 
	Receiver 
} 
// equally for other receivers.

Processors

Processors are responsible for additional tasks like filtering unwanted telemetry data and injecting additional attributes to the data before it is passed to the exporter. There are different types of processors, each with different capabilities. For example, instead of exporting data as and when you receive them, you can insert a batch processor to queue the data and export them at defined intervals. This saves valuable resources if your exporter needs to transmit such data over the internet.

Asides the Component interface, consumer and processor interfaces are also embedded in processor definition to provide a function that consumes the signal and a chance for the processor to modify the data, respectively. The implementation looks similar to the following:

type Capabilities struct { 
	ModifiesData bool 
} 
type baseConsumer interface { 
	Capabilities() Capabilities 
} 
type Metrics interface { 
	baseConsumer 
	ConsumeMetrics(ctx context.Context, md pdata.Metrics) error 
} 
type MetricsProcessor interface { 
	Processor 
	consumer.Metrics 
}

Exporters

The exporter component receives data as it appears in the internal collector format, then marshals the data into the required output format and sends it to the configured destination(s). The exporter’s interface also has a consumer and embeds the Exporter interface:

type LogsExporter interface { 
	Exporter 
	consumer.Logs 
}

Some of the available exporters and their supported signals are shown in the table below:

Exporters	Logs	Metrics	Traces
File
Jaeger
Kafka
OpenCensus
OpenTelemetry Protocol
Prometheus
Zipkin
Logging

Summary

OpenTelemetry has become one of the most active projects in CNCF, second only to Kubernetes. The shifting needs of DevOps and expanding landscape of observability can no longer be satisfied by a single tool.

So, instead of adding different tools and then figuring out how to interpret and use the data from them to improve your business, you can use OpenTelemetry to future-proof your business’ observability needs by standardizing the data you want to monitor, and select the right tools to visualize and analyze them later.

What is OpenTelemetry