Skip to content

OpenTelemetry (GLIDE 2.0)

Observability is consistently one of the top feature requests by customers. Valkey GLIDE 2.0 introduces support for OpenTelemetry (OTel), enabling developers to gain deep insights into client-side performance and behavior in distributed systems. OTel is an open source, vendor-neutral framework that provides APIs, SDKs, and tools for generating, collecting, and exporting telemetry data—such as traces, metrics, and logs. It supports multiple programming languages and integrates with various observability backends like Prometheus, Jaeger, and AWS CloudWatch.

GLIDE’s OpenTelemetry integration is designed to be both powerful and easy to adopt. Once an OTel collector endpoint is configured, GLIDE begins emitting default metrics and traces automatically—no additional code changes are required. This simplifies the path to observability best practices and minimizes disruption to existing workflows.

GLIDE emits several built-in metrics out of the box. These metrics can be used to build dashboards, configure alerts, and monitor performance trends:

  • Timeouts: Number of requests that exceeded their timeout duration.
  • Retries: Count of operations retried due to transient errors or topology changes.
  • Moved Errors: Number of MOVED responses received, indicating key reallocation in the cluster.

These metrics are emitted to your configured OpenTelemetry collector and can be viewed in any supported backend (Prometheus, CloudWatch, etc.).

GLIDE creates a trace span for each Valkey command, giving detailed visibility into client-side performance. Each trace captures:

  • The entire command lifecycle: from creation to completion or failure.
  • A nested send_command span, measuring communication time with the Valkey server.
  • A status tag indicating success or error for each span, helping you identify failure patterns.

This distinction helps developers separate client-side queuing latency from server communication delays, making it easier to troubleshoot performance issues.

  • The SCAN family of commands (SCAN, SSCAN, HSCAN, ZSCAN)
  • Lua scripting commands (EVAL, EVALSHA)

Support for these commands will be added in a future version as we continue to expand tracing coverage.

Even with these exceptions, GLIDE 2.0 provides comprehensive insights across the vast majority of standard operations, making it easy to adopt observability best practices with minimal effort.

To begin collecting telemetry data with GLIDE 2.0:

  • Set up an OpenTelemetry Collector to receive trace and metric data.
  • Configure the GLIDE client with the endpoint to your collector.
  • Alternatively, you can configure GLIDE to export telemetry data directly to a local file for development or debugging purposes, without requiring a running collector.

GLIDE does not export data directly to third-party services—instead, it sends data to your collector, which routes it to your backend (e.g., CloudWatch, Prometheus, Jaeger).

You can configure the OTel collector endpoint using one of the following protocols:

  • http:// or https:// - Send data via HTTP(S)
  • grpc:// - Use gRPC for efficient telemetry transmission
  • file:// - Write telemetry data to a local file (ideal for local dev/debugging)

When initializing OpenTelemetry, you can customize behavior using the openTelemetryConfig object.

openTelemetryConfig.traces
  • endpoint (required): The trace collector endpoint.
  • samplePercentage (optional): Percentage (0–100) of commands to sample for tracing. Default: 1.
  • For production, a low sampling rate (1–5%) is recommended to balance performance and insight.
openTelemetryConfig.metrics
  • endpoint (required): The metrics collector endpoint.
openTelemetryConfig.flushIntervalMs
  • (optional): Time in milliseconds between flushes to the collector. Default: 5000.

If using file:// as the endpoint:

  • The path must begin with file://.
  • If a directory is provided (or no file extension), data is written to signals.json in that directory.
  • If a filename is included, it will be used as-is.
  • The parent directory must already exist.
  • Data is appended, not overwritten.
  • flushIntervalMs must be a positive integer.
  • samplePercentage must be between 0 and 100.
  • File exporter paths must start with file:// and have an existing parent directory.
  • Invalid configuration will throw an error synchronously when calling OpenTelemetry.init().
import { OpenTelemetry, OpenTelemetryConfig, OpenTelemetryTracesConfig, OpenTelemetryMetricsConfig } from "@valkey/valkey-glide";
// Define traces configuration
const tracesConfig: OpenTelemetryTracesConfig = {
endpoint: "http://localhost:4318/v1/traces",
samplePercentage: 10 // Optional, defaults to 1%
};
// Define metrics configuration
const metricsConfig: OpenTelemetryMetricsConfig = {
endpoint: "http://localhost:4318/v1/metrics"
};
// Complete OpenTelemetry configuration
const openTelemetryConfig: OpenTelemetryConfig = {
traces: tracesConfig, // Optional: can omit if only metrics are needed
metrics: metricsConfig, // Optional: can omit if only traces are needed
flushIntervalMs: 1000 // Optional, defaults to 5000 ms
};
// Initialize OpenTelemetry (can only be called once per process)
OpenTelemetry.init(openTelemetryConfig);

✅ In this example, both traces and metrics are configured, but you can configure only one of them if you wish. At least one must be provided.