OpenTelemetry (GLIDE 2.0)
Observability is consistently one of the top feature requests by customers. Valkey GLIDE 2.0 introduces support for OpenTelemetry (OTel), enabling developers to gain deep insights into client-side performance and behavior in distributed systems. OTel is an open source, vendor-neutral framework that provides APIs, SDKs, and tools for generating, collecting, and exporting telemetry data—such as traces, metrics, and logs. It supports multiple programming languages and integrates with various observability backends like Prometheus, Jaeger, and AWS CloudWatch.
How It Works
Section titled “How It Works”GLIDE’s OpenTelemetry integration is designed to be both powerful and easy to adopt. Once an OTel collector endpoint is configured, GLIDE begins emitting default metrics and traces automatically—no additional code changes are required. This simplifies the path to observability best practices and minimizes disruption to existing workflows.
Metrics Overview
Section titled “Metrics Overview”GLIDE emits several built-in metrics out of the box. These metrics can be used to build dashboards, configure alerts, and monitor performance trends:
- Timeouts: Number of requests that exceeded their timeout duration.
- Retries: Count of operations retried due to transient errors or topology changes.
- Moved Errors: Number of MOVED responses received, indicating key reallocation in the cluster.
These metrics are emitted to your configured OpenTelemetry collector and can be viewed in any supported backend (Prometheus, CloudWatch, etc.).
Tracing Integration
Section titled “Tracing Integration”GLIDE creates a trace span for each Valkey command, giving detailed visibility into client-side performance. Each trace captures:
- The entire command lifecycle: from creation to completion or failure.
- A nested
send_commandspan, measuring communication time with the Valkey server. - A status tag indicating success or error for each span, helping you identify failure patterns.
This distinction helps developers separate client-side queuing latency from server communication delays, making it easier to troubleshoot performance issues.
GLIDE 2.0 provides comprehensive insights across the vast majority of standard operations, making it easy to adopt observability best practices with minimal effort.
Getting Started
Section titled “Getting Started”To begin collecting telemetry data with GLIDE 2.0:
- Set up an OpenTelemetry Collector to receive trace and metric data.
- Configure the GLIDE client with the endpoint to your collector.
- Alternatively, you can configure GLIDE to export telemetry data directly to a local file for development or debugging purposes, without requiring a running collector.
GLIDE does not export data directly to third-party services—instead, it sends data to your collector, which routes it to your backend (e.g., CloudWatch, Prometheus, Jaeger).
Supported Collector Protocols
Section titled “Supported Collector Protocols”You can configure the OTel collector endpoint using one of the following protocols:
http://orhttps://- Send data via HTTP(S)grpc://- Use gRPC for efficient telemetry transmissionfile://- Write telemetry data to a local file (ideal for local dev/debugging)
Optional Parameters
Section titled “Optional Parameters”When initializing OpenTelemetry, you can customize behavior using the OpenTelemetryConfig object.
Note: Both traces and metrics are optional—but at least one must be provided in the OpenTelemetryConfig. If neither is set, OpenTelemetry will not emit any data.
Tracing
Section titled “Tracing”openTelemetryConfig.traces- endpoint (required): The trace collector endpoint.
- samplePercentage (optional): Percentage (0–100) of commands to sample for tracing. Default:
1. - For production, a low sampling rate (1–5%) is recommended to balance performance and insight.
Metrics
Section titled “Metrics”openTelemetryConfig.metrics- endpoint (required): The metrics collector endpoint.
Flush Interval
Section titled “Flush Interval”openTelemetryConfig.flushIntervalMs- (optional): Time in milliseconds between flushes to the collector. Default:
5000.
File Exporter Details
Section titled “File Exporter Details”If using file:// as the endpoint:
- The path must begin with
file://. - If a directory is provided (or no file extension), data is written to
signals.jsonin that directory. - If a filename is included, it will be used as-is.
- The parent directory must already exist.
- Data is appended, not overwritten.
Validation Rules
Section titled “Validation Rules”flushIntervalMsmust be a positive integer.samplePercentagemust be between 0 and 100.- File exporter paths must start with
file://and have an existing parent directory. - Invalid configuration will throw an error synchronously when calling
OpenTelemetry.init().
Full Example (Java)
Section titled “Full Example (Java)”import glide.api.OpenTelemetry;OpenTelemetry.init( OpenTelemetry.OpenTelemetryConfig.builder() .traces( OpenTelemetry.TracesConfig.builder() .endpoint("http://localhost:4318/v1/traces") .samplePercentage(10) // Optional, defaults to 1. Can also be changed at runtime via setSamplePercentage(). .build() ) .metrics( OpenTelemetry.MetricsConfig.builder() .endpoint("http://localhost:4318/v1/metrics") .build() ) .flushIntervalMs(1000L) // Optional, defaults to 5000 .build());