Valkey 8.0: Delivering Enhanced Performance and Reliability

2024-08-02 · Ping Xie, Madelyn Olson

The Valkey community is proud to unveil the first release candidate of Valkey 8.0, a major update designed to enhance performance, reliability, and observability for all Valkey installations. In this blog, we'll dive a bit deeper into each of these areas and talk about the exciting features we've built for this release.

Performance

Valkey 8.0 features significant improvements to the existing I/O threading system, allowing the main thread and I/O threads to operate concurrently. This release also includes a number of improvements to offload work to the I/O threads and introduces efficient batching of commands. Altogether, Valkey 8.0 is designed to handle up to 1.2 million Queries Per Second (QPS) on AWS's r7g platform, compared to the previous limit of 380K QPS. We'll dive deeper into these numbers in an upcoming blog.

NOTE: Not all improvements are available in the release candidate, but they will be available in the GA release of Valkey 8.0.

  • Asynchronous I/O Threading: Enables parallel processing of commands and I/O operations, maximizing throughput and minimizing bottlenecks.
  • Intelligent Core Utilization: Distributes I/O tasks across multiple cores based on realtime usage, reducing idle time and improving energy efficiency.
  • Command Batching: Optimizes memory access patterns by prefetching frequently accessed data to minimize CPU cache misses, reducing memory accesses required for dictionary operations.

For more details on these improvements, you can refer to #758 and #763.

Reliability

Cluster scaling operations via slot migrations have historically been delicate. Valkey 8.0 improves reliability and minimizes disruptions with the following enhancements:

  • Automatic Failover for Empty Shards: New shards that start empty, owning no slots, now benefit from automatic failover. This ensures high availability from the start of the scaling process.
  • Replication of Slot Migration States: All CLUSTER SETSLOT commands are now replicated synchronously to replicas before execution on the primary. This reduces the chance of unavailability if the primary fails, as the replicas have the most up-to-date information about the state of the shard. New replicas also automatically inherit the state from the primary without additional input from an operator.
  • Slot Migration State Recovery: In the event of a failover, Valkey 8.0 automatically updates the slot migration states on source and target nodes. This ensures requests are continuously routed to the correct primary in the target shard, maintaining cluster integrity and availability.

For more details on these improvements, you can refer to #445.

Replication

Valkey 8.0 introduces a dual-channel replication scheme, allowing the RDB and the replica backlog to be transferred simultaneously, accelerating synchronization.

  • Reduced Memory Load: By streaming replication data to the replica during the full sync, the primary node experiences significantly less memory pressure. The replica now manages the Client Output Buffer (COB) tracking, reducing the likelihood of COB overruns and enabling larger COB sizes on the replica side.
  • Reduced Parent Process Load: A dedicated connection for RDB transfer frees the primary's parent process from handling this data, allowing it to focus on client queries and improving overall responsiveness.

Performance tests show improvements in write latency during sync, and in scenarios with heavy read commands, the sync time can be cut by up to 50%. This translates to a more responsive system, even during synchronization.

For more details on these improvements, you can refer to #60.

Observability

Valkey 8.0 introduces a comprehensive per-slot metrics infrastructure, providing detailed visibility into the performance and resource usage of individual slots. This granular data helps inform decisions about resource allocation, load balancing, and performance optimization.

  • Key Count: Returns the number of keys in each slot, making it easier to identify the slots with the largest number of keys.
  • CPU Usage: Tracks CPU time consumed by operations on each slot, identifying areas of high utilization and potential bottlenecks.
  • Network Input/Output Bytes: Monitors data transmission and reception by each slot, offering insights into network load and bandwidth utilization.
  • Minimal Overhead: Initial benchmarks show that enabling detailed metrics incurs a negligible overhead of approximately 0.7% in QPS.

For more details on these improvements, you can refer to #712, #720, and #771.

Efficiency

Valkey 8.0 introduces two new improvements that reduce the memory overhead of keys, allowing users to store more data without any application changes. The first change is that keys are now embedded in the main dictionary, eliminating separate key pointers and significantly reducing memory overhead. This results in a 9-10% reduction in overall memory usage for scenarios with 16-byte keys and 8 or 16-byte values, along with performance improvements.

This release also introduces a new per-slot dictionary for Valkey cluster, which replaces a linked list that used to allow operator to list out all the keys in a slot for slot-migration. The new architecture splits the main dictionary by slot, reducing the memory overhead by 16 bytes per key-value pair without degrading performance.

For more details on these improvements, you can refer to #541 and Redis#11695.

Additional Highlights

  • Dual IPv4 and IPv6 Stack Support: Seamlessly operate in mixed IP environments for enhanced compatibility and flexibility. See #736 for details.
  • Improved Pub/Sub Efficiency: Lightweight cluster messages streamline communication and reduce overhead for faster, more efficient Pub/Sub operations. See #654 for details.
  • Valkey Over RDMA (Experimental): Unlock significant performance improvements with direct memory access between clients and Valkey servers, delivering up to 275% increase in throughput. See #477 for details.
  • Numerous Smaller Performance/Reliability Enhancements: Many under-the-hood improvements ensure a smoother, more stable experience across the board. See release notes for details.

Conclusion

Valkey 8.0 is a major update that offers improved performance, reliability, and observability. Whether you are an experienced Valkey/Redis user or exploring it for the first time, this release provides significant advancements in in-memory data storage. You can try out these enhancements today by downloading from source or using one of our container images. We would love to hear your thoughts on these new features and what you hope to see in the future from the Valkey project.

Important Note: The Valkey Over RDMA feature is currently experimental and might change or be removed in future versions.

We look forward to seeing what you achieve with Valkey 8.0!

About the authors

photo of Ping Xie

Ping Xie

Ping Xie is a software engineer at Google Cloud. He is currently an active member of the Technical Steering Committee for Valkey. Ping values collaboration and is dedicated to delivering high-quality solutions in the distributed database area.

photo of Madelyn Olson

Madelyn Olson

Madelyn Olson is a software engineer at AWS and a member of the Valkey Technical Steering Committee. When she is not busy writing databases, she enjoys hiking the serene nature of the pacific northwest.