 
Valkey 8.1: Continuing to Deliver Enhanced Performance and Reliability
The Valkey community is excited to unveil the new release of Valkey 8.1, a minor version update designed to further enhance performance, reliability, observability and usability over Valkey 8.0 for all Valkey installations.
In this blog, we'll dive a bit deeper into some of the new features in Valkey 8.1 and how they can benefit your applications.
Performance
Valkey 8.1 introduces several performance improvements that reduce latency, increase throughput, and lower memory usage.
The New Hashtable
The main changes responsible for several performance improvements is the new hashtable implementation that is used both as the main key-value store in Valkey and the implementations for the Hash, Set, and Sorted Set data types.
The new hashtable implementation is a complete rewrite of the previous hashtable. The new design adopts several modern design techniques to reduce the number of allocations to store each object, which reduces the number of random memory accesses while also saving memory.
The result is we observed a roughly 20 byte reduction per key-value pair for keys without a TTL, and up to a 30 byte reduction for key-value pairs with a TTL. The new implementation also helps improve the Valkey server throughput by roughly 10% compared to 8.0 version for pipeline workloads when I/O threading is not used.
You can learn more about the design and results in the dedicated blog post about the implementation.
Iterator Prefetching
Iterating over the key set keys is done in various scenarios, for example when a Valkey node needs to send all the keys and values to a newly connected replica.
In Valkey 8.1 the iteration functionality is improved by using memory prefetching techniques.
This means that when an element is going to be returned to the caller, the bucket and its elements have already been loaded into CPU cache when the previous bucket was being iterated.
This makes the iterator 3.5x faster than without prefetching, thus reducing the time it takes to send the data to a newly connected replica.
Commands like KEYS and also benefit from this optimization.
I/O Threads Improvements
Following up the I/O threads improvements added in 8.0, more operations have been offloaded to the I/O thread pool in the 8.1 release, improving the throughput and latency of some operations.
In the new release, TLS connections are now able to offload the TLS negotiation to I/O threads. This change improves the rate of accepting new connections by around 300%.
Other sources of overhead in the TLS connection handling were identified, namely in the calls to SSL_pending() and ERR_clear_error() functions, which were being called in the main event thread. By offloading these functions to the I/O threads pool, a throughput improvement was achieved in some operations. For instance, it was observed a 10% improvement in SET operations throughput, and a 22% improvement in GET operations throughput.
Replication traffic efficiency was also improved in 8.1 by offloading the reading of replication stream on the replicas to the I/O thread pool which means they can serve more read traffic. On the primaries, replication stream writes are now offloaded to the I/O thread pool.
Replication Improvements
Full syncs with TLS enabled are up to 18% faster by removing redundant CRC checksumming when using diskless replication.
The fork copy-on-write memory overhead is reduced by up to 47%
Sorted set and hyperloglog and bitcount optimizations
ZRANK command, which serves a popular usecase in operating Leaderboards, was optimized to perform up to 45% faster, depending on the sorted set size.
ZADD and other commands that involve floating point numbers are optimized by fast_float to parse floats using SIMD instructions.
This optimization requires a C++ compiler, and is currently an opt-in feature at compile time.
The probabilistic hyperloglog is another great data type, used for counting unique elements in very large datasets whilst using only 12KB of memory regardless of the amount of elements. By using the modern CPUs Advanced Vector Extensions of x86, Valkey 8.1 can achieve a 12x speed for the operations like PFMERGE and PFCOUNT on hyperloglog data types.
Similarly, the BITCOUNT operation has been improved up to 514% using AVX2 on x86.
Active Defrag Improvements
Active Defrag has been improved to eliminate latencies greater than 1ms (https://github.com/valkey-io/valkey/pull/1242). Defrag cycle time has been reduced to 500us (with increased frequency), resulting in much more predictable latencies, with a dramatic reduction in tail latencies.
An anti-starvation protection has also been introduced in the presence of long-running commands. If a slow command delays the defrag cycle, the defrag process will run proportionately longer to ensure that the configured CPU is achieved. Given the presence of slow commands, the proportional extra time is insignificant to latency.
Observability
There are also several improvements to the observability of the system behavior in Valkey 8.1.
Log Improvements
Valkey 8.1 brings new options to the format of the log file entries as well as the way timestamps are recorded in the log file. This makes it easier to consume the log files by log collecting systems.
The format of the log file entries is controlled by the log-format parameter, where the default is the existing format :
- legacy: the default, traditional log format
- logfmt: a structured log format; see https://www.brandur.org/logfmt
The formatting of the timestamp of the log file entries is controlled by the log-timestamp-format parameter, where the default is the existing format:
- legacy: default format
- iso8601: ISO 8601 extended date and time with time zone, of the form yyyy-mm-ddThh:mm:ss.sss±hh:mm
- milliseconds: milliseconds since the epoch
Note: using both the logfmt and iso8601 format uses around 60% more space, so disk space should be considered when implementing these.
Extending the Slowlog to Commandlog
Valkey has long had the capability to record slow commands at execution time based on the threshold set with the slowlog-log-slower-than parameter to keep the last slowlog-max-len entries. A useful tool in troubleshooting, it didn't take into account the overall round-trip to the application or the impact on network usage. With the addition of the new COMMANDLOG feature in Valkey 8.1, the recording of large requests and replies is now giving users great visibility in end-to-end latency.
Improved Latency Insights
Valkey has a built-in latency monitoring framework which samples latency-sensitive code paths such as for example fork when enabled through latency-monitor-threshold latency monitor.
The new feature adds to additional metrics to the LATENCY LATEST command that reports on the latest latency events that have been collected. The additional information in Valkey 8.1 reports on the total of the recorded latencies as well as the number of recorded spikes for this event. These additional fields allow users to better understand how often these latency events are occurring and the total impact they are causing to the system.
Extensibility
Valkey is already well known by its extensibility features. The sophisticated module system allows to extend the core system with new features developed as external modules.
Programmability
In Valkey 8.1 the module system API was extended with the support for developing new scripting engines as external modules.
This new API opens the door for the development of new language and runtime alternatives to the Lua base scripts supported by the Valkey core when using EVAL and FCALL commands.
In future releases of Valkey, we expect the emergence of new scripting engines. A good candidate is a scripting engine based on WASM, allowing EVAL scripts to be written in other languages than Lua and to be executed in a more secure sandbox environment.
There are also benefits for existing Lua scripts, since new Lua runtimes can be easily plugged in that provide better security properties and/or better performance.
Developers that intend to build new scripting engines for Valkey should check the Module API documentation.
Additional Highlights
Conditional Updates
This new functionality allows Valkey users to perform conditional updates using the SET command if the given comparison-value matches the key’s current value. This is a not only a quality-of-life improvement for developers as they no longer need to add this condition to their application code, it also saves a roundtrip to first get a value and then compare it before a SET. When using the optional GET as part of the SET IFEQ, the existing value is returned regardless whether it matches the comparison-value.
Conclusion
Valkey 8.1 continues the path of innovation and improvements, transparently bringing more performance and reliability to the user. We look forward to hearing what you achieve with Valkey 8.1! More detail can be found in release notes for the 8.1 GA release.
THANK YOU
We appreciate the efforts of all who contributed code to this release!
- Alan Scherger (flyinprogrammer),
- Amit Nagler (naglera),
- Basel Naamna (xbasel),
- Ben Totten (bentotten),
- Binbin (enjoy-binbin),
- Caiyi Wu (Codebells),
- Danish Mehmood (danish-mehmood),
- Eran Ifrah (eifrah-aws),
- Guillaume Koenig (knggk),
- Harkrishn Patro (hpatro),
- Jacob Murphy (murphyjacob4),
- Jim Brunner (JimB123),
- Josef Šimánek (simi),
- Jungwoo Song (bluayer),
- Karthick Ariyaratnam (karthyuom),
- Karthik Subbarao (KarthikSubbarao),
- Lipeng Zhu (lipzhu),
- Madelyn Olson (madolson),
- Masahiro Ide (imasahiro),
- Melroy van den Berg (melroy89),
- Mikhail Koviazin (mkmkme),
- Nadav Gigi (NadavGigi),
- Nadav Levanoni (nadav-levanoni),
- Nikhil Manglore (Nikhil-Manglore),
- Parth Patel (parthpatel),
- Pierre (pieturin),
- Ping Xie (PingXie),
- Qu Chen (QuChen88),
- Rain Valentine (SoftlyRaining),
- Ran Shidlansik (ranshid),
- Ray Cao (RayaCoo),
- Ricardo Dias (rjd15372),
- Romain Geissler (Romain-Geissler-1A),
- Roman Gershman (romange),
- Roshan Khatri (roshkhatri),
- Rueian (rueian),
- Sarthak Aggarwal (sarthakaggarwal97),
- Seungmin Lee (sungming2),
- Shai Zarka (zarkash-aws),
- Shivshankar (Shivshankar-Reddy),
- Simon Baatz (gmbnomis),
- Sinkevich Artem (ArtSin),
- Stav Ben-Tov (stav-bentov),
- Stefan Mueller (muelstefamzn),
- Tal Shachar (talxsha),
- Thalia Archibald (thaliaarchi),
- Uri Yagelnik (uriyage),
- Vadym Khoptynets (poiuj),
- Viktor Szépe (szepeviktor),
- Viktor Söderqvist (zuiderkwast),
- Vu Diep (vudiep411),
- Wen Hui (hwware),
- Xuyang WANG (Nugine),
- Yanqi Lv (lyq2333),
- Yury Fridlyand (Yury-Fridlyand),
- Zvi Schneider (zvi-code),
- bodong.ybd (yangbodong22011),
- chx9,
- kronwerk,
- otheng (otheng03),
- secwall,
- skyfirelee (artikell),
- xingbowang (xingbowang),
- zhaozhao.zz (soloestoy),
- zhenwei pi(pizhenwei),
- zixuan zhao (azuredream),
- 烈香 (hengyoush),
- 风去幽墨 (fengquyoumo)
