Valkey-Search

Valkey-Search (BSD-3-Clause), provided as a Valkey module, is a high-performance Vector Similarity Search engine optimized for AI-driven workloads. It delivers single-digit millisecond latency and high QPS, capable of handling billions of vectors with over 99% recall.

Valkey-Search allows users to create indexes and perform similarity searches, incorporating complex filters. It supports Approximate Nearest Neighbor (ANN) search with HNSW and exact matching using K-Nearest Neighbors (KNN). Users can index data using either Valkey Hash or Valkey-JSON data types.

While Valkey-Search currently focuses on Vector Search, its goal is to extend Valkey into a full-fledged search engine, supporting full-text search and additional indexing options.

Use-Cases Where Valkey-Search Shines

Valkey-Search's ability to search billions of vectors with millisecond latencies makes it ideal for real-time applications such as:

Personalized Recommendations – Deliver instant, highly relevant recommendations based on real-time user interactions.
Fraud Detection & Security – Identify anomalies and suspicious activity with ultra-fast similarity matching.
Conversational AI & Chatbots – Enhance response accuracy and relevance by leveraging rapid vector-based retrieval.
Image & Video Search – Enable multimedia search through real-time similarity detection.
GenAI & Semantic Search – Power advanced AI applications with efficient vector retrieval for natural language understanding.

Supported Commands

FT.CREATE
FT.DROPINDEX
FT.INFO
FT._LIST
FT.SEARCH

For a detailed description of the supported commands, examples and configuration options, see the Command Reference.

Scaling

Valkey-Search supports both Standalone and Cluster modes. Query processing and ingestion scale linearly with CPU cores in both modes. For large storage requirements, users can leverage Cluster mode for horizontal scaling of the keyspace.

If replica lag is acceptable, users can achieve horizontal query scaling by directing clients to read from replicas.

Hybrid Queries

Valkey-Search supports hybrid queries, combining vector similarity search with filtering on indexed fields, such as Numeric and Tag indexes.

Tag index

Tags are text fields that are interpreted as a list of tags delimited by a separator character. Generally, tags are small sets of values with finite possible values like color, book genre, city name, or author.

Only indexed fields can be used as a tag filter.
TAG fields are tokenized by a separator character, which is a comma "," by default but configurable during index creation.
Only prefix, exact pre filters can be performed on a tag field. Suffix, infix queries are not supported.
By default, tags are case insensitive. For example, "Blue" and "BLUE" both will be indexed as "blue" and will yield the same result in a hybrid query.
Empty strings are neither indexed or queried.
During indexing and querying, any trailing whitespace is removed.

Syntax

Below are some examples of building filter query on a field named: color.

Here { and } are part of syntax and | is used as a OR operator to support multiple tags, general syntax is:

@<field_name>:{<tag>}
or
@<field_name>:{<tag1> | <tag2>}
or
@<field_name>:{<tag1> | <tag2> | ...}

For example, the following query will return documents with blue OR black OR green color.

@color:{blue | black | green}

As another example, the following query will return documents containing "hello world" or "hello universe"

@color:{hello world | hello universe}

Numeric Index

Numeric indexes allow for filtering queries to only return values that are in between a given start and end value.

Both inclusive and exclusive queries are supported.
For open ended queries, +inf, -inf can be used to express start and end ranges.

As an example, the following query will return books published between 2021 and 2024 (Both inclusive). The equivalent mathematical expression is 2021 <= year <= 2024.

"@year:[2021 2024]"

While The following query will return books published between 2021 (exclusive) and 2024 (inclusive). The equivalent mathematical expression is 2021 < year <= 2024:

"@year:[(2021 2024]"

The following query will return books published before 2024 (inclusive). The equivalent mathematical expression is year <= 2024:

@year:[(-inf 2024]

The following query will return books published after 2015 (exclusive). The equivalent mathematical expression is year >= 2015:

@year:[2015 +inf]

Query planner

A query that utilizes a filter expression to filter results is called a hybrid query. Any combination of tag and numeric indexes can form a hybrid query.

Pre-filtering: Pre-filtering relies on secondary indexes (e.g. tag, numeric) to first find the matches to the filter expression regardless of vector similarity. Once the filtered results are calculated a brute-force search is performed to sort by vector similarity.
Inline-filtering: Inline-filtering performs the vector search algorithm (e.g. HNSW), ignoring found vectors which don't match the filter.

Pre-filtering is faster when the filtered search space is much smaller than the original search space. When the filtered search space is large, inline-filtering becomes faster. The query planner for Valkey-Search automatically chooses between the two strategies based on the provided filter.

Monitoring

To check the server's overall search metrics, you can use the INFO SEARCH or INFO MODULES commands.

The following metrics are added to the INFO command's output:

search_used_memory_human: A human-friendly readable version of the search_used_memory_bytes metric
search_used_memory_bytes: The total bytes of memory that all indexes occupy
search_number_of_indexes: Index schema total count
search_number_of_attributes: Total count of attributes for all indexes
search_total_indexed_documents: Total count of all keys for all indexes
search_background_indexing_status (String) The status of the indexing process. NO_ACTIVITY indicates idle indexing
search_failure_requests_count: A count of all failed requests, including syntax errors
search_successful_requests_count: A count of all successful requests
search_hnsw_create_exceptions_count: Count of HNSW creation unexpected errors
search_hnsw_search_exceptions_count: Count of HNSW search unexpected errors
search_hnsw_remove_exceptions_count: Count of HNSW removal unexpected errors
search_hnsw_add_exceptions_count: Count of HNSW addition unexpected errors
search_hnsw_modify_exceptions_count: Count of HNSW modification unexpected errors
search_modify_subscription_skipped_count: Count of skipped subscription modifications
search_remove_subscription_successful_count: Count of successful subscription removals
search_remove_subscription_skipped_count: Count of skipped subscription removals
search_remove_subscription_failure_count: Count of failed subscription removals
search_add_subscription_successful_count: Count of successfully added subscriptions
search_add_subscription_failure_count: Count of failures of adding subscriptions
search_add_subscription_skipped_count: Count of skipped subscription adding processes
search_modify_subscription_failure_count: Count of failed subscription modifications
search_modify_subscription_successful_count: Count of successful subscription modifications

Configuration

Static configuration

The following list of configurations can be passed to the loadmodule command:

--reader-threads: (Optional) Controls the amount of threads executing queries. (Default: number of physical CPU cores on the host machine)
--writer-threads: (Optional) Controls the amount of threads processing index mutations. (Default: number of physical CPU cores on the host machine)
--use-coordinator: (Optional) Cluster mode enabler. Default: false.
--hnsw-block-size: (Optional) Specifies the allocation block size used by the HNSW graph for storing new vectors. Larger block sizes may improve performance by enhancing CPU cache efficiency, but come at the cost of increased memory usage due to pre-allocation for potential future growth. (Default: 10K)
--log-level Controls the log verbosity level. Possible values are: debug, verbose, notice and warning. (Default: Valkey's log level)

Runtime configuration

The following list of configurations can be modified at runtime using the CONFIG SET command:

search.hnsw-block-size:: Specifies the allocation block size used by the HNSW graph for storing new vectors. Larger block sizes may improve performance by enhancing CPU cache efficiency, but come at the cost of increased memory usage due to pre-allocation for potential future growth. (Default: 10K)