Vector Search Plug-in User Guide

Last Updated：2020-09-21

The Elasticsearch vector retrieval plug-in is developed by Baidu Elasticsearch team, and can quickly realize vector retrieval, vector computation and other requirements.

Background

In recent years, the vector retrievals based on Text (Document) Embedding, eigenvector, etc. are widely applied in the similarity retrieval of recommended systems and pictures. The user can use Word2vec and other tools to map the image, audio, natural language and other complex data information to eigenvectors, then retrieve the eigenvectors by the vector retrieval algorithm to realize the handling of complex data information. To process the vector data, Baidu Elasticsearch vector retrieval plug-in provides two vector retrieval algorithms: Linear algorithm and hnsw algorithm.

Algorithm	Meaning	Applicable Scenario	Disadvantages	Support distance algorithm
linear	Linear computation of all vector data	The recall rate is 100%. The query time is proportional to the data volume. It is usually used for effect contrast.	The efficiency is lower under the large data volume. Consume cpu All-In-Memory	Cosine distance (cosine) Euclidean distance （l2） Dot product （dot_prod）
hnsw	Conduct approximate calculation of data based on the hnsw algorithm	The single machine data volume is small. High requirements on the recall rate High requirements on the query speed.	The data expansion ratio is higher. An index should be built after the data are written. All-In-Memory	Cosine distance （cosine） Euclidean distance （l2）

Cluster Preparation

	Recommendation	Description
Cluster selection	At least memory of 16G above	The vector retrieval has a higher requirement on the cluster memory. If the data volume exceeds 10G, it is recommended to select the 16-core package of 64G above, such as computation 2 type, computation 3 type and storage 3 type.
Single machine data volume	It is recommended not to exceed one third of the total node memory.	The vector retrieval has a higher requirement on the cluster memory.
Writing traffic limit	Take the computation 2 type (16-core and 64G) node for example. It is recommended that the single node writing traffic limit is controlled within 4000tps.	The building of vector index is a CPU intensity task. It is recommended not to write data with large traffic. Because all the data are loaded into the system memory in the query process, do not write data with large traffic at the same time during the query process.

Method of Application

Before writing data, the user should configure the knn parameters according to the vector dimension information and performance requirements of business, select the distance computation algorithm, and create the required knn index. You can write data after building the index. After building the index, you can conduct the vector retrieval query by the query mode provided below.

Create the knn index

We should create the knn index in advance by the following method:

As shown in the following example below, we create an index with the name of test-index, including field1 and field2. You can also customize the index name and field name according to your own requirements.

PUT /test-index 
{ 
    "settings": { 
      "index": { 
         "codec": "bpack_knn_hnsw", 
         "bpack.knn.hnsw.space": "cosine", 
         "bpack.knn.hnsw.m": 16,
         "bpack.knn.hnsw.ef_construction": 512 
      } 
   }, 
   "mappings": { 
      "properties": { 
         "field1": { 
            "type": "bpack_vector", 
            "dims": 2 
         }, 
         "field2": { 
            "type": "bpack_knn_vector", 
            "dims": 2 
         } 
      } 
   } 
}

Parameter	Description
index.codec	The `bpack_knn_hnsw` supports hnsw algorithm and linear algorithm. Or it only supports linear algorithm.
type	The vector retrieval plug-in provides two new vector field types, `bpack_vector` and `bpack_knn_vector`. `bpack_vector` represents a common vector field and supports linear algorithm.； `bpack_knn_vector` represents a vector search field and supports linear algorithm and hnsw algorithm.
dims	Vector dimension, supporting 2~2048 dimensions.

The bpack.knn.hnsw parameter meaning in settings is as shown in the index level parameter optimization below.

Write and query data

Write data

We write data in the _doc of index test-index we created just now, and the example of writing data is as below:

POST /test-index/_doc/ 
{ 
    "field1" : [6.5, 2.5], 
    "field2" : [6.5, 2.5], 
    "price" : 10 
}

And field1 is the field of bpack_vector type we just set. field2 is the field of bpack_knn_vector type we just set. price stands for other common fields.

After building an index, we can query the data as below:

linear query

The linear algorithm can query the field of bpack_knn_vector as well as that of bpack_vector type. In the following example, we query the field field1 of bpack_ vector.

POST /test-index/_search 
{ 
   "query": { 
      "script_score": { 
         "query": { 
            "match_all": {} 
         }, 
         "script": { 
            "source": "bpack_knn_script", 
            "lang": "knn", 
            "params": { 
               "space": "cosine", 
               "field": "field1", 
               "vector": [3.5, 2.5] 
            } 
         } 
      } 
   }, 
   "size": 100 
} 
Or 
POST /test-index/_search 
{ 
  "query": { 
    "function_score": { 
      "boost_mode": "replace", 
      "script_score": { 
        "script": { 
          "source": "bpack_knn_script", 
          "lang": "knn", 
          "params": { 
            "space": "cosine", 
            "field": "field1", 
            "vector": [3.5, 2.5] 
          } 
        } 
      } 
    } 
  }, 
  "size": 100 
}

And the query parameter means:

Parameter	Description	Default value
source	Select the computing method, and set it as `bpack_knn_script` here.	Required parameters
space	Distance algorithm parameter. The linear algorithm supports three distance algorithms: Cosine distance （cosine）, Dot product （dot_prod）, Euclidean distance （l2）.	cosine
field	Vector field name.	Required parameters
vector	The format is float array. The array length must be consistent with the dims specified by the field mapping when the index is created.	Required parameters

hnsw query

When we use hnsw for query. The index must specify index.codec as bpack_knn_hnsw. Meanwhile, the type specified by the vector field mapping to be queried must be bpack_ knn_ hnsw. In the following example, we query the field field2 of bpack_knn_ vector.

POST /test-index/_search 
{ 
    "size" : 10, 
    "query": { 
        "knn": { 
            "field2": { 
                "vector": [3, 4], 
                "k": 2, 
                "ef": 512 
            } 
        } 
    } 
}

And the query parameter means:

Parameter	Description	Default value
vector	The format is float array. The array length must be consistent with the dims specified by the field mapping when the index is created. Or the results may have errors.	Required parameters
k	The value taken for the nearest number queried in the hnsw algorithm is a positive integer.	Required parameters
ef	This parameter represents the size of the nearest dynamic scanning zone during the search period. The higher the value is, the higher the query accuracy rate is, and the slower the query speed is. The value range is [2,1024].	512

Parameter Optimization

Index level parameter

The index settings parameter must be provided when an index is created. The default value is used when these settings are not provided. These settings are static. This means that you can't modify the created indexes. The specific parameter analysis is as below:

Parameter	Description	Default value
bpack.knn.hnsw.m	This parameter represents the number of tow-way links created for each new element during its building period. The reasonable range of m is 2-100. It mainly impacts the memory, storage consumption and accuracy. The higher m value means a higher-consumption memory and storage, slower index building time and a higher accuracy rate. It is recommended to take values according to (vector dimension *1.5) to guarantee the performance. The values 12-48 can satisfy the requirements of most scenarios.	16
bpack.knn.hnsw.space	Distance algorithm of vector retrieval computation Distance algorithm parameter. Hnsw Support two distance algorithms: Cosine distance （cosine）、 Euclidean distance （l2）。	cosine
bpack.knn.hnsw.ef_construction	This parameter represents the size of the nearest dynamic scanning zone during the index building period. The higher the value is, the higher the query accuracy rate is, but the slower the index building is. The value range is [2,1024].	512

Cluster level parameters

Common parameters

Parameter	Description	Default value
bpack.knn.hnsw.index_thread_qty	This parameter represents the number of threads allowed for building graphs by HNSW. (by default, nmslib sets this value as the number of cores n. However, Elasticsearch can create n threads for generating indexes. If each index thread can call nmslib to build graphs, that is to say, each thread generates n threads, this may cause simultaneous running of n^2 threads, and 100% CPU utilization ratio. Therefore, this value is set as 1 by default). The value range is [1,32].	1

Cache settings

Settings of cache parameters of linear algorithm

Parameter	Description	Default value
bpack.knn.memory.cache.limit	This parameter indicates the maximum capacity of cache. When the cache attempts to load the data and the data exceed the maximum capacity limit of cache, the eviction operation is trigged. This value can be set as a percentage, and represents the percentage of jvm memory. It can be also set as a value with the storage capacity unit, such as 『10kb』,『10mb』and『3g』, It is recommended not to set a fractional value, such as『1.5g』.	10%
bpack.knn.memory.cache.expiry.time	This parameter indicates that the data are cleared from the cache when the data are not accessed in the duration. It is expressed in TimeUnit format, such as 『10s』,『10m』and 『3h』. It is can't be set as a fractional value, such as『1.5h』. Generally speaking, we set this value for over 30 minutes to ensure the cache result can be effectively hit by the following queries; if a too small value is set, it is cleared quickly.	30m

Settings of cache parameters of hnsw algorithm

Parameter	Description	Default value
bpack.knn.cache.item.expiry.time	This parameter indicates that the data are cleared from the cache when the data are not accessed in the duration. It is expressed in TimeUnit format, such as 『10s』,『10m』and 『3h』. It is can't be set as a fractional value, such as『1.5h』. Generally speaking, we set this value for over 30 minutes to ensure the cache result can be effectively hit by the following queries; if a too small value is set, it is cleared quickly.	180m

Settings of Circuit Breaker

The hnsw algorithm consumes a lot of out-of-core memory. If the consumed memory is too much, the pagecache which can be used by Elasticsearch/Lucene is insufficient, and the cluster performance declines. To avoid this situation, we can configure Circuit Breaker to limit the excessive consumption of out-of-core memory. Currently, when the memory reaches the breaker limit we configure, the eviction mechanism is triggered to trigger the cache items which are uncommonly used.

Parameter	Description	Default value
bpack.knn.memory.circuit_breaker.limit	This parameter indicates the maximum capacity of cache. When the cache of hnsw exceeds the maximum capacity limit of the cache, the eviction operation is trigged and the circuit_breaker_triggered status is set as true (can be queried by the query of statistical information api). This value can be set as a percentage, and represents the percentage of the remaining memory of the server excluding the jvm of Elasticsearch. It can be also set as a value with the storage capacity unit, such as 『10kb』,『10mb』and『3g』, It is recommended not to set a fractional value, such as『1.5g』. For example, one machine has 100GB of memory and the jvm of Elasticsearch uses 32GB. The default value of bpack.knn.memory.circuit_breaker.limit is (60% * (100 -32) = 40.8GB).	60%
bpack.knn.circuit_breaker.unset.percentage	This parameter represents the removal percentage of Circuit Breaker. When the cache capacity is smaller than bpack.knn.circuit_breaker.unset.percentage, Circuit Breaker removes the triggering. The circuit_breaker_triggered status is set as false (can be queried by the query of statistical information api).	75

Example

PUT /_cluster/settings 
{ 
    "persistent" : { 
        "bpack.knn.hnsw.index_thread_qty" : 1, 
        "bpack.knn.cache.item.expiry.time": "15m", 
        "bpack.knn.memory.cache.limit": "1g", 
        "bpack.knn.memory.cache.expiry.time":"10m", 
        "bpack.knn.memory.circuit_breaker.limit" : "55%", 
        "bpack.knn.circuit_breaker.unset.percentage": 23 
    } 
}

The method to query the status is as below:

GET /_bpack/_knn/stats 
GET /_bpack/_knn/nodeId1,nodeId2/stats/statName1,statName2

The result example is as below:

{ 
   "_nodes": { 
      "total": 1, 
      "successful": 1, 
      "failed": 0 
   }, 
   "cluster_name": "my-application",
   "circuit_breaker_triggered": false, 
   "nodes": { 
      "HYMrXXsBSamUkcAjhjeN0w: { 
         "eviction_count" : 0,
         "miss_count" : 1,
         "graph_memory_usage_kb" : 1,
         "cache_capacity_reached" : false, 
         "load_exception_count" : 0,
         "hit_count" : 0,
         "load_success_count" : 1,
         "total_load_time_nanos" : 2878745
      } 
   } 
}

Cluster status parameter:

Parameter	Description
circuit_breaker_triggered	Indicate whether the circuit breaker is triggered. If any node in the cluster deletes items from the cache because it has reached the cache capacity, the circuit breaker is triggered. When the number of items in the cache is less than bpack.knn.circuit_breaker.unset.percentage, the circuit breaker cancels triggering.

Node status parameter:

Parameter	Description
eviction_count	Indicate the frequency the cache is eliminated in guava cache. (those caused by index deletion are not computed)
hit_count	Cache hits occurring on the node.
miss_count	Cache non-hits occurring on the node.
graph_memory_usage_kb	Total size of cached in the memory of the local machine in kb.
cache_capacity_reached	Whether to reach the cache capacity of this node.
load_exception_count	Number of anomalies occurring from loading to the cache
load_success_count	Number of successes occurring from loading to the cache
total_load_time_nanos	Total time consumption from loading to cache, unit: Nanosecond.

Performance Comparison

Memory configuration: 30G
cpu Configuration: Number of logical cores: 56, 2 physical cpus, each cpu cores: 14
Elasticsearch Node: Single node

The performance comparison results are as below:

Data size	Index parameter	Cluster parameters	Top30 recall rate	Average time consumption of hnsw	Average time consumption of linear
1 million 32-dimensional vectors 1shards	"bpack.knn.hnsw.space": "cosine", "bpack.knn.hnsw.m": 16, "bpack.knn.hnsw.ef_construction": 300	"bpack.knn.cache.item.expiry.time": "1h", "bpack.knn.memory.cache.limit": "15g", "bpack.knn.memory.cache.expiry.time":"1h", "bpack.knn.memory.circuit_breaker.limit" : "70%"	99.97%	12.96ms	134.96ms
10 million 32-dimensional vectors 1shards	"bpack.knn.hnsw.space": "cosine", "bpack.knn.hnsw.m": 16, "bpack.knn.hnsw.ef_construction": 600	"bpack.knn.cache.item.expiry.time": "1h", "bpack.knn.memory.cache.limit": "15g", "bpack.knn.memory.cache.expiry.time":"1h", "bpack.knn.memory.circuit_breaker.limit" : "70%"	99.97%	24.69ms	1209.13ms
10 million 32-dimensional vectors 16shards	"bpack.knn.hnsw.space": "cosine", "bpack.knn.hnsw.m": 48, "bpack.knn.hnsw.ef_construction": 600	"bpack.knn.cache.item.expiry.time": "1h", "bpack.knn.memory.cache.limit": "15g", "bpack.knn.memory.cache.expiry.time":"1h", "bpack.knn.memory.circuit_breaker.limit" : "70%"	99.99%	20.26ms	609.56ms

Algorithm Summary

Applicable scenarios of linear algorithm:
- Small data volume (the single segmentation is usually below 100w);
- First execute the normal search filter condition, and then conduct vector retrieval computation on the filtered result set.
- The recall rate is 100%, and the query performance is slower compared with hnsw.
Applicable scenarios of hnsw algorithm:
- The data volume is relatively large (cluster data volume is at the level of tens of millions)
- The vector retrieval computation and other filterings are conducted simultaneously. It is recommended to appropriately increase the query parameter k of hnsw to guarantee that the data satisfying filtering conditions can be involved in the computation;
- The query performance requirement is high, and the recall rate is 90% above.

Best Practices

It is recommend to conduct regularly forceMerge in the business low peak period after writing to reduce the query delay.
When using the linear algorithm for query, you should define the "bpack.knn.memory.cache.limit" parameter according to the data volume. For example, if the node data volume is 10G, and the default value of "bpack.knn.memory.cache.limit" is used (the default value of computation 2 type is 30G*10%=3G), the cache is unavailable. The bulk query may trigger the fusing operation of Elasticsearch, and the error circuitBreakingException is reported.
When a vector index is built for a larger data volume, the building may be slow. You can appropriately adjust the "bpack.knn.hnsw.index_thread_qty" before writing data according to the partitions and node cpu cores. For example, 1kw data, 1 node and 2 partitions, and 16-core cpu for the node. We can set "bpack.knn.hnsw.index_thread_qty" as 4-6 (if we set it as 8, the cpu can be fully loaded, and the production environment may be in risk) to improve the building efficiency.

It should be noted that a higher "bpack.knn.hnsw.index_thread_qty" parameter set may cause excessive start threads in the building. In a cluster with a higher load, it is not recommend to adjust this parameter to avoid full load of the cluster. If it is slow to write and build the vector, you can quicken building by temporarily reducing the cluster loads (reduce other writes and queries) and enlarging "bpack.knn.hnsw.index_thread_qty", and then adjust "bpack.knn.hnsw.index_thread_qty" to 1 after building.

When the written data volume is 1kw (for example about 10G), 1 node 1 partition, and computation 2 type node (16-core cpu and 64G memory), it is recommend to set the parameter as:

PUT /_cluster/settings 
{ 
    "persistent" : { 
        "bpack.knn.hnsw.index_thread_qty" : 1, 
        "bpack.knn.cache.item.expiry.time": "1h", 
        "bpack.knn.memory.cache.limit": "12g", 
        "bpack.knn.memory.cache.expiry.time":"1h", 
        "bpack.knn.memory.circuit_breaker.limit" : "70%" 
    } 
}

Analysis:

"bpack.knn.hnsw.index_thread_qty" : 1： Generally, it is recommended to set it as 1; when the index building is too slow, you may appropriately adjust this parameter by reference to the recommendations above.

"bpack.knn.cache.item.expiry.time": "1h"： You can set the timeout according to your own business.

"bpack.knn.memory.cache.limit": "12g"： The data volume is about 10G. The cache should accommodate all the data.

"bpack.knn.memory.cache.expiry.time":"1h"： You can set the timeout according to your own business.

"bpack.knn.memory.circuit_breaker.limit" : "70%"： The default jvm memory of computation 2 type Elasticsearch is 30G. The "bpack.knn.memory.circuit_breaker.limit" is 70%*(64-30)=23.8G, it can accommodate the out-of-core memory occupied by data.

FAQ

Q： How is the recall rate defined?

A: Use the same vector to query two query mode. Compare the recalled documents, and get the ratio of two identical documents and recall documents. Now we can get the recall rate of the vector to be measured. We use the recall rate to characterize the accuracy rate of query.
Q： Why don't the indexed documents increase or completely reach the writing volume, and may the query fail when the writing has been successful?

A: The vector index is built in the refresh or flush period. Although the writing is completed, the vector index building tasks at the background may still continue.
Q： How to install a vector retrieval plug-in?

A: The newly applied 7.4.2 cluster has its own vector retrieval plug-in; if you already install the vector retrieval plug-in, you can contact the customer service personnel to assist installation.

NLP Chinese Word Segmentation Plugin

Best Practices

Elasticsearch

Vector Search Plug-in User Guide

Background

Cluster Preparation

Method of Application

Create the knn index

Write and query data

Write data

linear query

hnsw query

Parameter Optimization

Index level parameter

Cluster level parameters

Common parameters

Cache settings

Settings of cache parameters of linear algorithm

Settings of cache parameters of hnsw algorithm

Settings of Circuit Breaker

Example

View the related statistical information of hnsw algorithm

Performance Comparison

Algorithm Summary

Best Practices

FAQ