Vector Search Plug-in User Guide
The Elasticsearch vector retrieval plug-in is developed by Baidu Elasticsearch team, and can quickly realize vector retrieval, vector computation and other requirements.
Background
In recent years, the vector retrievals based on Text (Document) Embedding, eigenvector, etc. are widely applied in the similarity retrieval of recommended systems and pictures. The user can use Word2vec and other tools to map the image, audio, natural language and other complex data information to eigenvectors, then retrieve the eigenvectors by the vector retrieval algorithm to realize the handling of complex data information. To process the vector data, Baidu Elasticsearch vector retrieval plug-in provides two vector retrieval algorithms: Linear algorithm and hnsw algorithm.
Algorithm | Meaning | Applicable Scenario | Disadvantages | Support distance algorithm |
---|---|---|---|---|
linear | Linear computation of all vector data | The recall rate is 100%. The query time is proportional to the data volume. It is usually used for effect contrast. |
The efficiency is lower under the large data volume. Consume cpu All-In-Memory |
Cosine distance (cosine) Euclidean distance (l2) Dot product (dot_prod) |
hnsw | Conduct approximate calculation of data based on the hnsw algorithm | The single machine data volume is small. High requirements on the recall rate High requirements on the query speed. |
The data expansion ratio is higher. An index should be built after the data are written. All-In-Memory |
Cosine distance (cosine) Euclidean distance (l2) |
Cluster Preparation
Recommendation | Description | |
---|---|---|
Cluster selection | At least memory of 16G above | The vector retrieval has a higher requirement on the cluster memory. If the data volume exceeds 10G, it is recommended to select the 16-core package of 64G above, such as computation 2 type, computation 3 type and storage 3 type. |
Single machine data volume | It is recommended not to exceed one third of the total node memory. | The vector retrieval has a higher requirement on the cluster memory. |
Writing traffic limit | Take the computation 2 type (16-core and 64G) node for example. It is recommended that the single node writing traffic limit is controlled within 4000tps. | The building of vector index is a CPU intensity task. It is recommended not to write data with large traffic. Because all the data are loaded into the system memory in the query process, do not write data with large traffic at the same time during the query process. |
Method of Application
Before writing data, the user should configure the knn parameters according to the vector dimension information and performance requirements of business, select the distance computation algorithm, and create the required knn index. You can write data after building the index. After building the index, you can conduct the vector retrieval query by the query mode provided below.
Create the knn index
We should create the knn index in advance by the following method:
As shown in the following example below, we create an index with the name of test-index
, including field1
and field2
. You can also customize the index name and field name according to your own requirements.
PUT /test-index
{
"settings": {
"index": {
"codec": "bpack_knn_hnsw",
"bpack.knn.hnsw.space": "cosine",
"bpack.knn.hnsw.m": 16,
"bpack.knn.hnsw.ef_construction": 512
}
},
"mappings": {
"properties": {
"field1": {
"type": "bpack_vector",
"dims": 2
},
"field2": {
"type": "bpack_knn_vector",
"dims": 2
}
}
}
}
Parameter | Description |
---|---|
index.codec | The bpack_knn_hnsw supports hnsw algorithm and linear algorithm. Or it only supports linear algorithm. |
type | The vector retrieval plug-in provides two new vector field types, bpack_vector and bpack_knn_vector . bpack_vector represents a common vector field and supports linear algorithm.;bpack_knn_vector represents a vector search field and supports linear algorithm and hnsw algorithm. |
dims | Vector dimension, supporting 2~2048 dimensions. |
The bpack.knn.hnsw parameter meaning in settings is as shown in the index level parameter optimization below.
Write and query data
Write data
We write data in the _doc of index test-index we created just now, and the example of writing data is as below:
POST /test-index/_doc/
{
"field1" : [6.5, 2.5],
"field2" : [6.5, 2.5],
"price" : 10
}
And field1
is the field of bpack_vector
type we just set. field2
is the field of bpack_knn_vector
type we just set. price
stands for other common fields.
After building an index, we can query the data as below:
linear query
The linear algorithm can query the field of bpack_knn_vector
as well as that of bpack_vector
type. In the following example, we query the field field1
of bpack_ vector
.
POST /test-index/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "bpack_knn_script",
"lang": "knn",
"params": {
"space": "cosine",
"field": "field1",
"vector": [3.5, 2.5]
}
}
}
},
"size": 100
}
Or
POST /test-index/_search
{
"query": {
"function_score": {
"boost_mode": "replace",
"script_score": {
"script": {
"source": "bpack_knn_script",
"lang": "knn",
"params": {
"space": "cosine",
"field": "field1",
"vector": [3.5, 2.5]
}
}
}
}
},
"size": 100
}
And the query parameter means:
Parameter | Description | Default value |
---|---|---|
source | Select the computing method, and set it as bpack_knn_script here. |
Required parameters |
space | Distance algorithm parameter. The linear algorithm supports three distance algorithms: Cosine distance (cosine), Dot product (dot_prod), Euclidean distance (l2). | cosine |
field | Vector field name. | Required parameters |
vector | The format is float array. The array length must be consistent with the dims specified by the field mapping when the index is created. | Required parameters |
hnsw query
When we use hnsw for query. The index must specify index.codec
as bpack_knn_hnsw
. Meanwhile, the type specified by the vector field mapping to be queried must be bpack_ knn_ hnsw
. In the following example, we query the field field2
of bpack_knn_ vector
.
POST /test-index/_search
{
"size" : 10,
"query": {
"knn": {
"field2": {
"vector": [3, 4],
"k": 2,
"ef": 512
}
}
}
}
And the query parameter means:
Parameter | Description | Default value |
---|---|---|
vector | The format is float array. The array length must be consistent with the dims specified by the field mapping when the index is created. Or the results may have errors. | Required parameters |
k | The value taken for the nearest number queried in the hnsw algorithm is a positive integer. | Required parameters |
ef | This parameter represents the size of the nearest dynamic scanning zone during the search period. The higher the value is, the higher the query accuracy rate is, and the slower the query speed is. The value range is [2,1024]. | 512 |
Parameter Optimization
Index level parameter
The index settings parameter must be provided when an index is created. The default value is used when these settings are not provided. These settings are static. This means that you can't modify the created indexes. The specific parameter analysis is as below:
Parameter | Description | Default value |
---|---|---|
bpack.knn.hnsw.m | This parameter represents the number of tow-way links created for each new element during its building period. The reasonable range of m is 2-100. It mainly impacts the memory, storage consumption and accuracy. The higher m value means a higher-consumption memory and storage, slower index building time and a higher accuracy rate. It is recommended to take values according to (vector dimension *1.5) to guarantee the performance. The values 12-48 can satisfy the requirements of most scenarios. | 16 |
bpack.knn.hnsw.space | Distance algorithm of vector retrieval computation Distance algorithm parameter. Hnsw Support two distance algorithms: Cosine distance (cosine)、 Euclidean distance (l2)。 | cosine |
bpack.knn.hnsw.ef_construction | This parameter represents the size of the nearest dynamic scanning zone during the index building period. The higher the value is, the higher the query accuracy rate is, but the slower the index building is. The value range is [2,1024]. | 512 |
Cluster level parameters
Common parameters
Parameter | Description | Default value |
---|---|---|
bpack.knn.hnsw.index_thread_qty | This parameter represents the number of threads allowed for building graphs by HNSW. (by default, nmslib sets this value as the number of cores n. However, Elasticsearch can create n threads for generating indexes. If each index thread can call nmslib to build graphs, that is to say, each thread generates n threads, this may cause simultaneous running of n^2 threads, and 100% CPU utilization ratio. Therefore, this value is set as 1 by default). The value range is [1,32]. | 1 |
Cache settings
Settings of cache parameters of linear algorithm
Parameter | Description | Default value |
---|---|---|
bpack.knn.memory.cache.limit | This parameter indicates the maximum capacity of cache. When the cache attempts to load the data and the data exceed the maximum capacity limit of cache, the eviction operation is trigged. This value can be set as a percentage, and represents the percentage of jvm memory. It can be also set as a value with the storage capacity unit, such as 『10kb』,『10mb』and『3g』, It is recommended not to set a fractional value, such as『1.5g』. | 10% |
bpack.knn.memory.cache.expiry.time | This parameter indicates that the data are cleared from the cache when the data are not accessed in the duration. It is expressed in TimeUnit format, such as 『10s』,『10m』and 『3h』. It is can't be set as a fractional value, such as『1.5h』. Generally speaking, we set this value for over 30 minutes to ensure the cache result can be effectively hit by the following queries; if a too small value is set, it is cleared quickly. | 30m |
Settings of cache parameters of hnsw algorithm
Parameter | Description | Default value |
---|---|---|
bpack.knn.cache.item.expiry.time | This parameter indicates that the data are cleared from the cache when the data are not accessed in the duration. It is expressed in TimeUnit format, such as 『10s』,『10m』and 『3h』. It is can't be set as a fractional value, such as『1.5h』. Generally speaking, we set this value for over 30 minutes to ensure the cache result can be effectively hit by the following queries; if a too small value is set, it is cleared quickly. | 180m |
Settings of Circuit Breaker
The hnsw algorithm consumes a lot of out-of-core memory. If the consumed memory is too much, the pagecache which can be used by Elasticsearch/Lucene is insufficient, and the cluster performance declines. To avoid this situation, we can configure Circuit Breaker to limit the excessive consumption of out-of-core memory. Currently, when the memory reaches the breaker limit we configure, the eviction mechanism is triggered to trigger the cache items which are uncommonly used.
Parameter | Description | Default value |
---|---|---|
bpack.knn.memory.circuit_breaker.limit | This parameter indicates the maximum capacity of cache. When the cache of hnsw exceeds the maximum capacity limit of the cache, the eviction operation is trigged and the circuit_breaker_triggered status is set as true (can be queried by the query of statistical information api). This value can be set as a percentage, and represents the percentage of the remaining memory of the server excluding the jvm of Elasticsearch. It can be also set as a value with the storage capacity unit, such as 『10kb』,『10mb』and『3g』, It is recommended not to set a fractional value, such as『1.5g』. For example, one machine has 100GB of memory and the jvm of Elasticsearch uses 32GB. The default value of bpack.knn.memory.circuit_breaker.limit is (60% * (100 -32) = 40.8GB). | 60% |
bpack.knn.circuit_breaker.unset.percentage | This parameter represents the removal percentage of Circuit Breaker. When the cache capacity is smaller than bpack.knn.circuit_breaker.unset.percentage, Circuit Breaker removes the triggering. The circuit_breaker_triggered status is set as false (can be queried by the query of statistical information api). | 75 |
Example
PUT /_cluster/settings
{
"persistent" : {
"bpack.knn.hnsw.index_thread_qty" : 1,
"bpack.knn.cache.item.expiry.time": "15m",
"bpack.knn.memory.cache.limit": "1g",
"bpack.knn.memory.cache.expiry.time":"10m",
"bpack.knn.memory.circuit_breaker.limit" : "55%",
"bpack.knn.circuit_breaker.unset.percentage": 23
}
}
View the related statistical information of hnsw algorithm
The method to query the status is as below:
GET /_bpack/_knn/stats
GET /_bpack/_knn/nodeId1,nodeId2/stats/statName1,statName2
The result example is as below:
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "my-application",
"circuit_breaker_triggered": false,
"nodes": {
"HYMrXXsBSamUkcAjhjeN0w: {
"eviction_count" : 0,
"miss_count" : 1,
"graph_memory_usage_kb" : 1,
"cache_capacity_reached" : false,
"load_exception_count" : 0,
"hit_count" : 0,
"load_success_count" : 1,
"total_load_time_nanos" : 2878745
}
}
}
Cluster status parameter:
Parameter | Description |
---|---|
circuit_breaker_triggered | Indicate whether the circuit breaker is triggered. If any node in the cluster deletes items from the cache because it has reached the cache capacity, the circuit breaker is triggered. When the number of items in the cache is less than bpack.knn.circuit_breaker.unset.percentage, the circuit breaker cancels triggering. |
Node status parameter:
Parameter | Description |
---|---|
eviction_count | Indicate the frequency the cache is eliminated in guava cache. (those caused by index deletion are not computed) |
hit_count | Cache hits occurring on the node. |
miss_count | Cache non-hits occurring on the node. |
graph_memory_usage_kb | Total size of cached in the memory of the local machine in kb. |
cache_capacity_reached | Whether to reach the cache capacity of this node. |
load_exception_count | Number of anomalies occurring from loading to the cache |
load_success_count | Number of successes occurring from loading to the cache |
total_load_time_nanos | Total time consumption from loading to cache, unit: Nanosecond. |
Performance Comparison
- Memory configuration: 30G
- cpu Configuration: Number of logical cores: 56, 2 physical cpus, each cpu cores: 14
- Elasticsearch Node: Single node
The performance comparison results are as below:
Data size | Index parameter | Cluster parameters | Top30 recall rate | Average time consumption of hnsw | Average time consumption of linear |
---|---|---|---|---|---|
1 million 32-dimensional vectors 1shards |
"bpack.knn.hnsw.space": "cosine", "bpack.knn.hnsw.m": 16, "bpack.knn.hnsw.ef_construction": 300 |
"bpack.knn.cache.item.expiry.time": "1h", "bpack.knn.memory.cache.limit": "15g", "bpack.knn.memory.cache.expiry.time":"1h", "bpack.knn.memory.circuit_breaker.limit" : "70%" |
99.97% | 12.96ms | 134.96ms |
10 million 32-dimensional vectors 1shards |
"bpack.knn.hnsw.space": "cosine", "bpack.knn.hnsw.m": 16, "bpack.knn.hnsw.ef_construction": 600 |
"bpack.knn.cache.item.expiry.time": "1h", "bpack.knn.memory.cache.limit": "15g", "bpack.knn.memory.cache.expiry.time":"1h", "bpack.knn.memory.circuit_breaker.limit" : "70%" |
99.97% | 24.69ms | 1209.13ms |
10 million 32-dimensional vectors 16shards |
"bpack.knn.hnsw.space": "cosine", "bpack.knn.hnsw.m": 48, "bpack.knn.hnsw.ef_construction": 600 |
"bpack.knn.cache.item.expiry.time": "1h", "bpack.knn.memory.cache.limit": "15g", "bpack.knn.memory.cache.expiry.time":"1h", "bpack.knn.memory.circuit_breaker.limit" : "70%" |
99.99% | 20.26ms | 609.56ms |
Algorithm Summary
-
Applicable scenarios of linear algorithm:
- Small data volume (the single segmentation is usually below 100w);
- First execute the normal search filter condition, and then conduct vector retrieval computation on the filtered result set.
- The recall rate is 100%, and the query performance is slower compared with hnsw.
-
Applicable scenarios of hnsw algorithm:
- The data volume is relatively large (cluster data volume is at the level of tens of millions)
- The vector retrieval computation and other filterings are conducted simultaneously. It is recommended to appropriately increase the query parameter k of hnsw to guarantee that the data satisfying filtering conditions can be involved in the computation;
- The query performance requirement is high, and the recall rate is 90% above.
Best Practices
- It is recommend to conduct regularly forceMerge in the business low peak period after writing to reduce the query delay.
- When using the linear algorithm for query, you should define the "bpack.knn.memory.cache.limit" parameter according to the data volume. For example, if the node data volume is 10G, and the default value of "bpack.knn.memory.cache.limit" is used (the default value of computation 2 type is 30G*10%=3G), the cache is unavailable. The bulk query may trigger the fusing operation of Elasticsearch, and the error circuitBreakingException is reported.
-
When a vector index is built for a larger data volume, the building may be slow. You can appropriately adjust the "bpack.knn.hnsw.index_thread_qty" before writing data according to the partitions and node cpu cores. For example, 1kw data, 1 node and 2 partitions, and 16-core cpu for the node. We can set "bpack.knn.hnsw.index_thread_qty" as 4-6 (if we set it as 8, the cpu can be fully loaded, and the production environment may be in risk) to improve the building efficiency.
It should be noted that a higher "bpack.knn.hnsw.index_thread_qty" parameter set may cause excessive start threads in the building. In a cluster with a higher load, it is not recommend to adjust this parameter to avoid full load of the cluster. If it is slow to write and build the vector, you can quicken building by temporarily reducing the cluster loads (reduce other writes and queries) and enlarging "bpack.knn.hnsw.index_thread_qty", and then adjust "bpack.knn.hnsw.index_thread_qty" to 1 after building.
-
When the written data volume is 1kw (for example about 10G), 1 node 1 partition, and computation 2 type node (16-core cpu and 64G memory), it is recommend to set the parameter as:
PUT /_cluster/settings { "persistent" : { "bpack.knn.hnsw.index_thread_qty" : 1, "bpack.knn.cache.item.expiry.time": "1h", "bpack.knn.memory.cache.limit": "12g", "bpack.knn.memory.cache.expiry.time":"1h", "bpack.knn.memory.circuit_breaker.limit" : "70%" } }
Analysis:
"bpack.knn.hnsw.index_thread_qty" : 1
: Generally, it is recommended to set it as 1; when the index building is too slow, you may appropriately adjust this parameter by reference to the recommendations above."bpack.knn.cache.item.expiry.time": "1h"
: You can set the timeout according to your own business."bpack.knn.memory.cache.limit": "12g"
: The data volume is about 10G. The cache should accommodate all the data."bpack.knn.memory.cache.expiry.time":"1h"
: You can set the timeout according to your own business."bpack.knn.memory.circuit_breaker.limit" : "70%"
: The default jvm memory of computation 2 type Elasticsearch is 30G. The "bpack.knn.memory.circuit_breaker.limit" is 70%*(64-30)=23.8G, it can accommodate the out-of-core memory occupied by data.
FAQ
-
Q: How is the recall rate defined?
A: Use the same vector to query two query mode. Compare the recalled documents, and get the ratio of two identical documents and recall documents. Now we can get the recall rate of the vector to be measured. We use the recall rate to characterize the accuracy rate of query.
-
Q: Why don't the indexed documents increase or completely reach the writing volume, and may the query fail when the writing has been successful?
A: The vector index is built in the refresh or flush period. Although the writing is completed, the vector index building tasks at the background may still continue.
-
Q: How to install a vector retrieval plug-in?
A: The newly applied 7.4.2 cluster has its own vector retrieval plug-in; if you already install the vector retrieval plug-in, you can contact the customer service personnel to assist installation.