百度智能云

All Product Document

          Elasticsearch

          Vector Search Plug-in User Guide

          The Elasticsearch vector retrieval plug-in is developed by Baidu Elasticsearch team, and can quickly realize vector retrieval, vector computation and other requirements.

          Background

          In recent years, the vector retrievals based on Text (Document) Embedding, eigenvector, etc. are widely applied in the similarity retrieval of recommended systems and pictures. The user can use Word2vec and other tools to map the image, audio, natural language and other complex data information to eigenvectors, then retrieve the eigenvectors by the vector retrieval algorithm to realize the handling of complex data information. To process the vector data, Baidu Elasticsearch vector retrieval plug-in provides two vector retrieval algorithms: Linear algorithm and hnsw algorithm.

          Algorithm Meaning Applicable Scenario Disadvantages Support distance algorithm
          linear Linear computation of all vector data The recall rate is 100%.
          The query time is proportional to the data volume.
          It is usually used for effect contrast.
          The efficiency is lower under the large data volume.
          Consume cpu
          All-In-Memory
          Cosine distance (cosine)
          Euclidean distance (l2)
          Dot product (dot_prod)
          hnsw Conduct approximate calculation of data based on the hnsw algorithm The single machine data volume is small.
          High requirements on the recall rate
          High requirements on the query speed.
          The data expansion ratio is higher.
          An index should be built after the data are written.
          All-In-Memory
          Cosine distance (cosine)
          Euclidean distance (l2)

          Cluster Preparation

          Recommendation Description
          Cluster selection At least memory of 16G above The vector retrieval has a higher requirement on the cluster memory. If the data volume exceeds 10G, it is recommended to select the 16-core package of 64G above, such as computation 2 type, computation 3 type and storage 3 type.
          Single machine data volume It is recommended not to exceed one third of the total node memory. The vector retrieval has a higher requirement on the cluster memory.
          Writing traffic limit Take the computation 2 type (16-core and 64G) node for example. It is recommended that the single node writing traffic limit is controlled within 4000tps. The building of vector index is a CPU intensity task. It is recommended not to write data with large traffic.
          Because all the data are loaded into the system memory in the query process, do not write data with large traffic at the same time during the query process.

          Method of Application

          Before writing data, the user should configure the knn parameters according to the vector dimension information and performance requirements of business, select the distance computation algorithm, and create the required knn index. You can write data after building the index. After building the index, you can conduct the vector retrieval query by the query mode provided below.

          Create the knn index

          We should create the knn index in advance by the following method:

          As shown in the following example below, we create an index with the name of test-index, including field1 and field2. You can also customize the index name and field name according to your own requirements.

          PUT /test-index 
          { 
              "settings": { 
                "index": { 
                   "codec": "bpack_knn_hnsw", 
                   "bpack.knn.hnsw.space": "cosine", 
                   "bpack.knn.hnsw.m": 16,
                   "bpack.knn.hnsw.ef_construction": 512 
                } 
             }, 
             "mappings": { 
                "properties": { 
                   "field1": { 
                      "type": "bpack_vector", 
                      "dims": 2 
                   }, 
                   "field2": { 
                      "type": "bpack_knn_vector", 
                      "dims": 2 
                   } 
                } 
             } 
          } 
          Parameter Description
          index.codec The bpack_knn_hnsw supports hnsw algorithm and linear algorithm. Or it only supports linear algorithm.
          type The vector retrieval plug-in provides two new vector field types, bpack_vector and bpack_knn_vector.
          bpack_vector represents a common vector field and supports linear algorithm.
          bpack_knn_vector represents a vector search field and supports linear algorithm and hnsw algorithm.
          dims Vector dimension, supporting 2~2048 dimensions.

          The bpack.knn.hnsw parameter meaning in settings is as shown in the index level parameter optimization below.

          Write and query data

          Write data

          We write data in the _doc of index test-index we created just now, and the example of writing data is as below:

          POST /test-index/_doc/ 
          { 
              "field1" : [6.5, 2.5], 
              "field2" : [6.5, 2.5], 
              "price" : 10 
          } 

          And field1 is the field of bpack_vector type we just set. field2 is the field of bpack_knn_vector type we just set. price stands for other common fields.

          After building an index, we can query the data as below:

          linear query

          The linear algorithm can query the field of bpack_knn_vector as well as that of bpack_vector type. In the following example, we query the field field1 of bpack_ vector.

          POST /test-index/_search 
          { 
             "query": { 
                "script_score": { 
                   "query": { 
                      "match_all": {} 
                   }, 
                   "script": { 
                      "source": "bpack_knn_script", 
                      "lang": "knn", 
                      "params": { 
                         "space": "cosine", 
                         "field": "field1", 
                         "vector": [3.5, 2.5] 
                      } 
                   } 
                } 
             }, 
             "size": 100 
          } 
          Or 
          POST /test-index/_search 
          { 
            "query": { 
              "function_score": { 
                "boost_mode": "replace", 
                "script_score": { 
                  "script": { 
                    "source": "bpack_knn_script", 
                    "lang": "knn", 
                    "params": { 
                      "space": "cosine", 
                      "field": "field1", 
                      "vector": [3.5, 2.5] 
                    } 
                  } 
                } 
              } 
            }, 
            "size": 100 
          } 

          And the query parameter means:

          Parameter Description Default value
          source Select the computing method, and set it as bpack_knn_script here. Required parameters
          space Distance algorithm parameter. The linear algorithm supports three distance algorithms: Cosine distance (cosine), Dot product (dot_prod), Euclidean distance (l2). cosine
          field Vector field name. Required parameters
          vector The format is float array. The array length must be consistent with the dims specified by the field mapping when the index is created. Required parameters

          hnsw query

          When we use hnsw for query. The index must specify index.codec as bpack_knn_hnsw. Meanwhile, the type specified by the vector field mapping to be queried must be bpack_ knn_ hnsw. In the following example, we query the field field2 of bpack_knn_ vector.

          POST /test-index/_search 
          { 
              "size" : 10, 
              "query": { 
                  "knn": { 
                      "field2": { 
                          "vector": [3, 4], 
                          "k": 2, 
                          "ef": 512 
                      } 
                  } 
              } 
          } 

          And the query parameter means:

          Parameter Description Default value
          vector The format is float array. The array length must be consistent with the dims specified by the field mapping when the index is created. Or the results may have errors. Required parameters
          k The value taken for the nearest number queried in the hnsw algorithm is a positive integer. Required parameters
          ef This parameter represents the size of the nearest dynamic scanning zone during the search period. The higher the value is, the higher the query accuracy rate is, and the slower the query speed is. The value range is [2,1024]. 512

          Parameter Optimization

          Index level parameter

          The index settings parameter must be provided when an index is created. The default value is used when these settings are not provided. These settings are static. This means that you can't modify the created indexes. The specific parameter analysis is as below:

          Parameter Description Default value
          bpack.knn.hnsw.m This parameter represents the number of tow-way links created for each new element during its building period. The reasonable range of m is 2-100. It mainly impacts the memory, storage consumption and accuracy. The higher m value means a higher-consumption memory and storage, slower index building time and a higher accuracy rate. It is recommended to take values according to (vector dimension *1.5) to guarantee the performance. The values 12-48 can satisfy the requirements of most scenarios. 16
          bpack.knn.hnsw.space Distance algorithm of vector retrieval computation Distance algorithm parameter. Hnsw Support two distance algorithms: Cosine distance (cosine)、 Euclidean distance (l2)。 cosine
          bpack.knn.hnsw.ef_construction This parameter represents the size of the nearest dynamic scanning zone during the index building period. The higher the value is, the higher the query accuracy rate is, but the slower the index building is. The value range is [2,1024]. 512

          Cluster level parameters

          Common parameters

          Parameter Description Default value
          bpack.knn.hnsw.index_thread_qty This parameter represents the number of threads allowed for building graphs by HNSW. (by default, nmslib sets this value as the number of cores n. However, Elasticsearch can create n threads for generating indexes. If each index thread can call nmslib to build graphs, that is to say, each thread generates n threads, this may cause simultaneous running of n^2 threads, and 100% CPU utilization ratio. Therefore, this value is set as 1 by default). The value range is [1,32]. 1

          Cache settings

          Settings of cache parameters of linear algorithm
          Parameter Description Default value
          bpack.knn.memory.cache.limit This parameter indicates the maximum capacity of cache. When the cache attempts to load the data and the data exceed the maximum capacity limit of cache, the eviction operation is trigged. This value can be set as a percentage, and represents the percentage of jvm memory. It can be also set as a value with the storage capacity unit, such as 『10kb』,『10mb』and『3g』, It is recommended not to set a fractional value, such as『1.5g』. 10%
          bpack.knn.memory.cache.expiry.time This parameter indicates that the data are cleared from the cache when the data are not accessed in the duration. It is expressed in TimeUnit format, such as 『10s』,『10m』and 『3h』. It is can't be set as a fractional value, such as『1.5h』. Generally speaking, we set this value for over 30 minutes to ensure the cache result can be effectively hit by the following queries; if a too small value is set, it is cleared quickly. 30m
          Settings of cache parameters of hnsw algorithm
          Parameter Description Default value
          bpack.knn.cache.item.expiry.time This parameter indicates that the data are cleared from the cache when the data are not accessed in the duration. It is expressed in TimeUnit format, such as 『10s』,『10m』and 『3h』. It is can't be set as a fractional value, such as『1.5h』. Generally speaking, we set this value for over 30 minutes to ensure the cache result can be effectively hit by the following queries; if a too small value is set, it is cleared quickly. 180m

          Settings of Circuit Breaker

          The hnsw algorithm consumes a lot of out-of-core memory. If the consumed memory is too much, the pagecache which can be used by Elasticsearch/Lucene is insufficient, and the cluster performance declines. To avoid this situation, we can configure Circuit Breaker to limit the excessive consumption of out-of-core memory. Currently, when the memory reaches the breaker limit we configure, the eviction mechanism is triggered to trigger the cache items which are uncommonly used.

          Parameter Description Default value
          bpack.knn.memory.circuit_breaker.limit This parameter indicates the maximum capacity of cache. When the cache of hnsw exceeds the maximum capacity limit of the cache, the eviction operation is trigged and the circuit_breaker_triggered status is set as true (can be queried by the query of statistical information api). This value can be set as a percentage, and represents the percentage of the remaining memory of the server excluding the jvm of Elasticsearch. It can be also set as a value with the storage capacity unit, such as 『10kb』,『10mb』and『3g』, It is recommended not to set a fractional value, such as『1.5g』. For example, one machine has 100GB of memory and the jvm of Elasticsearch uses 32GB. The default value of bpack.knn.memory.circuit_breaker.limit is (60% * (100 -32) = 40.8GB). 60%
          bpack.knn.circuit_breaker.unset.percentage This parameter represents the removal percentage of Circuit Breaker. When the cache capacity is smaller than bpack.knn.circuit_breaker.unset.percentage, Circuit Breaker removes the triggering. The circuit_breaker_triggered status is set as false (can be queried by the query of statistical information api). 75

          Example

          PUT /_cluster/settings 
          { 
              "persistent" : { 
                  "bpack.knn.hnsw.index_thread_qty" : 1, 
                  "bpack.knn.cache.item.expiry.time": "15m", 
                  "bpack.knn.memory.cache.limit": "1g", 
                  "bpack.knn.memory.cache.expiry.time":"10m", 
                  "bpack.knn.memory.circuit_breaker.limit" : "55%", 
                  "bpack.knn.circuit_breaker.unset.percentage": 23 
              } 
          } 

          The method to query the status is as below:

          GET /_bpack/_knn/stats 
          GET /_bpack/_knn/nodeId1,nodeId2/stats/statName1,statName2 

          The result example is as below:

          { 
             "_nodes": { 
                "total": 1, 
                "successful": 1, 
                "failed": 0 
             }, 
             "cluster_name": "my-application",
             "circuit_breaker_triggered": false, 
             "nodes": { 
                "HYMrXXsBSamUkcAjhjeN0w: { 
                   "eviction_count" : 0,
                   "miss_count" : 1,
                   "graph_memory_usage_kb" : 1,
                   "cache_capacity_reached" : false, 
                   "load_exception_count" : 0,
                   "hit_count" : 0,
                   "load_success_count" : 1,
                   "total_load_time_nanos" : 2878745
                } 
             } 
          } 

          Cluster status parameter:

          Parameter Description
          circuit_breaker_triggered Indicate whether the circuit breaker is triggered. If any node in the cluster deletes items from the cache because it has reached the cache capacity, the circuit breaker is triggered. When the number of items in the cache is less than bpack.knn.circuit_breaker.unset.percentage, the circuit breaker cancels triggering.

          Node status parameter:

          Parameter Description
          eviction_count Indicate the frequency the cache is eliminated in guava cache. (those caused by index deletion are not computed)
          hit_count Cache hits occurring on the node.
          miss_count Cache non-hits occurring on the node.
          graph_memory_usage_kb Total size of cached in the memory of the local machine in kb.
          cache_capacity_reached Whether to reach the cache capacity of this node.
          load_exception_count Number of anomalies occurring from loading to the cache
          load_success_count Number of successes occurring from loading to the cache
          total_load_time_nanos Total time consumption from loading to cache, unit: Nanosecond.

          Performance Comparison

          • Memory configuration: 30G
          • cpu Configuration: Number of logical cores: 56, 2 physical cpus, each cpu cores: 14
          • Elasticsearch Node: Single node

          The performance comparison results are as below:

          Data size Index parameter Cluster parameters Top30 recall rate Average time consumption of hnsw Average time consumption of linear
          1 million 32-dimensional vectors
          1shards
          "bpack.knn.hnsw.space": "cosine",
          "bpack.knn.hnsw.m": 16,
          "bpack.knn.hnsw.ef_construction": 300
          "bpack.knn.cache.item.expiry.time": "1h",
          "bpack.knn.memory.cache.limit": "15g",
          "bpack.knn.memory.cache.expiry.time":"1h",
          "bpack.knn.memory.circuit_breaker.limit" : "70%"
          99.97% 12.96ms 134.96ms
          10 million 32-dimensional vectors
          1shards
          "bpack.knn.hnsw.space": "cosine",
          "bpack.knn.hnsw.m": 16,
          "bpack.knn.hnsw.ef_construction": 600
          "bpack.knn.cache.item.expiry.time": "1h",
          "bpack.knn.memory.cache.limit": "15g",
          "bpack.knn.memory.cache.expiry.time":"1h",
          "bpack.knn.memory.circuit_breaker.limit" : "70%"
          99.97% 24.69ms 1209.13ms
          10 million 32-dimensional vectors
          16shards
          "bpack.knn.hnsw.space": "cosine",
          "bpack.knn.hnsw.m": 48,
          "bpack.knn.hnsw.ef_construction": 600
          "bpack.knn.cache.item.expiry.time": "1h",
          "bpack.knn.memory.cache.limit": "15g",
          "bpack.knn.memory.cache.expiry.time":"1h",
          "bpack.knn.memory.circuit_breaker.limit" : "70%"
          99.99% 20.26ms 609.56ms

          Algorithm Summary

          • Applicable scenarios of linear algorithm:

            • Small data volume (the single segmentation is usually below 100w);
            • First execute the normal search filter condition, and then conduct vector retrieval computation on the filtered result set.
            • The recall rate is 100%, and the query performance is slower compared with hnsw.
          • Applicable scenarios of hnsw algorithm:

            • The data volume is relatively large (cluster data volume is at the level of tens of millions)
            • The vector retrieval computation and other filterings are conducted simultaneously. It is recommended to appropriately increase the query parameter k of hnsw to guarantee that the data satisfying filtering conditions can be involved in the computation;
            • The query performance requirement is high, and the recall rate is 90% above.

          Best Practices

          • It is recommend to conduct regularly forceMerge in the business low peak period after writing to reduce the query delay.
          • When using the linear algorithm for query, you should define the "bpack.knn.memory.cache.limit" parameter according to the data volume. For example, if the node data volume is 10G, and the default value of "bpack.knn.memory.cache.limit" is used (the default value of computation 2 type is 30G*10%=3G), the cache is unavailable. The bulk query may trigger the fusing operation of Elasticsearch, and the error circuitBreakingException is reported.
          • When a vector index is built for a larger data volume, the building may be slow. You can appropriately adjust the "bpack.knn.hnsw.index_thread_qty" before writing data according to the partitions and node cpu cores. For example, 1kw data, 1 node and 2 partitions, and 16-core cpu for the node. We can set "bpack.knn.hnsw.index_thread_qty" as 4-6 (if we set it as 8, the cpu can be fully loaded, and the production environment may be in risk) to improve the building efficiency.

            It should be noted that a higher "bpack.knn.hnsw.index_thread_qty" parameter set may cause excessive start threads in the building. In a cluster with a higher load, it is not recommend to adjust this parameter to avoid full load of the cluster. If it is slow to write and build the vector, you can quicken building by temporarily reducing the cluster loads (reduce other writes and queries) and enlarging "bpack.knn.hnsw.index_thread_qty", and then adjust "bpack.knn.hnsw.index_thread_qty" to 1 after building.

          • When the written data volume is 1kw (for example about 10G), 1 node 1 partition, and computation 2 type node (16-core cpu and 64G memory), it is recommend to set the parameter as:

            PUT /_cluster/settings 
            { 
                "persistent" : { 
                    "bpack.knn.hnsw.index_thread_qty" : 1, 
                    "bpack.knn.cache.item.expiry.time": "1h", 
                    "bpack.knn.memory.cache.limit": "12g", 
                    "bpack.knn.memory.cache.expiry.time":"1h", 
                    "bpack.knn.memory.circuit_breaker.limit" : "70%" 
                } 
            } 

          Analysis:

          1. "bpack.knn.hnsw.index_thread_qty" : 1: Generally, it is recommended to set it as 1; when the index building is too slow, you may appropriately adjust this parameter by reference to the recommendations above.
          2. "bpack.knn.cache.item.expiry.time": "1h": You can set the timeout according to your own business.
          3. "bpack.knn.memory.cache.limit": "12g": The data volume is about 10G. The cache should accommodate all the data.
          4. "bpack.knn.memory.cache.expiry.time":"1h": You can set the timeout according to your own business.
          5. "bpack.knn.memory.circuit_breaker.limit" : "70%": The default jvm memory of computation 2 type Elasticsearch is 30G. The "bpack.knn.memory.circuit_breaker.limit" is 70%*(64-30)=23.8G, it can accommodate the out-of-core memory occupied by data.

          FAQ

          • Q: How is the recall rate defined?

            A: Use the same vector to query two query mode. Compare the recalled documents, and get the ratio of two identical documents and recall documents. Now we can get the recall rate of the vector to be measured. We use the recall rate to characterize the accuracy rate of query.

          • Q: Why don't the indexed documents increase or completely reach the writing volume, and may the query fail when the writing has been successful?

            A: The vector index is built in the refresh or flush period. Although the writing is completed, the vector index building tasks at the background may still continue.

          • Q: How to install a vector retrieval plug-in?

            A: The newly applied 7.4.2 cluster has its own vector retrieval plug-in; if you already install the vector retrieval plug-in, you can contact the customer service personnel to assist installation.

          Previous
          NLP Chinese Word Segmentation Plugin
          Next
          Best Practices