FAQs About the Elasticsearch System

Last Updated：2020-09-29

The primary purpose of this document is as follows:

To accumulate users' FAQs.
To complete all operations in the document using sense, and need to install the sense plug-in into the chrome browser.

How to check plug-ins installed in the Es

You can use the following API to list the plug-ins installed on each node.

GET /_cat/plugins

Error caused by full thread pool queue

The exceptions thrown by the ES in this scenario are as follows:

rejected execution of org.elasticsearch.transport.TransportService$4@c8998f4 
on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@553aee29
[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 0]]

There are many thread pools in the ES, such as index, search, and bulk, which are the three typical thread pools we can see. If the system is under great pressure and the backend thread cannot handle it, the tasks initiated by the user pile up in the queue of the thread pools. In the case of reaching the upper limit of the queue, the ES throws a corresponding exception. In the case of such an error, you need to take the following two steps:

Check the system's CPU and IO utilization. If the system's IO and CPU utilization remain high, this indicates that the system has encountered a resource bottleneck and cannot avoid such error by optimizing the system parameters. The user on the Baidu AI Cloud can view the ES CPU utilization at the ES console or through the ES's command. You can see the CPU utilization and a load of each node of the ES through the following command:
```
  GET /_cat/nodes?v
```
If there is no problem with resources, check the configuration of the leading thread pool. For the error above, check the configuration of bulk's thread pool, and execute the following command in the sense:
```
  GET /_cluster/settings
```

The result is as follows:

 "thread_pool": {
         "bulk": {
            "type": "fixed",
            "size": "4",
            "queue_size": "50"
         }
      }

The result indicates that there are 4 execution threads for the thread pool handling the bulk task, and the queue number is 50. According to our experience, this value is relatively small so that you can handle it with the following operation:

ES 5.5.0 version:

PUT /_cluster/settings
{
   "persistent": {
       "thread_pool.bulk.size": 32,
       "thread_pool.bulk.queue_size": 300
   }
}

ES 6.5.3 + cluster:

PUT /_cluster/settings
{
    "persistent": {
        "thread_pool.write.size": 32,
        "thread_pool.write.queue_size": 300
    }
}

Too Many Open Files error

If this error occurs in the es log, generally, there are too many open files. The ES recommends that the limit of file handles should be at least 65536. The user can modify it by modifying /etc/security/limits.conf, or modify it using the command ulimit. In the es, each shard is a separate lucene index writer. Each shard consists of multiple segments, and each segment contains multiple files. Thus, the number of open files = shard number * segment number * the number of files contained in each segment. So, we suggest that the number of shards on a physical machine node is about 1000, and not too many shards. Also, the lucene can effectively reduce the number of files in each segment by using the compound file format.

How large is a shard in the Es?

Each shard of the ES is an index of the lucene, and an index of the lucene can only store 2 billion documents, so a shard can only store up to 2 billion documents. Additionally, the size of a shard should be between 10 G and 50 G. If it is too large, the query is slow. Also, it takes more time to repair the replica. If the shard size is too small, it leads to excessive shards in an index, and too large fanout brought by the query.

What to do when the cluster is red or yellow?

If the cluster is RED, it indicates no primary shard assigned in the cluster, and yellow indicates a failure to assign a replica. Recommend you to use the following API to see the reason for not assigning the shard to a node.

GET _cluster/allocation/explain

According to our experience, the ES does assign the shard due to the following circumstances:

No node has the storage space to put down this shard.
If the shard is a replica, the primary shard may be unassigned or in the initializing state.

When the shard is in the unassigned state for a long time, the ES tries to assign an unassigned shard in ES 5 times, and if failed, do not try again. In this case, you need to call and manually control the cluster for handling the unassigned shard:

POST /_cluster/reroute?retry_failed=true

How to cancel a slow log?

Sendings a query by the user may cause a cluster to be very slow, and the CPU utilization rate very high. Thus, the user sometimes wants to cancel the query occupying too many resources. After the release of the ES 5.0, the es provides the command to cancel the query. The es encapsulates all the execution tasks into a task. You can view a list of tasks being executed by a node through the task api, or you can cancel tasks using the task api. For example, if desired to query all tasks executing the search type, you can use the following API:

GET /_tasks?actions=*search

Cancel all search tasks being executed:

POST _tasks/_cancel?actions=*search

For more methods for use, see the introduction of the Official Website.

PageCache plays a vital role in the query

Recommend that the ES should be given as much pagecache as possible if the condition permits. It may significantly optimize the query speed. If the pagecache is insufficient, the ES reads the disk every time for querying the 【fetch Document, fetch posting list】. And, the system gets slow down. The user can use the iostat to view the IO information of the system or search for "io_stats" in the information returned by GET _nodes/state. If the iops remains high, the io of the system keeps high, due to the possible cause that the pagecache is small.

Disable the privilege verification

Sometimes the original ES service of the business system is not subject to privilege verification. However, the ES service on the cloud has passed the privilege verification. If you do not want to change the code when migrating the business system, you can disable the privilege verification for a smooth migration. The operation method is as follows:

PUT /_cluster/settings
{
    "persistent": {
        "simpleauth.enable":false
    }
}

Supported client types

At present, the products on the Baidu AI Cloud only support the http-based restful api, and do not support the tcp-based transport client api. The main reason for this setting is that the transport client is deeply bound to the cluster running version. During the cluster upgrading, you need to upgrade the frontend services as well.

Does the Es support the Spark and Hadoop to write or read data?

Yes, you need to download the es-hadoop package from the es official website and put it into the spark or hadoop to read and write the es with spark or Hadoop.

Several situations for JVM FULL GC

FullGC caused by Scroll

When some users use the scroll to do paging query or export data with scroll, they often set a long scroll timeout, e.g., 1 day. In this case, the es backend always saves the corresponding search context for this scroll, and each search context corresponds to the lucene's searcher. At this moment, the failure to release the searcher results in the lucene-merged files being not deleted. Some leafreader and fst remain in the JVM for a long time, resulting in FullGC with more and more search contexts. The user can use the following 2 APIs to view and clear these contexts.

GET /_nodes/stats/indices/search
DELETE /_search/scroll/_all

FullGC caused by query

The user sets a long from+size of the result set when querying, e.g., it is due to size=Integer.MAX_VALUE. At present, the ES opens a priority queue according to the set from+size. When the concurrency is large, the memory assignment gets failed. So, FULL GC or even OOM results, due to many large queues.

FullGC caused by aggregation

If there are many different values when the user executes a similar terms agg, generate many buckets, such as tens of millions of buckets. These buckets also exist in memory, causing the fullgc.

How to improve import performance?

Reduce the number of replicas and extend the refresh interval

PUT /index_name/_settings
{
      "index.number_of_replicas": 0,
         "index.refresh_interval": "10s"
}

The multi-replica mechanism of the ES sends the original json document to multiple replicas during the writing. And, it performs segmentation, index building, and other operations on multiple replicas, respectively. For the import is a CPU intensive operation, changing the number of replicas to 0 can reduce the CPU utilization. After the import is complete, restore the number of replicas, thus directly replicating the physical files rapidly.

The refresh interval aims to control how long the data in the memory gets brushed out of the segment. The es merges the brushed out segments. If the merge fails, the es can prevent writing. So, you can extend the refresh interval, or increase the brushed segment, reduce the merging frequency, and improve the import performance.

Increase the import speed limit for index

PUT /_cluster/settings
{
      "persistent" : {
             "indices.store.throttle.max_bytes_per_sec" : "200mb"
      }
}

The ES has a speed limit when writing data to prevent occupying too much disk IO. If the cluster import is large and there are a small number of queries, you can increase this speed limit.

Cluster configuration problem

Oracle JDK version 1.8 or above is required.

Set the maximum number of files:

  Modify /etc/security/limits.conf: 

  *　　soft　　nofile　　65536

  *　　hard　　nofile　　65536

Increase the mmap counts:

  Modify /etc/sysctl.conf:

  vm.max_map_count=262144

  Then execute: sysctl -p

Cluster restart problem

In some cases (such as modify the configuration file), you need to restart the cluster. You can restart the clusters one-by-one or by restarting the whole clusters. Restarting the es may cause the redistribution of data. The following describes how to restart the services in these two cases.

Restart of the whole cluster

Set the whole cluster to the read-only state

   PUT /_cluster/settings
   {
       "persistent": {
           "cluster.blocks.read_only":true
       }
   }

Flush all data in the node memory to the disk.

   POST /_flush/synced

Restart all es nodes
After the cluster is green, modify the cluster to the writable state.

   PUT /_cluster/settings
   {
       "persistent": {
           "cluster.blocks.read_only":false
       }
   }

Restart one by one

This method does not result in the restart service being interrupted and is applicable to online service.

The shard assignment is prohibited, so that when we disable one Es service, without the shard redistributed.

   PUT /_cluster/settings
   {
              "transient" : {
              "cluster.routing.allocation.enable" : "none"
             }
   }

Disable a single node, modify the configuration or replace the jar package, and start the node.

Start the shard redistribution.

 PUT /_cluster/settings
 {
       "transient" : {
           "cluster.routing.allocation.enable" : "all"
       }
 }

After the cluster becomes green, repeat Steps 1-3 until the configuration of all nodes gets modified.

Disable _field_names

The _field_names field is an internal metadata field of the Elasticsearch. This field indexes the name of each field in the document (except the field name with a field value of null), which is mainly to execute the Elasticsearch exists query. The Elasticsearch only performs indexing for this field and does not store this field. After version 6.3, this field only indexes those fields with doc_value and norms not disabled. Recommend you to disable this field without using the exists query for your business. Thus, it is possible to slightly reduce the storage space occupied by the inverted index list and appropriately enhance the utilization of pagecache.

 PUT index
 {
   "mappings": {
     "_doc": {
       "_field_names": {
         "enabled": false
       }
     }
   }
 }

Several cases where the data import gets slower and slower

Update contained in the imported data

In fact, the update of the Es is to read the data, and then write it after being changed. When writing more and more data, it becomes slow to read and write the data.

Control the number of indexes on nodes

By default, the ES cluster tries to balance the number of indexes and shards on all nodes. However, it may cause too many shards of an index concentrated on a few nodes. In this case, you can set the number of index shards stored in each node in the cluster:

PUT {index name}/_settings
{
  "index.routing.allocation.total_shards_per_node": 10
}

Control the number of shards and replicas of the index

Without modifying the parameters, an index has 5 shards and 2 replicas (including the main shard), which can be controlled by modifying the parameters of the index:

PUT /{index name}
{
   "settings": {
       "number_of_shards": 20,
       "number_of_replicas": 2
   }
}

number_of_shards: Means the number of shards, which cannot be modified after creating the index. You must specify it during creation.
number_of_replicas: The number of replicas, excluding the main shard.

The recovery speed may be slow when the cluster is in a recovery state

You can view the recovering index shards through

GET /_recovery?active_only=true

By default, 4 index shards get recovered simultaneously recovered on one node, including 2 for source nodes and 2 for target nodes. When the number of shards is huge, the recovery may be slow. By default, the maximum speed limit is 40 mb during recovery. In this case, you can set cluster parameters:

curl -XPUT "host:port/_cluster/settings" -d'
{
    "transient": {
        "cluster.routing.allocation.node_concurrent_recoveries": 8,
        "indices.recovery.max_bytes_per_sec": "120mb"
    }
}'

indices.recovery.max_bytes_per_sec： Means the maximum bandwidth for node recovery. This setting should be smaller than the current network bandwidth to avoid affecting other network services.
cluster.routing.allocation.node_concurrent_recoveries： Means the maximum number of concurrent recoveries when the node is a source node or target node.

How to recover after the disk is full

When the disk usage rate of the DataNode of the Es reaches a certain threshold (95%), the Es can prevent further writing, and add a block to all Indexes. When the user continues to write, they can receive the following error.

cluster_block_exception [FORBIDDEN/12/index read-only / allow delete (api)];

In this case, the user must release the disk space to solve the problem. There are two ways to release the disk space:

Delete the unused Index
Reduce the number of replicas of the Index. For example, reduce the number of replicas from 2 to 1.

After the disk space is released, the Es cannot automatically remove the block. In this case, the user still cannot write in data, but need to execute the following command:

curl -XPUT "host:port/_all/_settings" -d '
{
    "index.blocks.read_only_allow_delete": null
}'

FAQ Overview

FAQ About Spark Accessing Es