Access the Cluster Resources

Last Updated：2020-07-20

Evaluate the disk space storage capacity

For the disk space storage capacity, mainly consider the following aspects:

Index expansion: When writing data in “ES”, the “ES” stores the original data by default and establishes indexes on each field. Thus, the data volume in the “ES” expands, compared to the original data. However, the data stored in the “ES” is compressed, which can be offset by the index expansion from our observation. Generally, the index expansion does not affect capacity. This expansion rate varies with the configuration parameters of the “ES”. The most relevant configuration parameters are “_all” and “_source”. At present, the “_all” is disabled by default, and the “_source” is enabled. If required to query the original document content, the user can also disable the “_source” to reduce the disk used size.
“ES” internal consumption: You need to perform regular merge between segments in the “ES”, which reduces the number of segments and releases the space occupied by the deleted documents. The original data remains reserved during the merge, and the “ES” writes a piece of new data. In this case, the data volume doubles, but the “ES” does not fuse all segments, but only selects part of segments for the merge.
Emergency consumption: Sometimes, the “ES” redistributes data to other nodes after a node is down. In this case, each node needs a particular space to receive data. The space reserved for the merge is also available for emergency consumption, so you can calculate it by combining points 2 and 3 above. The actual disk usage rate is 70%, which is relatively high. If above 70%, an alarm take place.
Number of replicas.

Considering the four points above, obtain the disk space = original data volume1.0 number of replicas/0.7

Except for testing, suggest that the number of replicas is at least 2. In this case, disk space = original data volume * 2.8. In most cases, the user can estimate the number of nodes by using the disk space as an index to combine the selected package.

Each “shard” maintains a separate “lucene” index. The index between “shards” does not merge. Lots of information on the “lucene” index gets stored in the “jvm” memory, such as “FST”, “DocValue” index metadata information, and “FieldInfos” information. This information occupies a huge “JVM” memory. Considering the pointer compression feature of the “JVM”, the “ES” suggests that the “JVM” memory should not exceed 30 GB. Subject to this restriction, suggest that the number of shards on a node should be below 500 and no more than 1,000.
Big “shards” cause very slow migration of the “shard” between nodes in abnormal conditions. In case of a shutdown, the recovery lasts for a long time. When a “shard” is big, the query delay is much longer. The size of a “shard” should be controlled between 10 GB and 30 GB to achieve a sound query performance. Do not exceed 100 GB.

Monitoring and Alarm

Automatic Renewal

百度智能云

Elasticsearch

Access the Cluster Resources

Evaluate the disk space storage capacity

Elasticsearch

Access the Cluster Resources

Evaluate the disk space storage capacity

Shard-related assessment