Basic Concept

Last Updated：2020-09-22

Index

The Elasticsearch allows you to store data in one or more indexes. The index is a set of documents with similar characteristics. Compared with the traditional relational database domain, the index is equivalent to a database or a data storage schema in SQL. You can identify the index by its name (which must be all lowercase characters). You can also create, search, update, and delete the document by referencing this name. You can create any number of indexes as needed in an Elasticsearch cluster.

Type

Type is a logical partition within the index, but its meaning depends on the user requirements. So, the user can define one or more types within an index. Generally, the type is the predefinition for documents with the same domain. For example, in the index, you can define a type for storing user data, a type for storing log data, and a type for storing comment data. Compared with the traditional relational database domain, the type is equivalent to a table. But the ES weakens the concept of type. There is only one type in an index in the future.

Document

The document is the atomic unit of the Lucene index and search. It is a container containing one or more domains and is expressed based on JSON format. A document consists of one or more domains, each with a name and one or more values. Domains with multiple values are usually called multiple-value domains. Each document can store different domain sets, but documents of the same type shall have some similarities.

Mapping

In the Elasticsearch, all documents must be subject to analysis before being stored. The user can determine how to divide the text into tokens, which tokens should be filtered out, and which texts need additional processing, and so on, according to their requirements. Also, the Elasticsearch provides additional features, such as sorting the contents in the domain as needed. In fact, the Elasticsearch can automatically determine the type of domain based on its value.

Cluster

The Elasticsearch cluster is a set of one or more nodes that store the entire data set and provide federated indexes and search capabilities across all nodes. The cluster formed by multiple nodes has redundancy capability. It can ensure the overall availability of services when one or more nodes fail. The cluster is identified by its unique name, and the default name is elasticsearch. The node determines the Elasticsearch cluster to join by its cluster name, and one node can only be subordinate to one cluster. Even if you do not consider the redundancy and other characteristics, the Elasticsearch cluster with only one node can also realize all storage and search features.

Node

The Elasticsearch host running a single instance is called a Node. It is a member of the cluster and can store data, join in cluster index and search operations. Similar to a cluster, a node is identified by its name, which defaults to a random Marvel character name generated at startup. The user can customize any name they want as needed. But, the name should be as recognizable as possible for the management purpose. The node determines the cluster to join by the Elasticsearch cluster name configured for it.

Shard

The shard mechanism of the Elasticsearch can store the data within an index in multiple nodes in a distributed manner. It divides an index into multiple underlying physical Lucene indexes to complete the split storage function of index data. Each physical Lucene index is called a shard. Each shard is internally a fully functional and independent index. Thus, it can be stored by any host in the cluster. When creating an index, the user can specify the number of shards, which is 5 by default. There are two types of shards, i.e., primary shard and replica shard. The primary shard is available for document storage. Under each new index, create 5 primary shards are automatically. You can define this number by configuration before index creation. However, with the index created, the number of the primary shards do not change any longer. The replica shard is a copy of the primary shard, which is used for redundant data and improving search performance. One replica shard is configured for each primary shard by default. However, you can also configure multiple replica shards. And the number of replica shards changes dynamically. The Elasticsearch automatically increases or decreases the number of replica shards as needed. The Elasticsearch cluster consists of multiple nodes. You can store all the shards on these nodes in a distributed manner.

Product Introduction

Key Characteristics