Backup Recovery

Last Updated：2020-11-17

This document introduces how to use Baidu Elasticsearch (BES) for snapshot creation and restoration, and how to snapshot your data into Baidu Object Storage (BOS). The snapshot and backup are substantially identical in their concept. However, the snapshot gives more focus on the point-in-time, i.e., a certain moment.

Create a snapshot

Creating a snapshot mainly includes two steps below:

Create a BOS-based warehouse.
Create a data snapshot.

Create a BOS-based warehouse

Before creating the warehouse, you need to create the corresponding "bucket" in your BOS and ensure that the user has the corresponding privileges. Here, you can identify the user by the "access_key" and "secrect_key" of Baidu AI Cloud. You can select the storage type of the "bucket" as required. A recommended choice for you is the standard storage.
You should ensure that the bucket corresponding to the BOS is in the same region as your Elasticsearch cluster

The "Es_repo" sets up the warehouse name for you. You can select another name according to your business requirements.

PUT /_snapshot/es_repo
{
    "type": "bos",
    "settings": {
        "access_key": "your access_key",
        "secret_key": "your secret_key",
        "endpoint": "s3.bj.bcebos.com",
        "bucket": "es-repo",
        "base_path": ""
    }
}

Meaning of relevant parameters:

Parameters	Features
types	Means the type of warehouse. Enter `bos` here
access_key	Means the "access_key" of Baidu AI Cloud. You can see it in the Baidu AI Cloud "console"
secret_key	Means the "secret_key" of Baidu AI Cloud. You can see it in the "console" of Baidu AI Cloud
endpoint	Means the service domain of each "region" to which the BOS corresponds
bucket	BOS bucket must ensure that the corresponding user identity has "bucket" read and write privileges
base_path	Means the start position of the warehouse, which is the root directory by default
chunk_size	Means that a large file is chunked into multiple parts. The file size is 1 GB by default, the minimum size is 5 MB, and the maximum 5 TB
max_snapshot_bytes_per_sec	Means the maximum snapshotting speed per node, which is `40mb/s` by default
max_restore_bytes_per_sec	Means the maximum restoration speed per node,which is `40mb/s` by default

BOS corresponds to the service domain name of each region.

Region	Access Endpoint
BJ	s3.bj.bcebos.com
GZ	s3.gz.bcebos.com
SU	s3.su.bcebos.com

If you need to modify the corresponding parameters after creating the warehouse, use the "POST" method. If the data to be uploaded is very big, limit the size of the chunks in the "snapshot" process. If it exceeds this size, the data gets uploaded to BOS in chunks.

POST /_snapshot/es_repo
{
    "type": "bos",
    "settings": {
        "access_key": "your access_key",
        "secret_key": "your secret_key",
        "endpoint": "s3.bj.bcebos.com",
        "bucket": "es-repo",
        "chunk_size": "1g",
        "base_path": ""
    }
}

List All Warehouse Information

GET /_snapshot

View the specific warehouse information

GET /_snapshot/{warehouse name you set}

Snapshot

One warehouse can contain multiple snapshots. Each snapshot is a set of a series of indexes, or maybe a single index, a part of indexes, and all indexes. You can specify the index of the snapshot required when creating a snapshot. If you snapshot all open indexes in the cluster without specifying it, give the snapshot a unique name. The name should have a certain meaning. For example, snapshot_2018_07_01 means a snapshot created on July 1, 2018, so that you can recover it according to your data requirements during restoration.

Initiate a snapshot:

PUT /_snapshot/es_repo/snapshot_2018_07_01?wait_for_completion=true

This request snapshots all open index snapshots in the cluster to the es_repo warehouse and name this snapshot to snapshot_2018_07_01. This request gets returned immediately after the snapshot initialization, and the snapshot process runs at the backend of your cluster.

Thewait_for_completion parameter is used to tell whether the request gets returned after the snapshot initialization or after the snapshot is complete. It is false by default and returned after snapshot initialization.

When the "snapshot" gets initialized, information about all previous "snapshots" is loaded into the memory. It indicates that even if "wait_for_completion" is set to "false", it may take a few seconds or even a few minutes when there is a big warehouse.

By default, all open and started indexes in the cluster have "snapshot" created. In the snapshot request, you can specify those indexes to be snapshotted:

PUT /_snapshot/es_repo/snapshot_2018_07_01
 {
   "indices": "index1,index2",
   "ignore_unavailable": true,
   "include_global_state": false
 }

Parameters	Features
indices	The "index" list to be included in "snapshot", `multi index syntax` supported.
ignore_unavailable	When it is set to "true", ignore the "index" that does not exist in "indices". It is not set by default. If the "index" does not exist, an error occurs.
include_global_state	When setting it "false", avoid snapshotting the `cluster global state`.

cluster global state means the cluster global metadata information maintained by BES. For more information, please see: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-state.html

"Snapshot" has the following characteristics:

"Snapshot" is incremental, and one "snapshot" indicates the point-in-time view of the index ("Records" added after "snapshot" creation is not visible in this "snapshot").
Except for creating the snapshot of all primary shards of this index, you can snapshot the global cluster metadata (including persistent cluster setting and templates.)
For a cluster, only one "snapshot" can run at any time. Executing all snapshot requests is impossible until the last snapshot is complete. Otherwise, rejecting all the snapshots may result.

Query a snapshot

With a "snapshot" created, you can obtain the information about the "snapshot" by initiating a GET request for the warehouse and the snapshot name.

Basic format: GET /_snapshot/{your_repo_name}/{your_snapshot_name}, as follows

GET /_snapshot/es_repo/snapshot_2018_07_01

The response returned contains all information related to the snapshot:

{
   "snapshots": [
      {
         "snapshot": "snapshot_2018_07_19",
         "uuid": "TWKo55e7TSy1Sq4WLxMVrQ",
         "version_id": 5050099,
         "version": "5.5.0",
         "indices": [
            "snapindex"
         ],
         "state": "SUCCESS",
         "start_time": "2018-07-19T10:53:17.543Z",
         "start_time_in_millis": 1531997597543,
         "end_time": "2018-07-19T10:53:21.795Z",
         "end_time_in_millis": 1531997601795,
         "duration_in_millis": 4252,
         "failures": [],
         "shards": {
            "total": 1,
            "failed": 0,
            "successful": 1
         }
      }
   ]
}

You can obtain the information of multiple snapshots, and the GET request supports the wildcard matching of multiple snapshot information. In this case, the meaning of the snapshot name has the following effect:

GET /_snapshot/es_repo/snapshot_order_＊

You can use the parameter _all to obtain complete list information of all snapshots in a warehouse:

GET /_snapshot/es_repo/_all

Stop or delete a snapshot

The BES service provided by Baidu AI Cloud does not have a separate API for stopping the snapshot. Stopping or deleting a snapshot is one semantics. If found that the "snapshot" execution is wrong or takes a very long time, you can stop the "snapshot" running at the backend by deleting it:

DELETE /_snapshot/es_repo/snapshot_2018_07_01

Delete the snapshot from the warehouse:

DELETE /_snapshot/es_repo/snapshot_2018_07_01

You can also delete a warehouse directly:

DELETE /_snapshot/es_repo

Notice:When deleting a snapshot or warehouse, the ES simply removes the cluster's location reference to the warehouse or snapshot. The user needs to process real physical files and others by themselves. When confirming that all snapshots are no longer applicable, you can delete the warehouse metadata in the Elasticsearch, and then log in to Baidu AI Cloud BOS console to manually delete the warehouse'. Additionally,' Don't manually delete any snapshot file in the BOS warehouses, manually deleting a snapshot file results in the snapshot being unavailable and irrecoverable loss when recovered.

View snapshot progress

You can view the progress information of a snapshot through the status interface.

GET /_snapshot/es_repo/snapshot_2018_07_19/_status

The following is the detailed statistical information returned by the status interface:

{
   "snapshots": [
      {
         "snapshot": "snapshot_2018_07_19",
         "repository": "es_repo",
         "uuid": "TWKo55e7TSy1Sq4WLxMVrQ",
         "state": "SUCCESS",  ..................  [A]
         "shards_stats": {
            "initializing": 0,
            "started": 0,
            "finalizing": 0,
            "done": 1,
            "failed": 0,
            "total": 1
         },
         "stats": {
            "number_of_files": 16,
            "processed_files": 16,
            "total_size_in_bytes": 18639,
            "processed_size_in_bytes": 18639,
            "start_time_in_millis": 1531997598051,
            "time_in_millis": 2782
         },
         "indices": {
            "snapindex": {
               "shards_stats": {
                  "initializing": 0,
                  "started": 0,
                  "finalizing": 0,
                  "done": 1,  ..................... [B]
                  "failed": 0,
                  "total": 1
               },
               "stats": {
                  "number_of_files": 16,
                  "processed_files": 16,
                  "total_size_in_bytes": 18639,
                  "processed_size_in_bytes": 18639,
                  "start_time_in_millis": 1531997598051,
                  "time_in_millis": 2782
               },
               "shards": {
                  "0": {
                     "stage": "DONE",............... [C]
                     "stats": {
                        "number_of_files": 16,
                        "processed_files": 16,
                        "total_size_in_bytes": 18639,
                        "processed_size_in_bytes": 18639,
                        "start_time_in_millis": 1531997598051,
                        "time_in_millis": 2782
                     }
                  }
               }
            }
         }
      }
   ]
}

It contains all information of the snapshot, such as the start time, total size, total number of files, and total number of files that have been processed. The current state of all "indexes" snapshotted and state of all "shards" under the "index" are also recorded detailedly.

[A] indicates that this snapshot has been complete, displaying the SUCCESS state. And, the running snapshot displays IN_PROGRESS.
[B] indicates that the snapshots of all shards of this index have been complete.
[C] indicates that the snapshots of shards corresponding to this index have been complete.

Different state values indicate different meanings:

Status Value	Meaning
INIT	The snapshot does not get started but is initializing
STARTED	The snapshot is copying the "index" file
FINALIZE	The metadata of the snapshot is being written into a remote warehouse
DONE	The snapshot is complete successfully
FAILURE	The snapshot gets failed. For the possible causes, you can see them in the `status` API

Restorer a snapshot

You can restore a snapshot through the following command:

POST /_snapshot/es_repo/snapshot_2018_07_19/_restore

By default, all indexes in the specified snapshot get restored. You can specify the index and global cluster state by adding indices and include_global_state to the request body:

POST /_snapshot/es_repo/snapshot_2018_07_19/_restore
{
  "indices": "snapindex",
  "ignore_unavailable": true,
  "include_global_state": true,
  "rename_pattern": "snap(.+)",
  "rename_replacement": "restore$1"
}

You can use the rename_pattern and rename_replacement to rename the “index”. Most “indexes” can be reset as follows:

POST /_snapshot/repo/snapshot_wyf_2018_01_29/_restore
{
  "indices": "wyf",
  "index_settings": {
    “index.number_of_replicas": 0
  },
  "ignore_index_settings": [
    "index.refresh_interval"
  ]
}

You should pay attention to that:You cannot perform some settings during restoration, such as index.number_of_shards. However, you can restore it to another cluster. The version of the new cluster must be the same as or larger than the cluster snapshotted (only 1 "major version" larger is allowed). For example, you can restore a snapshot of 1.x to 2.x, but not to 5.x.

Like the snapshot request, the restore request gets returned immediately after checking the snapshot information and verifying the index information in the snapshot. The restoration gets done at the backend of the cluster. You can add the parameter wait_for_completion at the end of the request to complete the restoration before the request is blocked:

POST /_snapshot/es_repo/snapshot_2018_07_19/_restore?wait_for_completion=true

Monitor the snapshot restoration

Restoring the data from the BOS warehouse utilizes the internal recovery mechanism of the Elasticsearch. From the principle of internal implementation, restoring the data from the warehouse is entirely equivalent to restoring the data from one node to another. The internal restoration of the Elasticsearch includes existing_store restoration, peer restoration, and snapshot restoration.

You can view the restoration progress through the recovery API:

GET /{index}/_recovery

GET snapindex/_recovery

This interface returns the following responses:

{
   "snapindex": {
      "shards": [
         {
            "id": 0,
            "type": "SNAPSHOT", ........................ [A]
            "stage": "DONE", ........................... [B]
            "primary": true,
            "start_time_in_millis": 1532065843418,
            "stop_time_in_millis": 1532065845773,
            "total_time_in_millis": 2354,
            "source": { ................................ [C]
               "repository": "es_repo",
               "snapshot": "snapshot_2018_07_19",
               "version": "5.5.0",
               "index": "snapindex"
            },
            "target": {
               "id": "8wR8Z38USImEeSO0SZ1_hA",
               "host": "192.168.16.5",
               "transport_address": "192.168.16.5:9300",
               "ip": "192.168.16.5",
               "name": "8wR8Z38"
            },
            "index": {
               "size": {
                  "total_in_bytes": 18668,
                  "reused_in_bytes": 0,
                  "recovered_in_bytes": 18668,
                  "percent": "100.0%" .................. [D]
               },
               "files": {
                  "total": 16,
                  "reused": 0,
                  "recovered": 16,
                  "percent": "100.0%"
               },
               "total_time_in_millis": 2148,
               "source_throttle_time_in_millis": 0,
               "target_throttle_time_in_millis": 0
            },
            "translog": {
               "recovered": 0,
               "total": 0,
               "percent": "100.0%",
               "total_on_start": 0,
               "total_time_in_millis": 158
            },
            "verify_index": {
               "check_index_time_in_millis": 0,
               "total_time_in_millis": 0
            }
         }
      ]
   }
}

[A] type indicates restoration from the remote warehouse snapshot.
[B] stage field indicates this restoration is complete.
[C] source field indicates this restoration is complete.
[D] percent field indicates the completion percentage of the restoration.

Cancel the running restoration

You can cancel the "index" restoration by deleting the "index" being restored:

DELETE /snapindex

Notes for snapshot

One cluster can only have one snapshot running at a time.
You cannot create a snapshot at the same time of deleting a snapshot.

Notes for restoration

The index of a restore may be nonexistent. Otherwise, it must be in the closed state.
The index of a restore overwrites the previous file. Even if the file is identical, delete the old file, and then create a new file.
The restore process skips the translog recovery process to create a new translog.
If the restoration target is not the snapshotted ES cluster but a new cluster, you need to create a "repo" in the new cluster and set the "read_only" parameter to "true".

Identity and Access Management

Configure the IK-Analyzer-Thesaurus