百度智能云

All Product Document

          Elasticsearch

          Backup Recovery

          This document introduces how to use Baidu Elasticsearch (BES) for snapshot creation and restoration, and how to snapshot your data into Baidu Object Storage (BOS). The snapshot and backup are substantially identical in their concept. However, the snapshot gives more focus on the point-in-time, i.e., a certain moment.

          Create a snapshot

          Creating a snapshot mainly includes two steps below:

          • Create a BOS-based warehouse.
          • Create a data snapshot.

          Create a BOS-based warehouse

          • Before creating the warehouse, you need to create the corresponding "bucket" in your BOS and ensure that the user has the corresponding privileges. Here, you can identify the user by the "access_key" and "secrect_key" of Baidu AI Cloud. You can select the storage type of the "bucket" as required. A recommended choice for you is the standard storage.
          • You should ensure that the bucket corresponding to the BOS is in the same region as your Elasticsearch cluster
          • The "Es_repo" sets up the warehouse name for you. You can select another name according to your business requirements.

            PUT /_snapshot/es_repo
            {
                "type": "bos",
                "settings": {
                    "access_key": "your access_key",
                    "secret_key": "your secret_key",
                    "endpoint": "s3.bj.bcebos.com",
                    "bucket": "es-repo",
                    "base_path": ""
                }
            }

            Meaning of relevant parameters:

          Parameters Features
          types Means the type of warehouse. Enter bos here
          access_key Means the "access_key" of Baidu AI Cloud. You can see it in the Baidu AI Cloud "console"
          secret_key Means the "secret_key" of Baidu AI Cloud. You can see it in the "console" of Baidu AI Cloud
          endpoint Means the service domain of each "region" to which the BOS corresponds
          bucket BOS bucket must ensure that the corresponding user identity has "bucket" read and write privileges
          base_path Means the start position of the warehouse, which is the root directory by default
          chunk_size Means that a large file is chunked into multiple parts. The file size is 1 GB by default, the minimum size is 5 MB, and the maximum 5 TB
          max_snapshot_bytes_per_sec Means the maximum snapshotting speed per node, which is 40mb/s by default
          max_restore_bytes_per_sec Means the maximum restoration speed per node,which is 40mb/s by default

          BOS corresponds to the service domain name of each region.

          Region Access Endpoint
          BJ s3.bj.bcebos.com
          GZ s3.gz.bcebos.com
          SU s3.su.bcebos.com

          If you need to modify the corresponding parameters after creating the warehouse, use the "POST" method. If the data to be uploaded is very big, limit the size of the chunks in the "snapshot" process. If it exceeds this size, the data gets uploaded to BOS in chunks.

          POST /_snapshot/es_repo
          {
              "type": "bos",
              "settings": {
                  "access_key": "your access_key",
                  "secret_key": "your secret_key",
                  "endpoint": "s3.bj.bcebos.com",
                  "bucket": "es-repo",
                  "chunk_size": "1g",
                  "base_path": ""
              }
          }

          List All Warehouse Information

          GET /_snapshot

          View the specific warehouse information

          GET /_snapshot/{warehouse name you set}

          Snapshot

          One warehouse can contain multiple snapshots. Each snapshot is a set of a series of indexes, or maybe a single index, a part of indexes, and all indexes. You can specify the index of the snapshot required when creating a snapshot. If you snapshot all open indexes in the cluster without specifying it, give the snapshot a unique name. The name should have a certain meaning. For example, snapshot_2018_07_01 means a snapshot created on July 1, 2018, so that you can recover it according to your data requirements during restoration.

          Initiate a snapshot:

          PUT /_snapshot/es_repo/snapshot_2018_07_01?wait_for_completion=true

          This request snapshots all open index snapshots in the cluster to the es_repo warehouse and name this snapshot to snapshot_2018_07_01. This request gets returned immediately after the snapshot initialization, and the snapshot process runs at the backend of your cluster.

          Thewait_for_completion parameter is used to tell whether the request gets returned after the snapshot initialization or after the snapshot is complete. It is false by default and returned after snapshot initialization.

          When the "snapshot" gets initialized, information about all previous "snapshots" is loaded into the memory. It indicates that even if "wait_for_completion" is set to "false", it may take a few seconds or even a few minutes when there is a big warehouse.

          By default, all open and started indexes in the cluster have "snapshot" created. In the snapshot request, you can specify those indexes to be snapshotted:

          PUT /_snapshot/es_repo/snapshot_2018_07_01
           {
             "indices": "index1,index2",
             "ignore_unavailable": true,
             "include_global_state": false
           }
          Parameters Features
          indices The "index" list to be included in "snapshot", multi index syntax supported.
          ignore_unavailable When it is set to "true", ignore the "index" that does not exist in "indices". It is not set by default. If the "index" does not exist, an error occurs.
          include_global_state When setting it "false", avoid snapshotting the cluster global state.

          cluster global state means the cluster global metadata information maintained by BES. For more information, please see: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-state.html

          "Snapshot" has the following characteristics:

          • "Snapshot" is incremental, and one "snapshot" indicates the point-in-time view of the index ("Records" added after "snapshot" creation is not visible in this "snapshot").
          • Except for creating the snapshot of all primary shards of this index, you can snapshot the global cluster metadata (including persistent cluster setting and templates.)
          • For a cluster, only one "snapshot" can run at any time. Executing all snapshot requests is impossible until the last snapshot is complete. Otherwise, rejecting all the snapshots may result.

          Query a snapshot

          With a "snapshot" created, you can obtain the information about the "snapshot" by initiating a GET request for the warehouse and the snapshot name.

          Basic format: GET /_snapshot/{your_repo_name}/{your_snapshot_name}, as follows

          GET /_snapshot/es_repo/snapshot_2018_07_01

          The response returned contains all information related to the snapshot:

          {
             "snapshots": [
                {
                   "snapshot": "snapshot_2018_07_19",
                   "uuid": "TWKo55e7TSy1Sq4WLxMVrQ",
                   "version_id": 5050099,
                   "version": "5.5.0",
                   "indices": [
                      "snapindex"
                   ],
                   "state": "SUCCESS",
                   "start_time": "2018-07-19T10:53:17.543Z",
                   "start_time_in_millis": 1531997597543,
                   "end_time": "2018-07-19T10:53:21.795Z",
                   "end_time_in_millis": 1531997601795,
                   "duration_in_millis": 4252,
                   "failures": [],
                   "shards": {
                      "total": 1,
                      "failed": 0,
                      "successful": 1
                   }
                }
             ]
          }

          You can obtain the information of multiple snapshots, and the GET request supports the wildcard matching of multiple snapshot information. In this case, the meaning of the snapshot name has the following effect:

          GET /_snapshot/es_repo/snapshot_order_*

          You can use the parameter _all to obtain complete list information of all snapshots in a warehouse:

          GET /_snapshot/es_repo/_all

          Stop or delete a snapshot

          The BES service provided by Baidu AI Cloud does not have a separate API for stopping the snapshot. Stopping or deleting a snapshot is one semantics. If found that the "snapshot" execution is wrong or takes a very long time, you can stop the "snapshot" running at the backend by deleting it:

          DELETE /_snapshot/es_repo/snapshot_2018_07_01

          Delete the snapshot from the warehouse:

          DELETE /_snapshot/es_repo/snapshot_2018_07_01

          You can also delete a warehouse directly:

          DELETE /_snapshot/es_repo

          Notice:When deleting a snapshot or warehouse, the ES simply removes the cluster's location reference to the warehouse or snapshot. The user needs to process real physical files and others by themselves. When confirming that all snapshots are no longer applicable, you can delete the warehouse metadata in the Elasticsearch, and then log in to Baidu AI Cloud BOS console to manually delete the warehouse'. Additionally,' Don't manually delete any snapshot file in the BOS warehouses, manually deleting a snapshot file results in the snapshot being unavailable and irrecoverable loss when recovered.

          View snapshot progress

          You can view the progress information of a snapshot through the status interface.

          GET /_snapshot/es_repo/snapshot_2018_07_19/_status

          The following is the detailed statistical information returned by the status interface:

          {
             "snapshots": [
                {
                   "snapshot": "snapshot_2018_07_19",
                   "repository": "es_repo",
                   "uuid": "TWKo55e7TSy1Sq4WLxMVrQ",
                   "state": "SUCCESS",  ..................  [A]
                   "shards_stats": {
                      "initializing": 0,
                      "started": 0,
                      "finalizing": 0,
                      "done": 1,
                      "failed": 0,
                      "total": 1
                   },
                   "stats": {
                      "number_of_files": 16,
                      "processed_files": 16,
                      "total_size_in_bytes": 18639,
                      "processed_size_in_bytes": 18639,
                      "start_time_in_millis": 1531997598051,
                      "time_in_millis": 2782
                   },
                   "indices": {
                      "snapindex": {
                         "shards_stats": {
                            "initializing": 0,
                            "started": 0,
                            "finalizing": 0,
                            "done": 1,  ..................... [B]
                            "failed": 0,
                            "total": 1
                         },
                         "stats": {
                            "number_of_files": 16,
                            "processed_files": 16,
                            "total_size_in_bytes": 18639,
                            "processed_size_in_bytes": 18639,
                            "start_time_in_millis": 1531997598051,
                            "time_in_millis": 2782
                         },
                         "shards": {
                            "0": {
                               "stage": "DONE",............... [C]
                               "stats": {
                                  "number_of_files": 16,
                                  "processed_files": 16,
                                  "total_size_in_bytes": 18639,
                                  "processed_size_in_bytes": 18639,
                                  "start_time_in_millis": 1531997598051,
                                  "time_in_millis": 2782
                               }
                            }
                         }
                      }
                   }
                }
             ]
          }

          It contains all information of the snapshot, such as the start time, total size, total number of files, and total number of files that have been processed. The current state of all "indexes" snapshotted and state of all "shards" under the "index" are also recorded detailedly.

          • [A] indicates that this snapshot has been complete, displaying the SUCCESS state. And, the running snapshot displays IN_PROGRESS.
          • [B] indicates that the snapshots of all shards of this index have been complete.
          • [C] indicates that the snapshots of shards corresponding to this index have been complete.

          Different state values indicate different meanings:

          Status Value Meaning
          INIT The snapshot does not get started but is initializing
          STARTED The snapshot is copying the "index" file
          FINALIZE The metadata of the snapshot is being written into a remote warehouse
          DONE The snapshot is complete successfully
          FAILURE The snapshot gets failed. For the possible causes, you can see them in the status API

          Restorer a snapshot

          You can restore a snapshot through the following command:

          POST /_snapshot/es_repo/snapshot_2018_07_19/_restore

          By default, all indexes in the specified snapshot get restored. You can specify the index and global cluster state by adding indices and include_global_state to the request body:

          POST /_snapshot/es_repo/snapshot_2018_07_19/_restore
          {
            "indices": "snapindex",
            "ignore_unavailable": true,
            "include_global_state": true,
            "rename_pattern": "snap(.+)",
            "rename_replacement": "restore$1"
          }

          You can use the rename_pattern and rename_replacement to rename the “index”. Most “indexes” can be reset as follows:

          POST /_snapshot/repo/snapshot_wyf_2018_01_29/_restore
          {
            "indices": "wyf",
            "index_settings": {
              “index.number_of_replicas": 0
            },
            "ignore_index_settings": [
              "index.refresh_interval"
            ]
          }

          You should pay attention to that:You cannot perform some settings during restoration, such as index.number_of_shards. However, you can restore it to another cluster. The version of the new cluster must be the same as or larger than the cluster snapshotted (only 1 "major version" larger is allowed). For example, you can restore a snapshot of 1.x to 2.x, but not to 5.x.

          Like the snapshot request, the restore request gets returned immediately after checking the snapshot information and verifying the index information in the snapshot. The restoration gets done at the backend of the cluster. You can add the parameter wait_for_completion at the end of the request to complete the restoration before the request is blocked:

          POST /_snapshot/es_repo/snapshot_2018_07_19/_restore?wait_for_completion=true

          Monitor the snapshot restoration

          Restoring the data from the BOS warehouse utilizes the internal recovery mechanism of the Elasticsearch. From the principle of internal implementation, restoring the data from the warehouse is entirely equivalent to restoring the data from one node to another. The internal restoration of the Elasticsearch includes existing_store restoration, peer restoration, and snapshot restoration.

          You can view the restoration progress through the recovery API:

          GET /{index}/_recovery
          
          GET snapindex/_recovery

          This interface returns the following responses:

          {
             "snapindex": {
                "shards": [
                   {
                      "id": 0,
                      "type": "SNAPSHOT", ........................ [A]
                      "stage": "DONE", ........................... [B]
                      "primary": true,
                      "start_time_in_millis": 1532065843418,
                      "stop_time_in_millis": 1532065845773,
                      "total_time_in_millis": 2354,
                      "source": { ................................ [C]
                         "repository": "es_repo",
                         "snapshot": "snapshot_2018_07_19",
                         "version": "5.5.0",
                         "index": "snapindex"
                      },
                      "target": {
                         "id": "8wR8Z38USImEeSO0SZ1_hA",
                         "host": "192.168.16.5",
                         "transport_address": "192.168.16.5:9300",
                         "ip": "192.168.16.5",
                         "name": "8wR8Z38"
                      },
                      "index": {
                         "size": {
                            "total_in_bytes": 18668,
                            "reused_in_bytes": 0,
                            "recovered_in_bytes": 18668,
                            "percent": "100.0%" .................. [D]
                         },
                         "files": {
                            "total": 16,
                            "reused": 0,
                            "recovered": 16,
                            "percent": "100.0%"
                         },
                         "total_time_in_millis": 2148,
                         "source_throttle_time_in_millis": 0,
                         "target_throttle_time_in_millis": 0
                      },
                      "translog": {
                         "recovered": 0,
                         "total": 0,
                         "percent": "100.0%",
                         "total_on_start": 0,
                         "total_time_in_millis": 158
                      },
                      "verify_index": {
                         "check_index_time_in_millis": 0,
                         "total_time_in_millis": 0
                      }
                   }
                ]
             }
          }
          • [A] type indicates restoration from the remote warehouse snapshot.
          • [B] stage field indicates this restoration is complete.
          • [C] source field indicates this restoration is complete.
          • [D] percent field indicates the completion percentage of the restoration.

          Cancel the running restoration

          You can cancel the "index" restoration by deleting the "index" being restored:

          DELETE /snapindex

          Notes for snapshot

          • One cluster can only have one snapshot running at a time.
          • You cannot create a snapshot at the same time of deleting a snapshot.

          Notes for restoration

          • The index of a restore may be nonexistent. Otherwise, it must be in the closed state.
          • The index of a restore overwrites the previous file. Even if the file is identical, delete the old file, and then create a new file.
          • The restore process skips the translog recovery process to create a new translog.
          • If the restoration target is not the snapshotted ES cluster but a new cluster, you need to create a "repo" in the new cluster and set the "read_only" parameter to "true".
          Previous
          Identity and Access Management
          Next
          Configure the IK-Analyzer-Thesaurus