Backup & Restore Elasticsearch data on Kubernetes cluster

This whole guide is based on elasticsearch version 2.3.1.

Overview

Description

1- Elasticsearch snapshot and restore API.

2- Elasticsearch reindexing API.

3- Elasticsearch data folder restore and backup method.

This guide is based on the 3rd method because of the following reasons:

  • When elasticsearch:2.3.1 docker container runs it loads the data from /usr/share/elasticsearch/data/ directory. This method cannot be used here because once the elasticsearch server starts it doesn’t detect any change in /etc/elasticsearch/elasticsearch.yml file and it is the requirement of this method to specify the snapshot repository like the example given below:
repo.path: [“/datasource”] 

The directory specified in the above configuration will be used to store the snapshots and later snapshots can be used to recover the data.

  • Method 2 is useful only if we have multi-node elasticsearch deployment. In which if one node goes down we can backup the data from other nodes.

Working

1. Init Container

The reason to use an init container for data restoring is that it has a specific purpose when fulfilled it must stop.

It is recommended to only use the init container only when the backup (AWS bucket) and current data is synced.

2. Sidecar Container

During the backup process, it compresses (in yyyy.mm.dd-HH-MM-SS.tar.gz format) the data and then push it to AWS S3 bucket.

Final Thoughts

DevSecOps Engineer https://irtizaali.com/

DevSecOps Engineer https://irtizaali.com/