Backup & Restore Elasticsearch data on Kubernetes cluster

This whole guide is based on elasticsearch version 2.3.1.

Overview

This story provides a guideline on how to backup and restore data of single node elasticsearch:2.3.1 instance deployed on the Kubernetes cluster.

Description

There are multiple ways to backup and restore elasticsearch data, the list is given below:

1- Elasticsearch snapshot and restore API.

2- Elasticsearch reindexing API.

3- Elasticsearch data folder restore and backup method.

This guide is based on the 3rd method because of the following reasons:

  • When elasticsearch:2.3.1 docker container runs it loads the data from /usr/share/elasticsearch/data/ directory. This method cannot be used here because once the elasticsearch server starts it doesn’t detect any change in /etc/elasticsearch/elasticsearch.yml file and it is the requirement of this method to specify the snapshot repository like the example given below:
repo.path: [“/datasource”] 

The directory specified in the above configuration will be used to store the snapshots and later snapshots can be used to recover the data.

  • Method 2 is useful only if we have multi-node elasticsearch deployment. In which if one node goes down we can backup the data from other nodes.

Working

Elasticsearch manifest provided in this repository uses two containers their details are given below:

1. Init Container

Init container’s job is to restore the data from the AWS S3 bucket. Untar it and copy the data in the shared volume /usr/share/elasticsearch/data between the containers.

The reason to use an init container for data restoring is that it has a specific purpose when fulfilled it must stop.

It is recommended to only use the init container only when the backup (AWS bucket) and current data is synced.

2. Sidecar Container

Sidecar container’s job is to back up the data available in /usr/share/elasticsearch/data directory. It runs continuously side-by-side with elasticsearch containers and backs up the data after the interval specified by the user.

During the backup process, it compresses (in yyyy.mm.dd-HH-MM-SS.tar.gz format) the data and then push it to AWS S3 bucket.

Final Thoughts

Do let me know if you find any issue is this guideline. Thank you

DevSecOps Engineer https://irtizaali.com/

DevSecOps Engineer https://irtizaali.com/