Skip to content

A simple Elasticsearch cleaner based on a Python script to clean up all indices and data streams from the Elasticsearch cluster which are older than the specified number of days.

License

Notifications You must be signed in to change notification settings

abdullahkhawer/simple-elasticsearch-cleaner

Repository files navigation

Simple Elasticsearch Cleaner

Introduction

A Simple Elasticsearch Cleaner based on a Python script to delete all indices and data streams from the Elasticsearch cluster which are older than the specified number of days. It is basically a curator but with the ability to delete only, removing both indices and their associated data streams.

Below you can find an example of the execution:

simple-elasticsearch-cleaner % python simple_elasticsearch_cleaner.py 30
Script Execution Started!
Calculating the date that was 30 day(s) ago...
Connecting to Elasticsearch...
Connected to Elasticsearch successfully.
Fetching the list of all the data streams...
Fetching the list of all the indices...
Finding and deleting all the indices older than 30 days '2024.07.14'...
Deleted index '__REDACTED__' which was created on '2024.07.12'.
Deleted index '__REDACTED__' which was created on '2024.07.13'.
Deleted index '__REDACTED__' which was created on '2024.07.12'.
Deleted index '__REDACTED__' which was created on '2024.07.13'.
Finding and deleting all the data streams older than 30 days '2024.07.14'...
Skipping data stream '__REDACTED__' as it is a managed data stream.
Skipping data stream '__REDACTED__' as it is a managed data stream.
Deleted data stream '__REDACTED__' which was older than '2024.07.14'.
Deleted data stream '__REDACTED__' which was older than '2024.07.14'.
Skipping data stream '__REDACTED__' as it is a managed data stream.
Closing the connection with Elasticsearch...
Connection with Elasticsearch closed successfully.
Script Execution Completed!

Note: In the actual execution, you will see the actual values instead of __REDACTED__ values.

Usage Notes

Following are the 4 ways to use it:

  • Python script.
  • Docker container running that Python script.
  • Helm chart to create a Helm release to create a cron job running that Docker container in a pod on a Kubernetes cluster.
  • Terraform module to deploy that Helm chart as a Helm release on a Kubernetes cluster.

Terraform Module

You can use its Terraform module to create a Helm release using its Helm chart to create a cron job running that Docker container in a pod on a Kubernetes cluster.

For more details, check out its README.

Helm Chart

You can use its Helm chart to create a Helm release to create a cron job running that Docker container in a pod on a Kubernetes cluster.

Its repository on Artifact Hub can be found here.

For more details, check out its README.

Docker Image

You can use its Docker image which is publicly available to run its Docker container to run its Python script.

The Dockerfile used to build the image can be found here.

For more details, check out its README.

Python Script

You can use its Python script to run it directly which is the main script to be executed in order to clean the old indices and data streams.

Prerequisites

Following are the prerequisites to be met once before you begin:

  • Following packages should be installed on your system:
    • pip
    • python
    • Using pip, install the following via requirements.txt file:
      • urllib3
      • elasticsearch

Execution Instructions

Once all the prerequisites are met, set the following environment variables:

  • ELASTICSEARCH_HOST
    • Description: Host of Elasticsearch cluster.
    • Example: https://localhost
    • Requirement: REQUIRED
  • ELASTICSEARCH_PORT
    • Description: Port of Elasticsearch cluster.
    • Example: 9200
    • Requirement: REQUIRED
  • ELASTICSEARCH_USER
    • Description: Username of the user of the Elasticsearch cluster.
    • Example: admin
    • Requirement: REQUIRED
  • ELASTICSEARCH_PASSWORD
    • Description: Password of the user of the Elasticsearch cluster.
    • Example: 123456789
    • Requirement: REQUIRED
  • NUMBER_OF_DAYS
    • Description: Number of days to delete the indices and data streams older than that.
    • Example: 30
    • Requirement: REQUIRED

And then simply run the following command:

  • python simple_elasticsearch_cleaner.py

License

This project is licensed under the Apache License - see the LICENSE file for details. This LICENSE file applies to all the code in this repository including docker, helm-charts and terraform subdirectories.

Any contributions, improvements and suggestions will be highly appreciated. 😊