Elasticsearch Disk-based allocation and Tiered Storage (Hot-Warm) takes precedence

I'm setting doing a baseline and POC of an Elasticsearch cluster, currently I am working with 4 servers, 3 of which will hold the elastic cluster but is limited in storage (Probably couldn't hold the expected data per day).

If I setup a tiered storage 3 hot - 1 warm, if the 3 hot servers are almost fill will it reallocate shards to the 1 warm server to balance the data?

What I would like to know is when Disk-based shard allocation is enabled and when the said 3 servers are almost full will it redirect the shard allocation to the 4th server, this server has enough storage but holds Logstash and Kibana, I would just like to use it for backup and store the overflown data from the cluster.

Note: data retention is only for a day.

1 answer

  • answered 2019-10-08 20:11 ibexit

    I don't think that will be possible out of the box, as there is no concept of overflow nodes. The number of shards is defined in the very moment of index creation. All shards will be allocated, and the incoming data will be distributed across all shards. And if one of the primary or replica shards will be placed on the 4th node, that's fine in terms of elasticsearch.

    But I see a (probaby ugly) workaround: Configure your main index to allocate only on the first three nodes. Create a watcher monitoring the disk usage of the cluster. If a desired usage treshold is reached, issue a reindex query and move the oldest data of the main index into a overflow index cofigured to allocate only on the 4th node. But please be aware, that there will be no replicas if you have only one node for the overflow index.