Introducing our new post sub category of our Elasticsearch Advance Usage Exemples, Elasticsearch Best practices by one of the most important practices we should all implement on all our indices : Index Alias. 

Index Aliasing is the most important technique for a production ready elasticsearch. They are essential to facilitate maintenance, allow life cycles, reindex without downtime and so on…

TL:DR

  • An alias is acting exactly like an index. 
  • You can query an ingest on an alias with normal API calls
  • An alias can be set on several indices (logs-0001, logs-0002 can both have the same ‘logs’ alias)
  • Aliases can be changed whenever you want. 
  • Alias helps maintaining Index with no downtime

What is an index alias in Elasticsearch ? 

As the name suggests, an index alias is another name you can put on one or several indices. 

You can declare with the _aliases api. 

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "my-logs-*",
        "alias": "my-logs"
      }
    }
  ]
}

Now, it will be the same to request my-logs or my-logs-*  or even my-logs-1,my-logs-2,….

GET my-logs/_search
GET my-logs*/_search
GET my-logs-1,my-logs-2,my-logs-3/_search

You can also remove an alias from an index at any time : 

POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "my-logs-archive",
        "alias": "my-logs"
      }
    }
  ]
}

Then the documents of the index called my-logs-archive will not be retrieved anymore on a request on “my-logs” alias

Why Index Aliases are so important ? 

Let’s talk about some use cases. 

Scoping requests

You want to scope your request easily and make your code more readable. 
Let’s say you have a lot of posts and want to use only the posts of last year. 

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "posts-2020", "alias" : "current_year" }}  
    ]
}

At the end of the year you just have to remove the alias from posts-2010  and add it on your future posts-2021 without changing anything in your application. 

You can also add filters to your index alias to have the same result. 

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "logs",
        "alias": "todays_logs",
        "filter": {
          "range": {
            "@timestamp": {
              "gte": "now-1d"
            }
          }
        }
      }
    }
  ]
}

Reindex with index aliases

The most important use case is maintenance of indices. 

When you work with elastic search you will have to reindex datas, to change mapping or number of shards. 

Let’s say we were oversharding our logs index with 5 shards per index. 
All our current logs indices have the alias ‘logs’.

To limit the load on our RAM memory, we have to creates news indices  : 

PUT logs_v2_2020-01
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {...}
}

Now lets reindex : 

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "logs_2020-01"
  },
  "dest": {
    "index": "logs_v2_2020-01"
  }
}

At this stage nothing has changed. 

You can continue to request on ‘logs’ the way you used to. 

Then, when reindexing has ended, just change the alias to make your requests use the new index with no downtime : 

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs",
        "alias": "logs_v2_2020-01"
      }
    },
    {
      "remove": {
        "index": "logs",
        "alias": "logs_2020-01"
      }
    }
  ]
}

You can now delete your old index. 

DELETE logs_2020-01

And repeat the operation to correct your data structure without impacting your production. 

Aliases for Index Life cycle and Hot Warm architecture

Another common use case when companies ingest a lot of time based documents is to implement a life cycle policy. 

A life cycle policy can help you save Gb of data storage and should be implemented on every time series data. Unless your company have an unlimited budget.

We will not detail here how to implement it, you can learn the process in the official documentation.

In summary, each time your index becomes bigger than a limit, or is older than a limit, a new index is created. 

Then the old index can be shrinked and moved to a less expensive hardware.

All actions are totally transparent for end user (despite access times on old index depending on your settings).

This is based on the power of aliases. 

  1. All indices has an alias
  2. New created indices have the alias with the “ »is_write_index » : true” parameter.
  3. Therefore, all new ingested logs are automatically written on this new index. 

Behind the scene, other really cool elastic behaviors shrink and move the data of the old index if you set so. But it will be in another dedicated post. 

Conclusion

It’s really quick to add an Index Alias on your new indices, and easy to use into your code. 

It’s a very powerful behaviour which will avoid a lot of pain to every Elasticsearch maintainer.

So always add an index alias, it may save you days, and at least it will have no impact.