In the previous article about scoring, Spoon Consulting saw how the scoring of Elasticsearch works by default. 

Now let’s see how you can leverage your results to map your use cases. 

TL:DR

  • Terms can be reused to “boost” better results
  • Filter can be used to scope a query without influencing the score
  • Mix Must/Should/Filter in one Elasticsearch boolean query give a lot of flexibility
  • Boosts give weight on fields – in part 2
  • Multi-match to easily search the same value everywhere – in part 3
  • Function Score allow to define custom influences – in part 4  

In this article we will focus on Bool query to combine several conditions together

Boolean queries in Elasticsearch

Boolean queries are just a special query syntax to combine several conditions and filters.

As the documentation says:

The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.

https://www.elastic.co/
POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user.id" : "kimchy" }
      },
      "filter": {
        "term" : { "tags" : "production" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tags" : "env1" } },
        { "term" : { "tags" : "deployed" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

And it opens a lot of possibilities.
Thanks to this mechanism, we can combine several condition on the same query and it will naturally be ordered by the must strict matching.

Let’s detail some cases.

Filter versus match

Filters are just boolean; it should not influence score directly.
But… according to the TF/IDF algo it does.
Because it changes the context where the score is computed (the total document for IDF computation part) your scores will not be the same with or without filter.

Note: filter results are cached. So using a filter instead of a match when it’s possible is a good practice for performance

Several query matches

More interesting and powerful feature, combining several matches together will give a lot of power and flexibility.

In a search request we have to find a balance between precision and recall.

  • Precision is the rate of relevant document found in all the founded document. If all returned document are relevant, precision is 100% (or 1)
  • Recall: Is the ratio of relevant document founded on all relevant document.

The goal of the perfect request is to get all relevant document without missing anyone.
But usually, more precision is coming with less recall and more recall with less precision.

With several matches in an Elasticsearch boolean query, it’s possible to get more precise documents together with less precise documents ordered by relevance.

Remember:

The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.

https://www.elastic.co/

Let’s look at an example.

Lets begin with a classical match, transformed in a boolean query.
It’s wide enough to be sure that visitors will find content more or less related to what he is looking for.
It’s a strategy to give less qualified content either than nothing when the site don’t have precise content matching the user query.
But we can do much better to ensure relevant documents to come first… if they exists.

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "post_title": "elastic"
          }
        }
      ]
    }},
    "_source": [
      "post_title",
      "post_date"
    ]
  }

And the basic response will looks like

{
  "hits" : {
    "max_score" : 2.449155,
    "hits" : [
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "4153",
        "_score" : 2.449155,
        "_source" : {
          "post_title" : "Elastic Stack in details",
          "post_date" : "2020-06-03 14:19:40"
        }
      },
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "4019",
        "_score" : 1.738766,
        "_source" : {
          "post_title" : "Kibana Map on the Elastic stack 7.6+",
          "post_date" : "2020-04-10 10:55:34"
        }
      },
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "3973",
        "_score" : 1.1412597,
        "_source" : {
          "post_title" : "Live Business Intelligence (BI) with Salesforce, Elastic Search, Heroku and Kibana - Part 2 - Dashboards",
          "post_date" : "2019-09-17 15:39:34"
        }
      },
      {
        "_index" : "post-2",
        "_type" : "_doc",
        "_id" : "3995",
        "_score" : 1.1386931,
        "_source" : {
          "post_title" : "ElasticSearch for your web site, the easy easy way",
          "post_date" : "2019-10-14 10:39:20"
        }
      },
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "3995",
        "_score" : 1.1360463,
        "_source" : {
          "post_title" : "ElasticSearch for your web site, the easy easy way",
          "post_date" : "2019-10-14 10:39:20"
        }
      }
    ]
  }
}

We now have all documents related to what our user want.
Now lets add precision with a should match:

{
  "size": 5, 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "post_content": "elastic"
          }
        }
      ],
      "should": [
        {"match": {"post_title": {"query": "elastic search", "operator": "and"}}}, # if content match elastic it's better if the title also match elastic AND search
        {"match_phrase": {"post_title": "elastic search"}}, # it's even better if the title match 'elastic' then 'search' in this order without space 
        {"match_phrase": {"post_content": "elastic search"} } # it's even even better if the content also match 'elastic' then 'search' in this order without space
      ]
    }},
    "_source": [
      "post_title",
      "post_date"
    ]
  }

Accordingly, results are quite different:

{
  "hits" : {
    "max_score" : 8.827201,
    "hits" : [
      {
        "_index" : "post-2",
        "_type" : "_doc",
        "_id" : "3995",
        "_score" : 8.827201,
        "_source" : {
          "post_title" : "ElasticSearch for your web site, the easy easy way",
          "post_date" : "2019-10-14 10:39:20"
        }
      },
      {
        "_index" : "post-2",
        "_type" : "_doc",
        "_id" : "3962",
        "_score" : 7.8231583,
        "_source" : {
          "post_title" : "Live Business Intelligence (BI) with Salesforce, Heroku, Elastic Search and Kibana - Part 1 - Indexing Datas",
          "post_date" : "2019-09-05 13:34:16"
        }
      },
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "3973",
        "_score" : 7.378922,
        "_source" : {
          "post_title" : "Live Business Intelligence (BI) with Salesforce, Elastic Search, Heroku and Kibana - Part 2 - Dashboards",
          "post_date" : "2019-09-17 15:39:34"
        }
      },
      {
        "_index" : "post-2",
        "_type" : "_doc",
        "_id" : "3973",
        "_score" : 7.3731894,
        "_source" : {
          "post_title" : "Live Business Intelligence (BI) with Salesforce, Elastic Search, Heroku and Kibana - Part 2 - Dashboards",
          "post_date" : "2019-09-17 15:39:34"
        }
      },
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "3995",
        "_score" : 6.5617056,
        "_source" : {
          "post_title" : "ElasticSearch for your web site, the easy easy way",
          "post_date" : "2019-10-14 10:39:20"
        }
      }
    ]
  }
}

It’s also possible to give priority to the most recent posts.
Let’s add this to the previous query:

{
  "size": 5, 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "post_content": "elastic"
          }
        }
      ],
      "should": [
        {"match": {"post_title": {"query": "elastic search", "operator": "and"}}},
        {"match_phrase": {"post_title": "elastic search"}},  
        {"match_phrase": {"post_content": "elastic search"} },
        {
        "distance_feature": { # add a boost depending on distance from pivot
          "field": "post_date", # reference field
          "pivot": "30d", # Distance from the origin at which relevance scores receive half of the boost value.
          "origin": "now", # reference (can be now-1w or whatever)
          "boost": 4 # we will talk in detail on this on another article
        }
      }
      ]
    }},
    "_source": [
      "post_title",
      "post_date"
    ]
  }

And again, results are modified:

{
  "hits" : {
    "max_score" : 11.222102,
    "hits" : [
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "4332",
        "_score" : 11.222102,
        "_source" : {
          "post_title" : "Paginating term aggregation",
          "post_date" : "2020-09-25 07:46:12"
        }
      },
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "4340",
        "_score" : 11.085687,
        "_source" : {
          "post_title" : "Scoring TF/IDF with Elasticsrearch",
          "post_date" : "2020-10-05 12:48:39"
        }
      },
      {
        "_index" : "post-2",
        "_type" : "_doc",
        "_id" : "3995",
        "_score" : 8.853431,
        "_source" : {
          "post_title" : "ElasticSearch for your web site, the easy easy way",
          "post_date" : "2019-10-14 10:39:20"
        }
      },
      {
        "_index" : "post-1",
        "_type" : "_doc",
        "_id" : "4318",
        "_score" : 7.923232,
        "_source" : {
          "post_title" : "Spoon Consulting present Elastic uses cases for the online Devcon MRU 2020",
          "post_date" : "2020-09-11 13:25:46"
        }
      },
      {
        "_index" : "post-2",
        "_type" : "_doc",
        "_id" : "3962",
        "_score" : 7.849333,
        "_source" : {
          "post_title" : "Live Business Intelligence (BI) with Salesforce, Heroku, Elastic Search and Kibana - Part 1 - Indexing Datas",
          "post_date" : "2019-09-05 13:34:16"
        }
      }
    ]
  }
}

For the bravest among you, here is the details of the explain query on the previous query for the first doc.

{
  "_index" : "post-1",
  "_type" : "_doc",
  "_id" : "3995",
  "matched" : true,
  "explanation" : {
    "value" : 7.7534475,
    "description" : "sum of:",
    "details" : [
      {
        "value" : 0.5208409,
        "description" : "weight(post_content:elast in 3) [PerFieldSimilarity], result of:",
        "details" : [
          {
            "value" : 0.5208409,
            "description" : "score(freq=9.0), computed as boost * idf * tf from:",
            "details" : [
              {
                "value" : 2.2,
                "description" : "boost",
                "details" : [ ]
              },
              {
                "value" : 0.2876821,
                "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details" : [
                  {
                    "value" : 4,
                    "description" : "n, number of documents containing term",
                    "details" : [ ]
                  },
                  {
                    "value" : 5,
                    "description" : "N, total number of documents with field",
                    "details" : [ ]
                  }
                ]
              },
              {
                "value" : 0.8229426,
                "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details" : [
                  {
                    "value" : 9.0,
                    "description" : "freq, occurrences of term within document",
                    "details" : [ ]
                  },
                  {
                    "value" : 1.2,
                    "description" : "k1, term saturation parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 0.75,
                    "description" : "b, length normalization parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 536.0,
                    "description" : "dl, length of field (approximate)",
                    "details" : [ ]
                  },
                  {
                    "value" : 294.8,
                    "description" : "avgdl, average length of field",
                    "details" : [ ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value" : 2.2720926,
        "description" : "sum of:",
        "details" : [
          {
            "value" : 1.1360463,
            "description" : "weight(post_title:elast in 3) [PerFieldSimilarity], result of:",
            "details" : [
              {
                "value" : 1.1360463,
                "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                "details" : [
                  {
                    "value" : 2.2,
                    "description" : "boost",
                    "details" : [ ]
                  },
                  {
                    "value" : 1.3862944,
                    "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details" : [
                      {
                        "value" : 1,
                        "description" : "n, number of documents containing term",
                        "details" : [ ]
                      },
                      {
                        "value" : 5,
                        "description" : "N, total number of documents with field",
                        "details" : [ ]
                      }
                    ]
                  },
                  {
                    "value" : 0.3724928,
                    "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                    "details" : [
                      {
                        "value" : 1.0,
                        "description" : "freq, occurrences of term within document",
                        "details" : [ ]
                      },
                      {
                        "value" : 1.2,
                        "description" : "k1, term saturation parameter",
                        "details" : [ ]
                      },
                      {
                        "value" : 0.75,
                        "description" : "b, length normalization parameter",
                        "details" : [ ]
                      },
                      {
                        "value" : 8.0,
                        "description" : "dl, length of field",
                        "details" : [ ]
                      },
                      {
                        "value" : 5.2,
                        "description" : "avgdl, average length of field",
                        "details" : [ ]
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "value" : 1.1360463,
            "description" : "weight(post_title:search in 3) [PerFieldSimilarity], result of:",
            "details" : [
              {
                "value" : 1.1360463,
                "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                "details" : [
                  {
                    "value" : 2.2,
                    "description" : "boost",
                    "details" : [ ]
                  },
                  {
                    "value" : 1.3862944,
                    "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details" : [
                      {
                        "value" : 1,
                        "description" : "n, number of documents containing term",
                        "details" : [ ]
                      },
                      {
                        "value" : 5,
                        "description" : "N, total number of documents with field",
                        "details" : [ ]
                      }
                    ]
                  },
                  {
                    "value" : 0.3724928,
                    "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                    "details" : [
                      {
                        "value" : 1.0,
                        "description" : "freq, occurrences of term within document",
                        "details" : [ ]
                      },
                      {
                        "value" : 1.2,
                        "description" : "k1, term saturation parameter",
                        "details" : [ ]
                      },
                      {
                        "value" : 0.75,
                        "description" : "b, length normalization parameter",
                        "details" : [ ]
                      },
                      {
                        "value" : 8.0,
                        "description" : "dl, length of field",
                        "details" : [ ]
                      },
                      {
                        "value" : 5.2,
                        "description" : "avgdl, average length of field",
                        "details" : [ ]
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value" : 2.2720926,
        "description" : """weight(post_title:"elast search" in 3) [PerFieldSimilarity], result of:""",
        "details" : [
          {
            "value" : 2.2720926,
            "description" : "score(freq=1.0), computed as boost * idf * tf from:",
            "details" : [
              {
                "value" : 2.2,
                "description" : "boost",
                "details" : [ ]
              },
              {
                "value" : 2.7725887,
                "description" : "idf, sum of:",
                "details" : [
                  {
                    "value" : 1.3862944,
                    "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details" : [
                      {
                        "value" : 1,
                        "description" : "n, number of documents containing term",
                        "details" : [ ]
                      },
                      {
                        "value" : 5,
                        "description" : "N, total number of documents with field",
                        "details" : [ ]
                      }
                    ]
                  },
                  {
                    "value" : 1.3862944,
                    "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details" : [
                      {
                        "value" : 1,
                        "description" : "n, number of documents containing term",
                        "details" : [ ]
                      },
                      {
                        "value" : 5,
                        "description" : "N, total number of documents with field",
                        "details" : [ ]
                      }
                    ]
                  }
                ]
              },
              {
                "value" : 0.3724928,
                "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details" : [
                  {
                    "value" : 1.0,
                    "description" : "phraseFreq=1.0",
                    "details" : [ ]
                  },
                  {
                    "value" : 1.2,
                    "description" : "k1, term saturation parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 0.75,
                    "description" : "b, length normalization parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 8.0,
                    "description" : "dl, length of field",
                    "details" : [ ]
                  },
                  {
                    "value" : 5.2,
                    "description" : "avgdl, average length of field",
                    "details" : [ ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value" : 1.49668,
        "description" : """weight(post_content:"elast search" in 3) [PerFieldSimilarity], result of:""",
        "details" : [
          {
            "value" : 1.49668,
            "description" : "score(freq=9.0), computed as boost * idf * tf from:",
            "details" : [
              {
                "value" : 2.2,
                "description" : "boost",
                "details" : [ ]
              },
              {
                "value" : 0.82667863,
                "description" : "idf, sum of:",
                "details" : [
                  {
                    "value" : 0.2876821,
                    "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details" : [
                      {
                        "value" : 4,
                        "description" : "n, number of documents containing term",
                        "details" : [ ]
                      },
                      {
                        "value" : 5,
                        "description" : "N, total number of documents with field",
                        "details" : [ ]
                      }
                    ]
                  },
                  {
                    "value" : 0.5389965,
                    "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details" : [
                      {
                        "value" : 3,
                        "description" : "n, number of documents containing term",
                        "details" : [ ]
                      },
                      {
                        "value" : 5,
                        "description" : "N, total number of documents with field",
                        "details" : [ ]
                      }
                    ]
                  }
                ]
              },
              {
                "value" : 0.8229426,
                "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details" : [
                  {
                    "value" : 9.0,
                    "description" : "phraseFreq=9.0",
                    "details" : [ ]
                  },
                  {
                    "value" : 1.2,
                    "description" : "k1, term saturation parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 0.75,
                    "description" : "b, length normalization parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 536.0,
                    "description" : "dl, length of field (approximate)",
                    "details" : [ ]
                  },
                  {
                    "value" : 294.8,
                    "description" : "avgdl, average length of field",
                    "details" : [ ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value" : 1.1917417,
        "description" : "Distance score, computed as weight * pivotDistance / (pivotDistance + abs(value - origin)) from:",
        "details" : [
          {
            "value" : 16.0,
            "description" : "weight",
            "details" : [ ]
          },
          {
            "value" : 2592000000,
            "description" : "pivotDistance",
            "details" : [ ]
          },
          {
            "value" : 1603257046032,
            "description" : "origin",
            "details" : [ ]
          },
          {
            "value" : 1571049560000,
            "description" : "current value",
            "details" : [ ]
          }
        ]
      }
    ]
  }
}

Conclusion

In next articles we will study other, and complementary techniques to tweak the score of your Elasticsearch queries to fit your needs.

But a boolean query gives a lot of power and a lot a flexibility to define what is relevant for your application and to return either very precise documents and to keep less relevant documents and the end of the query.

Spoon consulting is a certified partner of Elastic

As a certified partner of the Elastic company, Spoon Consulting offers a high level consulting for all kinds of companies.

Read more information on your personal use Elasticsearch use case on Spoon consulting’s posts

Or contact Spoon consulting now