Elasticsearch Bool Query: must vs filter, Caching, and Scoring

1 min readDatabases & Storage

Bool queries combine must, filter, should, and must_not clauses. must clauses contribute to relevance scoring; filter clauses don't — but filters are cached by Elasticsearch and execute faster. Misusing must for exact-match conditions wastes scoring computation and bypasses the filter cache.

searchelasticsearch

The four clauses

| Clause | Behavior | Scoring | Caching | |---|---|---|---| | must | Document must match; contributes to _score | Yes | No | | filter | Document must match; does not affect _score | No | Yes | | should | Document may match; matching boosts _score | Yes | No | | must_not | Document must not match | No | Yes |

must and filter both require a match, but only must participates in scoring. filter and must_not are cached at the segment level — repeated filter queries hit the cache instead of scanning inverted index entries.

Practical patterns

Search with scoring + binary filter:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "body": "database replication" } }
      ],
      "filter": [
        { "term": { "status": "published" } },
        { "range": { "published_at": { "gte": "2024-01-01" } } }
      ]
    }
  }
}

match runs with scoring — documents where "database replication" appears more often or in more prominent positions score higher. status = published and published_at >= 2024-01-01 are binary conditions that don't affect relevance, so they go in filter and are cached.

Multi-condition search with optional boost:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "kubernetes" } }
      ],
      "should": [
        { "match": { "tags": "cloud-native" } },
        { "match": { "tags": "devops" } }
      ],
      "minimum_should_match": 0
    }
  }
}

should clauses are optional but boost _score when they match. minimum_should_match: 0 means none are required — they only influence ranking. Set minimum_should_match: 1 to require at least one should clause to match.

Exclusion:

{
  "query": {
    "bool": {
      "must": { "match": { "content": "python" } },
      "must_not": { "term": { "draft": true } }
    }
  }
}

filter clause results are cached — putting scoring conditions in filter wastes the cache and gives wrong scores

GotchaElasticsearch

Elasticsearch maintains a filter cache at the Lucene segment level. When the same filter query runs again on the same segment, it hits the bitset cache instead of re-executing. must clauses bypass this cache entirely because their output depends on query context (the score computation). Putting a full-text match in filter works syntactically — documents match or don't — but scoring is discarded. Putting a term or range in must wastes scoring computation on a binary condition that has no ranking signal.

Prerequisites

  • Inverted index
  • Elasticsearch relevance scoring
  • BM25

Key Points

  • filter is cached per segment; repeated identical filters are essentially free after the first execution.
  • must contributes to _score; filter does not — use must only when the match quality matters for ranking.
  • term, range, exists, and geo queries in filter context benefit from caching; match queries in filter do not cache well because they vary by query string.
  • must_not also uses the filter cache — exclusions are as cheap as inclusions in filter context.

Nested bool queries

Bool queries compose arbitrarily:

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              { "match": { "title": "elasticsearch" } },
              { "match": { "title": "opensearch" } }
            ],
            "minimum_should_match": 1
          }
        }
      ],
      "filter": [
        { "term": { "category": "search" } }
      ]
    }
  }
}

This finds documents about elasticsearch or opensearch in the search category. The inner bool with should handles the OR logic; the outer filter enforces the category without affecting score.

📝constant_score: force filter-only execution for term lookups

When you only need to match documents (no ranking needed), constant_score wraps a filter and assigns a fixed _score of 1.0 to all matches:

{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "user_id": "u_12345" }
      },
      "boost": 1.0
    }
  }
}

This is faster than a bool with a single must term query because it completely skips score computation and uses the filter cache. Use constant_score for ID lookups, tag filters, and any query where ranking is meaningless — e.g., fetching all documents for a user to display in a list.

A search query returns results ranked by relevance. You add a date range condition (last 30 days) to narrow results. Should you put the date range in must or filter?

easy

The date range is a binary condition — either the document is within the last 30 days or it isn't. You want results sorted by relevance to the search terms, not by date.

  • Amust — because the date range is required for a document to appear in results
    Incorrect.must does require a match, but it also contributes to _score. A date range in must adds its score to the overall relevance, which distorts ranking based on how 'well' a document matches the date range — a meaningless signal for binary date conditions. must is correct when the match quality matters for ranking.
  • Bfilter — because it's a binary condition with no ranking signal, and filter results are cached
    Correct!Date ranges are binary: in range or not. There's no 'degree of match' that should influence relevance ranking. Putting the range in filter means: (1) it doesn't distort the relevance score from your actual search terms, (2) the range filter is cached at the segment level — repeated searches with the same 30-day window hit the cache, and (3) the query correctly separates 'what must match' from 'how relevant is the match'.
  • Cshould — so that recent documents score higher but older ones still appear
    Incorrect.should makes a clause optional and adds score when matched. If you want to boost recent documents without excluding older ones, should works — but the question specifies narrowing results to last 30 days, which requires exclusion. For a boost-without-exclusion use case, should is appropriate.
  • Dmust_not — to exclude documents outside the date range
    Incorrect.must_not excludes documents that match the condition. But the date range condition selects documents inside the range — must_not would exclude documents inside the range and keep everything outside it. That's the opposite of the intended behavior.

Hint:Is matching a date range a 'how relevant' question or a 'does it qualify' question?