Elasticsearch Bool Query: must vs filter, Caching, and Scoring
Bool queries combine must, filter, should, and must_not clauses. must clauses contribute to relevance scoring; filter clauses don't — but filters are cached by Elasticsearch and execute faster. Misusing must for exact-match conditions wastes scoring computation and bypasses the filter cache.
The four clauses
| Clause | Behavior | Scoring | Caching |
|---|---|---|---|
| must | Document must match; contributes to _score | Yes | No |
| filter | Document must match; does not affect _score | No | Yes |
| should | Document may match; matching boosts _score | Yes | No |
| must_not | Document must not match | No | Yes |
must and filter both require a match, but only must participates in scoring. filter and must_not are cached at the segment level — repeated filter queries hit the cache instead of scanning inverted index entries.
Practical patterns
Search with scoring + binary filter:
{
"query": {
"bool": {
"must": [
{ "match": { "body": "database replication" } }
],
"filter": [
{ "term": { "status": "published" } },
{ "range": { "published_at": { "gte": "2024-01-01" } } }
]
}
}
}
match runs with scoring — documents where "database replication" appears more often or in more prominent positions score higher. status = published and published_at >= 2024-01-01 are binary conditions that don't affect relevance, so they go in filter and are cached.
Multi-condition search with optional boost:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "kubernetes" } }
],
"should": [
{ "match": { "tags": "cloud-native" } },
{ "match": { "tags": "devops" } }
],
"minimum_should_match": 0
}
}
}
should clauses are optional but boost _score when they match. minimum_should_match: 0 means none are required — they only influence ranking. Set minimum_should_match: 1 to require at least one should clause to match.
Exclusion:
{
"query": {
"bool": {
"must": { "match": { "content": "python" } },
"must_not": { "term": { "draft": true } }
}
}
}
filter clause results are cached — putting scoring conditions in filter wastes the cache and gives wrong scores
GotchaElasticsearchElasticsearch maintains a filter cache at the Lucene segment level. When the same filter query runs again on the same segment, it hits the bitset cache instead of re-executing. must clauses bypass this cache entirely because their output depends on query context (the score computation). Putting a full-text match in filter works syntactically — documents match or don't — but scoring is discarded. Putting a term or range in must wastes scoring computation on a binary condition that has no ranking signal.
Prerequisites
- Inverted index
- Elasticsearch relevance scoring
- BM25
Key Points
- filter is cached per segment; repeated identical filters are essentially free after the first execution.
- must contributes to _score; filter does not — use must only when the match quality matters for ranking.
- term, range, exists, and geo queries in filter context benefit from caching; match queries in filter do not cache well because they vary by query string.
- must_not also uses the filter cache — exclusions are as cheap as inclusions in filter context.
Nested bool queries
Bool queries compose arbitrarily:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{ "match": { "title": "elasticsearch" } },
{ "match": { "title": "opensearch" } }
],
"minimum_should_match": 1
}
}
],
"filter": [
{ "term": { "category": "search" } }
]
}
}
}
This finds documents about elasticsearch or opensearch in the search category. The inner bool with should handles the OR logic; the outer filter enforces the category without affecting score.
📝constant_score: force filter-only execution for term lookups
When you only need to match documents (no ranking needed), constant_score wraps a filter and assigns a fixed _score of 1.0 to all matches:
{
"query": {
"constant_score": {
"filter": {
"term": { "user_id": "u_12345" }
},
"boost": 1.0
}
}
}
This is faster than a bool with a single must term query because it completely skips score computation and uses the filter cache. Use constant_score for ID lookups, tag filters, and any query where ranking is meaningless — e.g., fetching all documents for a user to display in a list.
A search query returns results ranked by relevance. You add a date range condition (last 30 days) to narrow results. Should you put the date range in must or filter?
easyThe date range is a binary condition — either the document is within the last 30 days or it isn't. You want results sorted by relevance to the search terms, not by date.
Amust — because the date range is required for a document to appear in results
Incorrect.must does require a match, but it also contributes to _score. A date range in must adds its score to the overall relevance, which distorts ranking based on how 'well' a document matches the date range — a meaningless signal for binary date conditions. must is correct when the match quality matters for ranking.Bfilter — because it's a binary condition with no ranking signal, and filter results are cached
Correct!Date ranges are binary: in range or not. There's no 'degree of match' that should influence relevance ranking. Putting the range in filter means: (1) it doesn't distort the relevance score from your actual search terms, (2) the range filter is cached at the segment level — repeated searches with the same 30-day window hit the cache, and (3) the query correctly separates 'what must match' from 'how relevant is the match'.Cshould — so that recent documents score higher but older ones still appear
Incorrect.should makes a clause optional and adds score when matched. If you want to boost recent documents without excluding older ones, should works — but the question specifies narrowing results to last 30 days, which requires exclusion. For a boost-without-exclusion use case, should is appropriate.Dmust_not — to exclude documents outside the date range
Incorrect.must_not excludes documents that match the condition. But the date range condition selects documents inside the range — must_not would exclude documents inside the range and keep everything outside it. That's the opposite of the intended behavior.
Hint:Is matching a date range a 'how relevant' question or a 'does it qualify' question?