Skip to content

Commit

Permalink
Docs: Rewrote the filtered query docs to be clearer
Browse files Browse the repository at this point in the history
Closes #1688
  • Loading branch information
clintongormley committed Jun 19, 2014
1 parent 8ccfca3 commit adf6e79
Showing 1 changed file with 134 additions and 29 deletions.
163 changes: 134 additions & 29 deletions docs/reference/query-dsl/queries/filtered-query.asciidoc
Original file line number Diff line number Diff line change
@@ -1,54 +1,159 @@
[[query-dsl-filtered-query]]
=== Filtered Query

A query that applies a filter to the results of another query. This
query maps to Lucene `FilteredQuery`.
The `filtered` query is used to combine another query with any
<<query-dsl-filters,filter>>). Filters are usually faster than queries because:

* they don't have to calculate the relevance `_score` for each document --
the answer is just a boolean ``Yes, the document matches the filter'' or
``No, the document does not match the filter''.
* the results from most filters can be cached in memory, making subsequent
executions faster.

TIP: Exclude as many document as you can with a filter, then query just the
documents that remain.

[source,js]
--------------------------------------------------
{
"filtered" : {
"query" : {
"term" : { "tag" : "wow" }
},
"filter" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
}
"filtered": {
"query": {
"match": { "tweet": "full text search" }
},
"filter": {
"range": { "created": { "gte": "now - 1d / d" }}
}
}
}
--------------------------------------------------

The `filtered` query can be used wherever a `query` is expected, for instance,
to use the above example in search request:

[source,js]
--------------------------------------------------
curl -XGET localhost:9200/_search -d '
{
"query": {
"filtered": { <1>
"query": {
"match": { "tweet": "full text search" }
},
"filter": {
"range": { "created": { "gte": "now - 1d / d" }}
}
}
}
}
'
--------------------------------------------------
<1> The `filtered` query is passed as the value of the `query`
parameter in the search request.

The filter object can hold only filter elements, not queries. Filters
can be much faster compared to queries since they don't perform any
scoring, especially when they are cached.
==== Filtering without a query

If a `query` is not specified, it defaults to the
<<query-dsl-match-all-query,`match_all` query>>. This means that the
`filtered` query can be used to wrap just a filter, so that it can be used
wherever a query is expected.

[source,js]
--------------------------------------------------
curl -XGET localhost:9200/_search -d '
{
"query": {
"filtered": { <1>
"filter": {
"range": { "created": { "gte": "now - 1d / d" }}
}
}
}
}
'
--------------------------------------------------
<1> No `query` has been specfied, so this request applies just the filter,
returning all documents created since yesterday.

==== Multiple filters

Multiple filters can be applied by wrapping them in a
<<query-dsl-bool-filter,`bool` filter>>, for example:

[source,js]
--------------------------------------------------
{
"filtered": {
"query": { "match": { "tweet": "full text search" }},
"filter": {
"bool": {
"must": { "range": { "created": { "gte": "now - 1d / d" }}},
"should": [
{ "term": { "featured": true }},
{ "term": { "starred": true }}
],
"must_not": { "term": { "deleted": false }}
}
}
}
}
--------------------------------------------------

Similarly, multiple queries can be combined with a
<<query-dsl-bool-query,`bool` query>>.

==== Filter strategy

The filtered query allows to configure how to intersect the filter with the query:
You can control how the filter and query are executed with the `strategy`
parameter:

[source,js]
--------------------------------------------------
{
"filtered" : {
"query" : {
// query definition
},
"filter" : {
// filter definition
},
"query" : { ... },
"filter" : { ... ],
"strategy": "leap_frog"
}
}
--------------------------------------------------

IMPORTANT: This is an _expert-level_ setting. Most users can simply ignore it.

The `strategy` parameter accepts the following options:

[horizontal]
`leap_frog_query_first`:: Look for the first document matching the query, and then alternatively advance the query and the filter to find common matches.
`leap_frog_filter_first`:: Look for the first document matching the filter, and then alternatively advance the query and the filter to find common matches.
`leap_frog`:: Same as `leap_frog_query_first`.
`query_first`:: If the filter supports random access, then search for documents using the query, and then consult the filter to check whether there is a match. Otherwise fall back to `leap_frog_query_first`.
`random_access_${threshold}`:: If the filter supports random access and if there is at least one matching document among the first `threshold` ones, then apply the filter first. Otherwise fall back to `leap_frog_query_first`. `${threshold}` must be greater than or equal to `1`.
`random_access_always`:: Apply the filter first if it supports random access. Otherwise fall back to `leap_frog_query_first`.

The default strategy is to use `query_first` on filters that are not advanceable such as geo filters and script filters, and `random_access_100` on other filters.
`leap_frog_query_first`::

Look for the first document matching the query, and then alternatively
advance the query and the filter to find common matches.

`leap_frog_filter_first`::

Look for the first document matching the filter, and then alternatively
advance the query and the filter to find common matches.

`leap_frog`::

Same as `leap_frog_query_first`.

`query_first`::

If the filter supports random access, then search for documents using the
query, and then consult the filter to check whether there is a match.
Otherwise fall back to `leap_frog_query_first`.

`random_access_${threshold}`::

If the filter supports random access and if there is at least one matching
document among the first `threshold` ones, then apply the filter first.
Otherwise fall back to `leap_frog_query_first`. `${threshold}` must be
greater than or equal to `1`.

`random_access_always`::

Apply the filter first if it supports random access. Otherwise fall back
to `leap_frog_query_first`.

The default strategy is to use `query_first` on filters that are not
advanceable such as geo filters and script filters, and `random_access_100` on
other filters.

0 comments on commit adf6e79

Please sign in to comment.