Description
FWIW, I posted this on the Elasticsearch forum but got no response thus far, so I am posting it here too, since it definitely is analysis combo related.
I'm seeing duplicate concatenated values when using the combo analyzer for _all using a multi-field defined in a dynamic template.
e.g. Instead of seeing "Foo Bar" when listing the _all terms aggregation, I'm seeing "Foo Bar Foo Bar" for the token because my mulit-field defines 2 sub-fields. If the multi-field is defined with 4 sub-fields, then "Foo Bar" is concatenated 4 times.
My set up is below.
Elasticsearch 1.0.0 on CentOs 6.4 with Java 1.7.0_51.
$ES_HOME/config/default-mapping.json:
{
"default": {
"_all": {
"enabled": true,
"analyzer": "combo",
"store": false
},
"dynamic_templates": {
"string_multifield_template": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"include_in_all": false,
"fields": {
"{name}": {
"index": "not_analyzed",
"store": true,
"type": "string"
},
"lowercase": {
"analyzer": "lowercase",
"index": "analyzed",
"store": false,
"type": "string"
}
}
}
}
}
}
}
$ES_HOME/config/elasticsearch.yml:
...
index.analysis.analyzer.lowercase.type: custom
index.analysis.analyzer.lowercase.tokenizer: keyword
index.analysis.analyzer.lowercase.filter [ lowercase ]
index.analysis.analyzer.combo.type: custom
index.analysis.analyzer.combo.sub_analyzers: [ keyword, lowercase ]
index.analysis.analyzer.combo.deduplication: true
index.analysis.analyzer.combo.tokenstream_reuse: false
...
The aggregation query I use is the following:
{
"aggs": {
"_all": {
"terms": {
"field": "_all"
}
}
}
}