Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic template multi-field w/ _all combo analyzer issue? #13

Open
asanderson opened this issue Feb 25, 2014 · 0 comments
Open

dynamic template multi-field w/ _all combo analyzer issue? #13

asanderson opened this issue Feb 25, 2014 · 0 comments

Comments

@asanderson
Copy link

FWIW, I posted this on the Elasticsearch forum but got no response thus far, so I am posting it here too, since it definitely is analysis combo related.

I'm seeing duplicate concatenated values when using the combo analyzer for _all using a multi-field defined in a dynamic template.

e.g. Instead of seeing "Foo Bar" when listing the _all terms aggregation, I'm seeing "Foo Bar Foo Bar" for the token because my mulit-field defines 2 sub-fields. If the multi-field is defined with 4 sub-fields, then "Foo Bar" is concatenated 4 times.

My set up is below.

Elasticsearch 1.0.0 on CentOs 6.4 with Java 1.7.0_51.

$ES_HOME/config/default-mapping.json:
{
"default": {
"_all": {
"enabled": true,
"analyzer": "combo",
"store": false
},
"dynamic_templates": {
"string_multifield_template": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"include_in_all": false,
"fields": {
"{name}": {
"index": "not_analyzed",
"store": true,
"type": "string"
},
"lowercase": {
"analyzer": "lowercase",
"index": "analyzed",
"store": false,
"type": "string"
}
}
}
}
}
}
}

$ES_HOME/config/elasticsearch.yml:
...
index.analysis.analyzer.lowercase.type: custom
index.analysis.analyzer.lowercase.tokenizer: keyword
index.analysis.analyzer.lowercase.filter [ lowercase ]

index.analysis.analyzer.combo.type: custom
index.analysis.analyzer.combo.sub_analyzers: [ keyword, lowercase ]
index.analysis.analyzer.combo.deduplication: true
index.analysis.analyzer.combo.tokenstream_reuse: false
...

The aggregation query I use is the following:
{
"aggs": {
"_all": {
"terms": {
"field": "_all"
}
}
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant