elasticsearch
1. docker 安装
 1.1. elasticsearch 安装
 1.2. ik分词器安装
 1.2.1. 在线安装
 1.2.2. 离线安装
 1.2.3. 分词器测试
 2. linux http基本操作命令
 2.1. 基本操作
 2.2. 索引创建与新增元素
 2.3. 查询
 3. kibana 命令行操作
 3.1. 创建索引
 3.2. 中文分词
 3.2.1. ik_max_word
3.2.2. ik_smart
3.2.3. 最佳实践
 3.3. 手动插入数据
 3.4. 查询
 3.4.1. 字段类型
 3.4.2. filter and query
3.5. 索引新增字段
 3.6. 更改字段类型为 multi_field
3.7. 其他
 3.8. 重建索引、修改Mapping的方式
 3.8.1. 步骤1: 建立新索引
 3.8.2. 步骤2: 复制数据
 3.8.3. 步骤3: 修改别名关联
 3.8.4. 步骤4: 删除旧索引
 4. shard & replica
4.1. primary shard 主分片
 4.2. replica shard 副本分片
 5. 倒排索引结构
 6. spring 集成

elasticsearch [Top]

elasticsearch，基于lucene，隐藏复杂性，提供简单易用的restful api接口、java api接口（还有其他语言的api接口）

分布式的文档存储引擎
分布式的搜索引擎和分析引擎
分布式，支持PB级数据

入门说明 http://www.ruanyifeng.com/blog/2017/08/elasticsearch.html

https://www.cnblogs.com/yufeng218/p/12128538.html

kibana 命令行操作 [Top]

创建索引 [Top]

PUT sw_test.trade_contract_v1
{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 1
    },
    "analysis": {
      "analyzer": {
        "default": {
          "type": "ik_max_word"
        },
        "default_search": {
          "type": "ik_smart"
        }
      }
    }
  },
  "aliases": {
    "sw_test.trade_contract": {}
  },
  "mappings": {
    "properties": {
      "created_time": {
        "type": "date"
      },
      "modified_time": {
        "type": "date"
      },
      "status": {
        "type": "keyword"
      },
      "contract_id": {
        "type": "keyword"
      },
      "invoice_no": {
        "type": "keyword"
      },
      "export_country": {
        "type": "keyword"
      },
      "exp_currency": {
        "type": "keyword"
      },
      "amount": {
        "type": "double"
      },
      "deleted": {
        "type": "boolean"
      },
      "extra": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "product_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "product_quantity": {
        "type": "long"
      },
      "product_quantity_unit": {
        "type": "keyword"
      },
      "note": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

中文分词 [Top]

参见

analysis-ik分两种模式：ik_max_word和ik_smart模式

ik_max_word [Top]

会将文本做最细粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为“中华人民共和国、中华人民、中华、华人、人民共和国、人民、共和国、大会堂、大会、会堂等词语。

ik_smart [Top]

会做最粗粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为中华人民共和国、人民大会堂。

最佳实践 [Top]

两种分词器使用的最佳实践是：索引时用ik_max_word，在搜索时用ik_smart。即：索引时最大化的将文章内容分词，搜索时更精确的搜索到想要的结果。

{
  "mappings": {
    "_doc": {
      "properties": {
        "firm_name": {
          "type": "text",
          "analyzer": "ik_max_word", 
          "search_analyzer": "ik_smart" 
        }
      }
    }
  }
}

手动插入数据 [Top]

POST sw_test.trade_contract/_doc
{
    "amount":3344.00,
    "contract_id":"01990100018000018070300001149",
    "created_time":"2021-01-01",
    "custom_id":"123123",
    "deleted": false,
    "exp_currency":"USD",
    "export_country":"澳大利亚",
    "export_country_code":"CN",
    "extra":"小小",
    "invoice_no":"qa_order_20210526194123288",
    "modified_time":"2021-01-01",
    "note":"1212",
    "product_name":"apple",
    "product_quantity":1200,
    "product_quantity_unit":"kg",
    "status":"Closed",
    "test_add": "时代峰峻卡上的福建省地方"
}

查询 [Top]

POST sw_test.trade_contract/_search
{
   "query": { 
    "bool": { 
      "must": [
        { "match": { "exp_currency":   "USD" }},
        { "match": { "product_name": "apple" }}
      ],
      "filter": [ 
        { "term":  { "export_country": "澳大利亚" }},
        { "range": { "created_time": { "gte": "2015-01-01",
                "lte" : "2022-01-01" }}}
      ],
      "should": [{
					"match": {
						"test_add": "卡上的"
					}
				}
			]
    }
  }
}

filter and query [Top]

Bool Query

Query and Filter context

Different with Filter and Must Not

Basically, filter = must but without scoring.

filter:

It is written in Filter context.
It does not affect the score of the result.
The matched query results will appear in the result.
Exact match based, not partial match.

must_not:

It is written again on the same filter context.
Which means it will not affect the score of the result.
The documents matched with this condition will NOT appear in the result.
Exact match based.

bool	similar	context	影响评分	出现在结果集	精确匹配
must	AND	query context	Y	Y	Y
filter	AND	filter context	N	Y	Y
should	OR	query context	Y	Y	N
must_not	AND NOT	filter context	N	N	Y

索引新增字段 [Top]

POST sw_test.trade_contract_v1/_mapping
{
  "properties": {
     "test_add":{
        "type":"text"
     }
  }
}

更改字段类型为 multi_field [Top]

创建 mapping 时，可以为keyword指定ignore_above ，用来限定字符长度。
超过 ignore_above 的字符会被存储，但不会被全文索引。

PUT /sw_test.trade_contract_v1/_mapping/
{
  "properties": {
     "test_add":{
        "type":"text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
     }
  }
}

其他 [Top]

object类型自动映射，无需手动新增
int、long、date等类型自动映射，可以不手动新增
string类型会自动映射成multi_field，并使用默认分词器，建议手动修改ES mapping

重建索引、修改Mapping的方式 [Top]

Index Aliases Elasticsearch如何修改Mapping结构并实现业务零停机

步骤1: 建立新索引 [Top]

PUT sw_test.trade_contract_v2

步骤2: 复制数据 [Top]

POST _reindex
{
    "source": {
        "index": "sw_test.trade_contract_v1"
    },
    "dest": {
        "index": "sw_test.trade_contract_v2"
    }
}

步骤3: 修改别名关联 [Top]

POST /_aliases
{
    "actions": [
        { "remove": { "index": " sw_test.trade_contract_v1", "alias": " sw_test.trade_contract" }},
        { "add":    { "index": " sw_test.trade_contract_v2", "alias": " sw_test.trade_contract" }}
    ]
}

步骤4: 删除旧索引 [Top]

DELETE  sw_test.trade_contract_v1

shard & replica [Top]

PUT sw_test.trade_contract_v1
{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 1
    },
    ....
}

参考文章：

shards-and-replicas-in-elasticsearch
es-glossary

When you create an index (an index is automatically created when you index the first document as well) you can define how many shards it will be composed of.
If you don't specify a number it will have the default number of shards: 5 primaries. What does it mean?

It means that elasticsearch will create 5 primary shards that will contain your data:

 ____    ____    ____    ____    ____
| 1  |  | 2  |  | 3  |  | 4  |  | 5  |
|____|  |____|  |____|  |____|  |____|

Every time you index a document, elasticsearch will decide which primary shard is supposed to hold that document and will index it there.
Primary shards are not a copy of the data, they are the data! Having multiple shards does help taking advantage of parallel processing on a single machine,
but the whole point is that if we start another elasticsearch instance on the same cluster, the shards will be distributed in an even way over the cluster.

Node 1 will then hold for example only three shards:

 ____    ____    ____ 
| 1  |  | 2  |  | 3  |
|____|  |____|  |____|

Since the remaining two shards have been moved to the newly started node:

 ____    ____
| 4  |  | 5  |
|____|  |____|

Why does this happen? Because elasticsearch is a distributed search engine and this way you can make use of multiple nodes/machines to manage big amounts of data.

Every elasticsearch index is composed of at least one primary shard since that's where the data is stored. Every shard comes at a cost, though, therefore if you have a single node and no foreseeable growth, just stick with a single primary shard.

replica shard 副本分片 [Top]

Another type of shard is a replica. The default is 1, meaning that every primary shard will be copied to another shard that will contain the same data.
Replicas are used to increase search performance and for fail-over.
A replica shard is never going to be allocated on the same node where the related primary is
(it would pretty much be like putting a backup on the same disk as the original data).

Back to our example, with 1 replica we'll have the whole index on each node, since 2 replica shards will be allocated on the first node, and they will contain exactly the same data as the primary shards on the second node:

Node1

 ____    ____    ____    ____    ____
| 1  |  | 2  |  | 3  |  | 4R |  | 5R |
|____|  |____|  |____|  |____|  |____|

Same for the second node, which will contain a copy of the primary shards on the first node:

Node2

 ____    ____    ____    ____    ____
| 1R |  | 2R |  | 3R |  | 4  |  | 5  |
|____|  |____|  |____|  |____|  |____|

With a setup like this, if a node goes down, you still have the whole index. The replica shards will automatically become primaries, and the cluster will work properly despite the node failure, as follows:

 ____    ____    ____    ____    ____
| 1  |  | 2  |  | 3  |  | 4  |  | 5  |
|____|  |____|  |____|  |____|  |____|

倒排索引结构 [Top]

Elasticsearch分别为每个字段都建立了一个倒排索引。比如，在“张三”、“北京市”、22 这些都是Term，而[1，3]就是Posting List。Posting list就是一个数组，存储了所有符合某个Term的文档ID。

spring 集成 [Top]

spring data与elasticsearch版本对应

spring data 官网文档

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES.md

ES.md

elasticsearch [Top]

docker 安装 [Top]

elasticsearch 安装 [Top]

ik分词器安装 [Top]

在线安装 [Top]

离线安装 [Top]

分词器测试 [Top]

linux http基本操作命令 [Top]

基本操作 [Top]

索引创建与新增元素 [Top]

查询 [Top]

kibana 命令行操作 [Top]

创建索引 [Top]

中文分词 [Top]

ik_max_word [Top]

ik_smart [Top]

最佳实践 [Top]

手动插入数据 [Top]

查询 [Top]

字段类型 [Top]

filter and query [Top]

索引新增字段 [Top]

更改字段类型为 multi_field [Top]

其他 [Top]

重建索引、修改Mapping的方式 [Top]

步骤1: 建立新索引 [Top]

步骤2: 复制数据 [Top]

步骤3: 修改别名关联 [Top]

步骤4: 删除旧索引 [Top]

shard & replica [Top]

primary shard 主分片 [Top]

replica shard 副本分片 [Top]

倒排索引结构 [Top]

spring 集成 [Top]

Files

ES.md

Latest commit

History

ES.md

File metadata and controls