ES基本语法

背景：最近使用接触 ES 总体来说还是比较多的，故记录一下信息

1.查看所有索引

GET /_cat/indices?v

2.查询es集群节点

GET /_cat/nodes?v

3.查看集群是否启动成功

curl 'http://10.2.7.24:9200/?pretty'

4.随机查询某索引10条数据，并且可以看到该索引总数据量

get hy_enterprise/_search

5.使用match查询某字段为某值得数据

GET /hy_enterprise/enterprise/_search
{
    "query" : {
        "match" : {
            "name_ik" : "北京市朝阳区安贞街道陈静信息咨询部"
        }
    }
}

6.多条件查询，查询出上海企业中注册资金在[0,100]w的企业

-- 使用 constant_score 查询以非评分模式来执行 term 查询,查询结果中_socre均为1

GET /hy_enterprise/enterprise/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "province_code" : "310000" 
                }
            },
            "filter": {
                "range" : {
                    "reg_capi_num" : { 
                      "gte" : 0 ,
                      "lte":100
                    } 
              }
          }
      }
    }
}

7.全文搜索：查找addresses字段中存在 '杨浦' 则个词的数据

get hy_enterprise/enterprise/_search
{
  "query":{
    "match": {
      "addresses": "杨浦"
    }
  }
}

8.短语精确匹配：精确匹配 addresses 字段中数据为 ‘盘县保基乡大坝地村’ 的数据

get hy_enterprise/enterprise/_search
{
  "query":{
    "match_phrase": {
      "addresses": "盘县保基乡大坝地村"
    }
  }
}

9.高亮搜索：搜索结果中显示高亮部分文本片段，以便让用户知道为何该文档符合查询条件

GET /hy_enterprise/enterprise/_search
{
    "query" : {
        "match_phrase" : {
            "addresses" : "盘县保基乡大坝地村"
        }
    },
    "highlight": {
        "fields" : {
            "addresses" : {}
        }
    }
}

10.aggregations聚合分析

分组status字段，并统计出每个分组的数据量

GET /hy_enterprise/enterprise/_search
{
  "aggs": {
    "all_status": {
      "terms": { "field": "status" }
    }
  }
}

首先找到province_code='310000'的所有数据，然后根据status进行分组

GET /hy_enterprise/enterprise/_search
{
  "query": {
    "match": {
      "province_code": "310000"
    }
  }, 
  "aggs": {
    "all_status": {
      "terms": { "field": "status" }
    }
  }
}

根据字段 status 进行分组，再求每个分组内的 reg_capi_num 字段的平均值

GET /hy_enterprise/enterprise/_search
{
  "aggs": {
    "all_status": {
     "terms" : { "field" : "status"},
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "reg_capi_num"}
                }
            }
        }
    }
}

11.查看集群健康状态

GET /_cluster/health

12. 创建一个索引

PUT /blogs  --索引名为blogs
{
   "settings" : {
      "number_of_shards" : 3,   --分片数为3
      "number_of_replicas" : 1  --副本数为1
   }
}

13.动态调整副本数,读操作——搜索和返回数据——可以同时被主分片或副本分片所处理，所以当你拥有越多的副本分片时，也将拥有越高的吞吐量。

PUT /blogs/_settings
{
   "number_of_replicas" : 2
}

14.ES每条数据有个唯一的_id

PUT /website/blog/123
{
  "title": "My first blog entry",
  "text":  "Just trying this out...",
  "date":  "2014/01/01"
}   --指定_id:123
结果：
{
   "_index":    "website",
   "_type":     "blog",
   "_id":       "123",
   "_version":  1,
   "created":   true
}

POST /website/blog/
{
  "title": "My second blog entry",
  "text":  "Still trying this out...",
  "date":  "2014/01/01"
}   --es自动生成_id
结果：
{
   "_index":    "website",
   "_type":     "blog",
   "_id":       "AVFgSgVHUP18jI2wRx0w",
   "_version":  1,
   "created":   true
}

15.根据_id获取某条数据

-- 获取某一条完整的数据
get hy_enterprise/enterprise/9d4c4418-aaf0-11ea-bd08-00163e1254b5

-- 获得一条数据中某些字段
get hy_enterprise/enterprise/9d4c4418-aaf0-11ea-bd08-00163e1254b5?_source=name_smart_ik,org_type
结果：
{
  "_index": "hy_enterprise_2",
  "_type": "enterprise",
  "_id": "9d4c4418-aaf0-11ea-bd08-00163e1254b5",
  "_version": 5,
  "found": true,
  "_source": {
    "name_smart_ik": "盘县保基乡财政所",
    "org_type": "机关及事业单位"
  }
}

16.检测某所引中某条数据是否存在

curl -i -XHEAD http://10.2.7.24:9200/hy_enterprise/enterprise/9d4c4418-aaf0-11ea-bd08-00163e1254b5

17.删除某条数据

DELETE /website/blog/123

18.读写数据时指定版本好，如果版本号正确则会返回数据；若错误则会提示当前该条数据的版本号是多少(在某些应用场景中会用到)

get hy_enterprise/enterprise/9d4c4418-aaf0-11ea-bd08-00163e1254b5?version=1

18.对某条数据增加字段

POST /website/blog/1/_update
{
   "doc" : {
      "tags" : [ "testing" ],
      "views": 0
   }
}   --增加个tags以及views字段

19.一条语句查询多条数据

-- 查询多个文档中的不同数据的指定字段
GET /_mget
{
   "docs" : [
      {
         "_index" : "hy_enterprise",
         "_type" :  "enterprise",
         "_id" :    "9d4c4418-aaf0-11ea-bd08-00163e1254b5",
         "_source":"name_smart_ik"
      },
      {
         "_index" : "hy_articles",
         "_type" :  "articles",
         "_id" :    "60c072eadfcebd277502f631",
         "_source": "source"
      }
   ]
}

-- 同一个索引中想检索多条数据
GET /hy_articles/articles/_mget
{
   "ids" : [ "60c072eadfcebd277502f631", "60be6765dfcebd2775016c9d" ]
}

20.分页查询

GET /hy_enterprise/_search?size=5&from=5
-- size:显示应该返回的结果数量，默认是 10
-- from:显示应该跳过的初始结果数量，默认是 0

21.查看某索引mapping

ads_search_enterprise_inc

22.复合语句

{
    "bool": {
        "must":     { "match": { "tweet": "elasticsearch" }},
        "must_not": { "match": { "name":  "mary" }},
        "should":   { "match": { "tweet": "full text" }},
        "filter":   { "range": { "age" : { "gt" : 30 }} }
    }
}

23.range 查询

-- gt：大于
-- gte：大于等于
-- lt：小于
-- lte：小于等于
GET hy_enterprise/enterprise/_search
{
  "query": {
    "range": {
      "reg_capi_num": {
        "gte": 0,
        "lte": 100
      }
    }
  }
}

24.multi_match 查询可以在多个字段上执行相同的 match 查询

GET hy_enterprise/enterprise/_search
{
  "query": {
    "multi_match": {
      "query": "310000",
      "fields": ["province_code","district_code"]
    }
  }
}

25.term 查询被用于精确值匹配，这些精确值可能是数字、时间、布尔或者那些 not_analyzed 的字符串

GET hy_enterprise/enterprise/_search
{
  "query": {
    "term": {
      "category_new": {
        "value": "0115601"
      }
    }
  }
}
-- term 查询对于输入的文本不 分析 ，所以它将给定的值进行精确查询

26.terms 查询

-- terms 查询和 term 查询一样，但它允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件
GET hy_enterprise/enterprise/_search
{
  "query": {
    "terms": {
      "FIELD": [
        "search",
        "full_text"
      ]
    }
  }
}

27.组合查询

must：文档必须匹配这些条件才能被包含进来。
must_not：文档必须不匹配这些条件才能被包含进来。
should：如果满足这些语句中的任意语句，将增加 _score ，否则，无任何影响。它们主要用于修正每个文档的相关性得分。
filter：必须匹配，但它以不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。

-- 如果没有 must 语句，那么至少需要能够匹配其中的一条 should 语句。但，如果存在至少一条 must 语句，则对 should 语句的匹配没有要求。
GET hy_enterprise/enterprise/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "province_code": "310000"
        }}
      ],
      "must_not": [
        {"match": {
          "l1_domains.code": "C0000"
        }}
      ],
      "should": [
        {"range": {
          "reg_capi_num": {
            "gte": 0,
            "lte": 100
          }
        }}
      ]
    }
  }
}

GET hy_enterprise/enterprise/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "province_code": "310000"
        }}
      ],
     "filter": {
        "range": {
          "reg_capi_num": {
            "gte": 0,
            "lte": 100
          }
        }
      }
    }
  }
}

28.sort排序

-- 结果首先按第一个条件排序，仅当结果集的第一个 sort 值完全相同时才会按照第二个条件进行排序，以此类推。
GET hy_enterprise/enterprise/_search
{
  "query": {
    "bool": {
      "filter": {"term": {
        "province_code": "310000"
      }}
    }
  },
  "sort": [
    {
      "start_date": {
        "order": "desc"
      }
    },
    {
      "_score":{
        "order": "desc"
      }
    }
  ]
}

29.删除多个索引

DELETE /index_one,index_two
DELETE /index_*

30.动态修改索引副本数

-- number_of_shards:每个索引的主分片数，默认值是 5 。这个配置在索引创建后不能修改。
-- number_of_replicas:每个主分片的副本数，默认值是 1 。对于活动的索引库，这个配置可以随时修改。
PUT /my_temp_index/_settings
{
    "number_of_replicas": 1
}

31.自定义分析器

- 使用 html清除 字符过滤器移除HTML部分。
- 使用一个自定义的 映射 字符过滤器把 & 替换为 " and " 
- 使用 标准 分词器分词
- 小写词条，使用 小写 词过滤器处理
- 使用自定义 停止 词过滤器移除自定义的停止词列表中包含的词
PUT /my_index
{
    "settings": {
        "analysis": {
            "char_filter": {
                "&_to_and": {
                    "type":       "mapping",
                    "mappings": [ "&=> and "]
            }},
            "filter": {
                "my_stopwords": {
                    "type":       "stop",
                    "stopwords": [ "the", "a" ]
            }},
            "analyzer": {
                "my_analyzer": {
                    "type":         "custom",
                    "char_filter":  [ "html_strip", "&_to_and" ],
                    "tokenizer":    "standard",
                    "filter":       [ "lowercase", "my_stopwords" ]
            }}
}}}

32.索引别名

-- 设置别名 my_index 指向 my_index_v1，即my_index 是索引my_index_v1的别名
PUT /my_index_v1/_alias/my_index

-- 检测某一别名中包含哪些索引
GET /*/_alias/my_index

-- 零停机切换索引
POST /_aliases
{
    "actions": [
        { "remove": { "index": "my_index_v1", "alias": "my_index" }},
        { "add":    { "index": "my_index_v2", "alias": "my_index" }}
    ]
}

33.精确值查找

-- 使用 constant_score 查询以非评分模式来执行 term 查询，故查询的结果中_score均为1
GET hy_enterprise/enterprise/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "province_code": "310000"
        }
      }
    }
  }
}

-- 想要term查询找到精确长串（不自动分词的），建索引时设置该字段 index=not_analyzed
PUT /my_store 
{
    "mappings" : {
        "products" : {
            "properties" : {
                "productID" : {
                    "type" : "string",
                    "index" : "not_analyzed" 
                }
            }
        }
    }
}
-- 字段 productID 中的数据不会进行分词解析

34.组合过滤器

- 布尔过滤器
- must:所有的语句都 必须（must） 匹配，与 AND 等价。
- must_not:所有的语句都 不能（must not） 匹配，与 NOT 等价。
- should:至少有一个语句要匹配，与 OR 等价。

GET /my_store/products/_search
{
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "should" : [
                 { "term" : {"price" : 20}}, 
                 { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} 
              ],
              "must_not" : {
                 "term" : {"price" : 30} 
              }
           }
         }
      }
   }
}

GET /my_store/products/_search
{
   "query" : {
      "filtered" : {
         "filter" : {
            "bool" : {
              "should" : [
                { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, 
                { "bool" : { 
                  "must" : [
                    { "term" : {"productID" : "JODL-X-1937-#pV7"}}, 
                    { "term" : {"price" : 30}} 
                  ]
                }}
              ]
           }
         }
      }
   }
}

35.match提高精度查询

-- 默认情况下match中operate操作符为or,可进行设置，设置为and；即数据中需同时包含才算匹配
GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": {      
                "query":    "BROWN DOG!",
                "operator": "and"
            }
        }
    }
}
- title 中需同时包含brown 和 dog 才算匹配

-- 控制精度匹配，match 查询支持 minimum_should_match 最小匹配参数，这让我们可以指定必须匹配的词项数用来表示一个文档是否相关。我们可以将其设置为某个具体数字，更常用的做法是将其设置为一个百分数
GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": {
        "query":                "quick brown dog",
        "minimum_should_match": "75%"
      }
    }
  }
}

36.bool查询中控制should匹配数

-- 就像我们能控制 match 查询的精度 一样，我们可以通过 minimum_should_match 参数控制需要匹配的 should 语句的数量，它既可以是一个绝对的数字，又可以是个百分比
GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "brown" }},
        { "match": { "title": "fox"   }},
        { "match": { "title": "dog"   }}
      ],
      "minimum_should_match": 2 
    }
  }
}

37.相关性（评分规则）

每个文档都有相关性评分，用一个正浮点数字段 _score 来表示。 _score 的评分越高，相关性越高。

查询语句会为每个文档生成一个 _score 字段。评分的计算方式取决于查询类型不同的查询语句用于不同的目的： fuzzy 查询会计算与关键词的拼写相似程度，terms 查询会计算找到的内容与关键词组成部分匹配的百分比，但是通常我们说的 relevance 是我们用来计算全文本字段的值相对于全文本检索词相似程度的算法。

Elasticsearch 的相似度算法被定义为检索词频率/反向文档频率， TF/IDF ，包括以下内容：

检索词频率：检索词在该字段出现的频率？出现频率越高，相关性也越高。字段中出现过 5 次要比只出现过 1 次的相关性高。
反向文档频率：每个检索词在索引中出现的频率？频率越高，相关性越低。检索词出现在多数文档中会比出现在少数文档中的权重更低。
字段长度准则：字段的长度是多少？长度越长，相关性越低。检索词出现在一个短的 title 要比同样的词出现在一个长的 content 字段权重更大。

38.查询语句提升权重

希望为提及 “Elasticsearch” 或 “Lucene” 的文档给予更高的权重，这里更高权重是指如果文档中出现 “Elasticsearch” 或 “Lucene” ，它们会比没有的出现这些词的文档获得更高的相关度评分 _score ，也就是说，它们会出现在结果集的更上面。可以通过指定 boost 来控制任何查询语句的相对的权重， boost 的默认值为 1 ，大于 1 会提升一个语句的相对权重

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "match": {  
                    "content": {
                        "query":    "full text search",
                        "operator": "and"
                    }
                }
            },
            "should": [
                { "match": {
                    "content": {
                        "query": "Elasticsearch",
                        "boost": 3 
                    }
                }},
                { "match": {
                    "content": {
                        "query": "Lucene",
                        "boost": 2 
                    }
                }}
            ]
        }
    }
}

最后编辑于：2022.03.11 10:51:50

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 216,544评论 6赞 501
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,430评论 3赞 392
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 162,764评论 0赞 353
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,193评论 1赞 292
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,216评论 6赞 388
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,182评论 1赞 299
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,063评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,917评论 0赞 274
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,329评论 1赞 310
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,543评论 2赞 332
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,722评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,425评论 5赞 343
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,019评论 3赞 326
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,671评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,825评论 1赞 269
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,729评论 2赞 368
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,614评论 2赞 353