ElasticSearch7.6.1配置

全文检索：
扫描文章的每一个词，对每个词建立一个索引，指明出现的位置和次数;

效率比数据库模糊查询高；

搜索结果存在相关排序

关键词不区分大小写

只处理文本不处理语义
安装博文：https://www.cnblogs.com/tangyin/p/10830142.html

ES的结构

索引(index) :有相似特征的文档集合（类似数据库中的库）

类型(type) :索引的一个分区（类似数据库中的表，但es6后一个索引只能对应一个表）

映射(mapping)：限定数据结构（类似数据库中的schema，用于定义一个索引的数据结构）

文档(document)：可被索引的基础信息单元，json格式（类似于表中的一条记录）

kibnan ES的可视化工具
ES端口 http://localhost:9200/
kibnan端口 http://localhost:5601

一般为ELK整合
ElasticSearch（检索引擎） +logstash（数据处理引擎）+kibana（可视化）

一基本操作

1.1 索引操作

PUT /索引名添加索引
GET /_cat/indices 查看所有索引
GET /_cat/indices?v 查看所有索引详细信息
GET /索引名查看单个索引
DELETE /索引名删除指定索引
DELETE /* 删除所有索引
HEAD /索引名确认有没有这个索引
POST /索引名/_close 关闭索引（关闭的索引几乎不浪费空间）
POST /索引名/_open 重新打开索引
GET alias1 查看别名

//设置索引结构
PUT /test
{
  "mappings": {
    "properties": {
      "title":{"type": "text"},
      "age":{"type": "integer"}
    }
  }
}
``

//索引设置
PUT blog
{
"settings" : {
"number_of_shards" : 2,//分片数
"number_of_replicas" : 2//副本数
}
}

```java
//更新设置的副本数
PUT blog/_settings
{
  "number_of_replicas": 1
}

//给索引起别名
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index1", //单个起别名
       "indices": ["index2","test"],//多个起别名
        "alias": "alias1"
      }
    }
  ]
}

//别名移出
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "test",
        "alias": "alias1"
      }
    }
  ]
}

1.2 文档操作

GET blog/_doc/1 获取指定文档
HEAD blog/_doc/1 判断文档是否存在

新建文档指定id

_index：文档所在的索引名
_doc：文档所在的类型名（固定格式）
_id：文档ID（尽量不要自己起id名）

PUT test3/_create/1//如果已有就不创建
PUT test3/_doc/1//如果已有就覆盖更新
{
  "name":"aa"
}

新建文档不指定id

//id随机
POST index3/_doc
{
  "name":"aa"
}

更新文档
把之前的全部删除在新建

POST /test/_doc/1
{
  "name":"a"
}

只替换对应部分

POST /test/_update/1
{
  "doc":{
    "name":"c"
  }
}

1.3 批量操作

批量操作时每个操作是独立的
增删改

POST /_bulk
{ "create": { "_index": "blog", "_id": "1" }}
{ "title": "批量测试1" ,"author":"hu", }

{ "delete": { "_index": "blog", "_id": "1" }}

{ "create": { "_index": "blog","_id": "3" }}
{ "title":  "批量测试3" , "author": "hu" }

{ "update": { "_index": "blog", "_id": "3", "retry_on_conflict" : 3} }
{ "doc" : {"title" : "修改测试"} }

批量查询

GET blog/_mget
{
    "ids" : ["1", "2","3"]
}

1.4 查询操作

1.4.1 基于url的简单查询

GET /blog/_search?q=*&sort=price:desc&size=2&from=0
（查询说有，并基于price字段降序返回前2条，size不写的话默认返回10条,排序后得分失效，from为从第几条开始）

对请求结果分析

{
  "took" : 51,          //查询时间51ms
  "timed_out" : false, //是否超时
  "_shards" : {  //扇区分片 ，将一个索引分片后分别检索，最后将结果聚合
    "total" : 1,  //1个分片
    "successful" : 1,  //成功一个
    "skipped" : 0,  //跳过0个
    "failed" : 0  //失败0个
  },
  "hits" : {  //命中，就是相关性
    "total" : { 
      "value" : 3,//总共三条
      "relation" : "eq"
    },
    "max_score" : 1.0,  //最高命中得分1.0
    "hits" : []  //存放数据的数组

1.4.2 DSL(特殊领域查询语言)基于responsebody的查询

1.4.2.1 普通查询

GET /test/_search
{
  "query": {"match_all": {} }   //查询所有
  , "sort": [     //根据age倒序排列
    {
      "age": {
        "order": "desc"
      }
    }
  ]
  ,"size": 3       //每页3条数据
  , "from": 0    //从第一页开始
  , "_source": ["age"]   //只需要age字段
  
}

1.4.2.2 term查询

term注意事项:

ES的查询机制是基于分词查询。默认使用的是标准分词器，对应keyword,integer.date等，整个字段作为一个分词，而text会进行分词处理，对于英文是一个单词作为一个分词，对于中文则是一个汉字对应一个分词，所以在不修改分词器机制下，全文查询只能查询单个汉字；

GET /test/_search
{
  "query": {
    "term": {
      "context": {
        "value": "驱"
      }
    }
  }
}

1.4.2.3 范围查询(大于等于，小于等于)

GET /test/_search
{
  "query": {
    "range": {
      "age": {
        "gte": "14",
        "lte":"16"
      }
    }
  }

1.4.2.4 前缀查询

前缀是指查一个分词的前半部分；

注意：英文在存储的时候都会转成小写所以查询的时候用小写；

GET /test/_search
{
  "query": {
    "prefix": {
      "context": {
       "value": "阿"
      }
    }
  }
}

1.4.2.5 通配符匹配

*任意多个字符？单个字符，注意都是基于分词查

GET /test/_search
{
  "query": {
    "wildcard": {
      "context": {
       "value": "j*"
      }
    }
  }
}

1.4.2.6 模糊匹配（拼写错误匹配）

fuzziness：关键字中允许几个错误，默认两个，可以缺失和多余2个，但当长度和查询关键词一样是就成了查询所有；

GET /test/_search
{
  "query": {
    "fuzzy": {
      "context": {
        "value": "iac"
        , "fuzziness": "auto"
      }
    }
  }
}

1.4.2.7 布尔查询

布尔查询用于组合多条件查询；

must 相当于&&

should 相当于||

must not 相当于！

{
  "query": {
    "bool": {
      "must": [
      。
      。
      。
        ]，
     " must not": [
      ]
    }
  }
}

1.4.2.8 多字段分词查询

可以对输入的查询字段分词，再分别找每个字段，最后组合结果

GET /test/_search
{
  "query": {
    "multi_match": {
         "analyzer": "",  //可以指定分词器检索
      "query": "其味",
      "fields": ["title","context"]
    }
  }
}

GET /test/_search
{
  "query": {
    "query_string": {
      "analyzer": "",  //可以指定分词器检索
      "query": "",
      "fields": []
    }
  }
}

1.4.2.9 高亮查询

GET /test/_search
{
  "query": {
    "term": {
      "context": {
        "value": "驱"
      }
    }
  }
  , "highlight": {
    "fields": {"context":{}}, 
    "pre_tags":["<span style='color:red'>"]
    , "post_tags": ["</span>"]
    , "require_field_match": "false"  //关闭只对查询字段高亮
  }
}

1.5 ES原理图

image.png

1.6 分词器

1.6 .1分词器作用

将关键词提炼出来，去掉停用词和语气词；

1.6 .2ES中自带分词器

标准分词器 standard analyzer (默认) 英文单词分词，数字整段作为分词，中文单字分词；

简单分词器 simple analyzer 英文单词分词，去掉数字，中文不分词；

1.6 3 测试分词器

GET _analyze
{
  "analyzer": "standard", 
  "text": ["我是中国人11"]
}

1.6 4 github提供的IK分词器

1.6 4.1 安装测试

安装：

https://github.com/medcl/elasticsearch-analysis-ik
下载解压后 mvn clean complie package

//测试
GET _analyze
{
  "analyzer": "ik_smart", 
  "text": ["我是中国人"]
}


PUT /test
{
  "mappings": {
    "properties": {
      "title":{"type": "text","analyzer": "ik_smart"},
      "context":{"type": "text","analyzer": "ik_smart"}
    }
  }
}

分词方式

ik_max_word 分出所有可能词汇
-ik_smart 智能分词

1.6 4.2 拓展词和停用词

在 \plugins\analysis-ik\config中有IKAnalyzer.cfg文件，打开后：

image.png

<entry>自定义.dic</entry> 配置自己的dic文件(可以拷贝一个原有的，重新修改写入)
远程词典：<entry>url路径</entry>
可能会出现由于是ip路径被安全策略限制访问的问题；
修改 jre/lib/security/java.policy文件
添加permission java.net.SocketPermission "路径地址:端口号"，“connect,resolve”;

判断远程文件是否更新：响应头里的last-modified和ETag
词库更新后，只能对新建的库生效，原有索引无法修改；

二 JAVA操作ES

2.1 文档地址

文档地址https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-getting-started-initialization.html