ES 学习记录
环境
- Centos7 X64
- ES V7.3.0
- Logstash-7.3.0
注意:
- 请不要使用root用户启动,ES 为了安全,禁止使用root账户启动
- 内存最好大一点,即便是虚拟机也最好大于3G, 不然可能会启动失败
- ES入门教程 http://www.ruanyifeng.com/blog/2017/08/elasticsearch.html
1. ES 下载安装
官方安装 https://www.elastic.co/cn/downloads/elasticsearch
- 下载自己的版本
- 解压
- 运行 Run bin/elasticsearch (or bin\elasticsearch.bat on Windows)
- 测试是否安装成功命令。 curl http://localhost:9200/ ( 如果是windows系统, 直接在浏览器访问http://localhost:9200/ )
中文分词插件
安装命令
sudo ./bin/elasticsearch-plugin install
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.3.0/elasticsearch-analysis-ik-7.3.0.zip
ES 中 index type document 概念
对于熟悉mysql的:
- index 可以理解为 数据库
- type 理解为 表 [现版本的ES中, 每一个index,只能有一个type; 有废弃的迹象]
- document 理解为 表中一条记录
2 命令查询
2.1查看所有index
curl是一种命令行工具,作用是发出网络请求,然后得到和提取数据,显示在"标准输出"(stdout)上面。 可以理解为命令行版的postman
http://www.ruanyifeng.com/blog/2011/09/curl.html
curl -X GET 'http://localhost:9200/_cat/indices?v'
2.2新建index
curl -X PUT 'localhost:9200/weather'
2.3删除
curl -X DELETE 'localhost:9200/weather'
2.4 POST 增加 Document; localhost:9200/{index}/{type}
curl -H "Content-Type: application/json" -X POST 'localhost:9200/weather/20180101' -d '
{
"date": "老六",
"title": "golang",
"desc": "go开发工程师"
}'
2.5 PUT 方式 增加document
localhost:9200/{index}/{type}/{id}
curl -H "Content-Type: application/json" -X PUT 'localhost:9200/weather/20180101/1' -d '
{
"date": "张三1",
"title": "工程师1",
"desc": "晴天1"
}'
2.6 查看记录
查询index = weather, type = 20180101, id = 1 的记录
curl 'localhost:9200/weather/20180101/1'
删除
curl -XDELETE 'localhost:9200/weather/20180101/1'
更新[存在更新,不存在新建]
curl -H "Content-Type: application/json" -XPUT 'localhost:9200/weather/20180101/1' -d '
{
"date":"lisi",
"title":"title2",
"desc":"dadfsaf"
}'
返回所有记录
curl 'localhost:9200/weather/20180101/_search'
# 20180101 为type,现规定一个index只能有一个type, 所以不带type, 也可以查询所有记录
curl 'localhost:9200/weather/_search'
# 返回结果
# 参数官方解释 https://www.elastic.co/guide/cn/elasticsearch/guide/current/empty-search.html
# 可以留意下took, 表示查询花费的时间
# score 标识匹配程度, 返回记录中,每一条都有一个 _score ,记录以此降序
{
"took":4,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"skipped":0,
"failed":0
},
"hits":{
"total":{
"value":3,
"relation":"eq"
},
"max_score":1,
"hits":[
{
"_index":"weather",
"_type":"20180101",
"_id":"IbOjrmwBucBWZ7yGLubW",
"_score":1,
"_source":{
"date":"张三",
"title":"工程师",
"desc":"晴天"
}
},
{
"_index":"weather",
"_type":"20180101",
"_id":"JLOtrmwBucBWZ7yGrubi",
"_score":1,
"_source":{
"date":"张三",
"title":"工程师",
"desc":"晴天"
}
},
{
"_index":"weather",
"_type":"20180101",
"_id":"1",
"_score":1,
"_source":{
"date":"lisi",
"title":"title2",
"desc":"dadfsaf"
}
}
]
}
}
3 复杂搜索
3.1 查询带有 “晴” 的记录, match 表示desc字段包含 “晴”
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {"match": {"desc":"晴"}}
}'
3.2 数据量太大,只想要前几条数据? size:大小
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {"match": {"desc":"晴"}},
"size" : 1
}'
3.3 不想要前十条数据? from: 偏移,默认0; 本例表示从位置1开始,查询一条记录
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {"match": {"desc":"晴"}},
"from": 1,
"size" : 1
}'
3.4 逻辑OR运算
工 和 师 之间多了一个空格,标识逻辑OR查询
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {"match": {"title": "工 师"}}
}'
# 下面这个是不可以的, 多个field的查询 需要用bool查询,往下看
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {"match": {"desc": "工程师", "title":"golang" }}
}'
3.5 逻辑AND运算, must, should, must_not, filter
must :
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query":{
"bool": {
"must": [
{"match":{"desc":"js"}},
{"match":{"desc":"工程师"}}
]
}
}
}'
# 下面语句是 must: desc字段包含 "工程师", must_not: title字段不包含"golang"的 docucument
# must_not 语句不会影响评分; 它的作用只是将不相关的文档排除。
# 所有 must 语句必须匹配,所有 must_not 语句都必须不匹配
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query":{
"bool": {
"must": {"match": {"desc":"工程师"}},
"must_not":{"match": {"title":"golang"}}
}
}
}'
# 结果
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"skipped":0,
"failed":0
},
"hits":{
"total":{
"value":2,
"relation":"eq"
},
"max_score":1.7263287,
"hits":[
{
"_index":"weather",
"_type":"20180101",
"_id":"3ffesWwBCUKd2dtSgQyB",
"_score":1.7263287,
"_source":{
"date":"王五",
"title":"JavaScript",
"desc":"js开发工程师"
}
},
{
"_index":"weather",
"_type":"20180101",
"_id":"3PfesWwBCUKd2dtSIAzZ",
"_score":1.5912248,
"_source":{
"date":"李四",
"title":"java",
"desc":"扎瓦开发工程师"
}
}
]
}
}
# should 一个文档不必包含 扎瓦 或 js 这两个词项,但如果一旦包含,我们就认为它们 更相关 :
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query":{
"bool": {
"must": {"match": {"desc":"工程师"}},
"must_not":{"match": {"title":"golang"}},
"should": [
{"match": {"desc":"扎瓦"}},
{"match": {"desc":"js"}}
]
}
}
}'
#携带should 查询的结果,虽然返回的记录相同,但是他们的得分更高
{
"took":21,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"skipped":0,
"failed":0
},
"hits":{
"total":{
"value":2,
"relation":"eq"
},
"max_score":3.9487753,
"hits":[
{
"_index":"weather",
"_type":"20180101",
"_id":"3PfesWwBCUKd2dtSIAzZ",
"_score":3.9487753,
"_source":{
"date":"李四",
"title":"java",
"desc":"扎瓦开发工程师"
}
},
{
"_index":"weather",
"_type":"20180101",
"_id":"3ffesWwBCUKd2dtSgQyB",
"_score":3.0051887,
"_source":{
"date":"王五",
"title":"JavaScript",
"desc":"js开发工程师"
}
}
]
}
}
match 标识分词匹配,term标识完全匹配, range 标识范围
filtered 已经被 bool query 替代
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {
"bool": {
"must": {
"match":{
"desc": "工程师"
}
},
"filter": {
"term": {
"title":"golang"
}
}
}
}
}
'
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {
"term": {
"title":"golang"
}
}
}'
curl -H 'Content-Type: application/Json' 'localhost:9200/weather/20180101/_search' -d '
{
"query": {
"bool": {
"filter": {
"match_phrase": {
"desc": {"query": "工程师"}
}
}
}
}
}
'
用到这里 已经使用过两个关键字 _cat, _search
4 ES 同步 Mysql
4.1 logstash 安装
官方安装文档 https://www.elastic.co/cn/downloads/logstash
- 下载
- 解压
- 配置一个简单的文件
- 运行
4.2 配置
# jdbc.conf
# 输入部分
input {
stdin {}
jdbc {
# mysql数据库驱动
jdbc_driver_library => "//home/ccnn/logstash-7.3.0/config/mysql-connector-java-5.1.47.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# mysql数据库链接,数据库名
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
# mysql数据库用户名,密码
jdbc_user => "root"
jdbc_password => "12345678"
# 设置监听间隔 各字段含义(分、时、天、月、年),全部为*默认含义为每分钟更新一次
schedule => "* * * * *"
# 分页
jdbc_paging_enabled => "true"
# 分页大小
jdbc_page_size => "50000"
# sql语句执行文件,也可直接使用 statement => 'select * from t_employee'
statement_filepath => "jdbc_driver_library => "/home/ccnn/logstash-7.3.0/config/t_depart.sql"
# elasticsearch索引类型名
type => "t_employee"
}
}
# 过滤部分(不是必须项)
filter {
json {
source => "message"
remove_field => ["message"]
}
}
# 输出部分
output {
elasticsearch {
# elasticsearch索引名
index => "octopus"
# 使用input中的type作为elasticsearch索引下的类型名
document_type => "%{type}" # <- use the type from each input
# elasticsearch的ip和端口号
hosts => "localhost:9200"
# 同步mysql中数据id作为elasticsearch中文档id
document_id => "%{id}"
}
stdout {
codec => json_lines
}
}
# 注: 使用时请去掉此文件中的注释,不然会报错
# jdbc.sql
select * from t_employee
4.3 启动服务, 指定配置文件
cd logstash-7.3.0
# 检查配置文件语法是否正确
bin/logstash -f config/jdbc.conf --config.test_and_exit
# 启动
bin/logstash -f config/jdbc.conf --config.reload.automatic
注册为linux服务
- 略
4.4 配置多个同步表怎么办?
# 查看index 是否添加进去
curl localhost:9200/_cat/indices
# 查看 index_employee里面有没有数据?
curl localhost:9200/index_employee/_search
4.5 多表增量更新
# sql 中增加 :sql_last_value 过滤, 要求表中存在更新日期字段
# t_depart.sql
SELECT * FROM T_depart where updated_at > :sql_last_value
# t_employee.sql
select * from t_employee where updated_at > :sql_last_value
# jdbc.conf
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://192.168.1.150:3306/test"
jdbc_user => "root"
jdbc_password => "root"
jdbc_driver_library => "/home/ccnn/logstash-7.3.0/config/mysql-connector-java-5.1.47.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
statement_filepath => "/home/ccnn/logstash-7.3.0/config/t_depart.sql"
schedule => "* * * * *"
type => "depart"
}
jdbc {
jdbc_connection_string => "jdbc:mysql://192.168.1.150:3306/test"
jdbc_user => "root"
jdbc_password => "root"
jdbc_driver_library => "/home/ccnn/logstash-7.3.0/config/mysql-connector-java-5.1.47.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
parameters => {"number" => "200"}
statement_filepath => "/home/ccnn/logstash-7.3.0/config/t_employee.sql"
schedule => "* * * * *"
type => "employee"
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
}
output {
if[type] == "depart" {
elasticsearch {
hosts => ["localhost:9200"]
index => "index_depart"
document_id => "%{id}"
}
}
if[type] == "employee" {
elasticsearch {
hosts => ["localhost:9200"]
index => "index_employee"
document_id => "%{id}"
}
}
stdout {
codec => json_lines
}
}
5. go包 olivere/elastic 基本操作
选择了第三方的工具包 https://gopkg.in/olivere/elastic.v7
go get gopkg.in/olivere/elastic.v7
import "gopkg.in/olivere/elastic.v7"
package main
import (
"context"
"encoding/json"
"log"
"gopkg.in/olivere/elastic.v7"
)
var client *elastic.Client
var host = "http://localhost:9200"
func main() {
Init()
log.Println("test go ", client)
e1 := Employee{1, "周瑞发", "password", 30}
//创建 其中 index type id 和 ES 中概念相同
put1, err1 := client.Index().Index("go-test").Type("emp").Id("2").BodyJson(e1).Do(context.Background())
if err1 != nil {
log.Println("init fiald", err1.Error())
}
log.Println("添加成功: ", put1.Id, put1.Index, put1.Type)
//修改
res0, err0 := client.Update().Index("go-test").Type("emp").Id("2").Doc(map[string]interface{}{"Name": "刘德华刘德华"}).Do(context.Background())
if err0 != nil {
log.Println("修改失败 ", err0.Error())
}
log.Println("更新 ", res0.Result)
//查询id
res2, err2 := client.Get().Index("go-test").Type("emp").Id("2").Do(context.Background())
if err2 != nil {
log.Println("查询", err2.Error())
} else {
log.Println("查询结果: ", res2.Id, res2.Index, res2.Type, res2.Fields)
if res2.Found {
source, _ := res2.Source.MarshalJSON()
log.Println(source)
log.Println(string(source[:]))
}
}
//根据文档名字查询
//搜索
res5, err5 := client.Search("go-test").Type("emp").Do(context.Background())
if err5 != nil {
log.Println(err5)
}
log.Println("TotalHits ", res5.TotalHits())
// match query
matchQuery := elastic.NewMatchQuery("Password", "password")
rsp, err := client.Search("go-test").Type("emp").Query(matchQuery).Do(context.Background())
if err == nil {
log.Println("match query : ", rsp.Hits.Hits)
if err == nil {
for _, hit := range rsp.Hits.Hits {
var cmp Employee
err := json.Unmarshal(hit.Source, &cmp)
if err != nil {
log.Panicln(err)
}
log.Println(cmp)
}
}
}
//删除
res1, err1 := client.Delete().Index("go-test").Type("emp").Id("1").Do(context.Background())
if err1 != nil {
log.Println("删除失败 ", err1.Error())
}
log.Println("删除: ", res1.Result)
}
func Init() {
log.Println("init start ")
var err0 error
// 初始化一个操作对象
client, err0 = elastic.NewClient(elastic.SetURL(host))
if err0 != nil {
log.Println(err0)
}
//是否连接成功
info, code, err1 := client.Ping(host).Do(context.Background())
if err1 != nil {
log.Println(err1.Error())
return
}
log.Println(info, code)
//获取ES的信息
esversion, err2 := client.ElasticsearchVersion(host)
if err2 != nil {
log.Println(err2.Error())
return
}
log.Println("ES VERSION : ", esversion)
}
type Employee struct {
Id int
Name string
Password string
Age int
}