Elasticsearch: Elasticsearch 是一个实时的分布式搜索分析引擎,它能让你以一个之前从未有过的速度和规模,去探索你的数据。 它被用作全文检索、结构化搜索、分析以及这三个功能的组合。
可以这样形容Elasticsearch:
- 一个分布式的实时文档存储,每个字段 可以被索引与搜索
- 一个分布式实时分析搜索引擎
- 能胜任上百个服务节点的扩展,并支持 PB 级别的结构化或者非结构化数据
可以在官方下载网址下载安装Elasticsearch,当然也可以在github下载最新的版本(因为它是完全开源的).
1.解压归档文件 tar -xvzf elasticsearch-version.tar.gz
cd elasticsearch-<version>
./bin/elasticsearch
之后就可以在浏览器地址栏输入127.0.0.1:9200,然后你会看到下面类似的结果:
{
"name" : "a9W1rx3",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "rwt9_Vd4RZy6jPdJZ1N6nw",
"version" : {
"number" : "5.1.1",
"build_hash" : "5395e21",
"build_date" : "2016-12-06T12:36:15.409Z",
"build_snapshot" : false,
"lucene_version" : "6.3.0"
},
"tagline" : "You Know, for Search"
}
- 安装kibana可视化web客户端工具,下载地址
1. 解压归档文件 tar -xvzf kibana.version.tar.gz
cd kibana.version
./bin/kibana
在浏览器地址栏输入127.0.0.1:5601
然后你会看到:
选择Dev Tools:
- 测试: 在Dev Tools控制台输入
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
- 获取数据
GET /megacorp/employee/2
- 轻量搜索
GET /megacorp/employee/_search -- 返回前10条数据
GET /megacorp/employee/_search?q=last_name:Smith -- 返回last_name为Smith的数据
- 使用查询表达式搜索
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
- 更复杂的查询
GET /megacorp/employee/_search
{
"query":{
"bool":{
"must":{
"match":{
"last_name": "Smith"
}
}, -- 此处应当有个','
"filter":{
"range":{
"age":{
"gt": 30
}
}
}
}
}
}
- 全文搜索
GET /megacorp/employee/_search
{
"query":{
"match":{
"about": "rock climbing"
}
}
}
然后会得到两个结果:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.53484553,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.53484553,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.26742277,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
默认按照相关性得分排序,即每个文档跟查询的匹配程度.因为第二个用户的“about”属性也提到了“rock”,所以也会返回,但是第一个的“about”包含了"rock climbing”,所以相关性得分更高排在前面.
Elasticsearch中的相关性概念非常重要,也是完全区别于传统关系型数据库的一个概念,数据库中的一条记录要么匹配要么不匹配
- 找出一个属性中的独立单词是没有问题的,但有时候想要精确匹配一系列单词或者短语
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing" -- 匹配短语"rock climbing"
}
}
}
- 高亮搜索: 许多应用都倾向于在每个搜索结果中高亮部分文本片段,以便让用户知道为何该文档符合查询条件
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": { -- 高亮参数
"fields" : {
"about" : {}
}
}
}
返回结果如下:
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
当执行该查询时,返回结果与之前一样,与此同时结果中还多了一个叫做 highlight 的部分。这个部分包含了 about 属性匹配的文本片段,并以 HTML 标签 <em></em> 封装