ElasticSearch

分布式的实时文件存储，每个字段都被索引并可被搜索
分布式的实时分析搜索引擎
可以扩展到上百台服务器，处理PB级结构化或非结构化数据

与Elasticserach交互####

节点客户端(node client)
传输客户端(Transport client) 9300端口

基于HTTP协议，以JSON为数据交互格式的RESTful API####

elastic.png

基本用法

查询全部(前十个文档)

    GET /megacorp/employee/_search

Elasticsearch的DSL查询用法####

一些常用的DSL语句####

查询所有索引

GET /_cat/indices?v

创建索引

PUT /bookdb_index
{
  "settings": {"number_of_shards": 1}
}

批量上传文档

POST /bookdb_index/book/_bulk
    { "index": { "_id": 1 }}
    { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" }
    { "index": { "_id": 2 }}
    { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }
    { "index": { "_id": 3 }}
    { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }
    { "index": { "_id": 4 }}
    { "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }

基本查询a(查询包含guide的记录)

GET /bookdb_index/book/_search
{
    "query": {
       "multi_match": {
         "query": "in action",
         "fields": ["title"]
       }
    }
}

基本查询b(指定返回字段，语法高亮等)

在下面例子中，我们指定 size限定返回的结果条数，from 指定起始位子，_source 指定要返回的字段，以及语法高亮

POST /bookdb_index/book/_search
{
    "query": {
        "match" : {
            "title" : "in action"
        }
    },
    "size": 2,
    "from": 0,
    "_source": [ "title", "summary", "publish_date" ],
    "highlight": {
        "fields" : {
            "title" : {}
        }
    }
}

对于多个词查询，match 允许指定是否使用 and 操作符来取代默认的 or 操作符。你还可以指定 mininum_should_match 选项来调整返回结果的相关程度。具体看后面的例子。

Boosting

由于我们是多个字段查询，我们可能需要提高某一个字段的分值。在下面的例子中，我们把 summary 字段的分数提高三倍，为了提升 summary 字段的重要度；因此，我们把文档 4 的相关度提高了。

POST /bookdb_index/book/_search
{
    "query": {
        "multi_match" : {
            "query" : "elasticsearch guide",
            "fields": ["title", "summary^3"]
        }
    },
    "_source": ["title", "summary", "publish_date"]
}

Bool查询

为了提供更相关或者特定的结果，AND/OR/NOT 操作符可以用来调整我们的查询。它是以布尔查询的方式来实现的。布尔查询接受如下参数：

a. must 等同于 AND
b. must_not 等同于 NOT
c. should 等同于 OR

POST /bookdb_index/book/_search
{
    "query": {
        "bool": {
            "must": {
                "bool" : { "should": [
                      { "match": { "title": "Elasticsearch" }},
                      { "match": { "title": "Solr" }} ] }
            },
            "must": { "match": { "authors": "clinton gormely" }},
            "must_not": { "match": {"authors": "radu gheorge" }}
        }
    }
}

模糊（Fuzzy）查询

在进行匹配和多项匹配时，可以启用模糊匹配来捕捉拼写错误，模糊度是基于原始单词的编辑距离来指定的。

ps: 当术语长度大于 5 个字符时，AUTO 的模糊值等同于指定值 “2”。但是，80％拼写错误的编辑距离为 1，所以，将模糊值设置为 1可能会提高您的整体搜索性能。

POST /bookdb_index/book/_search
{
    "query": {
        "multi_match" : {
            "query" : "comprihensiv guide",
            "fields": ["title", "summary"],
            "fuzziness": "AUTO"
        }
    },
    "_source": ["title", "summary", "publish_date"],
    "size": 1
}

通配符(wildcard)查询

通配符查询允许你指定匹配的模式，而不是整个术语。

？匹配任何字符

匹配零个或多个字符。
例如，要查找名称以字母’t’开头的所有作者的记录：

POST /bookdb_index/book/_search
{
    "query": {
        "wildcard" : {
            "authors" : "t*"
        }
    },
    "_source": ["title", "authors"],
    "highlight": {
        "fields" : {
            "authors" : {}
        }
    }
}

正则（Regexp）查询

正则查询让你可以使用比通配符查询更复杂的模式进行查询：

POST /bookdb_index/book/_search
{
    "query": {
        "regexp" : {
            "authors" : "t[a-z]*y"
        }
    },
    "_source": ["title", "authors"],
    "highlight": {
        "fields" : {
            "authors" : {}
        }
    }
}

短语匹配(Match Phrase)查询

短语匹配查询要求在请求字符串中的所有查询项必须都在文档中存在，文中顺序也得和请求字符串一致，且彼此相连。默认情况下，查询项之间必须紧密相连，但可以设置 slop 值来指定查询项之间可以分隔多远的距离，结果仍将被当作一次成功的匹配。

POST /bookdb_index/book/_search
{
    "query": {
        "multi_match" : {
            "query": "search engine",
            "fields": ["title", "summary"],
            "type": "phrase",
            "slop": 3
        }
    },
    "_source": [ "title", "summary", "publish_date" ]
}

短语前缀(Match Phrase Prefix)查询

短语前缀式查询能够进行即时搜索（search-as-you-type）类型的匹配，或者说提供一个查询时的初级自动补全功能，无需以任何方式准备你的数据。和 match_phrase 查询类似，它接收slop 参数（用来调整单词顺序和不太严格的相对位置）和 max_expansions参数（用来限制查询项的数量，降低对资源需求的强度）。

POST /bookdb_index/book/_search
{
    "query": {
        "match_phrase_prefix" : {
            "summary": {
                "query": "search en",
                "slop": 3,
                "max_expansions": 10
            }
        }
    },
    "_source": [ "title", "summary", "publish_date" ]
}

注：采用查询时即时搜索具有较大的性能成本。更好的解决方案是采用索引时即时搜索。更多信息，请查看自动补齐接口（Completion Suggester API）或边缘分词器（Edge-Ngram filters）的用法。

查询字符串（Query String）

查询字符串类型（query_string）的查询提供了一个方法，用简洁的简写语法来执行多匹配查询、布尔查询、提权查询、模糊查询、通配符查询、正则查询和范围查询。下面的例子中，我们在那些作者是 “grant ingersoll” 或 “tom morton” 的某本书当中，使用查询项 “search algorithm” 进行一次模糊查询，搜索全部字段，但给 summary 的权重提升 2 倍。

POST /bookdb_index/book/_search
{
    "query": {
        "query_string" : {
            "query": "(saerch~1 algorithm~1) AND (grant ingersoll)  OR (tom morton)",
            "fields": ["_all", "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}

简单查询字符串（Simple Query String）

简单请求字符串类型（simple_query_string）的查询是请求字符串类型（query_string）查询的一个版本，它更适合那种仅暴露给用户一个简单搜索框的场景；因为它用 +/|/- 分别替换了 AND/OR/NOT，并且自动丢弃了请求中无效的部分，不会在用户出错时，抛出异常。

POST /bookdb_index/book/_search
{
    "query": {
        "simple_query_string" : {
            "query": "(saerch~1 algorithm~1) + (grant ingersoll)  | (tom morton)",
            "fields": ["_all", "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}

词条（Term）/多词条（Terms）查询

以上例子均为 full-text(全文检索) 的示例。有时我们对结构化查询更感兴趣，希望得到更准确的匹配并返回结果，词条查询和多词条查询可帮我们实现。在下面的例子中，我们要在索引中找到所有由 Manning 出版的图书。

POST /bookdb_index/book/_search
{
    "query": {
        "term" : {
            "publisher": "manning"
        }
    },
    "_source" : ["title","publish_date","publisher"]
}

词条（Term）查询 - 排序（Sorted）
词条查询的结果（和其他查询结果一样）可以被轻易排序，多级排序也被允许：

POST /bookdb_index/book/_search
{
    "query": {
        "term" : {
            "publisher": "manning"
        }
    },
    "_source" : ["title","publish_date","publisher"],
    "sort": [
        { "publish_date": {"order":"desc"}},
        { "title": { "order": "desc" }}
    ]
}

范围查询

另一个结构化查询的例子是范围查询。在这个例子中，我们要查找 2015 年出版的书。

POST /bookdb_index/book/_search
{
    "query": {
        "range" : {
            "publish_date": {
                "gte": "2015-01-01",
                "lte": "2015-12-31"
            }
        }
    },
    "_source" : ["title","publish_date","publisher"]
}

过滤(Filtered)查询

过滤查询允许你可以过滤查询结果。对于我们的例子中，要在标题或摘要中检索一些书，查询项为 Elasticsearch，但我们又想筛出那些仅有 20 个以上评论的。

POST /bookdb_index/book/_search
{
    "query": {
        "bool": {
            "must" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "range" : {
                    "num_reviews": {
                        "gte": 20
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews"]
}

ElasticSearch技术初识