es中的term和match的区别

`term` 和 `match` 总结

在实际的项目查询中，term和match 是最常用的两个查询，而经常搞不清两者有什么区别，趁机总结有空总结下。

`term`用法

先看看term的定义，term是代表完全匹配，也就是精确查询，搜索前不会再对搜索词进行分词拆解。

这里通过例子来说明，先存放一些数据：

{
    "title": "love China",
    "content": "people very love China",
    "tags": ["China", "love"]
}
{
    "title": "love HuBei",
    "content": "people very love HuBei",
    "tags": ["HuBei", "love"]
}

来使用term 查询下：

{
  "query": {
    "term": {
      "title": "love"
    }
  }
}

结果是，上面的两条数据都能查询到：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

发现，title里有关love的关键字都查出来了，但是我只想精确匹配 love China这个，按照下面的写法看看能不能查出来：

{
  "query": {
    "term": {
      "title": "love China"
    }
  }
}

执行发现无数据，从概念上看，term属于精确匹配，只能查单个词。我想用term匹配多个词怎么做？可以使用terms来：

{
  "query": {
    "terms": {
      "title": ["love", "China"]
    }
  }
}

查询结果为：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

发现全部查询出来，为什么？因为terms里的[ ] 多个是或者的关系，只要满足其中一个词就可以。想要通知满足两个词的话，就得使用bool的must来做，如下：

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "title": "love"
          }
        },
        {
          "term": {
            "title": "china"
          }
        }
      ]
    }
  }
}

可以看到，我们上面使用china是小写的。当使用的是大写的China 我们进行搜索的时候，发现搜不到任何信息。这是为什么了？title这个词在进行存储的时候，进行了分词处理。我们这里使用的是默认的分词处理器进行了分词处理。我们可以看看如何进行分词处理的？

分词处理器

GET test/_analyze
{
  "text" : "love China"
}

结果为：

{
  "tokens": [
    {
      "token": "love",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "china",
      "start_offset": 5,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

分析出来的为love和china的两个词。而term只能完完整整的匹配上面的词，不做任何改变的匹配。所以，我们使用China这样的方式进行的查询的时候，就会失败。稍后会有一节专门讲解分词器。

`match` 用法

先用 love China来匹配。

GET test/doc/_search
{
  "query": {
    "match": {
      "title": "love China"
    }
  }
}

结果是：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": [
            "HuBei",
            "love"
          ]
        }
      }
    ]
  }
}

发现两个都查出来了，为什么？因为match进行搜索的时候，会先进行分词拆分，拆完后，再来匹配，上面两个内容，他们title的词条为： love china hubei ，我们搜索的为love China 我们进行分词处理得到为love china ，并且属于或的关系，只要任何一个词条在里面就能匹配到。如果想 love 和 China 同时匹配到的话，怎么做？使用 match_phrase

`match_phrase` 用法

match_phrase 称为短语搜索，要求所有的分词必须同时出现在文档中，同时位置必须紧邻一致。

GET test/doc/_search
{
  "query": {
    "match_phrase": {
      "title": "love china"
    }
  }
}

结果为：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      }
    ]
  }
}

这次好像符合我们的需求了，结果只出现了一条记录。

es中的term和match的区别