Elasticsearch基本概念简介

Neil Zhu,简书ID Not_GOD,University AI 创始人 & Chief Scientist,致力于推进世界人工智能化进程。制定并实施 UAI 中长期增长战略和目标,带领团队快速成长为人工智能领域最专业的力量。
作为行业领导者,他和UAI一起在2014年创建了TASA(中国最早的人工智能社团), DL Center(深度学习知识中心全球价值网络),AI growth(行业智库培训)等,为中国的人工智能人才建设输送了大量的血液和养分。此外,他还参与或者举办过各类国际性的人工智能峰会和活动,产生了巨大的影响力,书写了60万字的人工智能精品技术内容,生产翻译了全球第一本深度学习入门书《神经网络与深度学习》,生产的内容被大量的专业垂直公众号和媒体转载与连载。曾经受邀为国内顶尖大学制定人工智能学习规划和教授人工智能前沿课程,均受学生和老师好评。

glossary of terms(术语)

analysis(分析)

Analysis is the process of converting full text to terms. Depending on which analyzer is used, these phrases: FOO BAR, Foo-Bar, foo,bar will probably all result in the terms foo and bar. These terms are what is actually stored in the index. A full text query (not a term query) for FoO:bAR will also be analyzed to the terms foo,bar and will thus match the terms stored in the index. It is this process of analysis (both at index time and at search time) that allows elasticsearch to perform full text queries. Also see text and term.

分析是将全文转化为项的过程。取决于选择使用的分析器,如FOO BAR、Foo-Bar、foo,bar可能会产生相同的结果——项foo和bar。这些项是实际存储在索引中的。全文查询(非项查询)FoO:bAR将会同样被分析成项foobar,然后会匹配存储在索引中的项。分析的这个过程(在索引和搜索时)让ES有全文查询的功能。请查看文本(text)和项(term)。

cluster(集群)

A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced if the current master node fails.

集群包含了一个或者多个拥有相同集群名称的节点。每个集群有一个主节点,这是由集群自动选择出来的,并且在当前主节点挂掉时被其他节点取代。

document(文档)

A document is a JSON document which is stored in elasticsearch. It is like a row in a table in a relational database. Each document is stored in an index and has a type and an id. A document is a JSON object (also known in other languages as a hash / hashmap / associative array) which contains zero or more fields, or key-value pairs. The original JSON document that is indexed will be stored in the _source field, which is returned by default when getting or searching for a document.

文档是一个存放在ES中的JSON文档。文档类似于关系数据库中的表中的行。每个文档被存放在一个索引下面,拥有类型和ID。文档是一个JSON对象,包含0个或者多个字段,或者键值对。被索引的原始JSON文档会被存放在_source字段中,在getting或者searching一个文档时,将被返回。

id(ID)

The ID of a document identifies a document. The index/type/id of a document must be unique. If no ID is provided, then it will be auto-generated. (also see routing)

文档的ID标识了一个文档。文档的index/type/id肯定是唯一的。如果没有ID被提供,那么它会被自动生成。(请参看routing

field(字段)

A document contains a list of fields, or key-value pairs. The value can be a simple (scalar) value (eg a string, integer, date), or a nested structure like an array or an object. A field is similar to a column in a table in a relational database. The mapping for each field has a field type (not to be confused with document type) which indicates the type of data that can be stored in that field, eg integer, string, object. The mapping also allows you to define (amongst other things) how the value for a field should be analyzed.

文档包含了字段的列表,或者键值对。值可以是简单值(字符串,整数,日期等),或者一个嵌套的结构(诸如数组或者对象)。字段类似于关系数据库中的表中的列。对每个字段的映射拥有一个字段类型(不同于文档类型),其表示了可以被存放在该字段的数据类型(如整数,字符串、对象等)。映射也允许用户定义字段的值如何被分析。

index(索引)

An index is like a database in a relational database. It has a mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

索引(index)如同关系数据库中的数据库。索引有一个可以定义多个类型(type)映射。索引是一个逻辑命名空间,该空间可以对应于一个或者多个主shard,并且有零个或多个从shard。

mapping(映射)

A mapping is like a schema definition in a relational database. Each index has a mapping, which defines each type within the index, plus a number of index-wide settings. A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.

映射(mapping)就如同一个在关系数据库中的模式定义。每个索引都有一个映射,映射定义了索引中每个类型,加上一堆索引范围内的设置。映射可以显式定义,或者会在文档被索引时自动产生。

node(节点)

A node is a running instance of elasticsearch which belongs to a cluster. Multiple nodes can be started on a single server for testing purposes, but usually you should have one node per server. At startup, a node will use unicast (or multicast, if specified) to discover an existing cluster with the same cluster name and will try to join that cluster.

节点(node)就是一个ES的运行实例,属于一个集群(cluster)。多个节点可以在测试过程中放在一个服务器上启动,但一般来说,你应当在每个服务器上都有一个节点。启动时,节点会使用单播(多播)来发现已有的拥有相同名称的集群,并试着加入那个集群。

primary shard(主shard)

Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. You can specify fewer or more primary shards to scale the number of documents that your index can handle. You cannot change the number of primary shards in an index, once the index is created. See also routing

replica shard(从shard)

Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:

increase failover: a replica shard can be promoted to a primary shard if the primary fails
increase performance: get and search requests can be handled by primary or replica shards. By default, each primary shard has one replica, but the number of replicas can be changed dynamically on an existing index. A replica shard will never be started on the same node as its primary shard.
routing
When you index a document, it is stored on a single primary shard. That shard is chosen by hashing the routing value. By default, the routing value is derived from the ID of the document or, if the document has a specified parent document, from the ID of the parent document (to ensure that child and parent documents are stored on the same shard). This value can be overridden by specifying a routing value at index time, or a routing field in the mapping.

shard(shard)

A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by elasticsearch. An index is a logical namespace which points to primary and replica shards. Other than defining the number of primary and replica shards that an index should have, you never need to refer to shards directly. Instead, your code should deal only with an index. Elasticsearch distributes shards amongst all nodes in the cluster, and can move shards automatically from one node to another in the case of node failure, or the addition of new nodes.

Shard就是一个单独的Lucene实例。他是一个底层的工作单元,被ES自动地管理。索引是逻辑命名空间,其指向了主/从shard。你不需要直接访问shard,不需要定义索引需要的主/从shard的数目。相反,你的代码只需要与索引打交道。ES分发shard到集群中所有节点(node),并且可以在节点挂掉或者新节点增加时自动地把shard从一个节点迁移到另一个。

source field(源字段)

By default, the JSON document that you index will be stored in the _source field and will be returned by all get and search requests. This allows you access to the original object directly from search results, rather than requiring a second step to retrieve the object from an ID. Note: the exact JSON string that you indexed will be returned to you, even if it contains invalid JSON. The contents of this field do not indicate anything about how the data in the object has been indexed.

被索引的JSON文档默认地会存放在_source字段中,并且将被所有getsearch请求返回。这给予你直接从搜索结果获取原始对象的能力,而不需要二次请求,从一个ID查询对象。注意:被索引的JSON字符串将被返回给你,即使他包含了非法的JSON。这个字段的内容不暗示任何关于在这个对象中的数据已被索引的信息。

term(项)

A term is an exact value that is indexed in elasticsearch. The terms foo, Foo, FOO are NOT equivalent. Terms (i.e. exact values) can be searched for using term queries. See also text and analysis.

项是在ES中被索引的确切的值。项foo、Foo、FOO不是等价的。项(确切值)可以通过term查询而被搜索到。请查看文本(term)和分析(analysis)。

text(文本)

Text (or full text) is ordinary unstructured text, such as this paragraph. By default, text will be analyzed into terms, which is what is actually stored in the index. Text fields need to be analyzed at index time in order to be searchable as full text, and keywords in full text queries must be analyzed at search time to produce (and search for) the same terms that were generated at index time. See also term and analysis.

文本(或者全文)是一般的非结构化文本,就像这篇文章。文本默认地会被分析成项(term),项是直接存放在索引中的。文本的字段需要在索引时被分析,从而可以作为全文可搜索的,而在全文查询中的关键词(keyword)必须在搜索时进行分析来产生(和检索)在索引时生成的同样的项(term)。请查看项(term)和分析(analysis)。

type(类型)

A type is like a table in a relational database. Each type has a list of fields that can be specified for documents of that type. The mapping defines how each field in the document is analyzed.

类型如同关系数据库中的表。每个类型有一个字段的列表,该列表可以被指定为该类型的文档。映射则定义了文档的每个字段将如何被分析。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,001评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,210评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,874评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,001评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,022评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,005评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,929评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,742评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,193评论 1 309
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,427评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,583评论 1 346
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,305评论 5 342
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,911评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,564评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,731评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,581评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,478评论 2 352

推荐阅读更多精彩内容

  • 这两天花了更多的时间去读战友们的文章,收获也是巨大的。 身边的人就是最好的书。一群拼命进步的人就像实时更新的知识库...
    素交量场阅读 209评论 2 1
  • 合练
    发卡十八线小粉丝阅读 288评论 0 0
  • 今天介绍的书的是《精要主义》,曾多次登上《纽约时报》的畅销书榜单,为无数人获得成功提供了优秀的经验。从本书里,我们...
    檀子_阅读 812评论 2 8
  • 生活不是目的,而是旅程。这是我刚刚看到芷洁的分享。 这也是我正想表达的,当我们确定了生活的目标,生活并不是目的,而...
    9bdfb277f668阅读 207评论 0 0