Neil Zhu,简书ID Not_GOD,University AI 创始人 & Chief Scientist,致力于推进世界人工智能化进程。制定并实施 UAI 中长期增长战略和目标,带领团队快速成长为人工智能领域最专业的力量。
作为行业领导者,他和UAI一起在2014年创建了TASA(中国最早的人工智能社团), DL Center(深度学习知识中心全球价值网络),AI growth(行业智库培训)等,为中国的人工智能人才建设输送了大量的血液和养分。此外,他还参与或者举办过各类国际性的人工智能峰会和活动,产生了巨大的影响力,书写了60万字的人工智能精品技术内容,生产翻译了全球第一本深度学习入门书《神经网络与深度学习》,生产的内容被大量的专业垂直公众号和媒体转载与连载。曾经受邀为国内顶尖大学制定人工智能学习规划和教授人工智能前沿课程,均受学生和老师好评。
glossary of terms(术语)
analysis(分析)
Analysis is the process of converting full text to terms. Depending on which analyzer is used, these phrases: FOO BAR, Foo-Bar, foo,bar will probably all result in the terms foo and bar. These terms are what is actually stored in the index. A full text query (not a term query) for FoO:bAR will also be analyzed to the terms foo,bar and will thus match the terms stored in the index. It is this process of analysis (both at index time and at search time) that allows elasticsearch to perform full text queries. Also see text and term.
分析是将全文转化为项的过程。取决于选择使用的分析器,如FOO BAR、Foo-Bar、foo,bar可能会产生相同的结果——项foo和bar。这些项是实际存储在索引中的。全文查询(非项查询)FoO:bAR
将会同样被分析成项foo
、bar
,然后会匹配存储在索引中的项。分析的这个过程(在索引和搜索时)让ES有全文查询的功能。请查看文本(text
)和项(term
)。
cluster(集群)
A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced if the current master node fails.
集群包含了一个或者多个拥有相同集群名称的节点。每个集群有一个主节点,这是由集群自动选择出来的,并且在当前主节点挂掉时被其他节点取代。
document(文档)
A document is a JSON document which is stored in elasticsearch. It is like a row in a table in a relational database. Each document is stored in an index and has a type and an id. A document is a JSON object (also known in other languages as a hash / hashmap / associative array) which contains zero or more fields, or key-value pairs. The original JSON document that is indexed will be stored in the _source field, which is returned by default when getting or searching for a document.
文档是一个存放在ES中的JSON文档。文档类似于关系数据库中的表中的行。每个文档被存放在一个索引下面,拥有类型和ID
。文档是一个JSON对象,包含0个或者多个字段,或者键值对。被索引的原始JSON文档会被存放在_source
字段中,在getting或者searching一个文档时,将被返回。
id(ID)
The ID of a document identifies a document. The index/type/id of a document must be unique. If no ID is provided, then it will be auto-generated. (also see routing)
文档的ID标识了一个文档。文档的index/type/id
肯定是唯一的。如果没有ID
被提供,那么它会被自动生成。(请参看routing
)
field(字段)
A document contains a list of fields, or key-value pairs. The value can be a simple (scalar) value (eg a string, integer, date), or a nested structure like an array or an object. A field is similar to a column in a table in a relational database. The mapping for each field has a field type (not to be confused with document type) which indicates the type of data that can be stored in that field, eg integer, string, object. The mapping also allows you to define (amongst other things) how the value for a field should be analyzed.
文档包含了字段的列表,或者键值对。值可以是简单值(字符串,整数,日期等),或者一个嵌套的结构(诸如数组或者对象)。字段类似于关系数据库中的表中的列。对每个字段的映射拥有一个字段类型(不同于文档类型),其表示了可以被存放在该字段的数据类型(如整数,字符串、对象等)。映射也允许用户定义字段的值如何被分析。
index(索引)
An index is like a database in a relational database. It has a mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.
索引(index)如同关系数据库中的数据库。索引有一个可以定义多个类型(type)映射。索引是一个逻辑命名空间,该空间可以对应于一个或者多个主shard,并且有零个或多个从shard。
mapping(映射)
A mapping is like a schema definition in a relational database. Each index has a mapping, which defines each type within the index, plus a number of index-wide settings. A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.
映射(mapping)就如同一个在关系数据库中的模式定义。每个索引都有一个映射,映射定义了索引中每个类型,加上一堆索引范围内的设置。映射可以显式定义,或者会在文档被索引时自动产生。
node(节点)
A node is a running instance of elasticsearch which belongs to a cluster. Multiple nodes can be started on a single server for testing purposes, but usually you should have one node per server. At startup, a node will use unicast (or multicast, if specified) to discover an existing cluster with the same cluster name and will try to join that cluster.
节点(node)就是一个ES的运行实例,属于一个集群(cluster)。多个节点可以在测试过程中放在一个服务器上启动,但一般来说,你应当在每个服务器上都有一个节点。启动时,节点会使用单播(多播)来发现已有的拥有相同名称的集群,并试着加入那个集群。
primary shard(主shard)
Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. You can specify fewer or more primary shards to scale the number of documents that your index can handle. You cannot change the number of primary shards in an index, once the index is created. See also routing
replica shard(从shard)
Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:
increase failover: a replica shard can be promoted to a primary shard if the primary fails
increase performance: get and search requests can be handled by primary or replica shards. By default, each primary shard has one replica, but the number of replicas can be changed dynamically on an existing index. A replica shard will never be started on the same node as its primary shard.
routing
When you index a document, it is stored on a single primary shard. That shard is chosen by hashing the routing value. By default, the routing value is derived from the ID of the document or, if the document has a specified parent document, from the ID of the parent document (to ensure that child and parent documents are stored on the same shard). This value can be overridden by specifying a routing value at index time, or a routing field in the mapping.
shard(shard)
A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by elasticsearch. An index is a logical namespace which points to primary and replica shards. Other than defining the number of primary and replica shards that an index should have, you never need to refer to shards directly. Instead, your code should deal only with an index. Elasticsearch distributes shards amongst all nodes in the cluster, and can move shards automatically from one node to another in the case of node failure, or the addition of new nodes.
Shard就是一个单独的Lucene实例。他是一个底层的工作单元,被ES自动地管理。索引是逻辑命名空间,其指向了主/从shard。你不需要直接访问shard,不需要定义索引需要的主/从shard的数目。相反,你的代码只需要与索引打交道。ES分发shard到集群中所有节点(node),并且可以在节点挂掉或者新节点增加时自动地把shard从一个节点迁移到另一个。
source field(源字段)
By default, the JSON document that you index will be stored in the _source field and will be returned by all get and search requests. This allows you access to the original object directly from search results, rather than requiring a second step to retrieve the object from an ID. Note: the exact JSON string that you indexed will be returned to you, even if it contains invalid JSON. The contents of this field do not indicate anything about how the data in the object has been indexed.
被索引的JSON文档默认地会存放在_source
字段中,并且将被所有get
和search
请求返回。这给予你直接从搜索结果获取原始对象的能力,而不需要二次请求,从一个ID查询对象。注意:被索引的JSON字符串将被返回给你,即使他包含了非法的JSON。这个字段的内容不暗示任何关于在这个对象中的数据已被索引的信息。
term(项)
A term is an exact value that is indexed in elasticsearch. The terms foo, Foo, FOO are NOT equivalent. Terms (i.e. exact values) can be searched for using term queries. See also text and analysis.
项是在ES中被索引的确切的值。项foo、Foo、FOO不是等价的。项(确切值)可以通过term查询而被搜索到。请查看文本(term)和分析(analysis)。
text(文本)
Text (or full text) is ordinary unstructured text, such as this paragraph. By default, text will be analyzed into terms, which is what is actually stored in the index. Text fields need to be analyzed at index time in order to be searchable as full text, and keywords in full text queries must be analyzed at search time to produce (and search for) the same terms that were generated at index time. See also term and analysis.
文本(或者全文)是一般的非结构化文本,就像这篇文章。文本默认地会被分析成项(term),项是直接存放在索引中的。文本的字段需要在索引时被分析,从而可以作为全文可搜索的,而在全文查询中的关键词(keyword)必须在搜索时进行分析来产生(和检索)在索引时生成的同样的项(term)。请查看项(term)和分析(analysis)。
type(类型)
A type is like a table in a relational database. Each type has a list of fields that can be specified for documents of that type. The mapping defines how each field in the document is analyzed.
类型如同关系数据库中的表。每个类型有一个字段的列表,该列表可以被指定为该类型的文档。映射则定义了文档的每个字段将如何被分析。