Elasticsearch tutorial (二)

Basic Concepts基本概念:

There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process.

以下是Elasticsearch核心的一些概念。在教程伊始了解这些概念可以极好地帮助你学习接下来的课程。


Near Realtime(NRT)近实时性

Elasticsearch is  a near real time search platform.What this means is there is a slight latency(normally one second) from the time you index a document until the time it becomes searchable.

Elasticsearch是一个近乎实时搜索的平台。换言之,自你导入一个文档到它可以被搜索到的时候只有细微的延迟等待(通常在1s左右)。

Cluster集群

A cluster is a collection of one or more nodes(servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch".This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.

集群是一个或多个节点(服务器)组成的,通过所有节点一起保存你的全部数据并提供联合索引和搜索功能。每个集群都有一个唯一名称作为身份标识,默认为"elasticsearch"。这个名称很重要,因为只有一个节点以这个名称加入集群,才能够成为这个集群的一部分。

Make sure that you don't reuse the same cluster names in different environments,otherwise you might end up with nodes joining the wrong cluster.For instance you could use logging-dev , logging-stage, and logging-prod for the development,staging,and production clusters.

你没有在不同环境下重复使用相同的集群名称,否则你终将把节点加入错误的集群。例如:你可以使用logging-dev、logging-stage、logging-prod 来为开发、演示、产出集群分别命名。

Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.

请注意:设立只有一个节点在内的集群是有效且完全ok的。不过,你就需要为多种独立存在的集群设置它们专有的集群名称。

Node节点

A node is a single server that is part of your cluster, stores your data, and participates in the cluster's indexing and search capabilities.Just like a cluster, a node is identified by a name which by default is a random Univesally Unique Identifier(UUID) that is assigned to the node at startup. You can define any code name you want if you do not want the default.This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

节点是组成你集群中的一个服务器,为你存储数据,参与集群的索引及搜索功能。类似集群,一个节点在建立之初也被分配一个代表身份标识的名称,默认为一个随机的UUID(普遍唯一标识符)。如果你不想要这个默认的名称,也可以自己定义。这个名称对你识别网络上服务器对应哪个Elasticsearch集群的节点有着重要的管理意义。

A node can be configured to join a specific cluster by the cluster name.By default, each node is set up to join a cluster named elasticsearch which means that if you start up a number of nodes on your network and——assuming they can discover each other——they will automatically form and join a single cluster named elasticsearch.

一个节点可以通过配置集群名称来加入指定的集群。但默认情况下,每个节点创建之初就被加入到一个名为elasticsearch的集群中。意味着,若你在网络中创建了一些节点,且假定它们能够互相识别,它们将自动排列并加入到名为elasticsearch的集群中。

Index索引

An index is a collection of documents that have somewhat similar characteristics.For example, you can have an index for customer data,another index for a product catalog, and yet another index for order data.An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.

索引就是有着某些相似特性的文档集合。例如,你有一个索引指向用户数据,一个指向产品分类,一个指向订单数据。一个索引被一个名称(名称必须全部小写)唯一标识,这个名称将通过文档去执行索引、搜索、更新】删除等操作。

In  a single cluster, you can define as many indexes as you want.

在一个集群中,你可以随意定义诸多索引。

Type类型

warning: Deprecated in 6.0.0

警告:6.0.0版本不建议使用     Removal of mapping types

A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, eg one type for users, another type for blog posts.It is no longer possible to create multiple types in an index, and the whole concept of types will be removed in a later version. See Removal of mapping types for more.

类型就是索引中的一个逻辑分类/分区,它的存在允许你在相同的索引中存储不同类型的文档,例如,一个用户类型,一个博客文章类型。如今已不能在一个索引中创建多种类型,且整个类型的概念也将在之后的版本中移除。查看移除类型映射获取更多信息。

Document文档

A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order.This document is expressed in JSON (JavaScript Object Notation) which is  ubiquitous internet data interchange format.

文档是一个可被检索的信息基础单元。例如,你可以为一个独立用户创建一个文档,为一个产品创建一个文档,一个订单创建一个文档。这个文档以JSON(JavaScript对象标记)形式呈现,JSON是一种普遍的网络数据交换格式。

Within an index/type, you can store as many as documents as you want.Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.

在一个索引/类型中, 你可以随意存储诸多文档。注意,虽然一个文档在物理属性上属于一个索引,但实际上必须被索引/指定到索引中的类型。

Shards & Replicas

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to search requests from a single node alone.

索引可以潜藏可能超过一个节点硬盘限制的大量数据。例如,一个十亿文档索引将占据1TB的磁盘空间,但一个节点上但硬盘空间可能没这么大,即使足够承载,但从单一节点上发起搜索请求的响应也会非常慢。

To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards.When you create an index, you can simply define the number of shards that you want.Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

为解决这个问题,Elasticsearch 提供了将索引分割成多片区的功能,称之为shards(分片)。当你创建一个索引,你可以简单定义想要的分片数量。每个分片功能齐备且独立于索引,能够安放在集群的任一节点上。

Sharding is important for two primary reasons:

分片之所以重要的两个主要原因:

    It allows you to horizontally split/scale your content volume.

    允许你水平分割/缩放你的内容册

    It allows you to distribute and parallelize operations across shards(potentially on multiple nodes) thus increasing performance/throughput.

    允许通过分片来分发和并行化操作以便提高表现/吞吐量。

The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.

分片是如何被分发的操作流程,它的文档又是如何被聚集到搜索请求里是完全由Elasticsearch管理的,且这些流程完全向用户透明。

In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index's shards into what are called replica shards, or replicas for short.

网络/云环境下,故障随时可能发生。以防一个分片/节点因某些原因下线或者消失了,强烈推荐一个非常好用的故障转移机制。为达到目的,Elasticsearch 允许你将一个或多个索引的分片拷贝放入一个叫replica shards(复刻分片)的地方,简称replica(复制)

Replication is important for two primary reasons:

复刻之所以重要主要源于以下两点:

    It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.

    一旦分片/节点挂了,它有着很高的可利用性。也因此,谨记不要将复制分片和原始分片分配到同一节点上。

    It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

    因搜索行为可以在分片的所有拷贝中并行执行,它允许你的分片提供超出自身负荷的搜索。

To summarize, each index can be split into multiple shards. An index can also be replicated zero(meaning no replicas) or more times.Once replicated, each index will have primary shards(the original shards that were replicated from) and replica shards(the copies of the primary shards). The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the fact.

总结来说,每个索引可以被分割成多个分片。每个索引也可以被复制0(也就是没有复制)到多次。一旦复刻,每个索引将会有原始分片(复刻产生的原始分片)和复刻分片(原始分片的拷贝)。分片和复刻分片的数量可以在每个索引创建的时候定义。索引创建后,你可以随时动态更改复刻分片的数量,但不能更改原始分片但数量。

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster,your index will have 5 primary shards and another 5 replica shards(1complete replica) for a total of 10 shards per index.

默认情况下,Elasticsearch的每个索引都分发了5个原始分片和一个复刻,意味着你的集群里有至少两个节点,你的索引里有5个原始分片和另外5个复刻分片(1个完整复刻)也就是每个索引有10个分片。

Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of Lucene-5843, the limit is 2,127,483,519(=Integer.MAX_VALUE - 128) documents.You can monitor shard sizes using the _cat/shards API.

每个Elasticsearch分片都是一个Lucene索引。一个Lucene索引中都有一个文档数量的最大值。截至Lucene-5843,限制2,127,483,519(=Integer.MAX_VALUE - 128) 个文档。你可以使用_cat/shards来监测分片大小。

    

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,718评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,683评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,207评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,755评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,862评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,050评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,136评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,882评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,330评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,651评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,789评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,477评论 4 333
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,135评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,864评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,099评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,598评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,697评论 2 351

推荐阅读更多精彩内容

  • 世路入千门,九百门长闭。百门待我开,九十生而弃。剩有十扇门,七八开至今。一开混与沌,二启赤子心。三凉檐下雨,四暖陌...
    讷言不敏阅读 478评论 1 10
  • 小熙 我,一个五官端正,有一头长发的小女孩。我叫小江,今年九岁。我想大家一定还不太熟悉我吧!那我就做个自我介绍吧!...
    刹那年华之水木清阅读 1,082评论 16 21
  • 今天第四天了哈,今天我们学习查找和替换的不同玩法 1.基本用法: 按Ctrl+F查找Ctrl+H替换相信大家都在用...
    孔文娟阅读 328评论 0 0
  • 月光如水 倾泻于青色的琉璃瓦石上 深蓝的夜空 星光闪烁 庭院里的桂竹 倒影斑驳 今晚的夜色温柔可爱 粗心的你知道吗
    Yuexiaofeng阅读 54评论 0 1