elastic search实践(1)

Background

目前客户关系模块需要搜索大量的数据，使用传统的关系数据库查询起来速度很慢，满足不了要求。为了提高检索速度，提高改模块使用友好性和易用性，考虑使用Nosql来达到该目的。

模块数据结构

crm.png

Person

客户数据记录。常见的：名字，邮件地址，联系地址，年龄，性别等。
Registration

客户注册活动记录。参加活动的名字，开始和结束时间，编号等
Revenue

客户参加活动的收费汇总。包括：赛事的总费用，买赛事用品的总费用等。
Socail referral

客户社交活动带来的收益汇总。包括：分享的次数，分享带来的总收益，客户分享的等级。

检索要求

给定固的检索条目，列如：名字，年龄，性别，活动名字，客户等级等
用户可以选择匹配表达式，列如：对时间或者次数（数字的），可以选择大于，小于或者between;
用户可以选择时match all 或者match any, 在选择match all的时候用户可以选择部分条件排除

elastic

从上面的检索要求来看，这里涉及到查询条件可以动态调整，检索的字段在关系性数据库中，分布到多个表中；且对部分数据，需要做like %condition%匹配。使用全文检索，更符合这类场景。

Elastic是一个分布式、易扩展、实时的数据搜索引擎。它具有良好的水平收缩性，也有很好的监控工具。在业界具有良好的口碑和大量的使用，所以我们决定研究一把来满足业务场景的使用。

Elastic search基础

基本概念

集群
节点（主节点和副节点）
分片（主分片和副分片）
replicate number
document
index
type
id
lock
version
create
update
get
delete
bulk api (一个好的批量大小在开始处理后所占用的物理大小约为 5-15 MB)
routing

shard = hash(routing) % number_of_primary_shards

一致性

主分片需要规定数量(quorum),或大多数的分片 (其中分片副本可以是主分片或者副本分片)在写入操作时可用

int( (primary + number_of_replicas) / 2 ) + 1

Install elastic search

Pull images

docker pull docker.elastic.co/elasticsearch/elasticsearch:5.2.2
docker pull docker.elastic.co/kibana/kibana:5.2.2

Create file elastic_kibana.yml:

version: '2'
services:

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.2.2
    environment:
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200

  kibana:
    image: docker.elastic.co/kibana/kibana:5.2.2
    links:
      - elasticsearch
    ports:
      - 5601:5601

volumes:
  esdata1:
    driver: local

Start up

  docker-compse -f elastic_kibana.yml up

Access kibaba:

http://localhost:5601/app/kibana#/management?_g=()

下一步，我们进行对数据进行elastic的建模。