通过ElasticSearch实现复杂大数据搜索

what who

Elasticsearch不仅仅是Lucene和全文搜索,它还是
• 分布式的实时文件存储,每个字段都被索引并可被搜索
• 分布式的实时分析搜索引擎
• 可以扩展到上百台服务器,处理PB级结构化或非结构化数据

  • 它还有一些特点
    第一:JSON存储属于文档存储
    第二:采用倒排索引
    第三:没有事务

  • 它还有一些缺点
    第一:有1~2秒延迟落盘
    第二:mapping定义不能随便修改,哪怕修改一个字段类型都属于全局重建索引
    但有解决方案:采用同义词(alias)新建索引别名,当需要修改时,创建新的索引指向该索引别名,待新索引数据全部新建完,一键删除老索引指向新索引,平滑过度~

  • 有一些基本概念要提一下
    我们首先要做的是存储员工数据,每个文档代表一个员工。在Elasticsearch中存储数据的行为就叫做索引(indexing),不过在索引之前我们需要明确数据应该存储在哪里。
    在Elasticsearch中,文档归属于一种类型(type),而这些类型存在于索引(index)中,我们可以画一些简单的对比图来类比传统关系型数据库:

Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

Elasticsearch集群可以包含多个索引(indices)(数据库)
每一个索引可以包含多个类型(types)(表)
每一个类型包含多个文档(documents)(行), Json
然后每个文档包含多个字段(Fields)(列)。 Json中的一个属性

索引(index)这个词在Elasticsearch中有着不同的含义,一个索引(index)就像是传统关系数据库中的数据库,它是相关文档存储的地方,index的复数是indices 或indexes。

where when

  • 在什么时候下该使用ES呢

搜索、日志分析(ELK)等等

我们的业务场景:订单数据量庞大,采用分库分表做数据存储,根据openId作为shardingKey,满足前台所有查询场景(所有请求都带openId来查订单信息,粒度是到用户),但后台运营需要查看所有订单信息,粒度就不是单个用户了,而且会带各种维度的查询条件来查询,但订单数据落在了不同的库不同的表中,通过db遍历搜索然后分页肯定不太现实,这种场景ElasticSearch再合适不过了~

  • 和Apache生态的Solr比较呢
    solr.png

elasticsearch与solr的比较
总结:
1、当单纯的对已有数据进行搜索时,Solr更快。
2、当实时建立索引时, Solr会产生io阻塞,查询性能较差, Elasticsearch具有明显的优势。
3、随着数据量的增加,Solr的搜索效率会变得更低,而Elasticsearch却没有明显的变化。
4、Solr的架构不适合实时搜索的应用。
5、Solr 支持更多格式的数据,而 Elasticsearch 仅支持json文件格式
6、Solr 在传统的搜索应用中表现好于 Elasticsearch,但在处理实时搜索应用时效率明显低于 Elasticsearch
7、Solr 是传统搜索应用的有力解决方案,但 Elasticsearch 更适用于新兴的实时搜索应用

how

  • ES迭代版本非常快,了解下ES API的技术栈

第一:学习《 [Elasticsearch权威指南]》
第二:用什么版本呢?

从1.7到2.X,初始化方式改了一遍,从2.X到5.X又变了,如今已经有6.X版本,最新目前已经到7.X了,但推荐使用5.X!
注意:2.x版本数据可以直接迁移到 5.x; 5.X版本的数据可以直接迁移到6.x; 但是2.x版本数据无法直接迁移到6.x

ES 2.x版本
优点:

  1. Java技术栈, spring-boot-starter-data-elasticsearch 支持in-memory方式启动,单元测试开箱即用
  2. 当前线上运行的主流版本, 比较稳定
    缺点:
  3. 版本较老,无法体验新功能,且性能不如5.x
  4. 后期升级数据迁移比较麻烦
  5. 周边工具版本比较混乱;Kinbana等工具的对应版本需要自己查

ES 5.x版本
优点

  1. 版本相对较新,性能较好官方宣称索引吞吐量提升在25%到80%之间,新的数据结构用于存储数值和地理位置字段,性能大幅提升;5.x版本搜索进行了重构,搜索聚合能力大幅提高
  2. 周边工具比较全,版本号比较友好。 ES官方在5.x时代统一了 ELK体系的版本号
  3. 升级到6.x也比较方便
    缺点:
  4. 官方宣布已不支持In-Memory模式和Node Client已失效, 如果需要使用in-memory方式单测,需要自己手动配置ES版本、spring-data-elasticsearch版本、打开http访问开关等配置,并行使用REST API访问

第三:客户端如何使用呢?

Java技术栈目前有三种可以选择 Node Client, Transport Client, Rest API,
需要注明的是,官方已经标明NodeClient 已经过期,Transport Client 将在7.x版本开始不再支持,
最终会在7.x 统一到Rest API。目前Transport Client使用范围比较广;Rest API方式兼容性较好;除非在In-memory模式下运行单元测试,否则不推荐NodeClient。
本篇API使用还是采用Transport Client模式,

elasticsearch2.X调用方式:

 public static Client getClient() throws UnknownHostException {
        String clusterName = "elasticsearch";
        List<String> clusterNodes = Arrays.asList("http://172.16.0.29:9300");
        Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build();  
        TransportClient client = TransportClient.builder().settings(settings).build();
        for (String node : clusterNodes) {
            URI host = URI.create(node);
            client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
        }
        return client;
    }

elasticsearch5.X调用方式:

public static Client getClient() throws UnknownHostException {
        String clusterName = "shopmall-es";
        List<String> clusterNodes = Arrays.asList("http://172.16.32.69:9300","http://172.16.32.48:9300");
        Settings settings = Settings.builder().put("cluster.name", clusterName).build();
        TransportClient client = new PreBuiltTransportClient(settings);
        for (String node : clusterNodes) {
            URI host = URI.create(node);
            client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
        }
        return client;
  • 撸代码,首先引入需要的包
 <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>5.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>5.3.2</version>
        </dependency>

        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>2.8.2</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.11.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.11.1</version>
        </dependency>
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class Book {
    public static SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
    private String id;
    private String title;
    private List<String> authors;
    private String summary;
    private String publish_date;
    private Integer num_reviews;
    private String publisher;

    public Book(String id, String title, List<String> authors, String summary, String publish_date, Integer num_reviews, String publisher) {
        this.id = id;
        this.title = title;
        this.authors = authors;
        this.summary = summary;
        this.publish_date = publish_date;
        this.num_reviews = num_reviews;
        this.publisher = publisher;
    }

    public static SimpleDateFormat getSimpleDateFormat() {
        return simpleDateFormat;
    }

    public static void setSimpleDateFormat(SimpleDateFormat simpleDateFormat) {
        Book.simpleDateFormat = simpleDateFormat;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public List<String> getAuthors() {
        return authors;
    }

    public void setAuthors(List<String> authors) {
        this.authors = authors;
    }

    public String getSummary() {
        return summary;
    }

    public void setSummary(String summary) {
        this.summary = summary;
    }

    public String getPublish_date() {
        return publish_date;
    }

    public void setPublish_date(String publish_date) {
        this.publish_date = publish_date;
    }

    public Integer getNum_reviews() {
        return num_reviews;
    }

    public void setNum_reviews(Integer num_reviews) {
        this.num_reviews = num_reviews;
    }

    public String getPublisher() {
        return publisher;
    }

    public void setPublisher(String publisher) {
        this.publisher = publisher;
    }
}
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class DataUtil {
    public static SimpleDateFormat dateFormater = new SimpleDateFormat("yyyy-MM-dd");

    /**
     * 模拟获取数据
     */
    public static List<Book> batchData() {
        List<Book> list = new LinkedList<>();
        Book book1 = new Book("1", "Elasticsearch: The Definitive Guide", Arrays.asList("clinton gormley", "zachary tong"),
                "A distibuted real-time search and analytics engine", "2015-02-07", 20, "oreilly");
        Book book2 = new Book("2", "Taming Text: How to Find, Organize, and Manipulate It", Arrays.asList("grant ingersoll", "thomas morton", "drew farris"),
                "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
                "2013-01-24", 12, "manning");
        Book book3 = new Book("3", "Elasticsearch in Action", Arrays.asList("radu gheorge", "matthew lee hinman", "roy russo"),
                "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
                "2015-12-03", 18, "manning");
        Book book4 = new Book("4", "Solr in Action", Arrays.asList("trey grainger", "timothy potter"), "Comprehensive guide to implementing a scalable search engine using Apache Solr",
                "2014-04-05", 23, "manning");

        list.add(book1);
        list.add(book2);
        list.add(book3);
        list.add(book4);

        return list;
    }

    public static Date parseDate(String dateStr) {
        try {
            return dateFormater.parse(dateStr);
        } catch (ParseException e) {
        }
        return null;
    }
}
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class Constants {

    // 字段名

    public static String ID = "id";
    public static String TITLE = "title";
    public static String AUTHORS = "authors";
    public static String SUMMARY = "summary";
    public static String PUBLISHDATE = "publish_date";
    public static String PUBLISHER = "publisher";
    public static String NUM_REVIEWS = "num_reviews";

    // 过滤要返回的字段

    public static String[] fetchFieldsTSPD = {ID, TITLE, SUMMARY, PUBLISHDATE};
    public static String[] fetchFieldsTA = {ID, TITLE, AUTHORS};


    // 高亮

    public static HighlightBuilder highlightS = new HighlightBuilder().field(SUMMARY);
}
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public class Response<T> {

    private ResponseCode responseCode;

    private T data;

    public Response(ResponseCode responseCode, T data) {
        this.responseCode = responseCode;
        this.data = data;
    }

    public Response(ResponseCode responseCode) {
        this.responseCode = responseCode;
    }
}
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public enum ResponseCode {

    ESTIMEOUT(1, "超时"),

    FAILEDSHARDS(2, "shard执行失败"),

    OK(0, "成功");

    private Integer code;

    private String desc;

    ResponseCode(Integer code, String desc) {
        this.code = code;
        this.desc = desc;
    }

    public Integer getCode() {
        return code;
    }

    public void setCode(Integer code) {
        this.code = code;
    }

    public String getDesc() {
        return desc;
    }

    public void setDesc(String desc) {
        this.desc = desc;
    }
}
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class CommonQueryUtils {

    public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();

    /**
     * 处理ES返回的数据,封装
     */
    public static List<Book> parseResponse(SearchResponse searchResponse) {
        List<Book> list = new LinkedList<>();
        //可打印总记录数
        System.out.println("parseResponse count is "+searchResponse.getHits().getTotalHits());

        for (SearchHit hit : searchResponse.getHits().getHits()) {
            // 用gson直接解析
            Book book = gson.fromJson(hit.getSourceAsString(), Book.class);

            list.add(book);
        }
        return list;
    }

    /**
     * 解析完数据后,构建 Response 对象
     */
    public static Response<List<Book>> buildResponse(SearchResponse searchResponse) {
        // 超时处理
        if (searchResponse.isTimedOut()) {
            return new Response<>(ResponseCode.ESTIMEOUT);
        }
        // 处理ES返回的数据
        List<Book> list = parseResponse(searchResponse);
        // 有shard执行失败
        if (searchResponse.getFailedShards() > 0) {
            return new Response<>(ResponseCode.FAILEDSHARDS, list);
        }
        return new Response<>(ResponseCode.OK, list);
    }
}
休息下~
  • 关键逻辑开始了
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class EsConfig {

    //http是9200 api访问用9300
    private static String clusterNodes = "127.0.0.1:9300";

    //集群名称必须事先配置在elasticsearch.yml中
    private static String clusterName = "es-book-test";

    public static Client client() {
        Settings settings = Settings.builder().put("cluster.name", clusterName)
                                    .put("client.transport.sniff", true).build();

        TransportClient client = null;
        try {
             client = new PreBuiltTransportClient(settings);
            if (clusterNodes != null && !"".equals(clusterNodes)) {
                for (String node : clusterNodes.split(",")) {
                    String[] nodeInfo = node.split(":");
                    client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(nodeInfo[0]), Integer.parseInt(nodeInfo[1])));
                }
            }
        } catch (Exception e) {
            System.out.println("e"+e);
        }

        return client;
    }
}
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class DDLAndBulk {

    private static String bookIndex = "book_index";

    private static String bookIndexAlias = "book_index_alias";

    private static String bookType = "book_type";

    public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();

    /**
     * 创建索引,设置 settings,设置mappings
     */
    public static void createIndex() {
        int settingShards = 1;
        int settingReplicas = 0;

        Client client = EsConfig.client();
        // 判断索引是否存在,存在则删除
        IndicesExistsResponse indicesExistsResponse = client.admin().indices().prepareExists(bookIndex).get();

        if (indicesExistsResponse.isExists()) {
            System.out.println("索引 " + bookIndex + " 存在!");
            // 删除索引,防止报异常  ResourceAlreadyExistsException[index [bookdb_index/yL05ZfXFQ4GjgOEM5x8tFQ] already exists
            DeleteIndexResponse deleteResponse = client.admin().indices().prepareDelete(bookIndex).get();
            if (deleteResponse.isAcknowledged()){
                System.out.println("索引" + bookIndex + "已删除");
            }else {
                System.out.println("索引" + bookIndex + "删除失败");
            }


        } else {
            System.out.println("索引 " + bookIndex + " 不存在!");
        }

        // 设置Settings,第一步新建index
        CreateIndexResponse response = client.admin().indices().prepareCreate(bookIndex)
                                             .setSettings(Settings.builder()
                                                                  .put("index.number_of_shards", settingShards)
                                                                  .put("index.number_of_replicas", settingReplicas))
                                             .get();

        // 查看结果
        GetSettingsResponse getSettingsResponse = client.admin().indices()
                                                        .prepareGetSettings(bookIndex).get();
        System.out.println("索引设置结果");
        for (ObjectObjectCursor<String, Settings> cursor : getSettingsResponse.getIndexToSettings()) {
            String index = cursor.key;
            Settings settings = cursor.value;
            Integer shards = settings.getAsInt("index.number_of_shards", null);
            Integer replicas = settings.getAsInt("index.number_of_replicas", null);
            System.out.println("index:" + index + ", shards:" + shards + ", replicas:" + replicas);
        }
    }

    /**
     * Bulk 批量插入数据
     */
    public static void bulk() {
        List<Book> list = DataUtil.batchData();

        Client client = EsConfig.client();

        BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();

        //第二步新建type和创建mapping 其实也可以忽略,如果不设置mapping,则es通过source中数据自动添加数据类型
        if (!client.admin().indices().prepareTypesExists(bookIndex).setTypes(bookType).get().isExists()){
            client.admin().indices().preparePutMapping(bookIndex).setType(bookType).setSource(readFileTOString("es-book-mapping.json")).get()
                       .isAcknowledged();
            //第二步和第三步中间可以加一小步,可让之后mapping得到扩展,那就是创建索引别名
            createAlias(bookIndex, bookIndexAlias);
        }

        // 添加index操作到 bulk 中
        list.forEach(book -> {
            // 第三步插入数据,ps:第三步可以包含第二步的新建type,并省略mapping构建,让数据自动由es识别出数据类型
            // 新版的API中使用setSource时,参数的个数必须是偶数,否则需要加上 setSource(json, XContentType.JSON)
            bulkRequestBuilder.add(client.prepareIndex(bookIndexAlias, bookType, book.getId()).setSource(gson.toJson(book), XContentType.JSON));
        });

        BulkResponse responses = bulkRequestBuilder.get();
        if (responses.hasFailures()) {
            // bulk有失败
            for (BulkItemResponse res : responses) {
                System.out.println(res.getFailure());
            }
        }
    }

    /**
     * 创建别名
     */
    private static boolean createAlias(String indexName, String indexAlias) {
        Client client = EsConfig.client();

        // 获取老的索引和别名对应关系
        List<String> oldIndexName = new ArrayList<String>();
        GetAliasesResponse getAliases = client.admin().indices().prepareGetAliases(indexAlias).get();
        for (ObjectCursor<String> objectCursor : getAliases.getAliases().keys()) {
            if (!indexName.equals(objectCursor.value)) {
                oldIndexName.add(objectCursor.value);
            }
        }
        // 添加新的别名
        IndicesAliasesResponse r = client.admin().indices().prepareAliases().addAlias(indexName, indexAlias)
                                              .execute().actionGet();
        if (!r.isAcknowledged()) {
            throw new RuntimeException("[ES Check] indexName:" + indexName + ", 创建别名失败:" + indexAlias);
        }
        if (oldIndexName.size() > 0) {
            System.out.println("[ES Check] indexAlias:"+indexAlias+"获取到老的别名对应关系 oldIndexName:{}."+oldIndexName);
            // 删除老关系
            IndicesAliasesResponse r2 = client.admin().indices().prepareAliases()
                                                   .removeAlias(oldIndexName.toArray(new String[] {}), indexAlias).get();// .isAcknowledged();
            if (!r2.isAcknowledged()) {
                throw new RuntimeException("[ES Check] indexAlias:" + indexAlias + ", 删除老的别名对应关系失败:" + oldIndexName);
            } else {
                System.out.println("[ES Check] indexAlias:"+indexAlias+", 删除老的别名对应关系 oldIndexName:{}."+oldIndexName);
            }
        }

        return true;
    }

    public static String readFileTOString(String name) {

        InputStream inputStream = getResourceAsStream(name);

        if (null == inputStream){
            return null;
        }
        StringBuilder sb = new StringBuilder("");

        BufferedReader reader = null;
        try {
            reader = new BufferedReader(new InputStreamReader(inputStream));
            String tempString = null;
            // 一次读入一行,直到读入null为文件结束
            while ((tempString = reader.readLine()) != null) {
                sb.append(tempString);
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                }
            }
        }

        return sb.toString();
    }

    public static InputStream getResourceAsStream(String name) {

        InputStream resourceStream = null;

        // Try the current Thread context classloader
        ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
        resourceStream = classLoader.getResourceAsStream(name);
        if (resourceStream == null) {
            // Finally, try the classloader for this class
            classLoader = DDLAndBulk.class.getClassLoader();
            resourceStream = classLoader.getResourceAsStream(name);
        }

        return resourceStream;
    }

    public static void main(String[] args) {
        createIndex();
        bulk();
    }

}
{
    "book_type": {
        "properties": {
            "id": {
                "type": "long"
            },
            "title": {
                "type": "string",
                "index": "analyzed"
            },
            "authors": {
                "type": "string",
                "index": "not_analyzed"
            },
            "summary": {
                "type": "string",
                "index": "analyzed"
            },
            "publish_date": {
                "type": "date",
                "index": "not_analyzed"
            },
            "num_reviews": {
                "type": "integer",
                "index": "not_analyzed"
            },
            "publisher": {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}
/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public class BasicMatchQueryService {

    private static Client client = EsConfig.client();

    private static String bookIndexAlias = "book_index_alias";

    private static String bookType = "book_type";


    public static void main(String[] args) {
        //multiBatch();
        //match();
        boolPage();
        //boolPageMatch();
        //fuzzy();
        //wildcard();
        //phrase();
        //phrasePrefix();
    }
    /**
     * 进行ES查询,执行请求前后打印出 查询语句 和 查询结果
     */
    private static SearchResponse requestGet(String queryName, SearchRequestBuilder requestBuilder) {
        System.out.println(queryName + " 构建的查询:" + requestBuilder.toString());
        SearchResponse searchResponse = requestBuilder.get();
        System.out.println(queryName + " 搜索结果:" + searchResponse.toString());
        return searchResponse;
    }

    /**
     * 1.1 对 "guide" 执行全文检索
     * 测试:http://localhost:8080/basicmatch/multimatch?query=guide
     */
    public static Response<List<Book>> multiBatch() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("guide");

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(queryBuilder);

        SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 1.2 指定特定字段检索
     * 测试:http://localhost:8080/basicmatch/match?title=in action&from=0&size=4
     */
    public static void match() {
        MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder(Constants.TITLE, "in Action");
        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.TITLE).fragmentSize(200);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(matchQueryBuilder)
                                                    .setFrom(0).setSize(4)
                                                    .highlighter(highlightBuilder)
                                                    // 设置 _source 要返回的字段
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null);

        SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);

    }

    /**
     * 精确匹配
     * @return
     */
    public static Response<List<Book>> boolPage() {
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder(Constants.NUM_REVIEWS)
                .gte(15).lte(50);

        boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "manning"));
        boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "oreilly"));

        //term 精确匹配 range 范围匹配
        //should表示或者关系(or) must表示并且(and) mustNot并且不是(and not)
        boolQueryBuilder.mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge")).filter().add(rangeQueryBuilder);
        //boolQueryBuilder.must(rangeQueryBuilder).mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge"));

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                .setFrom(0).setSize(10).addSort("id", SortOrder.DESC);

        SearchResponse searchResponse = requestGet("bool", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 全文匹配(针对text类型的字段进行全文检索)
     * @return
     */
    public static Response<List<Book>> boolPageMatch() {
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        //matchQuery 分词匹配 matchPhraseQuery 短语匹配
        boolQueryBuilder.must(QueryBuilders.matchQuery(Constants.SUMMARY,"engine using"))
                        .mustNot(QueryBuilders.matchPhraseQuery(Constants.SUMMARY, "analytics engine"));

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                                                    .setFrom(0).setSize(10).addSort(SortBuilders.scoreSort());

        SearchResponse searchResponse = requestGet("bool", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     *  模糊检索(
     * @return
     */
    public static Response<List<Book>> fuzzy() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("elasticseares")
                .field("title").field("summary")
                .fuzziness(Fuzziness.AUTO);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(queryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null)
                                                    .setSize(2);

        SearchResponse searchResponse = requestGet("fuzzy", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 通配符检索、要查找具有以 "t" 字母开头的作者的所有记录
     */
    public static Response<List<Book>> wildcard() {
        WildcardQueryBuilder wildcardQueryBuilder = new WildcardQueryBuilder(Constants.AUTHORS, "t*");
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS, 200);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(wildcardQueryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTA, null)
                                                    .highlighter(highlightBuilder);

        SearchResponse searchResponse = requestGet("wildcard", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 正则表达式
     * @return
     */
    public static Response<List<Book>> regexp() {
        String regexp = "t[a-z]*n";
        RegexpQueryBuilder queryBuilder = new RegexpQueryBuilder(Constants.AUTHORS, regexp);
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setQuery(queryBuilder).setTypes(bookType).highlighter(highlightBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTA, null);

        SearchResponse searchResponse = requestGet("regexp", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 短语匹配-必须在一个单元中同时包含这两个词,可以不相连,而不是分词包含其中之一
     *
     *  "summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr",
     *      "summary":"A distibuted real-time search and analytics engine",
     * @return
     */
    public static Response<List<Book>> phrase() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("search engine")
                .field(Constants.SUMMARY)
                .type(MultiMatchQueryBuilder.Type.PHRASE).slop(3);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                    .setQuery(queryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null);


        SearchResponse searchResponse = requestGet("phrase", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 匹配词组前缀检索
     * @return
     */
    public static Response<List<Book>> phrasePrefix() {
        MatchPhrasePrefixQueryBuilder queryBuilder = new MatchPhrasePrefixQueryBuilder(Constants.SUMMARY, "search en")
                .slop(3).maxExpansions(10);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                    .setQuery(queryBuilder).setFetchSource(Constants.fetchFieldsTSPD, null);

        SearchResponse searchResponse = requestGet("phrasePrefix", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }
}

之后的任务,研究下6.X甚至7.X的新特性,如何采用Rest Api去实现

  • 接下来研究如何结合kibana的使用
  • 如何结合logStash的使用
  • ElasticSearch进阶
  • 最后一个小知识点fuzzy
    fuzzy搜索技术 --> 自动将拼写错误的搜索文本,进行纠正,纠正以后去尝试匹配索引中的数据
    surprize --> 拼写错误 --> surprise --> s -> z
    surprize --> surprise -> z -> s,纠正一个字母,就可以匹配上,所以在fuziness指定的2范围内
    surprize --> surprised -> z -> s,末尾加个d,纠正了2次,也可以匹配上,在fuziness指定的2范围内
    surprize --> surprising -> z -> s,去掉e,ing,3次,总共要5次,才可以匹配上,始终纠正不了

经过测试,fuzzy可以自动纠错两次~

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,294评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,780评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,001评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,593评论 1 289
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,687评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,679评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,667评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,426评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,872评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,180评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,346评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,019评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,658评论 3 323
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,268评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,495评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,275评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,207评论 2 352