在许多应用场景中，我们需要将应用程序中的对象（实体）持久化到数据库或搜索引擎中。Lucene作为一个强大的搜索引擎，通常被用来存储和搜索大量的文本数据。但是，Lucene原生并不支持直接存储对象，它处理的是基于字段的文档。因此，我们需要一个机制来将实体对象转换为Lucene可以理解的文档，反之亦然。
在lucene-candy组件中，我的目标是提供一种简单、直观的方式来处理Lucene中的索引和我们的应用程序中的实体对象之间的映射。这种映射不仅使得数据在存储和检索时更加高效，而且能降低开发者对Lucene内部细节的关注度。

使用实例

组件基于注解实现索引和实体类直接的映射，如下：

@Data
@Index(value = "jdf_config_info")
public class ConfigInfo extends BaseEntity {

    /**
     * 配置名称
     */
    @Field(value = "config_name")
    private String configName;

    /**
     * 域
     */
    @Field(value = "scope")
    private Integer scope;

    /**
     * 配置编码
     */
    @Field(value = "config_code")
    private String configCode;

    /**
     * 配置值
     */
    @Field(value = "config_value")
    private String configValue;

    /**
     * 有效标识
     */
    @Field(value = "valid_flag", type = DataTypeEnum.INT)
    private Integer validFlag;

    /**
     * 是否允许同步
     */
    @Field(value = "sync_flag", type = DataTypeEnum.INT)
    private Integer syncFlag;

    /**
     * 备注
     */
    @Field(value = "remark")
    private String remark;

}

@Index(value = "jdf_config_info") 定义实体类ConfigInfo和索引“jdf_config_info”的映射；
@Field(value = "config_name")定义属性configName和文档字段“config_name”的索引；

实现逻辑

定义注解

参考前面的代码样例，需要先定义注解：@Index、@Field。

@Documented
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.TYPE, ElementType.ANNOTATION_TYPE})
public @interface Index {

    /**
     * 实体对应的索引
     */
    String value() default "";

    /**
     * 对应的写入分析器
     *
     * @return AnalyzerEnum
     */
    AnalyzerEnum analyzer() default AnalyzerEnum.STANDARD_ANALYZER;
}

@Documented
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD, ElementType.ANNOTATION_TYPE})
public @interface Field {

    /**
     * 字段名称
     *
     * @return String
     */
    String value() default "";

    /**
     * 数据类型
     *
     * @return DataTypeEnum
     */
    DataTypeEnum type() default DataTypeEnum.STRING;

    /**
     * 是否是唯一标识
     * @return boolean
     */
    boolean isId() default false;

    /**
     * 用于索引，只在字符串形式下有效
     *
     * @return boolean
     */
    boolean indexed() default false;

    /**
     * 是否存储
     *
     * @return boolean
     */
    boolean stored() default true;

    /**
     * 是否可用于分组
     *
     * @return boolean
     */
    boolean groupBy() default false;

    /**
     * 填充方式
     *
     * @return FieldFillEnum
     */
    FieldFillEnum[] fills() default {};
}

@Field定义了属性对应到lucene的功能意义：

value：对应文档字段的名称；
type：对应数据类型，目前已支持字符串、整型、长整型、日期、BigDecimal，具体参考：cn.juque.lucenecandy.core.enums.DataTypeEnum；
isId：标注为true，将会自动赋值UUID，是每个文档的唯一标识；
stored：数据是否需要存储，标注为false，查询的时候对应的属性将不会返回数据；
fills：填充方式，指定该属性的自动填充逻辑。默认不填充，具体参考：cn.juque.lucenecandy.core.enums.FieldFillEnum；

定义基础父类

cn.juque.lucenecandy.core.base.BaseEntity定义了每个索引都必须包含的字段，这些预设字段都是为了满足lucene-candy的API使用。约定预设字段统一“下划线”开始，所以，在使用lucene-candy过程中定义的业务字段应尽量避免该约定。目前已定义的预设字段如下：

_id：唯一标识；
_create_time：创建时间
_update_time：更新时间
_version: 版本号
_visible：是否可见。

解析索引信息

lucene-candy会在应用启动后根据BaseEntity全量加载并解析索引信息，所以为了确保索引能正常操作，每个实体务必集成父类BaseEntity。
cn.juque.lucenecandy.runner.LuceneCandyRunner负责lucene-candy的全局初始化，在这里将会调度解析索引信息并加载到缓存。需要注意：应用启动后，索引信息将不会再被解析。索引信息缓存逻辑参考：cn.juque.lucenecandy.cache.IndexInfoCache。IndexInfoCache定义了一个Map用于存放所有的索引信息，key为索引对应的实体类类名，value为索引信息。缓存初始化逻辑如下：

/**
     * 刷新缓存
     */
    @Override
    public synchronized void refresh() {
        Set<Class<?>> baseClassSet = ClassUtil.scanPackageBySuper("cn.juque.lucenecandy.core.base", BaseEntity.class);
        Set<Class<?>> classSet = ClassUtil.scanPackageBySuper(null, BaseEntity.class);
        List<Class<?>> classList = CollUtil.newArrayList(baseClassSet);
        classList.addAll(classSet);
        classSet = CollUtil.newHashSet(classList);
        for (Class<?> aClass : classSet) {
            String index = AnnotationUtil.getAnnotationValue(aClass, Index.class, StrConstant.VALUE);
            if (CharSequenceUtil.isEmpty(index)) {
                continue;
            }
            // 初始化索引映射信息
            IndexBO indexBO = new IndexBO();
            indexBO.setIndexName(index);
            indexBO.setClassName(ClassUtil.getClassName(aClass, false));
            Field[] fields = ReflectUtil.getFields(aClass);
            Map<String, FieldBO> fieldMap = new HashMap<>(fields.length);
            for (Field field : fields) {
                String fieldName = field.getName();
                Annotation annotation = AnnotationUtil.getAnnotation(field, cn.juque.lucenecandy.core.annotation.Field.class);
                if (Objects.isNull(annotation)) {
                    continue;
                }
                String indexFieldName = AnnotationUtil.getAnnotationValue(field, cn.juque.lucenecandy.core.annotation.Field.class, StrConstant.VALUE);
                DataTypeEnum dataType = AnnotationUtil.getAnnotationValue(field, cn.juque.lucenecandy.core.annotation.Field.class, StrConstant.TYPE);
                FieldFillEnum[] fills = AnnotationUtil.getAnnotationValue(field, cn.juque.lucenecandy.core.annotation.Field.class, StrConstant.FILLS);
                boolean isId = AnnotationUtil.getAnnotationValue(field, cn.juque.lucenecandy.core.annotation.Field.class, StrConstant.IS_ID);
                boolean store = AnnotationUtil.getAnnotationValue(field, cn.juque.lucenecandy.core.annotation.Field.class, StrConstant.STORED);
                boolean indexed = AnnotationUtil.getAnnotationValue(field, cn.juque.lucenecandy.core.annotation.Field.class, StrConstant.INDEXED);
                boolean groupBy = AnnotationUtil.getAnnotationValue(field, cn.juque.lucenecandy.core.annotation.Field.class, StrConstant.GROUP_BY);
                FieldBO fieldBO = new FieldBO();
                fieldBO.setFieldName(indexFieldName);
                fieldBO.setIsId(isId);
                fieldBO.setDataType(dataType);
                fieldBO.setStore(store ? org.apache.lucene.document.Field.Store.YES : org.apache.lucene.document.Field.Store.NO);
                fieldBO.setIndexed(indexed);
                fieldBO.setGroupBy(groupBy);
                fieldBO.setFills(CollUtil.newHashSet(fills));
                fieldMap.put(fieldName, fieldBO);
            }
            indexBO.setFieldMap(fieldMap);
            MAP.put(ClassUtil.getClassName(aClass, false), indexBO);
            log.debug("完成索引信息初始化:{}", ClassUtil.getClassName(aClass, false));
        }
    }

实体类映射到文档

cn.juque.lucenecandy.helper.DocumentHelper统一封装了实体类和文档的映射操作。实体类映射到文档的逻辑如下：

public <T extends BaseEntity> Document toDocument(T entity) {
        IndexBO indexBO = this.indexInfoCache.get(ClassUtil.getClassName(entity, false), false);
        Map<String, FieldBO> fieldMap = indexBO.getFieldMap();
        Document document = new Document();
        for (Map.Entry<String, FieldBO> entry : fieldMap.entrySet()) {
            String fieldName = entry.getKey();
            FieldBO bo = entry.getValue();
            Object value = ReflectUtil.getFieldValue(entity, fieldName);
            if (Objects.isNull(value)) {
                continue;
            }
            switch (bo.getDataType()) {
                case STRING:
                    this.toDocumentStr(document, bo, value);
                    break;
                case DATE:
                    this.toDocumentDate(document, bo, value);
                    break;
                case LONG:
                    this.toDocumentLong(document, bo, value);
                    break;
                case INT:
                    this.toDocumentInt(document, bo, value);
                    break;
                case DECIMAL:
                    this.toDocumentDecimal(document, bo, value);
                    break;
                default:
                    break;
            }
        }
        return document;
    }

所以，如果需要扩展数据类型，映射逻辑亦需要扩展数据类型的映射。

private void toDocumentLong(Document document, FieldBO bo, Object value) {
        document.add(new NumericDocValuesField(bo.getFieldName(), Long.parseLong(value.toString())));
        if (org.apache.lucene.document.Field.Store.YES.equals(bo.getStore())) {
            document.add(new StoredField(bo.getFieldName(), Long.parseLong(value.toString())));
        }
        document.add(new LongPoint(bo.getFieldName(), Long.parseLong(value.toString())));
    }

以上代码是把一个长整型转换成文档对应的字段。这里面涉及到lucene的3个API：NumericDocValuesField、StoredField、LongPoint。这里的实现会略显冗余，我们先看下这3个API的区别：

NumericDocValuesField：用于检索过程中进行排序、聚合，等操作，不存储数据；
StoredField：存储数据，检索后可返回数据。
LongPoint：用于检索过程中的范围查询，不存储数据。
从上面的区别看，NumericDocValuesField、LongPoint不会存储实际数据，消耗的存储空间会比StoredField小的多。实际上是牺牲空间换取了检索的便利。其他的类型转换也是同样的逻辑。

文档映射到实体类

public <T extends BaseEntity> T toEntity(Document document, Class<T> tClass) {
        T entity;
        try {
            entity = ReflectUtil.newInstance(tClass);
        } catch (Exception e) {
            log.error(e.getMessage(), e);
            throw new AppException("【" + tClass.getName() + "】instance error");
        }
        IndexBO indexBO = this.indexInfoCache.get(ClassUtil.getClassName(entity, false), false);
        Map<String, FieldBO> fieldMap = indexBO.getFieldMap();
        for (Map.Entry<String, FieldBO> entry : fieldMap.entrySet()) {
            String fileName = entry.getKey();
            FieldBO bo = entry.getValue();
            String indexFileName = bo.getFieldName();
            String value = document.get(indexFileName);
            if (CharSequenceUtil.isEmpty(value)) {
                continue;
            }
            DataTypeEnum dataType = bo.getDataType();
            switch (dataType) {
                case STRING:
                    ReflectUtil.setFieldValue(entity, fileName, value);
                    break;
                case INT:
                    ReflectUtil.setFieldValue(entity, fileName, Integer.parseInt(value));
                    break;
                case LONG:
                    ReflectUtil.setFieldValue(entity, fileName, Long.parseLong(value));
                    break;
                case DATE:
                    ReflectUtil.setFieldValue(entity, fileName, new Date(Long.parseLong(value)));
                    break;
                case DECIMAL:
                    ReflectUtil.setFieldValue(entity, fileName, new BigDecimal(value));
                    break;
                default:
                    break;
            }
        }
        return entity;
    }

至此，完成实体到索引、索引到实体的映射实现。

lucene-candy系列：实体与索引之间的映射实现