POI获取单元格内容，纯数字单元格转为double的问题

问题背景

项目中遇到一个问题，导入Excel文件时，单元格内容为纯数字的字符串（商品税收分类编码），如12345，POI解析的时候单元格格式自动变为了数字类型，即java代码解析到的单元格内容为12345.0，导致字母数字类型的正则校验没有通过。

问题调查

导入的单元格格式为常规类型，没有问题

图片.png
debug时发现解析的时候单元格格式变成了数字类型cell.getCellType() == Cell.CELL_TYPE_NUMERIC为true

    public static Object[] convertArrayByRow(Row row) {
        int cols = row.getLastCellNum();
        Object[] arr = new Object[cols];
        for (int i = 0; i < cols; i++) {
            Cell cell = row.getCell(i);
            if (cell == null) {
                continue;
            }
            if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
                arr[i] = cell.getStringCellValue();
            } else if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
                arr[i] = cell.getNumericCellValue();
            } else {

            }
        }
        return arr;
    }

通过cell.getNumericCellValue();得到的单元格内容就变成12345.0了。

问题解决

这个问题上网查了一下，poi接口确实有这个问题，解决方法

在取值之前先调用cell.setCellType(Cell.CELL_TYPE_STRING);将单元格格式手动设置为字符串类型就OK。但是这样改是有问题的，你没办法判断单元格最原始的内容就是数字类型的12345.0还是因为poi把数字字符串转成了double类型，如果盲目改成了字符串类型，可能会出现问题
我的解决方案就是，解析的时候需要根据要转换的bean的数据项类型进行判断。由于我解析excel之后是需要通过反射获取注解中该列的类型，所以直接通过它判断就行了，下面上代码

public static <T extends Object> T convertBeanFromRow(Row row, Class<T> clazz) {
        T entity;
        try {

            int cols = row.getLastCellNum();
            Object[] arr = new Object[cols];

            entity = clazz.newInstance();
            Field[] fields = clazz.getDeclaredFields();
            //需要解析的列的数量超过实际解析的列数，解析失败，返回null
            if (fields.length > arr.length){
                return null;
            }
            for (Field field : fields) {
                if (!field.isAnnotationPresent(ExcelCell.class)) {
                    continue;
                }

                field.setAccessible(true);
                ExcelCell anno = field.getAnnotation(ExcelCell.class);
                Class<?> cellType = anno.type();
                Integer col = anno.col();

                Cell cell = row.getCell(col);
                if (cell == null) {
                    continue;
                }
                if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
                    arr[col] = cell.getStringCellValue();
                } else if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
                    if (cellType.isAssignableFrom(String.class)){
                        cell.setCellType(Cell.CELL_TYPE_STRING);
                        arr[col] = cell.getStringCellValue();
                    } else {
                        arr[col] = cell.getNumericCellValue();
                    }
                }


                if (cellType == null) {
                    field.set(entity, arr[col]);
                } else {
                    field.set(entity, numericByStr(cellType, arr[col]));
                }
            }
            return entity;
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

核心就是如果单元格格式是数字，再做一个判断

                else if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
                    if (cellType.isAssignableFrom(String.class)){
                        cell.setCellType(Cell.CELL_TYPE_STRING);
                        arr[col] = cell.getStringCellValue();
                    } else {
                        arr[col] = cell.getNumericCellValue();
                    }
                }

这个cellType是从哪来的呢，在这

ExcelCell anno = field.getAnnotation(ExcelCell.class);
                Class<?> cellType = anno.type();

@Documented
@Inherited
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.FIELD)
public @interface ExcelCell {
    int col();
    Class<?> type() default String.class;
}

@Data
public class GoodsTaxClassificationCodeRequest implements Serializable {
    /**
     * 商品名称
     */
    @ExcelCell(col = 0)
    String goodsName;

    /**
     * 商品编号
     */
    @ExcelCell(col = 1)
    String goodsId;

    /**
     * 税收分类编码
     */
    @ExcelCell(col = 2)
    String taxClassificationCode;

}

注解里面的type默认是String，如果是数字，可以显式在请求参数的变量里面声明。

问题思考

写完这篇，我发现一个问题，如果这列的值我写成12345.0，由于是按字符串读，就会忽略掉.0，没办法校验不通过，后来试了一下，在excel里面，如果单元格是常规，是没办法写成12345.0的，一点别处，就自动变成12345了。也就没有我担心的问题了哈哈。

图片.png
一般这种字符串的，尽量携带一个字母，否则处理起来比较麻烦，但是这种纯数字字符串的业务场景也是不可避免的。

参考：https://stackoverflow.com/questions/1072561/how-can-i-read-numeric-strings-in-excel-cells-as-string-not-numbers