目录树结构处理优化过程小记

一、问题描述

生产环境树结构目录处理响应过慢，单次请求都要到4秒多，更别说压测了。考虑后续可能会引发生产事故，所以决定优化一下。

二、场景描述

实际场景是查询一个课程的目录结构，目录是一个树结构，数据存储是用id-pid的方式存储的，pid的根是-1。测试环境目录层级比较少，未发现响应慢。生产环境有一个课程目录比较多，而且有多层。所以出现了响应过慢的情况。

三、原始代码

public List<CourseCatalog> getByCourseId(Long courseId) {
  //获取一级目录
  List<CourseCatalog> catalogList = this.list(new QueryWrapper<CourseCatalog>().lambda()
                                              .eq(CourseCatalog::getCourseId, courseId).eq(CourseCatalog::getParentId, -1).orderByAsc(CourseCatalog::getSort));
  //处理一级目录文件
  getFile(catalogList);
  //处理子目录
  getChild(catalogList);
  return catalogList;
}

private void getFile(List<CourseCatalog> courseCatalogList) {
  if (CollectionUtil.isEmpty(courseCatalogList)) {
    return;
  }
  courseCatalogList.forEach(courseCatalog -> {
    List<CourseFile> courseFileList = courseFileService.getByCatalogId(courseCatalog.getId());
    courseCatalog.setFile(courseFileList);
  });
}

private List<CourseCatalog> getChild(List<CourseCatalog> catalogList) {
  if (CollectionUtil.isEmpty(catalogList)) {
    return null;
  }
  catalogList.forEach(courseCatalog -> {
    List<CourseCatalog> courseCatalogList = this.getByParentId(courseCatalog.getId());
    if (CollectionUtil.isNotEmpty(courseCatalogList)) {
      //处理目录下文件
      getFile(courseCatalogList);
      courseCatalog.setChild(courseCatalogList);
      getChild(courseCatalog.getChild());
    }
  });
  return catalogList;
}

原始代码是通过遍历递归的方式处理每个子目录，经过对原始代码的分析发现有如下可以优化的点：

getFile()和getChild实际上可以合并成一个方法，减少遍历次数；
每个目录都可以单独处理，与遍历顺序无关，可以采用Java8的并行流；

四、改造后代码——递归方式

private void getChild(List<CourseCatalog> catalogList) {
  if (CollUtil.isEmpty(catalogList)) {
    return;
  }

  catalogList.parallelStream().forEach(catalog -> {
    // 处理课程文件列表
    List<CourseFile> courseFileList = courseFileService.getByCatalogId(catalog.getId());
    catalog.setFile(courseFileList);

    List<CourseCatalog> courseCatalogList = this.getByParentId(catalog.getId());
    if (CollUtil.isNotEmpty(courseCatalogList)) {
      // 递归处理子目录
      handleChild(courseCatalogList);
      catalog.setChild(courseCatalogList);
    }
  });
}

实现相对简单，但随着递归次数不断增加，可能会在处理大数据量时导致栈溢出风险。因此，可以考虑使用迭代的方式来实现遍历逻辑。

五、改造后代码——栈

每次将当前目录的子目录入栈，遍历到下一级目录时就将下一级目录入栈，回溯时弹出栈顶元素。可以按照以下步骤实现代码：

创建一个栈来保存遍历的目录。
首先将顶层目录加入栈中。
进入迭代循环，每次从栈顶弹出一个目录，处理它的文件和子目录，将子目录入栈。
直到遍历完整个树形目录结构为止。

private void getChild(List<CourseCatalog> catalogList) {
  if (CollUtil.isEmpty(catalogList)) {
    return;
  }

  // 1. 创建一个栈来保存遍历的目录
  Deque<CourseCatalog> stack = new LinkedList<>();
  // 2. 将顶层目录加入栈中
  catalogList.forEach(stack::push);

  // 3. 进入迭代循环，每次从栈顶弹出一个目录，处理它的文件和子目录，将子目录入栈
  while (!stack.isEmpty()) {
    CourseCatalog catalog = stack.pop();
    List<CourseFile> courseFileList = courseFileService.getByCatalogId(catalog.getId());
    catalog.setFile(courseFileList);

    List<CourseCatalog> courseCatalogList = this.getByParentId(catalog.getId());
    // 将子目录入栈
    if (CollUtil.isNotEmpty(courseCatalogList)) {
      courseCatalogList.forEach(stack::push);
      catalog.setChild(courseCatalogList);
    }
  }
}

六、总结

将原代码换成并行递归遍历后，响应时间变成了2秒；
换成栈遍历后，又回到了4秒；
数据表加索引，pid和catelogId，响应时间变成了几百毫秒；

这么看，原代码加索引应该可以变快，不过在整个过程中还是有收获的。

目前多次查询会导致性能瓶颈，后续再深入优化，可以考虑减少遍历循环内的查询次数或加缓存等；

目录树结构处理优化过程小记——2023.05.23