Hive 递归查询组织架构 递归遍历树

需求

hive中有一张表存储所有省市地区信息或者公司组织信息, 查询所有最末枝的地区/组织,如下所示, 查询结果应该是沙河镇, 马池口镇, 中关村,上地,济南市

北京市
   昌平区
       沙河镇
       马池口镇
    海淀区
        中关村
        上地
山东省
    济南市
......

创建hive 模拟表

create table area (id int, name string, parent_id int);

插入数据

with area_info as
 (select 1 as id, '北京市'     as name, 0 as parent_id union all
  select 2 as id, '山东省'     as name, 0 as parent_id union all
  select 3 as id, '昌平区'     as name, 1 as parent_id union all
  select 4 as id, '海淀区'     as name, 1 as parent_id union all
  select 5 as id, '沙河镇'     as name, 3 as parent_id union all
  select 6 as id, '马池口镇'    as name, 3 as parent_id union all
  select 7 as id, '中关村'     as name, 4 as parent_id union all
  select 8 as id, '上地'       as name, 4 as parent_id union all
  select 9 as id, '烟台市'     as name, 2 as parent_id union all
  select 10 as id, '即墨区'    as name, 9 as parent_id union all
  select 11 as id, '牟平区'    as name, 9 as parent_id union all
  select 12 as id, '济南市'    as name, 2 as parent_id
)
insert overwrite table area select * from area_info;

重点是SQL

通过Hive的CTE实现组织树的遍历

  1. 假如有N层架构, 就要写N-1个CTE子句
  2. 需要注意保留没有子节点的父节点
with p1 as (select t1.* from area t1 --子
join area t2 --父
on t1.parent_id = t2.id 
where t2.name in ('北京市', '山东省')),
p2 as (
--所有子节点
select t3.* from area t3 --子
 join p1 --父
on t3.parent_id = p1.id
union all
--没有子节点的父节点
select p1.* from area t3 --子
 right join p1 --父
on t3.parent_id = p1.id
where t3.id is null
)
select * from p2 ;

结果

p2.id p2.name p2.parent_id
7 中关村 4
8 上地 4
5 沙河镇 3
6 马池口镇 3
10 即墨区 9
11 牟平区 9
12 济南市 2

以上, 如有更好的写法, 请评论, 大家一起交流。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容