01. SQLboy 一些常见的SQL处理应用

本篇为视频篇的md文稿,说明:
本篇不归纳总结,以3道SQL题为例,主要介绍一下在数据仓库中一些常见的SQL处理应用

一. 窗口应用:用户连续消费的最大月份
二. 函数应用:提取Json中的key值
三. 分组应用:混合排序

一. 窗口应用:用户连续消费的最大月份

有表log_consume,字段 uid order_time order_price

数据:

A 2023-05-01 100

A 2023-06-02 100

A 2023-06-03 100

B 2023-01-01 100

B 2023-03-01 100

B 2023-04-01 100

B 2023-05-01 100

需求: 现在要统计用户连续消费的最大月份数

  1. 首先对uid,order_time(年-月)去重
  2. 窗口,偏移量函数(lag)对order_time开窗,对uid分组,偏移量为1 as lag_order_time
  3. 对两个时间点进行差计算, 如果结果为1,说明就是连续的(考虑跨年),if判断
  4. sun窗口,对时间差进行开窗,对uid进行分组
  5. 输出结果
with x as(
SELECT 
uid,
order_month,
(year(order_month) - year(lag_order_month) ) * 12 + month(order_month) - month(lag_order_month) as month_sum
FROM (
SELECT 
uid,
order_month,
lag(order_month,1,order_month) over(partion by uid order by order_month asc) as lag_order_month
FROM (
SELECT uid,concat(substr(order_time,1,7),'-1') as order_month
FROM log_consume
GROUP BY concat(substr(order_time,1,7),'-1')
    ) t1 
    ) t2
    ) 

SELECT 
uid ,k,group_concat(order_month)
FROM 
(
SELECT 
uid,
order_month,
sum(tag) over(partion by uid order by order_month asc) as k
FROM
(
SELECT 
uid,
order_month,
IF(month_sum = 1 ,0,1) as tag
FROM 
x
) t1 
    ) t2 
    GROUP BY uid ,k

二. 函数应用:提取Json中的key值

[{“website”:“baidu.com”,“name”:“百度”},{“website”:“google.com”,“name”:“谷歌”}]

  1. get_json_object
  2. json_tuple
select 
good_friend,friend_1 ,friend_2, friend_3
from 
table
lateral view json_tuple(str,'friend') as good_friend
lateral view json_tuple(good_friend,'friend_1','friend_2','friend_3') tb as friend_1 ,friend_2, friend_3
  1. explode,replace,posexplode
主要的思路正则替换,炸裂函数
SELECT 
pos + 1 as rn 
val 
FROM 
(
regexp_replace(translate('[{“website”:“baidu.com”,“name”:“百度”},{“website”:“google.com”,“name”:“谷歌”}]','[}{}""',''),'\,','\:') as str
) t1
lateral view posexplode(str,':') t2 as pas ,val

三. 分组应用:混合排序

sum() over() 
sum() over(partion by ....)
sum() over(partion by  order by ....)

sum() over() as rn 标签值
  1. 按city分类
  2. 按score降序,此时只考虑为A B C 不考虑A-1 A-2
  3. 如果存在A-1 A-2 之类,按照id升序

排序前的数据

id city name score

1 a A 100

2 a A-1 80

3 b C 70

4 a A-2 90

5 b D 85

6 b B 75

7 b E 30

8 b B-1 50

9 a F 95

10 b G 65

排序后的数据

id city name score

1 a A 100

2 a A-1 80

4 a A-2 90

9 a F 95

5 b D 85

6 b B 75

8 b B-1 50

3 b C 70

10 b G 65

7 b E 30

with x as (
select 
id , city, name, score  ,if(instr(name,'-'),substr(1,1),name) as base_name
from 
order_data
) t1

select 
x.id , x.city, x.name, x.score
from 
order_data join x on order_data.name = x.base_name
order by x.city,order_data.score,x.id
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容