[2022-10-07-mysql学习]行列互转问题

行列互转问题

行转列

从数据的行拆出数据的列,即把观测的属性(行)拆成一个变量(列)。
附注
1.其实我一开始觉得这是列转行,因为观测的属性由列排布变成了行,但一般都称这种问题为行转列。
2.我对概念的理解可能有问题。

示例数据

https://leetcode.cn/problems/reformat-department-table/

id revenue month
1 8000 Jan
2 9000 Jan
3 1000 Feb
1 7000 Feb
1 6000 Mar

输出每个部门每个月的薪资,即结果表格结构为:

id Jan_Revenue ... Dec_Revenue

允许空值。

错误解答

select id,case when month='Jan' then revenue end as Jan_Revenue,case when month='Feb' then revenue end as Feb_Revenue,case when month='Mar' then revenue end as Mar_Revenue,case when month='Apr' then revenue end as Apr_Revenue,case when month='May' then Revenue end as May_Revenue,case when month='Jun' then revenue end as Jun_Revenue,case when month='Jul' then revenue end as Jul_Revenue,case when month='Aug' then revenue end as Aug_Revenue,case when month='Sep' then revenue end as Sep_Revenue,case when month='Oct' then revenue end as Oct_Revenue,case when month='Nov' then revenue end as Nov_Revenue,case when month='Dec' then revenue end as Dec_Revenue
from department;

运行结果(部分):

id Jan_Revenue Feb_Revenue ... Dec_Revenue
1 8000 NULL ... NULL
2 9000 NULL ... NULL
3 NULL 10000 ... NULL
1 NULL 7000 ... NULL
1 NULL NULL ... NULL

对于每个用户分别输出结果,不符合要求。
考虑合并结果,一个自然的想法是聚合函数(注意:sum(NULL)=0):

select id,sum(case when month='Jan' then revenue end) as Jan_Revenue,
sum(case when month='Feb' then revenue end )as Feb_Revenue,
sum(case when month='Mar' then revenue end) as Mar_Revenue,
sum(case when month='Apr' then revenue end) as Apr_Revenue,
sum(case when month='May' then Revenue end) as May_Revenue,
sum(case when month='Jun' then revenue end) as Jun_Revenue,
sum(case when month='Jul' then revenue end) as Jul_Revenue,
sum(case when month='Aug' then revenue end) as Aug_Revenue,
sum(case when month='Sep' then revenue end) as Sep_Revenue,
sum(case when month='Oct' then revenue end) as Oct_Revenue,
sum(case when month='Nov' then revenue end) as Nov_Revenue,
sum(case when month='Dec' then revenue end) as Dec_Revenue
from department
group by id;

运行结果:

id Jan_Revenue Feb_Revenue ... Dec_Revenue
1 8000 7000 ... NULL
2 9000 NULL ... NULL
3 NULL 10000 ... NULL

这一结果符合要求。
另外的写法:

select id,sum(if(month='Jan',revenue,null)) as Jan_Revenue,
sum(if(month='Feb',revenue,null))as Feb_Revenue,
sum(if(month='Mar',revenue,null)) as Mar_Revenue,
sum(if(month='Apr',revenue,null)) as Apr_Revenue,
sum(if(month='May',revenue,null)) as May_Revenue,
sum(if(month='Jun',revenue,null)) as Jun_Revenue,
sum(if(month='Jul',revenue,null)) as Jul_Revenue,
sum(if(month='Aug',revenue,null)) as Aug_Revenue,
sum(if(month='Sep',revenue,null)) as Sep_Revenue,
sum(if(month='Oct',revenue,null)) as Oct_Revenue,
sum(if(month='Nov',revenue,null)) as Nov_Revenue,
sum(if(month='Dec',revenue,null)) as Dec_Revenue
from department
group by id;

与case when的写法思路相同,稍微简洁一些。

思路总结

Step1. 用case when或if筛选出将要转为变量的属性。
Step2.利用分组聚合函数sum汇总行。
语句框架:

select xx,sum(case when yy='属性1'then zz end),...
from tb
group by xx;

select xx,sum(if(yy='属性1',zz,null)),...
from tb
group by xx;

列转行

列转行是行转列的逆过程。
把上述结果再转回原表格:

#为了方便,把行专列的结果存成一个临时表
with tb as(
    select id,sum(if(month='Jan',revenue,null)) as Jan_Revenue,
sum(if(month='Feb',revenue,null))as Feb_Revenue,
sum(if(month='Mar',revenue,null)) as Mar_Revenue,
sum(if(month='Apr',revenue,null)) as Apr_Revenue,
sum(if(month='May',revenue,null)) as May_Revenue,
sum(if(month='Jun',revenue,null)) as Jun_Revenue,
sum(if(month='Jul',revenue,null)) as Jul_Revenue,
sum(if(month='Aug',revenue,null)) as Aug_Revenue,
sum(if(month='Sep',revenue,null)) as Sep_Revenue,
sum(if(month='Oct',revenue,null)) as Oct_Revenue,
sum(if(month='Nov',revenue,null)) as Nov_Revenue,
sum(if(month='Dec',revenue,null)) as Dec_Revenue
from department
group by id)
#列转行
select id,Jan_Revenue as Revenue,'Jan' as month
from tb
union all
select id,Feb_Revenue as Revenue,'Feb' as month
from tb
union all
select id,Mar_Revenue as Revenue,'Mar' as month
from tb
union all
select id,Apr_Revenue as Revenue,'Apr' as month
from tb
union all
select id,May_Revenue as Revenue,'May' as month
from tb
union all
select id,Jun_Revenue as Revenue,'Jun' as month
from tb
union all
select id,Jul_Revenue as Revenue,'Jul' as month
from tb
union all
select id,Aug_Revenue as Revenue,'Aug' as month
from tb
union all
select id,Sep_Revenue as Revenue,'Sep' as month
from tb
union all
select id,Oct_Revenue as Revenue,'Oct' as month
from tb
union all
select id,Nov_Revenue as Revenue,'Nov' as month
from tb
union all
select id,Dec_Revenue as Revenue,'Dec' as month
from tb;

运行结果:

id revenue month
1 8000 Jan
2 9000 Jan
3 NULL Jan
1 7000 Feb
2 NULL Feb
3 10000 Feb
.. .. ..

其中含有很多NULL,不是原来的表,考虑去除。

with tb as(
    select id,sum(if(month='Jan',revenue,null)) as Jan_Revenue,
sum(if(month='Feb',revenue,null))as Feb_Revenue,
sum(if(month='Mar',revenue,null)) as Mar_Revenue,
sum(if(month='Apr',revenue,null)) as Apr_Revenue,
sum(if(month='May',revenue,null)) as May_Revenue,
sum(if(month='Jun',revenue,null)) as Jun_Revenue,
sum(if(month='Jul',revenue,null)) as Jul_Revenue,
sum(if(month='Aug',revenue,null)) as Aug_Revenue,
sum(if(month='Sep',revenue,null)) as Sep_Revenue,
sum(if(month='Oct',revenue,null)) as Oct_Revenue,
sum(if(month='Nov',revenue,null)) as Nov_Revenue,
sum(if(month='Dec',revenue,null)) as Dec_Revenue
from department
group by id)
#列转行
select *
from(
select id,Jan_Revenue as Revenue,'Jan' as month
from tb
union all
select id,Feb_Revenue as Revenue,'Feb' as month
from tb
union all
select id,Mar_Revenue as Revenue,'Mar' as month
from tb
union all
select id,Apr_Revenue as Revenue,'Apr' as month
from tb
union all
select id,May_Revenue as Revenue,'May' as month
from tb
union all
select id,Jun_Revenue as Revenue,'Jun' as month
from tb
union all
select id,Jul_Revenue as Revenue,'Jul' as month
from tb
union all
select id,Aug_Revenue as Revenue,'Aug' as month
from tb
union all
select id,Sep_Revenue as Revenue,'Sep' as month
from tb
union all
select id,Oct_Revenue as Revenue,'Oct' as month
from tb
union all
select id,Nov_Revenue as Revenue,'Nov' as month
from tb
union all
select id,Dec_Revenue as Revenue,'Dec' as month
from tb
)tb2
where revenue is not null;

运行结果:

id revenue month
1 8000 Jan
2 9000 Jan
1 7000 Feb
3 10000 Feb
1 6000 Mar

返回了原表。

思路总结

Step1 分别取出各观测各属性的数据,生成属性外其它列与属性列(示例中的month)。
Step1 用union all合并观测。
语句框架:

select xx,yy as 'name1',属性1 as 'name2'
from tb
union all
select xx,yy as 'name1',属性2 as 'name2' 
from tb
union all
...
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容