行列互转问题
行转列
从数据的行拆出数据的列,即把观测的属性(行)拆成一个变量(列)。
附注
1.其实我一开始觉得这是列转行,因为观测的属性由列排布变成了行,但一般都称这种问题为行转列。
2.我对概念的理解可能有问题。
示例数据
| id | revenue | month |
|---|---|---|
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 1000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
输出每个部门每个月的薪资,即结果表格结构为:
| id | Jan_Revenue | ... | Dec_Revenue |
|---|
允许空值。
错误解答
select id,case when month='Jan' then revenue end as Jan_Revenue,case when month='Feb' then revenue end as Feb_Revenue,case when month='Mar' then revenue end as Mar_Revenue,case when month='Apr' then revenue end as Apr_Revenue,case when month='May' then Revenue end as May_Revenue,case when month='Jun' then revenue end as Jun_Revenue,case when month='Jul' then revenue end as Jul_Revenue,case when month='Aug' then revenue end as Aug_Revenue,case when month='Sep' then revenue end as Sep_Revenue,case when month='Oct' then revenue end as Oct_Revenue,case when month='Nov' then revenue end as Nov_Revenue,case when month='Dec' then revenue end as Dec_Revenue
from department;
运行结果(部分):
| id | Jan_Revenue | Feb_Revenue | ... | Dec_Revenue |
|---|---|---|---|---|
| 1 | 8000 | NULL | ... | NULL |
| 2 | 9000 | NULL | ... | NULL |
| 3 | NULL | 10000 | ... | NULL |
| 1 | NULL | 7000 | ... | NULL |
| 1 | NULL | NULL | ... | NULL |
对于每个用户分别输出结果,不符合要求。
考虑合并结果,一个自然的想法是聚合函数(注意:sum(NULL)=0):
select id,sum(case when month='Jan' then revenue end) as Jan_Revenue,
sum(case when month='Feb' then revenue end )as Feb_Revenue,
sum(case when month='Mar' then revenue end) as Mar_Revenue,
sum(case when month='Apr' then revenue end) as Apr_Revenue,
sum(case when month='May' then Revenue end) as May_Revenue,
sum(case when month='Jun' then revenue end) as Jun_Revenue,
sum(case when month='Jul' then revenue end) as Jul_Revenue,
sum(case when month='Aug' then revenue end) as Aug_Revenue,
sum(case when month='Sep' then revenue end) as Sep_Revenue,
sum(case when month='Oct' then revenue end) as Oct_Revenue,
sum(case when month='Nov' then revenue end) as Nov_Revenue,
sum(case when month='Dec' then revenue end) as Dec_Revenue
from department
group by id;
运行结果:
| id | Jan_Revenue | Feb_Revenue | ... | Dec_Revenue |
|---|---|---|---|---|
| 1 | 8000 | 7000 | ... | NULL |
| 2 | 9000 | NULL | ... | NULL |
| 3 | NULL | 10000 | ... | NULL |
这一结果符合要求。
另外的写法:
select id,sum(if(month='Jan',revenue,null)) as Jan_Revenue,
sum(if(month='Feb',revenue,null))as Feb_Revenue,
sum(if(month='Mar',revenue,null)) as Mar_Revenue,
sum(if(month='Apr',revenue,null)) as Apr_Revenue,
sum(if(month='May',revenue,null)) as May_Revenue,
sum(if(month='Jun',revenue,null)) as Jun_Revenue,
sum(if(month='Jul',revenue,null)) as Jul_Revenue,
sum(if(month='Aug',revenue,null)) as Aug_Revenue,
sum(if(month='Sep',revenue,null)) as Sep_Revenue,
sum(if(month='Oct',revenue,null)) as Oct_Revenue,
sum(if(month='Nov',revenue,null)) as Nov_Revenue,
sum(if(month='Dec',revenue,null)) as Dec_Revenue
from department
group by id;
与case when的写法思路相同,稍微简洁一些。
思路总结
Step1. 用case when或if筛选出将要转为变量的属性。
Step2.利用分组聚合函数sum汇总行。
语句框架:
select xx,sum(case when yy='属性1'then zz end),...
from tb
group by xx;
select xx,sum(if(yy='属性1',zz,null)),...
from tb
group by xx;
列转行
列转行是行转列的逆过程。
把上述结果再转回原表格:
#为了方便,把行专列的结果存成一个临时表
with tb as(
select id,sum(if(month='Jan',revenue,null)) as Jan_Revenue,
sum(if(month='Feb',revenue,null))as Feb_Revenue,
sum(if(month='Mar',revenue,null)) as Mar_Revenue,
sum(if(month='Apr',revenue,null)) as Apr_Revenue,
sum(if(month='May',revenue,null)) as May_Revenue,
sum(if(month='Jun',revenue,null)) as Jun_Revenue,
sum(if(month='Jul',revenue,null)) as Jul_Revenue,
sum(if(month='Aug',revenue,null)) as Aug_Revenue,
sum(if(month='Sep',revenue,null)) as Sep_Revenue,
sum(if(month='Oct',revenue,null)) as Oct_Revenue,
sum(if(month='Nov',revenue,null)) as Nov_Revenue,
sum(if(month='Dec',revenue,null)) as Dec_Revenue
from department
group by id)
#列转行
select id,Jan_Revenue as Revenue,'Jan' as month
from tb
union all
select id,Feb_Revenue as Revenue,'Feb' as month
from tb
union all
select id,Mar_Revenue as Revenue,'Mar' as month
from tb
union all
select id,Apr_Revenue as Revenue,'Apr' as month
from tb
union all
select id,May_Revenue as Revenue,'May' as month
from tb
union all
select id,Jun_Revenue as Revenue,'Jun' as month
from tb
union all
select id,Jul_Revenue as Revenue,'Jul' as month
from tb
union all
select id,Aug_Revenue as Revenue,'Aug' as month
from tb
union all
select id,Sep_Revenue as Revenue,'Sep' as month
from tb
union all
select id,Oct_Revenue as Revenue,'Oct' as month
from tb
union all
select id,Nov_Revenue as Revenue,'Nov' as month
from tb
union all
select id,Dec_Revenue as Revenue,'Dec' as month
from tb;
运行结果:
| id | revenue | month |
|---|---|---|
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | NULL | Jan |
| 1 | 7000 | Feb |
| 2 | NULL | Feb |
| 3 | 10000 | Feb |
| .. | .. | .. |
其中含有很多NULL,不是原来的表,考虑去除。
with tb as(
select id,sum(if(month='Jan',revenue,null)) as Jan_Revenue,
sum(if(month='Feb',revenue,null))as Feb_Revenue,
sum(if(month='Mar',revenue,null)) as Mar_Revenue,
sum(if(month='Apr',revenue,null)) as Apr_Revenue,
sum(if(month='May',revenue,null)) as May_Revenue,
sum(if(month='Jun',revenue,null)) as Jun_Revenue,
sum(if(month='Jul',revenue,null)) as Jul_Revenue,
sum(if(month='Aug',revenue,null)) as Aug_Revenue,
sum(if(month='Sep',revenue,null)) as Sep_Revenue,
sum(if(month='Oct',revenue,null)) as Oct_Revenue,
sum(if(month='Nov',revenue,null)) as Nov_Revenue,
sum(if(month='Dec',revenue,null)) as Dec_Revenue
from department
group by id)
#列转行
select *
from(
select id,Jan_Revenue as Revenue,'Jan' as month
from tb
union all
select id,Feb_Revenue as Revenue,'Feb' as month
from tb
union all
select id,Mar_Revenue as Revenue,'Mar' as month
from tb
union all
select id,Apr_Revenue as Revenue,'Apr' as month
from tb
union all
select id,May_Revenue as Revenue,'May' as month
from tb
union all
select id,Jun_Revenue as Revenue,'Jun' as month
from tb
union all
select id,Jul_Revenue as Revenue,'Jul' as month
from tb
union all
select id,Aug_Revenue as Revenue,'Aug' as month
from tb
union all
select id,Sep_Revenue as Revenue,'Sep' as month
from tb
union all
select id,Oct_Revenue as Revenue,'Oct' as month
from tb
union all
select id,Nov_Revenue as Revenue,'Nov' as month
from tb
union all
select id,Dec_Revenue as Revenue,'Dec' as month
from tb
)tb2
where revenue is not null;
运行结果:
| id | revenue | month |
|---|---|---|
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 1 | 7000 | Feb |
| 3 | 10000 | Feb |
| 1 | 6000 | Mar |
返回了原表。
思路总结
Step1 分别取出各观测各属性的数据,生成属性外其它列与属性列(示例中的month)。
Step1 用union all合并观测。
语句框架:
select xx,yy as 'name1',属性1 as 'name2'
from tb
union all
select xx,yy as 'name1',属性2 as 'name2'
from tb
union all
...