重塑和轴向旋转
有许多用于重新排列表格型数据的基础运算。这些函数也称作重塑(reshape)或轴向旋转(pivot)运算。
数据重塑和轴向旋转操作:表示转换一个表格或向量的结构,使其适合于进一步的分析
重塑层次化索引
层次化索引为DataFrame数据的重排任务提供了一种具有良好一致性的方式。主要功能有二:
- stack:列转行,将数据的列索引“旋转”为行索引
(少用一般用于将DataFrame转为层次化Series用) - unstack:行转列,将数据的行索引“旋转”为列索引***************
(常用:一般用于将层次化Series转为DataFrame)
二者互为逆运算
Series没有stack属性
In [1]:
import numpy as np
import pandas as pd
In [2]:
# 一个行列索引都带name的DataFrame对象
data = pd.DataFrame(
np.arange(6).reshape((2, 3)),
index=pd.Index(['Ohio', 'Colorado'], name='state'),
columns=pd.Index(['one', 'two', 'three'], name='number')
)
data
Out[2]:
| number | one | two | three |
|---|---|---|---|
| state | |||
| Ohio | 0 | 1 | 2 |
| Colorado | 3 | 4 | 5 |
In [3]:
data.index
Out[3]:
Index(['Ohio', 'Colorado'], dtype='object', name='state')
In [4]:
data.columns
Out[4]:
Index(['one', 'two', 'three'], dtype='object', name='number')
stack:列转行,将数据的列索引“旋转”为行索引
列索引,逆时针旋转为行索引
In [5]:
data
Out[5]:
| number | one | two | three |
|---|---|---|---|
| state | |||
| Ohio | 0 | 1 | 2 |
| Colorado | 3 | 4 | 5 |
In [7]:
result = data.stack()
result
Out[7]:
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32
In [8]:
type(result)
Out[8]:
pandas.core.series.Series
行转列:unstack将series层次化索引顺时针旋转为列索引
对层次化索引的Series,可以用unstack将其重排为一个DataFrame
In [10]:
result
Out[10]:
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32
In [11]:
result.unstack()
Out[11]:
| number | one | two | three |
|---|---|---|---|
| state | |||
| Ohio | 0 | 1 | 2 |
| Colorado | 3 | 4 | 5 |
In [12]:
data
Out[12]:
| number | one | two | three |
|---|---|---|---|
| state | |||
| Ohio | 0 | 1 | 2 |
| Colorado | 3 | 4 | 5 |
In [13]:
data.unstack()
Out[13]:
number state
one Ohio 0
Colorado 3
two Ohio 1
Colorado 4
three Ohio 2
Colorado 5
dtype: int32
In [17]:
# result.stack() # 报错。Series对象没有列,所以没有stack方法
unstack默认操作的是最内层(stack也是如此)。传入分层级别的编号或name即可对其它级别进行unstack操作
In [18]:
result
Out[18]:
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32
In [24]:
result.unstack() # 默认旋转最里层索引
result.unstack(0) # 旋转第0层索引,最外层
result.unstack(1) # 旋转第1层索引
result.unstack('state') # 写索引name也行
Out[24]:
| state | Ohio | Colorado |
|---|---|---|
| number | ||
| one | 0 | 3 |
| two | 1 | 4 |
| three | 2 | 5 |
转置
行索引和列索引,交换位置
In [25]:
data
Out[25]:
| number | one | two | three |
|---|---|---|---|
| state | |||
| Ohio | 0 | 1 | 2 |
| Colorado | 3 | 4 | 5 |
In [26]:
data.unstack()
Out[26]:
number state
one Ohio 0
Colorado 3
two Ohio 1
Colorado 4
three Ohio 2
Colorado 5
dtype: int32
In [27]:
data.unstack().unstack()
Out[27]:
| state | Ohio | Colorado |
|---|---|---|
| number | ||
| one | 0 | 3 |
| two | 1 | 4 |
| three | 2 | 5 |
In [28]:
data.T # 转置快捷操作
Out[28]:
| state | Ohio | Colorado |
|---|---|---|
| number | ||
| one | 0 | 3 |
| two | 1 | 4 |
| three | 2 | 5 |
如果不是所有级别值都能在各分组中找到的话,unstack操作会引入缺失数据
In [29]:
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
data2 = pd.concat([s1, s2], keys=['one', 'two'])
data2
Out[29]:
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64
In [30]:
data2.unstack()
Out[30]:
| a | b | c | d | e | |
|---|---|---|---|---|---|
| one | 0.0 | 1.0 | 2.0 | 3.0 | NaN |
| two | NaN | NaN | 4.0 | 5.0 | 6.0 |
stack默认会滤除缺失数据,因此该运算是可逆的
In [32]:
data2.unstack().stack()
Out[32]:
one a 0.0
b 1.0
c 2.0
d 3.0
two c 4.0
d 5.0
e 6.0
dtype: float64
In [33]:
data2.unstack().stack(dropna=False) # 不消除空值
Out[33]:
one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64
轴向旋转,综合练习
对DataFrame进行unstack操作时,作为旋转轴的级别将会成为结果中的最低级别
In [34]:
df = pd.DataFrame({'left': result, 'right': result + 5}, columns=pd.Index(['left', 'right'], name='side'))
df
Out[34]:

image.png
In [36]:
df.unstack() # 最里边的行索引 ,转为列索引仍然是最里边
Out[36]:

image.png
In [38]:
df.unstack('state') # 手动指定索引name
Out[38]:

image.png
In [41]:
df.unstack('state').stack() # 转2次,行索引位置交换
Out[41]:

image.png
In [43]:
df.unstack('state').stack('side')
Out[43]:

image.png
In [42]:
df
Out[42]:

image.png