Series

Pandas各个数据类型的关系：0维单值变量->1维Series->2维DataFrame->3维层次化DataFrame

wd.png

import numpy as np
import pandas as pd

创建

列表创建
字典创建
其他创建

列表创建Series

a = pd.Series([2,3,4,5,6])
a

0    2
1    3
2    4
3    5
4    6
dtype: int64

# 自定义索引
b = pd.Series([2,3,4,5,6], index=['a', 'b', 'c', 'd', 'e'])
b

a    2
b    3
c    4
d    5
e    6
dtype: int64

Series可以存储不同类型数据

虽然可以，但建议存储同一类型数据，因为Series更像表格中的列，列应该是同一类型的

c = pd.Series([18, 28, 38])
c

0    18
1    28
2    38
dtype: int64

c2 = pd.Series(['张三', 18, 85.5, True])
c2

0      张三
1      18
2    85.5
3    True
dtype: object

字典创建Series

# 索引就是字典的键
d = pd.Series({
    'name': '张三',
    'age': 18,
    'gander': True,
})

d

name        张三
age         18
gander    True
dtype: object

# 创建时自定义索引会替换字典索引
d2 = {
    'name': '张三',
    'age': 18,
    'gander': True,
}

d = pd.Series(d2, index=['name', 'age', 'score'])
d

name      张三
age       18
score    NaN
dtype: object

其他方式

# 标量创建
pd.Series(5)
pd.Series(5, index=[1,2,3,4,5])

1    5
2    5
3    5
4    5
5    5
dtype: int64

# 序列

range(5)
list(range(5))
for i in range(5):
    print(i)
    
pd.Series(range(5))

0
1
2
3
4





0    0
1    1
2    2
3    3
4    4
dtype: int64

# Numpy的序列函数创建

np.arange(5)
np.arange(2, 5)
np.arange(9, 5, -1)

array([9, 8, 7, 6])

pd.Series(
    np.arange(4),
    index=np.arange(9, 5, -1)
)

9    0
8    1
7    2
6    3
dtype: int32

查询

class1 = pd.Series([95, 25, 59, 90, 61], index=['ming', 'hua', 'hong', 'huang', 'bai'])
class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

查询数据形状

1维数据的形状就是它的值个数

class1.shape
class1.shape[0]

查询值（values）和索引（index）

一个Series数据是由2个ndarray数组组成的

# 查询值
class1.values

array([95, 25, 59, 90, 61], dtype=int64)

type(class1.values)

numpy.ndarray

# 查询索引
class1.index

Index(['ming', 'hua', 'hong', 'huang', 'bai'], dtype='object')

class1.index.values  # 索引的本质也是数组

array(['ming', 'hua', 'hong', 'huang', 'bai'], dtype=object)

# 查询单个索引或值

class1.values[2], class1.index[2], class1.index.values[2]

(59, 'hong', 'hong')

查询值

根据索引查询值
- 索引查询
- 切片查询
根据条件反查索引
- 布尔查询

索引查询

索引和切片查询都是根据索引查询值

class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

查询单值

# Series有两套索引：默认索引，自定义索引
class1['hong']  # 自定义索引
class1[2]  # 默认索引

查询多值

class1[['hua', 'huang']]

hua      25
huang    90
dtype: int64

class1[[1, 3]]

hua      25
huang    90
dtype: int64

# class1[[1, 'hong']]  # 索引不能混用

切片查询

class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

# 默认索引：包含起始值，不包含结束值
class1[:3]  
class1[2:]
class1[1:4]

hua      25
hong     59
huang    90
dtype: int64

# 自定义索引：包含起始和结束值
# 原因是自定义索引没有顺序，难以确定索引前后的值
class1['hua':'huang']

hua      25
hong     59
huang    90
dtype: int64

# 步长
class1[::2]

ming    95
hong    59
bai     61
dtype: int64

# 倒查
class1[::-1]

bai      61
huang    90
hong     59
hua      25
ming     95
dtype: int64

布尔查询

根据值反查索引

根据条件反查索引

class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

# 原生方式查询
for i in class1:
#     print(i)
    if i < 60:
        print(i)

25
59

# 布尔查询
class1[[True, True, False, False, False]]

ming    95
hua     25
dtype: int64

class1 < 60

ming     False
hua       True
hong      True
huang    False
bai      False
dtype: bool

# 布尔查询，布尔值由程序判断自动生成
class1[class1 < 60]

hua     25
hong    59
dtype: int64

向量化运算

矢量运算，并行运算

所有同学加5分

x = {
    'ming': 95,
    'hua': 25,
    'hong': 59,
    'huang': 90,
    'bai': 61,
}

x

{'ming': 95, 'hua': 25, 'hong': 59, 'huang': 90, 'bai': 61}

原生Python字典运算

# 原生python，遍历序列，运算
# 速度慢，效率低
for i in x:
    print(i, x[i]+5)

ming 100
hua 30
hong 64
huang 95
bai 66

Pandas向量化运算

class1 = pd.Series([95, 25, 59, 90, 61], index=['ming', 'hua', 'hong', 'huang', 'bai'])
class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

# 向量化运算，不需要遍历，速度快效率高
class1 + 5

ming     100
hua       30
hong      64
huang     95
bai       66
dtype: int64

应用函数执行向量化运算

{'ming': 95, 'hua': 25, 'hong': 59, 'huang': 90, 'bai': 61}

class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

计算总分

# Python原生方式
y = 0
for i in x:
    y = y + x[i]
    
y

# Pandas向量化方式
class1.sum()  # Pandas方法
np.sum(class1)  # Numpy方法

求平均分

# Python原生方式
y = 0
for i in x:
    y = y + x[i]
    
y/len(x)

66.0

# Pandas向量化方式
class1.sum() / class1.shape[0]   # shape[0]就是读取矩阵第一维度的长度
class1.mean()
np.mean(class1)

66.0

类Numpy数组操作，和类Python字典的操作

Pandas数据可以执行全部Numpy数据操作（因为Pandas底层基于Numpy，所以通用）
也可以执行部分Python原生列表或字典操作（仅限于Pandas实现的操作）
- 保留字in操作
- 使用.get()方法

{'ming': 95, 'hua': 25, 'hong': 59, 'huang': 90, 'bai': 61}

class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

类Numpy数组操作

np.mean(class1)  # Numpy方法

66.0

类Python字典操作

in关键字：判断某索引是否存在
get方法：判断某索引是否存在，存在则直接输出值，不存在则输出定义值

# 字典方法
'ming' in x
'hei' in x

False

# Pandas方法
'ming' in class1
'hei' in class1

False

# 字典方法
x.get('ming', 60)
x.get('hei', 60)

# Pandas方法
class1.get('ming', 60)
class1.get('hei', 60)

修改

class1

ming     95
hua      25
hong     59
huang    90
bai      61
dtype: int64

修改值，values

class1['hua']  # 查询

class1['hua'] = 35
class1

ming     95
hua      35
hong     59
huang    90
bai      61
dtype: int64

# 修改多值

class1[['hua', 'hong']]

class1[['hua', 'hong']] = [32, 59.5]
class1['hua', 'hong'] = [33, 59.6]  # 赋值使用单层括号
class1

ming     95.0
hua      33.0
hong     59.6
huang    90.0
bai      61.0
dtype: float64

修改索引，index

class1.index  # 索引
class1.index[0]  # 索引单值
# class1.index[0] = 'xiaoming'  # 报错，不能直接修改索引

'ming'

class1.index.values  # series的索引底层是数组

array(['ming', 'hua', 'hong', 'huang', 'bai'], dtype=object)

class1.index.values[0]

class1.index.values[0] = 'xiaoming'  # 错误操作，直接修改底层索引，见后
class1

xiaoming    95.0
hua         33.0
hong        59.6
huang       90.0
bai         61.0
dtype: float64

class1['hua']
class1['ming']
# class1['xiaoming']  # 报错，直接修改底层索引后无法查询

33.0

正确操作：使用rename方法修改单个索引

class11 = class1.rename({'xiaoming': '小明', 'hong': '小红'})
class11

小明       95.0
hua      33.0
小红       59.6
huang    90.0
bai      61.0
dtype: float64

class11['小明']

95.0

class1  # rename没有修改原值

xiaoming    95.0
hua         33.0
hong        59.6
huang       90.0
bai         61.0
dtype: float64

Series的层次化索引（了解）

层次化索引会增加Series的维度

class1

xiaoming    95.0
hua         33.0
hong        59.6
huang       90.0
bai         61.0
dtype: float64

class2 = pd.Series([95, 25, 59, 90, 61], index=[['ming', 'ming', 'hong', 'huang', 'bai'], [2,4,6,8,10]])
class2

ming   2     95
       4     25
hong   6     59
huang  8     90
bai    10    61
dtype: int64

层次化索引查询

# 先查最外层索引
class2['ming']

2    95
4    25
dtype: int64

type(class2['ming'])

pandas.core.series.Series

# 再往里层索引查询
class2['ming'][2]

class2['ming', 2]  # 推荐，逗号分隔，每层都是一个维度

type(class2['ming', 2])

numpy.int64

将层次化索引的Series转为DataFrame

unstack()方法将Series内层索引旋转为DataFrame的列索引

行转列，将数据的行“旋转”为列

class2

ming   2     95
       4     25
hong   6     59
huang  8     90
bai    10    61
dtype: int64

class2.unstack()

image.png

Pandas之Series创建和操作

Pandas之Series创建和操作

Series

创建

列表创建Series

字典创建Series

其他方式

查询

查询数据形状

查询值（values）和索引（index）

查询值

索引查询

切片查询

布尔查询

向量化运算

所有同学加5分

应用函数执行向量化运算

类Numpy数组操作，和类Python字典的操作

修改

修改值，values

修改索引，index

正确操作：使用rename方法修改单个索引

Series的层次化索引（了解）

将层次化索引的Series转为DataFrame

推荐阅读更多精彩内容

Pandas之Series创建和操作

Series

创建

列表创建Series

字典创建Series

其他方式

查询

查询数据形状

查询值（values）和索引（index）

查询值

索引查询

切片查询

布尔查询

向量化运算

所有同学加5分

应用函数执行向量化运算

类Numpy数组操作，和类Python字典的操作

修改

修改 值，values

修改索引，index

正确操作：使用rename方法修改单个索引

Series的层次化索引（了解）

将层次化索引的Series转为DataFrame

推荐阅读更多精彩内容

修改值，values