title: An R Markdown document converted from "04.ipynb"
output: html_document
一、Pandas 库
Pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。
但是与NumPy不同,Pandas更适合处理表格型或异质性数据(NumPy更适合处理同质型的
数值类数组数据),并提供了大量数学函数及计算方法。
import numpy as np
import pandas as pd
二、Pandas 库数据结构——Series, DataFrame
1. Series——索引index,值values
a = pd.Series([1, 2, 3, 4, 5])
a
a.index
a.values
a = pd.Series([1, 2, 3, 4, 5], index = ['a', 'b', 'c', 'd', 'e'])
a
a.index
a_reindex = pd.Series(a, index = ['e', 'b', 'c', 'd', 'a'])
a_reindex
a_reindex = pd.Series(a, index = ['e', 'b', 'c', 'd', 'a','f', 'g'])
a_reindex
a
b= a.reindex(['e', 'b', 'c', 'd', 'a','f', 'g'])
b
a
a.rename(index={'a':'h','b':'i','c':'j','d':'k','e':'l'})
a
a.index = ['e', 'b', 'c', 'd', 'a']
a
a.index = ['e', 'b', 'c', 'd', 'a']
a
b = np.array(a)
b
c = pd.Series(b)
c
data = {'yuwen': 80, 'yingyu': 90, 'shuxue': 80}
data
type(data)
data_ = pd.Series(data,index = ['yingyu','yuwen','shuxue'])
data_1 = pd.Series(data)
data_
data_1
data_.index
data_.name = 'Score'
data_.index.name = 'Course'
data_
data_.index
2. DataFrame——索引index, columns,值values
data = np.array([[95, 96, 97],
[80, 85, 86],
[56, 65, 70]])
data
data1 = np.array(1)
data1
frame = pd.DataFrame(data)
frame
frame = pd.DataFrame(data, index=['xiaoming', 'xiaohong', 'xiaohei'],
columns=['yuwen', 'yingyu', 'shuxue'])
frame
frame_ = pd.DataFrame(frame, index=[ 'xiaohong', 'xiaoming','xiaohei'],
columns=['yingyu','yuwen', 'shuxue'])
frame_
frame__ = pd.DataFrame(frame, index=[ 'xiaohong', 'xiaoming','xiaohei','xiaobai'],
columns=['yingyu',
'yuwen', 'shuxue', 'tiyu'])
frame__
frame_.reindex(index=[ 'xiaohong', 'xiaoming','xiaohei','xiaobai'],
columns=['yingyu','yuwen', 'shuxue', 'tiyu'])
frame_
frame_.rename(index={"xiaohong":"damao","xiaoming":"ermao","xiaohei":"Nicolas Cage"},
columns={"yingyu":"English", "yuwen":"Literature", "shuxue":"Maths"})
frame_
frame_.index = ['damao','ermao','Nicolas Cage']
frame_.columns = ['English', 'Literature', 'Maths']
frame_
data = {"English":[80,70,60],
"Literature":[70,70,85],
"Maths":[80,90,50],
"Music":["A","B","C"]}
type(data)
df = pd.DataFrame(data)
df
df = pd.DataFrame(data, index = ["alpha", "beta","theta"])
df
df.index
df.columns
df.name = 'Score'
df.index.name = 'Person'
df.columns.name = 'Course'
df
小结:
1.Series, DataFrame 结构
2.指定或修改索引方法
index,columns 指定索引,已经有索引可以按索引重新排序
reindex 通过reindex方法,重新建立索引或排序
rename 修改索引
Series.index = []
DataFrame.columns = []
df.info()
str(df)
备注:
元组,一种固定长度的,不可变的python对象序列
列表,长度可变的,内容可修改的序列
ndarray,高效多维同类数据容器,提供便捷的算数操作及广播功能
Dataframe, 异质性矩阵表,每一列(columns)可以是不同的值类型
三、Series, DataFrame 运算
1. 基本运算
s1 = pd.Series([1, 2, 3],
index = ['a','b','c'])
s1
s1 - 1
s2 = pd.Series([4, 5, 6],
index = ['b','c','e'])
s2
s1 + s2
data = {"English":[80,70,60],
"Literature":[70,70,85],
"Maths":[80,90,50],
"Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df
df * 2
data1 = {"English":[80,70,60],
"Literature":[70,70,85],
"Maths":[80,90,50],}
df1 = pd.DataFrame(data1,index = ["alpha", "beta","theta"])
df1
df + df1
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
df1
add_
df1 + add_
add1_ = {'alpha':10,'beta':10,'theta':20,}
add1_ = pd.Series(add1_)
add1_
df1+add1_
df1
df1.add(add1_,axis='index')
2. 矩阵运算、通用函数运算
df
df.T
df
np.square(df)
np.square(df1)
3. 基本统计方法
df.max(axis=0)
df.mean(axis=1)
df.describe()
df.info
四、Series, DataFrame 索引与切片
1. Series 索引
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
add_['Maths']
add_['Maths':'Literature']
add_[2]
add_[:2]
add_[[True, False, True, False]]
add_.English
2. DataFrame 索引
2.1 通过索引名称进行索引
df.dtypes
df
df['Maths']
df[['Maths','English']]
df.Maths
df['alpha']
df.loc['alpha']
df.loc['alpha':'theta']
df.alpha
2.2 通过数字进行索引
df.iloc[2]
df.iloc[:,2]
df.iloc[:2,:2]
df[:2]
df[2]
df[:2,:2]
2.3 通过布尔值索引
df1 >70
df1[df1>70]
df1[df1['Maths']>70]
df1['Maths']>70
df1[df1['Maths']>70] = 70
df1
五、Series, DataFrame 删除操作
1. Series 删除操作
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
add_.pop('Maths')
add_
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
add_.drop('Maths')
add_
add_.drop('Maths',inplace=True)
add_
del add_['English']
add_
2. DataFrame 的删除
data = {"English":[80,70,60],
"Literature":[70,70,85],
"Maths":[80,90,50],
"Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df
df.pop("Music")
df
df.drop('alpha')
df.drop('Maths',axis=1)
df
del df['Maths']
df
del df.loc['alpha']
六. Series, DataFrame 合并操作
1. Series 合并操作
s1 = pd.Series([1, 2, 3],
index = ['a','b','c'])
s1
s2 = pd.Series([4, 5, 6],
index = ['b','c','e'])
s2
pd.concat((s1,s2))
pd.concat((s1,s2),axis =1)
s1.combine_first(s2)
2. DataFrame 合并操作
data = {"English":[80,70,60],
"Literature":[70,70,85],
"Maths":[80,90,50],
"Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df
data1 = {"English":[80,70,60],
"Maths":[80,90,50],
"Literature":[70,70,85],}
df1 = pd.DataFrame(data1,index = ["beta","alpha","theta"])
df1
pd.concat((df,df1))
pd.concat((df,df1),axis=1)
df1.combine_first(df)
data = {"English":[80,70,60],
"Literature":[70,70,85],
"Maths":[80,90,50],
"Music":["A","B","C"],
"ID":[1001,1002,1003]}
df = pd.DataFrame(data)
df
data1 = {"English":[80,70,60],
"Literature":[70,70,85],
"Maths":[80,90,50],
"ID":[1004,1002,1003]}
df1 = pd.DataFrame(data1)
df1
pd.concat((df,df1),axis=1)
pd.merge(df,df1,on='ID')
pd.merge(df,df1,on='ID',how="outer")
df.set_index('ID', inplace=True)
df
df1.set_index('ID', inplace=True)
df1
df.join(df1, how='outer', lsuffix='df', rsuffix='df1')
七. Pandas 库其他常用函数或方法
df3 = pd.concat((df,df1),axis=0)
df3
df3.head()
df3.info()
df3.describe()
df3.sort_index(axis=0)
df3.sort_values(by=['Maths'])
df3.index.is_unique
df3['English'].is_unique
df3.index.value_counts()
df3.Music.value_counts()
df3['Maths'].rank()
df3['Maths'].rank(method = 'first')
总结
一、Pandas库
二、Pandas库数据结构——Series, DataFrame
1.Series——索引 index,值 values
2.DataFrame——索引index, columns,值 values
指定或修改索引方法
创建时:index, columns 指定索引,已经有索引可以按索引重新排序
创建后:
reindex方法,重新建立索引或指定索引排序
rename 修改索引
Series.index = []
DataFrame.columns = []
三、Series, DataFrame运算
1.基本运算
按照索引位置进行计算
DataFrame、Series “相加”时,按照DF的columns进行匹配
2.矩阵运算、通用函数
3.基本统计方法 axis指定操作轴
四、Series, DataFrame 索引与切片
1.Series 索引与切片 Index索引/数字索引/布尔值索引
2.DataFrame 索引与切片
Index索引 列:df['Maths'] 行:df.loc[‘alpha’]
数字索引 df.iloc[] 特别的行可以直接用数字切片索引
布尔值索引
五、Series, DataFrame 删除操作
1.Series删除操作 pop/drop/del
2.DataFrame删除操作 pop/drop/del
六、Series, DataFrame 合并操作
1.Series合并操作
pd.concat() combine_first()
2.DataFrame合并操作
pd.concat() combine_first()
pd.merge() join()
七、Pandas库其他常用函数或方法
head() info() describe()
sort_index() sort_values()
is_unique value_counts()
rank()