DataFrame是Pandas的另一个数据结构
1. DataFrame是通用的Numpy数组
import pandas as pd
>> area_dict = {
"California":423967,
"Texas":695662,
"New York":141297,
"Florida":170312,
"Illinois":149995
}
>> area = pd.Series(area_dict)
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
dtype: int64
>> population_dict = {
"California":38332521,
"Texas":26448193,
"New York":19651127,
"Florida":19552860,
"Illinois":12882135
}
>> population = pd.Series(population_dict)
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
1.结合上面两个Series对象area和population,用一个字典创建包含这些信息的二维对象
states = pd.DataFrame({'population':population,'area':area})
area population
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193
2.查看索引
>> states.index
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')
3.查看列名
>> states.columns
Index(['area', 'population'], dtype='object')
因此DataFrame可以看做一种通用的Numpy二维数组,它的行和列都可以通过索引获取
2. DataFrame是特殊的字典。
DataFrame是一列映射一个Series的数据。通过 'area'的列属性可以返回包含面积数据的Series对象:
>> states['area']
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64
3. 创建DataFrame对象。
Pandas的DataFrame对象可以通过许多种方式创建。
1.通过单个Series对象创建。DataFrame是一组Series对象的集合。
可以用单个Series创建一个单列的DataFrame:
>> population_dict = {
"California":38332521,
"Texas":26448193,
"New York":19651127,
"Florida":19552860,
"Illinois":12882135
}
>> population = pd.Series(population_dict)
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
>> pd.DataFrame(population,columns=['population'])
population
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
2.通过字典列表创建
>> data = [{'a':i,'b':2*i} for i in range(3)]
[{'a': 0, 'b': 0}, {'a': 1, 'b': 2}, {'a': 2, 'b': 4}]
>> pd.DataFrame(data)
a b
0 0 0
1 1 2
2 2 4
3.通过Series对象字典创建,像之前那样:
>> data = pd.DataFrame({'population':population,'area':area})
area population
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193
4.通过Numpy二维数组创建。
假如有一个二维数组,就可以通过创建一个可以指定行列索引值的DataFrame。
如果不指定行列索引,那么行列默认都是整数索引值
>> pd.DataFrame(np.random.rand(3,2),
columns = ['foo','bar'],
index = ['a','b','c'])
foo bar
a 0.882203 0.474690
b 0.969104 0.842780
c 0.637580 0.755599
5.通过Numpy结构化数组创建
>> A = np.zeros(3,dtype=[('A','i8'),('B','f8')])
array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])
>> pd.DataFrame(A)
A B
0 0 0.0
1 0 0.0
2 0 0.0