文章作者:Tyan
博客:noahsnail.com | CSDN | 简书
本文主要是关于pandas的一些基本用法。
#!/usr/bin/env python
# _*_ coding: utf-8 _*_
import pandas as pd
import numpy as np
# Test 1
# 定义数据
dates = pd.date_range('20170101', periods = 6)
print dates
df = pd.DataFrame(np.arange(24).reshape((6, 4)), index = dates, columns = ['A', 'B', 'C', 'D'])
print df
# Test 1 result
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
'2017-01-05', '2017-01-06'],
dtype='datetime64[ns]', freq='D')
A B C D
2017-01-01 0 1 2 3
2017-01-02 4 5 6 7
2017-01-03 8 9 10 11
2017-01-04 12 13 14 15
2017-01-05 16 17 18 19
2017-01-06 20 21 22 23
# Test 2
# 选择第一列数据
print df['A']
print df.A
# 选择前三行数据
print df[0:3]
print df['20170101':'20170103']
# 根据标签选择
print df.loc['20170101']
# 选择所有行, 特定列
print df.loc[:, ['A', 'B']]
# 选择特定行, 特定列
print df.loc['20170102', ['A', 'B']]
# Test 2 result
2017-01-01 0
2017-01-02 4
2017-01-03 8
2017-01-04 12
2017-01-05 16
2017-01-06 20
Freq: D, Name: A, dtype: int64
2017-01-01 0
2017-01-02 4
2017-01-03 8
2017-01-04 12
2017-01-05 16
2017-01-06 20
Freq: D, Name: A, dtype: int64
A B C D
2017-01-01 0 1 2 3
2017-01-02 4 5 6 7
2017-01-03 8 9 10 11
A B C D
2017-01-01 0 1 2 3
2017-01-02 4 5 6 7
2017-01-03 8 9 10 11
A 0
B 1
C 2
D 3
Name: 2017-01-01 00:00:00, dtype: int64
A B
2017-01-01 0 1
2017-01-02 4 5
2017-01-03 8 9
2017-01-04 12 13
2017-01-05 16 17
2017-01-06 20 21
A 4
B 5
Name: 2017-01-02 00:00:00, dtype: int64
# Test 3
# 根据行列来选择
print df.iloc[3:5, 1:3]
# 不连续的选择
print df.iloc[[1, 3, 5], 2:4]
# 混合选择
print df.ix[[1, 3, 5], ['A', 'B']]
# 对比选择
print df[df.A > 4]
# Test 3 result
B C
2017-01-04 13 14
2017-01-05 17 18
C D
2017-01-02 6 7
2017-01-04 14 15
2017-01-06 22 23
A B
2017-01-02 4 5
2017-01-04 12 13
2017-01-06 20 21
A B C D
2017-01-03 8 9 10 11
2017-01-04 12 13 14 15
2017-01-05 16 17 18 19
2017-01-06 20 21 22 23