数据源:链接: https://pan.baidu.com/s/1EFqJFXf70t2Rubkh6D19aw 提取码: syqg
数据源示例:
探索1960 - 2014 美国犯罪数据
步骤1 导入必要的库
import pandas as pd
import numpy as np
步骤2 从以下地址导入数据集
path1='pandas_exercise\exercise_data/US_Crime_Rates_1960_2014.csv'
步骤3 将数据框命名为crime
crime=pd.read_csv(path1)
print(crime.head())
步骤4 每一列(column)的数据类型是什么样的?用info
print(crime.info())
步骤5 将Year的数据类型转换为 datetime64 用pd.to_datetime
crime['Year']=pd.to_datetime(crime.Year,format='%Y')
print(crime.head())
步骤6 将列Year设置为数据框的索引 用set_index
crime=crime.set_index('Year',drop=True)
print(crime.head())
步骤7 删除名为Total的列 用del
del crime['Total']
print(crime.head())
步骤8 按照Year对数据框进行分组并求和 每十年 时间序列重采样resample
crimes=crime.resample('10AS').sum() #对每一列进行十年加和运算
crimes['Population']=crime['Population'].resample('10AS').max() #用resample去得到“Population”列的最大值,并替换
print(crimes)
步骤9 何时是美国历史上生存最危险的年代?
print(crime.idxmax(0)) #采用idxmax()
函数用于沿索引轴查找最大值的索引
示例:
输出
# 步骤3
Year Population Total ... Burglary Larceny_Theft Vehicle_Theft
0 1960 179323175 3384200 ... 912100 1855400 328200
1 1961 182992000 3488000 ... 949600 1913000 336000
2 1962 185771000 3752200 ... 994300 2089600 366800
3 1963 188483000 4109500 ... 1086400 2297800 408300
4 1964 191141000 4564600 ... 1213200 2514400 472800
[5 rows x 12 columns]
# 步骤4
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 55 non-null int64
1 Population 55 non-null int64
2 Total 55 non-null int64
3 Violent 55 non-null int64
4 Property 55 non-null int64
5 Murder 55 non-null int64
6 Forcible_Rape 55 non-null int64
7 Robbery 55 non-null int64
8 Aggravated_assault 55 non-null int64
9 Burglary 55 non-null int64
10 Larceny_Theft 55 non-null int64
11 Vehicle_Theft 55 non-null int64
dtypes: int64(12)
memory usage: 5.3 KB
None
# 步骤5
Year Population Total ... Burglary Larceny_Theft Vehicle_Theft
0 1960-01-01 179323175 3384200 ... 912100 1855400 328200
1 1961-01-01 182992000 3488000 ... 949600 1913000 336000
2 1962-01-01 185771000 3752200 ... 994300 2089600 366800
3 1963-01-01 188483000 4109500 ... 1086400 2297800 408300
4 1964-01-01 191141000 4564600 ... 1213200 2514400 472800
[5 rows x 12 columns]
# 步骤6
Population Total ... Larceny_Theft Vehicle_Theft
Year ...
1960-01-01 179323175 3384200 ... 1855400 328200
1961-01-01 182992000 3488000 ... 1913000 336000
1962-01-01 185771000 3752200 ... 2089600 366800
1963-01-01 188483000 4109500 ... 2297800 408300
1964-01-01 191141000 4564600 ... 2514400 472800
[5 rows x 11 columns]
# 步骤7
Population Violent ... Larceny_Theft Vehicle_Theft
Year ...
1960-01-01 179323175 288460 ... 1855400 328200
1961-01-01 182992000 289390 ... 1913000 336000
1962-01-01 185771000 301510 ... 2089600 366800
1963-01-01 188483000 316970 ... 2297800 408300
1964-01-01 191141000 364220 ... 2514400 472800
[5 rows x 10 columns]
# 步骤8
Population Violent ... Larceny_Theft Vehicle_Theft
Year ...
1960-01-01 201385000 4134930 ... 26547700 5292100
1970-01-01 220099000 9607930 ... 53157800 9739900
1980-01-01 248239000 14074328 ... 72040253 11935411
1990-01-01 272690813 17527048 ... 77679366 14624418
2000-01-01 307006550 13968056 ... 67970291 11412834
2010-01-01 318857056 6072017 ... 30401698 3569080
[6 rows x 10 columns]
# 步骤9
Population 2014-01-01
Violent 1992-01-01
Property 1991-01-01
Murder 1991-01-01
Forcible_Rape 1992-01-01
Robbery 1991-01-01
Aggravated_assault 1993-01-01
Burglary 1980-01-01
Larceny_Theft 1991-01-01
Vehicle_Theft 1991-01-01
dtype: datetime64[ns]