某个网站/服务器上,请求链接的用户信息。资料集中有电脑类型、浏览器链接等。用python进行简单的处理和绘图分析。
# INPUT uses python 3.6
import json
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
path = 'usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]
frame = pd.DataFrame(records)
results = pd.Series([x.split()[0] for x in frame.a.dropna()])
# print(results[:5])
cframe = frame[frame.a.notnull()]
operating_systems = np.where(cframe['a'].str.contains('Windows'),
'Windows','Not Windows')
by_tz_os = cframe.groupby(['tz',operating_systems])
agg_counts = by_tz_os.size().unstack().fillna(0)
indexer = agg_counts.sum(1).argsort()
count_subset = agg_counts.take(indexer)[-10:]
normed_subset = count_subset.div(count_subset.sum(1),axis = 0)
normed_subset.plot(kind='barh',stacked = True)
plt.show()
# OUT
Not Windows Windows
tz
America/Sao_Paulo 13.0 20.0
Europe/Madrid 16.0 19.0
Pacific/Honolulu 0.0 36.0
Asia/Tokyo 2.0 35.0
Europe/London 43.0 31.0
America/Denver 132.0 59.0
America/Los_Angeles 130.0 252.0
America/Chicago 115.0 285.0
245.0 276.0
America/New_York 339.0 912.0
2018.7.16
学习笔记《用python进行数据分析》,非原创,仅作学习存档用途。 在草稿箱放太久有点忘了。