前提:project来自datacamp https://www.datacamp.com/projects/10
1.介绍
Everyone loves Lego (unless you ever stepped on one). Did you know by the way that "Lego" was derived from the Danish phrase leg godt, which means "play well"? Unless you speak Danish, probably not.
In this project, we will analyze a fascinating dataset on every single lego block that has ever been built!
2.数据库描述
project只用到了colors,sets这两个table
colors数据库的head
3.问题:
3.1 How many distinct colors are available?
num_colors=colors.shape[0]
num_colors
135
默认颜色全部不重复?
3.2 Transparent Colors in Lego Sets
The colors data has a column named is_trans that indicates whether a color is transparent or not. It would be interesting to explore the distribution of transparent vs. non-transparent colors.
# colors_summary: Distribution of colors based on transparency
# -- YOUR CODE FOR TASK 4 --
colors_summary = colors.groupby('is_trans').count()
colors_summary
3.3 Explore Lego Sets
Another interesting dataset available in this database is the sets
data. It contains a comprehensive list of sets over the years and the number of parts that each of these sets contained.
首先我们先看一下set这个表格的开头
how the average number of parts in Lego sets has varied over the years?
parts_by_year = sets.groupby('year')['num_parts'].mean()
parts_by_year
parts_by_year.plot()
3.4 Lego Themes Over Years
Lego blocks ship under multiple themes. Let us try to get a sense of how the number of themes shipped has varied over the years.
按照年份统计,每一年有多少个不同的主题?
themes_by_year = sets[['year','theme_id']].groupby('year', as_index = False)
for key, item in themes_by_year:
print (themes_by_year.get_group(key), "\n\n")
这个是没有统计theme_id的结果
参考了提示答案,最后的答案是
themes_by_year = sets[['year','theme_id']].groupby('year', as_index = False).agg({'theme_id' : pd.Series.nunique})
themes_by_year
是可以跟没有统计的那个结果相匹配的,提示答案用了pd.Series.nunique这个function。并不会啊。。。
数据以及代码在github:https://github.com/a750208b/datacamp-project/blob/master/notebook.ipynb