无意间发现了一篇很有意思的文章:R语言可视化探索NBA14-15赛季的各种数据,数据集来源于kaggle called NBA shot-log.原文的作者是一个看球10多年的老球迷(as a basketball fan for more than 10 years),他提出的问题是:NBA中顶级的篮球远动员在整个常规赛中会有上千次的投篮,通过这些数据我们可以得到什么样的不能直接由电视直播中得到的信息(In the NBA, a top player makes around a thousand shots during the entire regular season. A question worth asking: what information can we get by looking at these shots. I am particularly intrested in discovering facts that can not be directly seen on live TV)。本次分析主要集中在Curry,Harden,LBJ,Westbrook四位顶级的球星,他们14-15赛季MVP排行榜上的top4,毫无疑问的联盟顶级巨星(who are ranked 1-4 in the MVP ballot in 2014-to-2015 season and undoubtedly superstars in the league)。原文地址Visualizing the Game Style and Shooting Performance among Superstars via NBA Shot-log | NYC Data Science Academy Blog,以下简单记录自己重复的过程。
第一幅图是四位联盟顶级球星的投篮分布密度图(原文的数据是联盟所有球员的数据,而且含有缺失值,所以需要对数据进行清洗,清洗数据的代码非常长,有时间可以重复其清洗数据的代码,非常好的学习素材,附上自己按照原文代码清洗好的数据一份cleaning.RData 密码t7pg ,直接通过load()加载即可)可视化需要用到的包ggplot2和ggthemes,通过install.packages()函数即可安装。
结果图
通过上图我们可以非常直观的看出四位球星的投篮分布:Curry、Harden、LBJ的投篮密度分布都表现出明显的双峰,他们三人的进攻选择主要是篮下和三分,Curry三分球比重最大,Harden三分和篮下基本持平,LBJ篮下进攻略多于三分。威少的投篮密度分布呈三峰(威三峰),说明中距离也是威少主要的进攻手段之一,而且中距离出手略多于三分,篮下进攻所占比重最大。
The graph above demonstrates the distribution of the shot attempts by each player versus shot distance. All four players have a local maximum centered at around 5 feet and 25 feet, corresponding to lay-up region and three-point region. Curry has the shot density leaning towards three-point zone while James shot more shots at the paint zone, indicating different play style between two players. it can also be seen that Westbrook uses two-point jumper frequently, as suggested by the peak at around 17 feet。
PS:原文还有很多有意思的结果,抽时间慢慢学习!