<<Data Visualization and Exploration with R>> by Eric Pimpler
定义
Exploratory Data Analysis (EDA)
目的:理解数据
步骤:
- 产生问题
- 对数据进行可视化,并寻找答案
- 改进或提出新的问题
数据类型
- categorical 离散变量,有限的小数据
- continuous 连续变量, 无限地,有序的
可视化的用途
发现变量的变化或多个变量之间的共变(variation or covariation)
方法
- 条形图 : Measuring categorical variation with a bar chart.
- 直方图 : Measuring continuous variation with a histogram,分布
- 箱体图: Measuring covariation with boxplots
- 符号大小:Measuring covariation with symbol size
- 散点图:correlation 相关性
Covariation is the tendency for the values of two or more variables to vary together in a related way. The best way to spot covariation is to visualise the relationship between two or more variables
条形图示例 bar plot
X轴 = 离线变量-color,Y轴 = 每种颜色的钻石数量
直方图示例
X轴 = 连续变量-price, Y轴=每种价格的频率/数量
箱体图示例
X轴 = 离线变量-cut品质,Y轴 = 价格,可以比较每种品质钻石的价格分布,比如中位值,离群值等