多变量分析绘图
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
np.random.seed(sum(map(ord, "regression")))
tips = sns.load_dataset("tips")
tips.head()
regplot()和lmplot()都可以绘制回归关系,推荐regplot()
regplot可传的参数更多(高手可以用,功能多,规范多)
sns.regplot(x="total_bill", y="tip", data=tips) #传入数据集 及字段
<matplotlib.axes._subplots.AxesSubplot at 0x20543531470>
可以查看小费和消费总额之间的关系
sns.lmplot(x="total_bill", y="tip", data=tips);
sns.regplot(data=tips,x="size",y="tip")
<matplotlib.axes._subplots.AxesSubplot at 0x20543649b70>
sns.regplot(x="size", y="tip", data=tips, x_jitter=.05) # x_jitter 加入范围浮动
<matplotlib.axes._subplots.AxesSubplot at 0x205437d3f60>
anscombe = sns.load_dataset("anscombe")
sns.regplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),
ci=None, scatter_kws={"s": 100})
<matplotlib.axes._subplots.AxesSubplot at 0x205451a0908>
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
ci=None, scatter_kws={"s": 80})
<seaborn.axisgrid.FacetGrid at 0x20543912e48>
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
order=2, ci=None, scatter_kws={"s": 80});
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips);
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,
markers=["o", "x"], palette="Set1");
sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips);
sns.lmplot(x="total_bill", y="tip", hue="smoker",
col="time", row="sex", data=tips);
f, ax = plt.subplots(figsize=(5, 5))
sns.regplot(x="total_bill", y="tip", data=tips, ax=ax);
col_wrap:“Wrap” the column variable at this width, so that the column facets span multiple rows
size :Height (in inches) of each facet
sns.lmplot(x="total_bill", y="tip", col="day", data=tips,
col_wrap=2, size=4);
sns.lmplot(x="total_bill", y="tip", col="day", data=tips,
aspect=.8);
分析-----------------------------------------------------------------------
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid", color_codes=True)
np.random.seed(sum(map(ord, "categorical")))
titanic = sns.load_dataset("titanic")
tips = sns.load_dataset("tips")
iris = sns.load_dataset("iris")
sns.stripplot(x="day", y="total_bill", data=tips);
重叠是很常见的现象,但是重叠影响我观察数据的量了
sns.stripplot(x="day", y="total_bill", data=tips, jitter=True)
<matplotlib.axes._subplots.AxesSubplot at 0x26b010c6860>
sns.swarmplot(x="day", y="total_bill", data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x26b0112b908>
sns.swarmplot(x="day", y="total_bill", hue="sex",data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x26b01154b70>
sns.swarmplot(x="total_bill", y="day", hue="time", data=tips);
盒图
- IQR即统计学概念四分位距,第一/四分位与第三/四分位之间的距离
- N = 1.5IQR 如果一个值>Q3+N或 < Q1-N,则为离群点
sns.boxplot(x="day", y="total_bill", hue="time", data=tips);
sns.violinplot(x="total_bill", y="day", hue="time", data=tips);
#小提琴图
sns.violinplot(x="day", y="total_bill", hue="sex", data=tips, split=True); #split拆分
sns.violinplot(x="day", y="total_bill", data=tips, inner=None)
sns.swarmplot(x="day", y="total_bill", data=tips, color="w", alpha=.5)
<matplotlib.axes._subplots.AxesSubplot at 0x26b01147b38>
显示值的集中趋势可以用条形图
sns.barplot(x="sex", y="survived", hue="class", data=titanic);
点图可以更好的描述变化差异
sns.pointplot(x="sex", y="survived", hue="class", data=titanic);
sns.pointplot(x="class", y="survived", hue="sex", data=titanic,
palette={"male": "g", "female": "m"},
markers=["^", "o"], linestyles=["-", "--"]); #指定线型及符号(三角形与圆)
宽形数据
sns.boxplot(data=iris,orient="h"); # orient :h 横着画盒图
多层面板分类图
sns.factorplot(x="day", y="total_bill", hue="smoker", data=tips)
<seaborn.axisgrid.FacetGrid at 0x26b016a8860>
sns.factorplot(x="day", y="total_bill", hue="smoker", data=tips, kind="bar")
<seaborn.axisgrid.FacetGrid at 0x26b01692978>
sns.factorplot(x="day", y="total_bill", hue="smoker",
col="time", data=tips, kind="swarm")
<seaborn.axisgrid.FacetGrid at 0x26b0290e6d8>
sns.factorplot(x="time", y="total_bill", hue="smoker",
col="day", data=tips, kind="box", size=4, aspect=.5)
<seaborn.axisgrid.FacetGrid at 0x26b0280b3c8>
seaborn.factorplot(x=None, y=None, hue=None, data=None, row=None, col=None, col_wrap=None, estimator=<function mean>, ci=95, n_boot=1000, units=None, order=None, hue_order=None, row_order=None, col_order=None, kind='point', size=4, aspect=1, orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True, sharey=True, margin_titles=False, facet_kws=None, **kwargs)
Parameters:
- x,y,hue 数据集变量 变量名
- date 数据集 数据集名
- row,col 更多分类变量进行平铺显示 变量名
- col_wrap 每行的最高平铺数 整数
- estimator 在每个分类中进行矢量到标量的映射 矢量
- ci 置信区间 浮点数或None
- n_boot 计算置信区间时使用的引导迭代次数 整数
- units 采样单元的标识符,用于执行多级引导和重复测量设计 数据变量或向量数据
- order, hue_order 对应排序列表 字符串列表
- row_order, col_order 对应排序列表 字符串列表
- kind : 可选:point 默认, bar 柱形图, count 频次, box 箱体, violin 提琴, strip 散点,swarm 分散点
size 每个面的高度(英寸) 标量
aspect 纵横比 标量
orient 方向 "v"/"h"
color 颜色 matplotlib颜色
palette 调色板 seaborn颜色色板或字典
legend hue的信息面板 True/False
legend_out 是否扩展图形,并将信息框绘制在中心右边 True/False
share{x,y} 共享轴线 True/False