1.variable names 2.variables' type(numerical/categorical)
3.variables' segment 4.expectation(label)
5.variables' correlation matrix
- statistics summary: label.describe()
histogram: sns.distplot(label, fit=norm)
fig = plt.figure()
res = stats.probplot(train['SalePrice'], plot=plt)
plt.show()
(Kurtosis and skewness. Deviate from the normal distribution.)
- Relationship with variables
scatter plots # numerical variables visualization
fig, ax = plt.subplots()
ax.scatter(x = train['GrLivArea'], y = train['SalePrice'])
plt.ylabel('SalePrice', fontsize=13)
plt.xlabel('GrLivArea', fontsize=13)
plt.show()
box plots # categorical variables visualization