CART是一颗二叉树(分类或回归)
分类树的节点分裂
基于Gini指数
数据集,预测婚姻
ID | Occupation | Marital Status |
---|---|---|
1 | Student | S |
2 | Student | S |
3 | Teacher | M |
4 | Officer | M |
5 | Officer | M |
6 | Teacher | S |
7 | Student | M |
演示:
选择Gini最小的分裂
最终选择{Officer}、{Student、Teacher}的划分方法
回归树的节点分裂
基于方差
数据集,预测年龄
ID | Occupation | Age |
---|---|---|
1 | Student | 12 |
2 | Student | 18 |
3 | Teacher | 26 |
4 | Officer | 47 |
5 | Officer | 36 |
6 | Teacher | 29 |
7 | Student | 21 |
演示:
选择方差最小的分裂
最终选择{Officer}、{Student, Teacher}的划分方法
连续变量的分裂和C4.5类似
数据集,预测职业
ID | Age | Occupation |
---|---|---|
1 | 12 | Student |
2 | 18 | Student |
7 | 21 | Student |
3 | 26 | Teacher |
6 | 29 | Teacher |
5 | 36 | Officer |
4 | 47 | Officer |
演示
选择Gini最小的分裂
最终选择{<26, >=26}的划分方法