1.0 理论
熵
条件熵
信息增益
信息增益比
1.0 sklearn.tree
首先,http://scikit-learn.org给的入门代码是有问题的...
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.externals.six import StringIO
import pydot
dot_data = StringIO()
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
tree.export_graphviz(clf, out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
这么粘下来,报的第一个错是:
AttributeError: 'list' object has no attribute 'write_pdf'
不禁显然了深深的思考...
然后stackoverflow告诉我,pydot已经升级了,请使用plus版...
于是麻溜的,pydotplus搞起!
果然,报错变了!(我就知道不会这么顺利...)
InvocationException:GraphViz's executables not found
赶紧再google起来,stackoverflow这次告诉我:小子!你没装GraphViz或者没配环境吧!
soga!GraphViz装起来~
搜一个GraphViz安装大保健~安装,重启IDE
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.externals.six import StringIO
from IPython.display import Image
import numpy as np
import pandas as pd
import os
import pydotplus
iris = load_iris()
test = tree.DecisionTreeClassifier()
test = test.fit(iris.data, iris.target)
dot_data = StringIO()
tree.export_graphviz(test, out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
完美~
接下来研究怎么出图....