Python机器学习-决策树的构建
决策树(ID3算法)
开发环境为anaconda中的spyder,所有库已经默认安装,若使用其它环境需要安装外部库,主要代码如下:
# -*- coding: utf-8 -*-
"""
Created on Tue Aug 1 16:09:50 2017
@author: Administrator
"""
# 用来提供数据转换
from sklearn.feature_extraction import DictVectorizer
# 处理csv文件
import csv
from sklearn import preprocessing
# 决策树算法
from sklearn import tree
from sklearn.externals.six import StringIO
allElectronicsData = open(r'D:/data_com.csv')
read = csv.reader(allElectronicsData)
headers = next(read)
featureList = []
labelList = []
for row in read:
labelList.append(row[-1])
rowDict = {}
# 从第二列到倒数第二列
for i in range(1,len(row)-1):
rowDict[headers[i]] = row[i]
featureList.append(rowDict)
# print(featureList)
# 创建一个dummyX 001:youth 010:sensior 100:middle_aged 等等
vec = DictVectorizer()
dummyX = vec.fit_transform(featureList).toarray()
lb = preprocessing.LabelBinarizer()
dummyY = lb.fit_transform(labelList)
# print(dummyX)
# print(vec)
# print(dummyY)
clf = tree.DecisionTreeClassifier(criterion = 'entropy')
clf = clf.fit(dummyX,dummyY)
with open('D:/allElectronicInformationGainOri.dot','w') as f:
f = tree.export_graphviz(clf,feature_names=vec.get_feature_names(),out_file=f)
输出的dot文件可以使用graphvize软件转为PDF,graphvize安装目录中的bin目录放入到环境变量的Path中
使用如下命令
dot -Tpdf xx.dot -o xx.pdf
运行完结果如下: