Decision Tree Learning

Here, we will discuss two decision tree algorithms, mainly including ID3 and C4.5 algorithm. Firstly, let's define decision tree learning.

At the begining

Decision Tree Learing is the construction of a decision tree from class-labeled training tuples.
A decision tree is a flow-chart like strcuture, where each internal node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf node holds a class label. The topmost node in a tree is the root node.
--- From wiki

ID3 algorithm

Next, we now can take a look at the ID3 algorithm. ID3(Iterative Dichotomiser 3) ,which based on Ockham's Razor, is an algorithm invented by Ross Quinlan.

Ockham's Razor is a problem-solving principle attributed to William of Ockham, and was later used in many aspects of science research. To conclude this theory in short, simple theories are preferable to more complex ones becuase they are more testable. When we mean testable, it means how well our theories are performing on other new datasets.

And the intuition behind this algorithm is this Ockham's Razor, so that we always prefer the smaller decision trees over the larger ones. Also, in Information Theory, the less expected information the dataset has, the larger Information Gain is.

ID3 algorithm's core idea is to use Information Gain to determine whether a label should be choose, and choose the one that maximize the Information Gain after splitting the label.

Now, let's take a look at the example that follows.

weather_example.png

We can tell that there is a total of 14 examples, which contains 9 positive examples and 5 negative ones. And we can calculate the entropy of current information

Suppose we look at the label outlook to classify the output,

Then we can see that the dataset is divided into three parts, and each branch's Information Entropy is

And the Information Entropy after splitting can be calculated as

Then the final Information Gain after splitting with the label T is

Before splitting at each non-leaf node of the decision tree, we should first calculate the Information Gain each label may bring, and then choose the label that can maximize the Information Gain. As the larger Information Gain there is, the greater it can classify the data sample. This top-down greedy criterion is the core idea of ID3.

Summary of ID3

  1. Calculate the entropy of every attribute using the data set S
  2. Split the set S into subsets using the attribute for which the resulting entropy is minimum (or, information gain is maximum)
  3. Make a decision tree node containing thta attribute
  4. Recurse on subsets using remaining attributes.
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 如此浑噩不知前途,怎能幻想明日空气。
    moodian阅读 289评论 0 0
  • 分别的初夏,总认为还会相见,头也不回,清高的留下背影,可是竟再也没见你…… 电影院寻不到了,滄湾变了容颜,昨天跑去...
    长跑人阅读 548评论 0 1
  • 那一天看到这一段话:“你还记得你最快乐的时刻吗? 那天,你和喜欢的人走在路上,她牵着你的手。你清楚记得那天的风是凉...
    小小小小米77阅读 454评论 0 0
  • 废话,可以跳过 在这个色彩斑斓的时代,各种app、各式各样的特效充斥着我们的眼球。在我看来,炫酷的特效在很大程度上...
    tzc123阅读 4,149评论 1 5