决策树基本概念及算法优缺点

1. 什么是决策树

分类决策树模型是一种描述对实例进行分类的树形结构. 决策树由结点和有向边组成. 结点有两种类型: 内部结点和叶节点. 内部节点表示一个特征或属性, 叶节点表示一个类.
决策树(Decision Tree),又称为判定树, 是一种以树结构(包括二叉树和多叉树)形式表达的预测分析模型.

  • 通过把实例从根节点排列到某个叶子节点来分类实例
  • 叶子节点为实例所属的分类
  • 树上每个节点说明了对实例的某个属性的测试, 节点的每个后继分支对应于该属性的一个可能值

2.决策树结构

决策树结构.png

3.决策树种类

分类树--对离散变量做决策树

回归树--对连续变量做决策树

4.决策树算法(贪心算法)

  • 有监督的学习

  • 非参数学习算法

  • 自顶向下递归方式构造决策树

  • 在每一步选择中都采取在当前状态下最好/优的选择

    决策树学习的算法通常是一个递归地选择最优特征, 并根据该特征对训练数据进行分割, 使得各个子数据集有一个最好的分类的过程.
    在决策树算法中,ID3基于信息增益作为属性选择的度量, C4.5基于信息增益作为属性选择的度量, CART基于基尼指数作为属性选择的度量

5.决策树学习过程

  • 特征选择
  • 决策树生成: 递归结构, 对应于模型的局部最优
  • 决策树剪枝: 缩小树结构规模, 缓解过拟合, 对应于模型的全局选择

6.决策树优缺点

优点:
(1)速度快: 计算量相对较小, 且容易转化成分类规则. 只要沿着树根向下一直走到叶, 沿途的分裂条件就能够唯一确定一条分类的谓词.
(2)准确性高: 挖掘出来的分类规则准确性高, 便于理解, 决策树可以清晰的显示哪些字段比较重要, 即可以生成可以理解的规则.
(3)可以处理连续和种类字段
(4)不需要任何领域知识和参数假设
(5)适合高维数据
缺点:
(1)对于各类别样本数量不一致的数据, 信息增益偏向于那些更多数值的特征
(2)容易过拟合
(3)忽略属性之间的相关性

5.2 决策树数学知识

1.信息论:

若一事假有k种结果, 对应概率为P_i, 则此事件发生后所得到的信息量I为:
\begin{split} I &= -(p_1*log_2(p_1) + p_2*log_2(p_2) + ... + p_k*log_2(p_k)) \\ &= - \sum_{i=1}^k{p_ilog_2p_i} \end{split}

2.熵:

给定包含关于某个目标概念的正反样例的样例集S, 那么S相对这个布尔型分类的熵为:
Entropy(S) \equiv -(p_{\oplus} * log_2p_{\oplus}) - (p_{\ominus} * log_2p_{\ominus})
其中p_{\oplus}代表正样例, p_{\ominus}代表反样例

3.条件熵:

假设随机变量(X,Y), 其联合分布概率为P(X=xi,Y=yi)=Pij, i=1,2,...,n;j=1,2,..,m
则条件熵H(Y|X)表示在已知随机变量X的条件下随机变量Y的不确定性, 其定义为X在给定条件下Y的条件概率分布的熵对X的数学期望
H(X|Y) = \sum_{i-1}^n p_iH(Y|X=x_i)

5.3 决策树算法Hunt

在Hunt算法中, 通过递归的方式建立决策树.

  1. 如果数据集D种所有的数据都属于一个类, 那么将该节点标记为节点.
  2. 如果数据集D中包含属于多个类的训练数据, 那么选择一个属性将训练数据划分为较小的子集, 对于测试条件的每个输出, 创建一个子节点, 并根据测试结果将D种的记录分布到子节点中, 然后对每一个子节点重复1,2过程, 对子节点的子节点依然是递归地调用该算法, 直至最后停止.

5.4 决策树算法ID3

1.分类系统信息熵

H(C) = \sum_{i=1}^n P(C_i)*log_2P(C_i)

2.条件熵

3.信息增益Gain(S, A) 定义

4.属性选择度量

使用信息增益, 选择最高信息增益的属性作为当前节点的测试属性

5.算法不足

  • 使用ID3算法构建决策树时, 若出现各属性值取值数分布偏差大的情况, 分类精度会大打折扣
  • ID3算法本身并未给出处理连续数据的方法
  • ID3算法不能处理带有缺失值的数据集, 故在算法挖掘之前需要对数据集中的缺失值进行预处理
  • ID3算法只有树的生成, 所以该算法生成的树容易产生过拟合

6.算法流程

ID3(Examples,Target_attribute,Attributes)

Examples即训练样例集. Target_attribute是这棵树要预测的目标属性. Attributes是除目标属性外供学习到的决策树测试的属性列表. 返回能正确分类给定Examples的决策树.

  • 创建树的Root结点
  • 如果Examples都为正, 那么返回label=+的单节点树Root
  • 如果Examples都为负, 那么返回label=-的单节点树Root
  • 如果Attributes为空, 那么返回单节点树Root, label=Examples中最普通的Target_attribute
  • 否则
    • A ← Attributes中分类Examples能力最好*的属性

7.算法Python实现

  1. Python实现熵的计算
from math import log
def calcShanNonEnt(dataSet):
    numEntries = len(dataSet)
    labelCounts = {}
    for featVec in dataSet:
        currentLabel = featVec[-1]
        if currentLabel not in labelCounts.keys():
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1
    shannonEnt = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key])/numEntries
        shannonEnt -= prob*log(prob,2)
    return shannonEnt

# example
dataset = [[1],[2],[3],[3],]
sne = calcShanNonEnt(dataset)

print(sne)
  1. Sklearn.tree参数介绍及使用建议

class sklearn.tree.DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)

# Examples
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
cross_val_score(clf, iris.data, iris.target, cv=10)
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(iris.data,iris.target,test_size=0.3) 

res = clf.fit(X_train,y_train)
pre = clf.predict(X_test)
sco = clf.score(X_test, y_test)

print(y_test)
print(pre)
print(sco)
clf.apply(X_train)
clf.apply(X_test)

clf.decision_path(X_train)

type(clf.decision_path(X_train))

X_train.shape

clf.feature_importances_

from sklearn.tree import DecisionTreeClassifier


clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

clf.feature_importances_

clf.get_params()

clf.predict_log_proba(X_test)

clf.predict_proba(X_test)

DecisionTreeClassifier实例

限制决策树层数为4的DecisionTreeClassifier实例

from itertools import product

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier

# 使用iris数据
iris = datasets.load_iris()
X = iris.data[:, [0, 2]]
y = iris.target

# 训练模型, 限制树的最大深度为4
clf = DecisionTreeClassifier(max_depth=4)
clf.fit(X,y)

# Plot
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .1),
                    np.arange(y_min, y_max, .1))

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=.4)
plt.scatter(X[:, 0], X[:, 1], c=y, alpha=.8)
plt.show()
output_12_0.png

Plot the decision surfaces of ensembles of trees on the iris dataset

This plot compares the decision surfaces learned by a dcision tree classifier(first column), by a random forest classifier(second column), by an extra-trees classifier(third column) and by an AdaBoost classifier(fouth column).

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

from sklearn import clone
from sklearn.datasets import load_iris
from sklearn.ensemble import (RandomForestClassifier, ExtraTreesClassifier,
                             AdaBoostClassifier)
from sklearn.tree import DecisionTreeClassifier

# Parameters
n_classes = 3
n_estimators = 30
cmap = plt.cm.RdYlBu
plot_step = 0.02
plot_step_coarser = 0.5
RANDOM_SEED = 13


# Load data
iris = load_iris()

plot_idx = 1

models = [DecisionTreeClassifier(max_depth=None),
         RandomForestClassifier(n_estimators=n_estimators),
         ExtraTreesClassifier(n_estimators=n_estimators),
         AdaBoostClassifier(DecisionTreeClassifier(max_depth=3), n_estimators=n_estimators)]

for pair in ([0,1], [0,2], [2,3]):
    for model in models:
        # print(pair, model)
        # only take the two correspoding features
        X = iris.data[:, pair]
        y = iris.target
        
        # Shuffle
        idx = np.arange(X.shape[0])
        np.random.seed(RANDOM_SEED)
        np.random.shuffle(idx)
        X = X[idx]
        y = y[idx]
        
        # Standardize
        mean = X.mean(axis=0)
        std = X.std(axis=0)
        X = (X - mean) / std
        
        # Train
        clf = clone(model)
        clf = model.fit(X, y)
        
        scores = clf.score(X, y)
        # Create a title for each column and the console by using str() and
        # slicing away useless parts of the string
        model_title = str(type(model)).split(".")[-1][:-2][:-len('Classifier')]
        model_details = model_title
        if hasattr(model, "estimators_"):
            model_details +=  " with {} estimators".format(len(model.estimators_))
        print(model_details + " with features", pair,
             "has a score of", scores)
        
        plt.subplot(3, 4, plot_idx)
        if plot_idx <= len(models):
            # Add a title at the top of eeach column
            plt.title(model_title)
            
        # Now plot the decision boundary using a fine mesh as input to a filled contour plot
        x_min, x_max = X[:,0].min() - 1, X[:,0].max() + 1
        y_min, y_max = X[:,1].min() - 1, X[:,1].max() + 1
        xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                            np.arange(y_min, y_max, plot_step))
        
        # Plot either a single DecisionTreeClassifier or alpha blend the
        # decision surfaces of the ensemble of classifiers
        if isinstance(model, DecisionTreeClassifier):
            Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
            Z = Z.reshape(xx.shape)
            cs = plt.contourf(xx, yy, Z, cmap=cmap)
        else:
            # Choose alpha blend level with respect to the number of estimators
            # that are in use (nothing that AdaBoost can use fewer estimtors
            # than its maximum if it achieves a good enough fit early on)
            estimator_alpha = 1.0 / len(model.estimators_)
            print(len(model.estimators_))
            for tree in model.estimators_:
                Z = tree.predict(np.c_[xx.ravel(), yy.ravel()])
                Z = Z.reshape(xx.shape)
                cs = plt.contourf(xx, yy, Z, alpha=estimator_alpha, cmap=cmap)
                
        # Build a coarser grid to plot a set of ensemble classifications
        # to show how these are different to what we see in the decision
        # surfaces. These points are regularly space and do not have a
        # black outline
        xx_coarser, yy_coarser = np.meshgrid(
            np.arange(x_min, x_max, plot_step_coarser),
            np.arange(y_min, y_max, plot_step_coarser))
        Z_points_coarser = model.predict(np.c_[xx_coarser.ravel(),
                                              yy_coarser.ravel()]
                                        ).reshape(xx_coarser.shape)
        cs_points = plt.scatter(xx_coarser, yy_coarser, s=15,
                               c=Z_points_coarser, cmap=cmap,
                               edgecolors="none")
        plt.scatter(X[:, 0], X[:, 1], c=y,
                cmap=ListedColormap(['r', 'y', 'b']),
                edgecolor='k', s=20)

        plot_idx += 1
            

Output:

Automatically created module for IPython interactive environment
DecisionTree with features [0, 1] has a score of 0.9266666666666666
RandomForest with 30 estimators with features [0, 1] has a score of 0.9266666666666666
30
ExtraTrees with 30 estimators with features [0, 1] has a score of 0.9266666666666666
30
AdaBoost with 30 estimators with features [0, 1] has a score of 0.84
30
DecisionTree with features [0, 2] has a score of 0.9933333333333333
RandomForest with 30 estimators with features [0, 2] has a score of 0.9933333333333333
30
ExtraTrees with 30 estimators with features [0, 2] has a score of 0.9933333333333333
30
AdaBoost with 30 estimators with features [0, 2] has a score of 0.9933333333333333
30
DecisionTree with features [2, 3] has a score of 0.9933333333333333
RandomForest with 30 estimators with features [2, 3] has a score of 0.9933333333333333
30
ExtraTrees with 30 estimators with features [2, 3] has a score of 0.9933333333333333
30
AdaBoost with 30 estimators with features [2, 3] has a score of 0.9933333333333333
30
output_14_1.png

Classifier comparison

A comparison of a several classifiers in scikit-learn on synthetic datasets.
The point of this examples is to illustrate the nature of decision boundaries of different classifiers.

Particularly in high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as naive Bayes and linear SVMs might lead to better generalization than is achieved by other classifiers.

print(__doc__)

# Code source: Gaël Varoquaux
#              Andreas Müller
# Mmodified for documentation by Jaques Grobler
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
# classifier
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

h = .02 # step size in the mesh

names = ['Nearest Neighbors', 'Linear SVM', 'RBF SVM', 'Gaussian Process', 'Decision Tree', 'Random Forest', 'Neural Net', 'AdaBoost',' Naive Bayes','QDA']

classifiers = [
    KNeighborsClassifier(3),
    SVC(kernel="linear",C=0.025), # C is pantly parameter
    SVC(gamma=2, C=1), # kernel: rbf(default), gamma: Kernel coefficient
    GaussianProcessClassifier(1.0 * RBF(1.0)),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    MLPClassifier(alpha=1), # MultiLayer Perceptron
    AdaBoostClassifier(),
    GaussianNB(), # Gaussian Naive Bayes
    QuadraticDiscriminantAnalysis()    
]

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, 
                          random_state=1, n_clusters_per_class=1)
rng = np.random.RandomState(2)
X += 2 * rng.uniform(size=X.shape)
linearly_separable = (X, y)

datasets = [make_moons(noise=0.3, random_state=0),
           make_circles(noise=0.2, factor=0.5, random_state=1),
            linearly_separable
           ]

figure = plt.figure(figsize=(27,9))

i = 1
# iterate over datasets
for ds_cnt, ds in enumerate(datasets):
    # preprocess datasset, split into training and test part
    X, y = ds
    X = StandardScaler().fit_transform(X)
    X_train, X_test, y_train, y_test = \
        train_test_split(X,y,test_size=.4,random_state=42)
    
    x_min, x_max = X[:,0].min() - .5, X[:,0].max() + .5
    y_min, y_max = X[:,1].min() - .5, X[:,0].max() + .5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                        np.arange(y_min, y_max, h))
    
    # just plot the dataset first
    cm = plt.cm.RdBu
    cm_bright = ListedColormap(['#FF0000', '#0000FF'])
    ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
    if ds_cnt == 0:
        ax.set_title("Input data")
    # Plot the training points
    ax.scatter(X_train[:,0], X_train[:,1], c=y_train, cmap=cm_bright,
              edgecolors='k')
    # and testing points
    ax.scatter(X_test[:,0], X_test[:,1], c=y_test, cmap=cm_bright, alpha=0.6,
              edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xticks(())
    ax.set_yticks(())
    i += 1
    
    # iterate over classifiers
    for name, clf in zip(names, classifiers):
        ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
        clf.fit(X_train, y_train)
        score = clf.score(X_test, y_test)
        
        # plot the decision boundary, For that, we will assign a color to each
        # point in the mesh [x_min, x_max]*[y_min, y_max].
        if hasattr(clf, 'decision_function'):
            Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
        else:
            Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
        
        # Put the result into a color plot
        Z = Z.reshape(xx.shape)
        ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)
        
        # plot also the training points
        ax.scatter(X_train[:,0], X_train[:,1], c=y_train, cmap=cm_bright,
                  edgecolors='k')
        # and testing points
        ax.scatter(X_test[:,0], X_test[:,1], c=y_test, cmap=cm_bright,
                  edgecolors='k', alpha=.6)
        
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())
        if ds_cnt ==  0:
            ax.set_title(name)
        ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),
               size=15, horizontalalignment='right')
        i += 1
        
plt.tight_layout()
plt.show()
Classifier comparison.png

Two-class AdaBoost

This example fits an AdaBoost decisin stump on a non-linearly separable classification dataset composed of two "Gaussian quantiles" clusters and plots the decision boundary and decision scores.

print(__doc__)

# Author: Noel Dawe <noel.dawe@gmail.com>
#
# License; BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_gaussian_quantiles


# Construct dataset
X1, y1 = make_gaussian_quantiles(cov=2.,
                                n_samples=200, n_features=2,
                                n_classes=2, random_state=1)
X2, y2 = make_gaussian_quantiles(mean=(3,3), cov=1.5,
                                n_samples=300, n_features=2,
                                n_classes=2, random_state=1)
X = np.concatenate((X1, X2))
y = np.concatenate((y1, -y2 + 1))


# Create and fit an AdaBoosted decision tree
bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),
                        algorithm='SAMME',
                        n_estimators=200)
bdt.fit(X, y)


plot_colors = 'br'
plot_step = .02
class_names = 'AB'

plt.figure(figsize=(10,5))

# plot the decision boundaries
plt.subplot(121)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                    np.arange(y_min, y_max, plot_step))

Z = bdt.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
plt.axis("tight")

# Plot the training points
for i, n, c in zip(range(2), class_names, plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1],
               c=c, cmap=plt.cm.Paired, 
               s=20, edgecolor='k',
               label=("Class %s" % n))
    
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.legend(loc='upper right')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Decision Boundary')

# Plot the two-class decision scors
twoclass_output = bdt.decision_function(X)
plot_range = (twoclass_output.min(), twoclass_output.max())
plt.subplot(122)
for i, n, c in zip(range(2), class_names, plot_colors):
    plt.hist(twoclass_output[y == i],
            bins=10,
            range=plot_range,
            facecolor=c,
            label=('Class %s' % n),
            alpha=.5,
            edgecolor='k')
x1, x2, y1, y2 = plt.axis()
plt.axis((x1, x2, y1, y2 * 1.2))
plt.legend(loc='upper right')
plt.ylabel('Samples')
plt.xlabel('Score')
plt.title('Decision Scores')

plt.tight_layout()
plt.subplots_adjust(wspace=0.35)
plt.show()

Output:

Automatically created module for IPython interactive environment
Two-class AdaBoost.png
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,293评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,604评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,958评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,729评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,719评论 5 366
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,630评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,000评论 3 397
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,665评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,909评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,646评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,726评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,400评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,986评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,959评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,197评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 44,996评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,481评论 2 342

推荐阅读更多精彩内容

  • 决策树理论在决策树理论中,有这样一句话,“用较少的东西,照样可以做很好的事情。越是小的决策树,越优于大的决策树”。...
    制杖灶灶阅读 5,832评论 0 25
  • 基本概念 决策树(decision tree)是一种常见的机器学习方法,它是基于树结构来进行决策的,这恰是人类在面...
    司马安安阅读 1,481评论 0 3
  • 分类与预测 餐饮企业经常会碰到下面的问题: 如何预测未来一段时间内,哪些顾客会流失,哪些顾客最有可能成为VIP客户...
    Skye_kh阅读 6,288评论 3 15
  • 记不清这种感觉已经是多少天。终于还是躺在了病床上。 看着吊瓶里的液体就这样注入我的身体。发生了些什么似乎记不得了,...
    范大彤阅读 260评论 0 0
  • 椿芽嫩采留君驻, 菜饼清香韭味浮, 宋明槐骨半干枯。 老媪妇,新薅草英蒲
    明O亮阅读 225评论 2 4