Perceptron理论和算法实现

神经网络出现已经有很长一段时间了，第一篇关于神经网络的论文命名为Perceptron，发表于1957年，作者是Frank Rosenblatt，The Perceptron: A Perceiving and Recognizing Automaton。从那时起，各种关于神经网络的理论和工具被研究，开发出来，这些各种不同的实现，我们统称为神经网络，现在神经网络是深度学习领域的方法之一。

Perceptron

Perceptron 算法结构是最简单神经网路算法，用来解决二分类问题。可以说Percptron 是神经网络最初的原型，是神经网络和支持向量机的基础。把分类问题抽象成空间坐标点，那么就可能存在一个超平面分离两种类别，Perceptron旨在找出这个超平面，为了找出超平面引入了基于误分类的损失函数，利用梯度下降得出最优化的损失函数。

Paste_Image.png

X_i是输入信号，W_i是每个输入信号的权重，y是输出信号，f为非线性函数。Σ 为迭代函数，这个表达式的意思就是计算输入数据的和，而 X_i 应用在工程性功能的非线性转换。

Perceptron 的表达式如下

Paste_Image.png

f(*) 是阶梯函数，首先用特征向量的每一个因素乘以权重，然后求和，然后使用阶梯函数再一次求和，输出的结果即为Perceptron的估测结果。在训练期间，我们可以和正确的值比较并且回馈错误数据

把t做为标签数据，公式可以表达如下：

Paste_Image.png

如果标签数据属于类别一，记为C1，t=1，如果属于类别二，记为C2，t = -1.假设输入数据被正确分类，我们就得到

Paste_Image.png

那么这个表达式就可以合成一个，表示成

Paste_Image.png

因此，我们可以通过下面这个最简化的公式增加Perceptron的预测能力

Paste_Image.png

E表示的是误差函数，M是错误分类集，我们可以使用梯度下降或者最速下降来最小化误差函数，使用梯度下降优化算法来找出最小化的方程式，表达式如下:

Paste_Image.png

η 表示学习率，优化算法中常用的参数用来提高学习的效率，k用来表示算法步骤。一般的说，学习率的值越小，算法找出局部极小值的可能性就越大，因为这种模型不可能超过旧值太多；如果这个值太大，那么模型参数就可能无法收敛，因为参数的值波动范围太大。在实际使用中，通常会设置一个比较大的初始值，然后迭代收敛。另一方面已被证明，当数据集是线性分割的，那么学习率数值的大小与算法收敛性无关，通常设置为1.

下面使用java代码做Preceptron最简单形式的实现。

Activation Function 激励函数


/**
 * Created by Mark.wei on 2017/6/20.
 */
public final class ActivationFunction {

    public static int step(double x) {
        if (x >= 0) {
            return 1;
        } else {
            return -1;
        }
    }

}

Gaussian Distribution 用来创建高斯分布数据

/**
 * Created by Mark.wei on 2017/6/20.
 */
public final class GaussianDistribution {

    private final double mean;
    private final double var;

    private final Random random;

    public GaussianDistribution(double mean, double var, Random random) {
        this.mean = mean;
        this.var = var;

        if(random == null){
            this.random = new Random();
        }else{
            this.random = random;
        }
    }


    public double generateVal(){
        double r = 0.0;

        while (r == 0.0){
            r = this.random.nextDouble();
        }

        double c = Math.sqrt(-2.0 * Math.log(r));

        if(this.random.nextDouble() < 0.5){
            return c * Math.sin(2.0 *  Math.PI * this.random.nextDouble() ) * var + mean;
        }else{
            return c * Math.cos(2.0 *  Math.PI * this.random.nextDouble() ) * var + mean;
        }

    }
}

Perceptrons 算法实现主体


**
 * Created by Mark.wei on 2017/6/20.
 */
public class Perceptrons {

    public int dim ;
    public double[] w;

    public Perceptrons(int dim) {
        this.dim = dim;
        this.w = new double[dim];
    }
    public int train(double[] x, int t, double learningRate) {

        int classified = 0;
        double c = 0.;

        // check if the data is classified correctly
        for (int i = 0; i < dim; i++) {
            c += w[i] * x[i] * t;
        }

        // apply steepest descent method if the data is wrongly classified
        if (c > 0) {
            classified = 1;
        } else {
            for (int i = 0; i < dim; i++) {
                w[i] += learningRate * x[i] * t;
            }
        }

        return classified;
    }

    public int predict (double[] x) {

        double preActivation = 0.;

        for (int i = 0; i < dim; i++) {
            preActivation += w[i] * x[i];
        }

        return ActivationFunction.step(preActivation);
    }


    public static void main(String[] args) {

        //
        // Declare (Prepare) variables and constants for perceptrons
        //

        final int train_N = 1000;  // 训练数据的个数
        final int test_N = 200;   // 测试数据的个数
        final int dim = 2;        // 输入数据的维度

        double[][] train_X = new double[train_N][dim];  //输入的训练数据
        int[] train_T = new int[train_N];               // 输出

        double[][] test_X = new double[test_N][dim];  // 输入的测试数据
        int[] test_T = new int[test_N];               // 标签数据
        int[] predicted_T = new int[test_N];          // output data predicted by the model

        final int epochs = 2000;   //
        final double learningRate = 1.;  // 学习率设置为1


        //
        // Create traidimg data and test data for demo.
        //
        // Let traidimg data set for each class follow Normal (Gaussian) distribution here:
        //   class 1 : x1 ~ N( -2.0, 1.0 ), y1 ~ N( +2.0, 1.0 )
        //   class 2 : x2 ~ N( +2.0, 1.0 ), y2 ~ N( -2.0, 1.0 )
        //

        final Random rng = new Random(1234);  // seed random
        GaussianDistribution g1 = new GaussianDistribution(-2.0, 1.0, rng);
        GaussianDistribution g2 = new GaussianDistribution(2.0, 1.0, rng);

        // data set in class 1
        for (int i = 0; i < train_N/2 - 1; i++) {
            train_X[i][0] = g1.generateVal();
            train_X[i][1] = g2.generateVal();
            train_T[i] = 1;
        }
        for (int i = 0; i < test_N/2 - 1; i++) {
            test_X[i][0] = g1.generateVal();
            test_X[i][1] = g2.generateVal();
            test_T[i] = 1;
        }

        // data set in class 2
        for (int i = train_N/2; i < train_N; i++) {
            train_X[i][0] = g2.generateVal();
            train_X[i][1] = g1.generateVal();
            train_T[i] = -1;
        }
        for (int i = test_N/2; i < test_N; i++) {
            test_X[i][0] = g2.generateVal();
            test_X[i][1] = g1.generateVal();
            test_T[i] = -1;
        }


        //
        // Build SingleLayerNeuralNetworks model
        //

        int epoch = 0;  // traidimg epochs

        // construct perceptrons
        Perceptrons classifier = new Perceptrons(dim);

        // train models
        while (true) {
            int classified_ = 0;

            for (int i=0; i < train_N; i++) {
                classified_ += classifier.train(train_X[i], train_T[i], learningRate);
            }

            if (classified_ == train_N) break;  // when all data classified correctly

            epoch++;
            if (epoch > epochs) break;
        }


        // test
        for (int i = 0; i < test_N; i++) {
            predicted_T[i] = classifier.predict(test_X[i]);
        }


        //
        // Evaluate the model
        //

        int[][] confusionMatrix = new int[2][2];
        double accuracy = 0.;
        double precision = 0.;
        double recall = 0.;

        for (int i = 0; i < test_N; i++) {

            if (predicted_T[i] > 0) {
                if (test_T[i] > 0) {
                    accuracy += 1;
                    precision += 1;
                    recall += 1;
                    confusionMatrix[0][0] += 1;
                } else {
                    confusionMatrix[1][0] += 1;
                }
            } else {
                if (test_T[i] > 0) {
                    confusionMatrix[0][1] += 1;
                } else {
                    accuracy += 1;
                    confusionMatrix[1][1] += 1;
                }
            }

        }

        accuracy /= test_N;
        precision /= confusionMatrix[0][0] + confusionMatrix[1][0];
        recall /= confusionMatrix[0][0] + confusionMatrix[0][1];

        System.out.println("----------------------------");
        System.out.println("Perceptrons model evaluation");
        System.out.println("----------------------------");
        System.out.printf("Accuracy:  %.1f %%\n", accuracy * 100);
        System.out.printf("Precision: %.1f %%\n", precision * 100);
        System.out.printf("Recall:    %.1f %%\n", recall * 100);

    }
}

通常，使用三个指标来衡量机器学习的运行效率accuracy，precision,recall. 我们会用混淆矩阵来标识来罗列指标，如图所示

Paste_Image.png

这只是一个最简单的数据分类的例子，这个示例没有包含交叉检测，数据集也都是自己通过高斯函数创建的正太分布数据，没有噪声数据影响我们的计算。当我们处理实际问题时，往往会面对较大的挑战，比如说欠拟合或者过拟合，都将导致结果不正确，影响最终的结论。。

Perceptron理论和算法实现

推荐阅读更多精彩内容