来源:coursera 斯坦福 吴恩达 机器学习
分类
要尝试分类,一种方法是使用线性回归并将大于0.5的所有预测映射为1,将所有小于0.5的预测映射为0.但是,此方法不能很好地工作,因为分类实际上不是线性函数。
分类问题就像回归问题一样,除了我们现在想要预测的值只占用少量离散值。现在,我们将关注二进制分类问题,其中y只能接受两个值0和1.(我们在这里所说的大多数也将推广到多类情况)
例如,如果我们尝试为电子邮件构建垃圾邮件分类器,那么x(i) 可能是一封电子邮件的某些功能,如果它是一封垃圾邮件则y可能为1 ,否则为0。因此,y∈{0,1}。 0也称为负类,1表示正类,它们有时也用符号“ - ”和“+”表示。给定x(i),相应的y (i) 也被称为训练示例的标签。
Classification
To attempt classification, one method is to use linear regression and map all predictions greater than 0.5 as a 1 and all less than 0.5 as a 0. However, this method doesn't work well because classification is not actually a linear function.
The classification problem is just like the regression problem, except that the values we now want to predict take on only a small number of discrete values. For now, we will focus on the binary classification problem in which y can take on only two values, 0 and 1. (Most of what we say here will also generalize to the multiple-class case.)
For instance, if we are trying to build a spam classifier for email, then x^{(i)}x(i) may be some features of a piece of email, and y may be 1 if it is a piece of spam mail, and 0 otherwise. Hence, y∈{0,1}. 0 is also called the negative class, and 1 the positive class, and they are sometimes also denoted by the symbols “-” and “+.” Given x^{(i)}x(i), the corresponding y^{(i)}y(i) is also called the label for the training example.