Digit Recognizer
Data Introduction--数据说明
The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.
数据文件train.csv及test.csv包含从0到9几个手写数字的灰度图像。
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
每张图片长宽各28像素,共784个像素点。每个像素点均关联一个像素值,像素值表明其明暗程度,数值越高标明像素点越暗。像素值取值区间[0,255]中的整数。
The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.
训练数据(train.csv)包含785列。第一列为“标签”,代表实际手写数字.其余列包含所有该图片的像素值数据。
Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).*
训练集中的每列名字为“pixelx”,其中字母“x”代表一个位于区间[0,783]的整数。使用如下方式定位一个像素在图像中的位置,将整数x分解为x=i*28+j,其中i和j取区间[0,27]的整数,则“pixelx”像素位于28X28矩阵的i行和j列(索引起始于0)。
For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below.
例如,“pixel31”表明该值为“ascii-diagram”从上到下第二行和从左到右第四列的像素值。
Visually, if we omit the "pixel" prefix, the pixels make up the image like this:
具体的,如果我们忽略“pixel”前缀,则像素值如下图一样组成一个图片:
000 001 002 003 ... 026 027
028 029 030 031 ... 054 055
056 057 058 059 ... 082 083
………
728 729 730 731 ... 754 755
756 757 758 759 ... 782 783
The test data set, (test.csv), is the same as the training set, except that it does not contain the "label" column.
测试数据集(text.csv)除了不包含“标签”列二外与训练数据集具有同样的格式。
Your submission file should be in the following format: For each of the 28000 images in the test set, output a single line containing the ImageId and the digit you predict. For example, if you predict that the first image is of a 3, the second image is of a 7, and the third image is of a 8, then your submission file would look like:
你的提交文件需要满足如下的格式:对于28000张测试图片的每张输出一行,该行包括图片ID“ImageId”和预测数字。例如,如果预测的第一幅图数字为3,第二幅图数字为7,第三幅图数字为8,则提交文件如下所示:
ImageId,Label
1,3
2,7
3,8
(27997 more lines)
The evaluation metric for this contest is the categorization accuracy, or the proportion of test images that are correctly classified. For example, a categorization accuracy of 0.97 indicates that you have correctly classified all but 3% of the images.
此比赛的评价标准为分类的准确性,或者说是正确分类的图片的比例。例如,分类准确度为0.97表明成功分类了除了3%的图片以外的所有图片。