kNN homework thoughts

enumerate

一般情况下对一个列表或数组既要遍历索引又要遍历元素时,会这样写:

for i in range (0,len(list)):
    print i ,list[i]

但是这种方法有些累赘,使用内置enumerrate函数会有更加直接,优美的做法,先看看enumerate的定义:

for index,text in enumerate(list)):
    print index ,text

magic word %store

You can exchange the variable between different jupyter notebooks.

%store y_test

And you can restore this variable in another jupyter notebook, like:

%store -r y_test

Imutability and mutable variable

I have not found a better way to change a mutable variable to an immutable variable, so I change a list into a tuple, which is an equivalent version of list.

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

X_train_folds = []
y_train_folds = []
################################################################################
# TODO:                                                                        #
# Split up the training data into folds. After splitting, X_train_folds and    #
# y_train_folds should each be lists of length num_folds, where                #
# y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
# Hint: Look up the numpy array_split function.                                #
################################################################################
#X_train_folds is a list with 5 element type of ndarray, and 4 elements needs to be
#concatenated together, but they can't be imutated directly, because X_train_folds
#needs to be used in another loop, so I transfer the type of X_train_folds terminently
# to the tuple, and when we need to mutate it, we assign its value to X_realtrain;
#and transfer the type of X_realtrain to list.
X_train_folds = np.array_split(X_train, num_folds)
X_train_folds = tuple(X_train_folds)
y_train_folds = np.array_split(y_train, num_folds)
y_train_folds = tuple(y_train_folds)
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# A dictionary holding the accuracies for different values of k that we find
# when running cross-validation. After running cross-validation,
# k_to_accuracies[k] should be a list of length num_folds giving the different
# accuracy values that we found when using that value of k.
k_to_accuracies = {}


################################################################################
# TODO:                                                                        #
# Perform k-fold cross validation to find the best value of k. For each        #
# possible value of k, run the k-nearest-neighbor algorithm num_folds times,   #
# where in each case you use all but one of the folds as training data and the #
# last fold as a validation set. Store the accuracies for all fold and all     #
# values of k in the k_to_accuracies dictionary.                               #
################################################################################
for i in range(len(k_choices)):
    accuracy = [0]*num_folds
    for j in range(num_folds):
        X_realtrain = list(X_train_folds)
        y_realtrain = list(y_train_folds)
        X_val = X_realtrain.pop(j)
        y_val = y_realtrain.pop(j)
        #print(type(X_val))
        #print(type(X_realtrain))
        X_realtrain = np.asarray(X_realtrain)
        #print(X_realtrain.shape)
        #The remaining 4 elements of ndarray needs to be concatenated, and
        #array.concatenate can only have two ndarrays, so we use np.vstack
        X_realtrain = np.vstack((X_realtrain[0],X_realtrain[1],X_realtrain[2],X_realtrain[3]))
        distance = classifier.compute_distances_no_loops(X_val)
        y_test_pred = classifier.predict_labels(distance, k = k_choices[i])
        num_correct = np.sum(y_test_pred == y_val) / num_test
        accuracy[j] = float(num_correct)
        j += 1
    k_to_accuracies.update({k_choices[i]:accuracy})
    i = i + 1
pass
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print('k = %d, accuracy = %f' % (k, accuracy)) 

L2 distance 3 ways

two loop version

for i in xrange(num_test):
      for j in xrange(num_train):
        #####################################################################
        # TODO:                                                             #
        # Compute the l2 distance between the ith test point and the jth    #
        # training point, and store the result in dists[i, j]. You should   #
        # not use a loop over dimension.                                    #
        #####################################################################
        dists[i, j] = (((self.X_train[j] - X[i])**2).sum(0))
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
    return dists

one loop version

for i in xrange(num_test):
      #######################################################################
      # TODO:                                                               #
      # Compute the l2 distance between the ith test point and all training #
      # points, and store the result in dists[i, :].                        #
      #######################################################################
      dists[i,:] = np.sum((self.X_train - X[i])**2, axis = 1)
      pass
      #######################################################################
      #                         END OF YOUR CODE                            #
      #######################################################################
    return dists

no loop version

#according to the formula (x-y)**2 = x**2 + y**2 -2*x*y, so u can calculate
# the 2xy(500*5000) and plus the 5000*1 column vector y**2.
dists = -2*np.dot(X, self.X_train.T) + np.sum(self.X_train**2, axis = 1) \
        +np.sum(X**2, axis = 1)[:, np.newaxis]
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,132评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,802评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,566评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,858评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,867评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,695评论 1 282
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,064评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,705评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,915评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,677评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,796评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,432评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,041评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,992评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,223评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,185评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,535评论 2 343

推荐阅读更多精彩内容