numpy必知必会-第九天

41 vstack 与 hstack。

把两个array分别进行水平合并与垂直合并。
例如：
输入

a = np.arange(10)
b = np.arange(10)

输出
垂直合并

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

水平合并

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

解决办法：

a = np.arange(10)
b = np.arange(10)
new_ab = np.vstack((a, b))
new_ab

输出：

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

new_ab2 = np.hstack((a,b))
new_ab2

输出：

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

42 在array中进行概率采样

例如：
在data array内的元素分别进行采样，其元素的分布概率为0.5， 0.25， 0.25。
data array如下

data  = np.array(['apple', 'banana', 'peach'])

解决办法:

data  = np.array(['apple', 'banana', 'peach'])
species_out = np.random.choice(data, 150, p=[0.5, 0.25, 0.25])
len(species_out[species_out=='apple']), len(species_out[species_out=='banana']), len(species_out[species_out=='peach'])

输出：

(79, 37, 34)

可见三种元素的种类分布基本与[0.5, 0.25, 0.25]一致。

43 把array 中的元素进行排序，并输出第二大的元素。

例如构建array a如下：

a=np.arange(100)

解决办法：

np.sort(a)[-2]

np.sort可以方便的对array进行排序，默认是从小到大。

44 按照指定列，对2D array进行排序。

例如：
构建2D 数据

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

按照第一列对iris array进行排序。
解决办法：

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

print(iris[iris[:,0].argsort()][:20])

输出

[[b'4.3' b'3.0' b'1.1' b'0.1' b'Iris-setosa']
 [b'4.4' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.4' b'3.0' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.5' b'2.3' b'1.3' b'0.3' b'Iris-setosa']
 [b'4.6' b'3.6' b'1.0' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa']
 [b'4.6' b'3.2' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.6' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.0' b'1.4' b'0.1' b'Iris-setosa']
 [b'4.8' b'3.0' b'1.4' b'0.3' b'Iris-setosa']
 [b'4.8' b'3.4' b'1.9' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.4' b'1.6' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.1' b'1.6' b'0.2' b'Iris-setosa']
 [b'4.9' b'2.4' b'3.3' b'1.0' b'Iris-versicolor']
 [b'4.9' b'2.5' b'4.5' b'1.7' b'Iris-virginica']
 [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']
 [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']]

我们只输出前20行。

45 查找一个array中最高频的元素

例如：
构建数据：

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

查找第三列中最高频的元素。

解决办法：

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

vals, counts = np.unique(iris[:, 2], return_counts=True)
print(vals[np.argmax(counts)])

输出

b'1.5'

这意味着在iris中第三列最高频的元素为b'1.5'。

要点解析:np.unique(iris[:, 2], return_counts=True)对iris的第三列进行unique操作，返回vals 与 counts。vals存放的是unique之后的元素（去掉重复项），counts里面存放对应元素出现的频次。

array([b'1.0', b'1.1', b'1.2', b'1.3', b'1.4', b'1.5', b'1.6', b'1.7',
        b'1.9', b'3.0', b'3.3', b'3.5', b'3.6', b'3.7', b'3.8', b'3.9',
        b'4.0', b'4.1', b'4.2', b'4.3', b'4.4', b'4.5', b'4.6', b'4.7',
        b'4.8', b'4.9', b'5.0', b'5.1', b'5.2', b'5.3', b'5.4', b'5.5',
        b'5.6', b'5.7', b'5.8', b'5.9', b'6.0', b'6.1', b'6.3', b'6.4',
        b'6.6', b'6.7', b'6.9'], dtype=object),
 array([ 1,  1,  2,  7, 12, 14,  7,  4,  2,  1,  2,  2,  1,  1,  1,  3,  5,
         3,  4,  2,  4,  8,  3,  5,  4,  5,  4,  8,  2,  2,  2,  3,  6,  3,
         3,  2,  2,  3,  1,  1,  1,  2,  1])

np.argmax(counts)返回counts中的最大值索引，本例中将返回5。则最后vals[5]返回b'1.5'。

多加练习，你就是下一个numpy 高手。

numpy必知必会-第九天

41 vstack 与 hstack。

42 在array中进行概率采样

43 把array 中的元素进行排序，并输出第二大的元素。

44 按照指定列，对2D array进行排序。

45 查找一个array中最高频的元素

推荐阅读更多精彩内容