Numpy概述
NumPy(Numerical Python的简称)是Python数值计算最重要的基础包。大多数提供科学计算的包都是用NumPy的数组作为构建基础。
Why NumPy?
- 一个强大的N维数组对象ndarray,具有矢量算术运算和复杂广播能力的快速且节省空间的多维数组
- 用于集成由C、C++、Fortran等语言类库的C语言 API
- 线性代数、随机数生成以及傅里叶变换功能。
- 用于对整组数据进行快速运算的标准数学函数(无需编写循环),支持大量的数据运算
- 是众多机器学习框架的基础库
Tips:Python的面向数组计算可以追溯到1995年,Jim Hugunin创建了Numeric库。接下来的10年,许多科学编程社区纷纷开始使用Python的数组编程,但是进入21世纪,库的生态系统变得碎片化了。2005年,Travis Oliphant从Numeric和Numarray项目整了出了NumPy项目,进而所有社区都集合到了这个框架下。
NumPy之于数值计算特别重要的原因之一,是因为它可以高效处理大数组的数据。这是因为:
- NumPy是在一个连续的内存块中存储数据,独立于其他Python内置对象。NumPy的C语言编写的算法库可以操作内存,而不必进行类型检查或其它前期工作。比起Python的内置序列,NumPy数组使用的内存更少。
- NumPy可以在整个数组上执行复杂的计算,而不需要Python的for循环。
numpy.array
基础
import numpy
numpy.__version__
'1.12.1'
import numpy as np
np.__version__
'1.12.1'
Python List的特点
L = [i for i in range(10)]
L
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
L[5]
5
L[5] = 100
L
[0, 1, 2, 3, 4, 100, 6, 7, 8, 9]
L[5] = "Machine Learning"
L
[0, 1, 2, 3, 4, 'Machine Learning', 6, 7, 8, 9]
Python的List不要求存储同样的类型,带来效率问题。
import array
arr = array.array('i', [i for i in range(10)])
arr
array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr[5]
5
arr[5] = 100
arr
array('i', [0, 1, 2, 3, 4, 100, 6, 7, 8, 9])
arr[5] = "Machine Learning"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-e74bffddd7b6> in <module>()
----> 1 arr[5] = "Machine Learning"
TypeError: an integer is required (got type str)
arr[5] = 5.0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-f30bba6fbd5a> in <module>()
----> 1 arr[5] = 5.0
TypeError: integer argument expected, got float
array
的缺点是没有将数据当做向量或者矩阵,不支持基本运算。
numpy.array
nparr = np.array([i for i in range(10)])
nparr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
nparr[5] = 100
nparr
array([ 0, 1, 2, 3, 4, 100, 6, 7, 8, 9])
nparr[5] = "Machine Learning"
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-df6cd071861b> in <module>()
----> 1 nparr[5] = "Machine Learning"
ValueError: invalid literal for int() with base 10: 'Machine Learning'
nparr.dtype
dtype('int64')
nparr[5] = 5.0
nparr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
nparr.dtype
dtype('int64')
nparr[5] = 3.14
nparr
array([0, 1, 2, 3, 4, 3, 6, 7, 8, 9])
nparr2 = np.array([1, 2, 3.0])
nparr2.dtype
dtype('float64')
创建 numpy.array
import numpy as np
numpy.array
nparr = np.array([i for i in range(10)])
nparr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
其他创建 numpy.array
的方法
zeros
np.zeros(10)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
np.zeros(10, dtype=float)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
np.zeros((3, 5))
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
np.zeros(shape=(3, 5), dtype=int)
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
ones
np.ones(10)
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
np.ones((3, 5))
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
full
np.full((3, 5), 666)
array([[666, 666, 666, 666, 666],
[666, 666, 666, 666, 666],
[666, 666, 666, 666, 666]])
np.full(fill_value=666, shape=(3, 5))
array([[666, 666, 666, 666, 666],
[666, 666, 666, 666, 666],
[666, 666, 666, 666, 666]])
arange
[i for i in range(0, 20, 2)]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
np.arange(0, 20, 2)
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
[i for i in range(0, 1, 0.2)]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-43-d0579096bf02> in <module>()
----> 1 [i for i in range(0, 1, 0.2)]
TypeError: 'float' object cannot be interpreted as an integer
np.arange(0, 1, 0.2)
array([ 0. , 0.2, 0.4, 0.6, 0.8])
[i for i in range(0, 10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
np.arange(0, 10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
[i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
linspace
np.linspace(0, 20, 10)
array([ 0. , 2.22222222, 4.44444444, 6.66666667,
8.88888889, 11.11111111, 13.33333333, 15.55555556,
17.77777778, 20. ])
np.linspace(0, 20, 11)
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.])
np.linspace(0, 1, 5)
array([ 0. , 0.25, 0.5 , 0.75, 1. ])
random
randint
np.random.randint(0, 10) # [0, 10)之间的随机数
5
np.random.randint(0, 10, 10)
array([2, 6, 1, 8, 1, 6, 8, 0, 1, 4])
np.random.randint(0, 1, 10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
np.random.randint(0, 10, size=10)
array([3, 4, 9, 9, 5, 2, 3, 3, 2, 1])
np.random.randint(0, 10, size=(3,5))
array([[1, 5, 3, 8, 5],
[2, 7, 9, 6, 0],
[0, 9, 9, 9, 7]])
np.random.randint(10, size=(3,5))
array([[4, 8, 3, 7, 2],
[9, 9, 2, 4, 4],
[1, 5, 1, 7, 7]])
seed
np.random.seed(666)
np.random.randint(0, 10, size=(3, 5))
array([[2, 6, 9, 4, 3],
[1, 0, 8, 7, 5],
[2, 5, 5, 4, 8]])
np.random.seed(666)
np.random.randint(0, 10, size=(3,5))
array([[2, 6, 9, 4, 3],
[1, 0, 8, 7, 5],
[2, 5, 5, 4, 8]])
random
np.random.random()
0.7315955468480113
np.random.random((3,5))
array([[ 0.8578588 , 0.76741234, 0.95323137, 0.29097383, 0.84778197],
[ 0.3497619 , 0.92389692, 0.29489453, 0.52438061, 0.94253896],
[ 0.07473949, 0.27646251, 0.4675855 , 0.31581532, 0.39016259]])
normal
np.random.normal()
0.9047266176428719
np.random.normal(10, 100)
-72.62832650185376
np.random.normal(0, 1, (3, 5))
array([[ 0.82101369, 0.36712592, 1.65399586, 0.13946473, -1.21715355],
[-0.99494737, -1.56448586, -1.62879004, 1.23174866, -0.91360034],
[-0.27084407, 1.42024914, -0.98226439, 0.80976498, 1.85205227]])
np.random.<TAB>
查看random中的更多方法
np.random?
np.random.normal?
help(np.random.normal)
Help on built-in function normal:
normal(...) method of mtrand.RandomState instance
normal(loc=0.0, scale=1.0, size=None)
Draw random samples from a normal (Gaussian) distribution.
The probability density function of the normal distribution, first
derived by De Moivre and 200 years later by both Gauss and Laplace
independently [2]_, is often called the bell curve because of
its characteristic shape (see the example below).
The normal distributions occurs often in nature. For example, it
describes the commonly occurring distribution of samples influenced
by a large number of tiny, random disturbances, each with its own
unique distribution [2]_.
Parameters
----------
loc : float or array_like of floats
Mean ("centre") of the distribution.
scale : float or array_like of floats
Standard deviation (spread or "width") of the distribution.
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., ``(m, n, k)``, then
``m * n * k`` samples are drawn. If size is ``None`` (default),
a single value is returned if ``loc`` and ``scale`` are both scalars.
Otherwise, ``np.broadcast(loc, scale).size`` samples are drawn.
Returns
-------
out : ndarray or scalar
Drawn samples from the parameterized normal distribution.
See Also
--------
scipy.stats.norm : probability density function, distribution or
cumulative density function, etc.
Notes
-----
The probability density for the Gaussian distribution is
.. math:: p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }}
e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },
where :math:`\mu` is the mean and :math:`\sigma` the standard
deviation. The square of the standard deviation, :math:`\sigma^2`,
is called the variance.
The function has its peak at the mean, and its "spread" increases with
the standard deviation (the function reaches 0.607 times its maximum at
:math:`x + \sigma` and :math:`x - \sigma` [2]_). This implies that
`numpy.random.normal` is more likely to return samples lying close to
the mean, rather than those far away.
References
----------
.. [1] Wikipedia, "Normal distribution",
http://en.wikipedia.org/wiki/Normal_distribution
.. [2] P. R. Peebles Jr., "Central Limit Theorem" in "Probability,
Random Variables and Random Signal Principles", 4th ed., 2001,
pp. 51, 51, 125.
Examples
--------
Draw samples from the distribution:
>>> mu, sigma = 0, 0.1 # mean and standard deviation
>>> s = np.random.normal(mu, sigma, 1000)
Verify the mean and the variance:
>>> abs(mu - np.mean(s)) < 0.01
True
>>> abs(sigma - np.std(s, ddof=1)) < 0.01
True
Display the histogram of the samples, along with
the probability density function:
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s, 30, normed=True)
>>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
... np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
... linewidth=2, color='r')
>>> plt.show()
numpy.array
基本操作
import numpy as np
np.random.seed(0)
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
X = np.arange(15).reshape((3, 5))
X
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
numpy.array
的基本属性
x.ndim
1
X.ndim
2
x.shape
(10,)
X.shape
(3, 5)
x.size
10
X.size
15
numpy.array
的数据访问
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[0]
0
x[-1]
9
X
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
X[0][0] # 不建议!
0
X[0, 0]
0
X[0, -1]
4
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[0:5]
array([0, 1, 2, 3, 4])
x[:5]
array([0, 1, 2, 3, 4])
x[5:]
array([5, 6, 7, 8, 9])
x[4:7]
array([4, 5, 6])
x[::2]
array([0, 2, 4, 6, 8])
x[1::2]
array([1, 3, 5, 7, 9])
x[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
X
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
X[:2, :3]
array([[0, 1, 2],
[5, 6, 7]])
X[:2][:3] # 结果不一样,在numpy中使用","做多维索引
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
X[:2, ::2]
array([[0, 2, 4],
[5, 7, 9]])
X[::-1, ::-1]
array([[14, 13, 12, 11, 10],
[ 9, 8, 7, 6, 5],
[ 4, 3, 2, 1, 0]])
X[0, :]
array([0, 1, 2, 3, 4])
X[:, 0]
array([ 0, 5, 10])
Subarray of numpy.array
subX = X[:2, :3]
subX
array([[0, 1, 2],
[5, 6, 7]])
subX[0, 0] = 100
subX
array([[100, 1, 2],
[ 5, 6, 7]])
X
array([[100, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14]])
X[0, 0] = 0
X
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
subX
array([[0, 1, 2],
[5, 6, 7]])
subX = X[:2, :3].copy()
subX[0, 0] = 100
subX
array([[100, 1, 2],
[ 5, 6, 7]])
X
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
Reshape
x.shape
(10,)
x.ndim
1
x.reshape(2, 5)
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
A = x.reshape(2, 5)
A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
B = x.reshape(1, 10)
B
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
B.ndim
2
B.shape
(1, 10)
x.reshape(-1, 10)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
x.reshape(10, -1)
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
x.reshape(2, -1)
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
x.reshape(3, -1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-12a588b09f7f> in <module>()
----> 1 x.reshape(3, -1)
ValueError: cannot reshape array of size 10 into shape (3,newaxis)
numpy.array
合并和分割
import numpy as np
numpy.array
的合并
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
array([1, 2, 3, 3, 2, 1])
z = np.array([666, 666, 666])
np.concatenate([x, y, z])
array([ 1, 2, 3, 3, 2, 1, 666, 666, 666])
A = np.array([[1, 2, 3],
[4, 5, 6]])
np.concatenate([A, A])
array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
np.concatenate([A, A], axis=1)
array([[1, 2, 3, 1, 2, 3],
[4, 5, 6, 4, 5, 6]])
np.concatenate([A, z])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-148a822297cf> in <module>()
----> 1 np.concatenate([A, z])
ValueError: all the input arrays must have same number of dimensions
np.concatenate([A, z.reshape(1, -1)])
array([[ 1, 2, 3],
[ 4, 5, 6],
[666, 666, 666]])
np.vstack([A, z])
array([[ 1, 2, 3],
[ 4, 5, 6],
[666, 666, 666]])
B = np.full((2,2), 100)
np.hstack([A, B])
array([[ 1, 2, 3, 100, 100],
[ 4, 5, 6, 100, 100]])
np.hstack([A, z])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-d5b9fc6fb0a8> in <module>()
----> 1 np.hstack([A, z])
/Users/yuanzhang/anaconda/lib/python3.6/site-packages/numpy/core/shape_base.py in hstack(tup)
286 return _nx.concatenate(arrs, 0)
287 else:
--> 288 return _nx.concatenate(arrs, 1)
289
290 def stack(arrays, axis=0):
ValueError: all the input arrays must have same number of dimensions
numpy.array
的分割
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x1, x2, x3 = np.split(x, [3, 7])
x1
array([0, 1, 2])
x2
array([3, 4, 5, 6])
x3
array([7, 8, 9])
x1, x2 = np.split(x, [5])
x1
array([0, 1, 2, 3, 4])
x2
array([5, 6, 7, 8, 9])
A = np.arange(16).reshape((4, 4))
A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
A1, A2 = np.split(A, [2])
A1
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
A2
array([[ 8, 9, 10, 11],
[12, 13, 14, 15]])
A1, A2 = np.split(A, [2], axis=1)
A1
array([[ 0, 1],
[ 4, 5],
[ 8, 9],
[12, 13]])
A2
array([[ 2, 3],
[ 6, 7],
[10, 11],
[14, 15]])
upper, lower = np.vsplit(A, [2])
upper
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
lower
array([[ 8, 9, 10, 11],
[12, 13, 14, 15]])
left, right = np.hsplit(A, [2])
left
array([[ 0, 1],
[ 4, 5],
[ 8, 9],
[12, 13]])
right
array([[ 2, 3],
[ 6, 7],
[10, 11],
[14, 15]])
data = np.arange(16).reshape((4, 4))
data
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
X, y = np.hsplit(data, [-1])
X
array([[ 0, 1, 2],
[ 4, 5, 6],
[ 8, 9, 10],
[12, 13, 14]])
y
array([[ 3],
[ 7],
[11],
[15]])
y[:, 0]
array([ 3, 7, 11, 15])
numpy.array
中的运算
给定一个数组,让数组中每一个数乘以2
n = 10
L = [i for i in range(n)]
2 * L
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
A = []
for e in L:
A.append(2*e)
n = 1000000
L = [i for i in range(n)]
%%time
A = []
for e in L:
A.append(2*e)
CPU times: user 253 ms, sys: 30 ms, total: 283 ms
Wall time: 303 ms
%%time
A = [2*e for e in L]
CPU times: user 93.6 ms, sys: 25.8 ms, total: 119 ms
Wall time: 128 ms
import numpy as np
L = np.arange(n)
%%time
A = np.array(2*e for e in L)
CPU times: user 15.1 ms, sys: 8.97 ms, total: 24.1 ms
Wall time: 24.8 ms
%%time
A = 2 * L
CPU times: user 3.79 ms, sys: 4.36 ms, total: 8.14 ms
Wall time: 8.03 ms
n = 10
L = np.arange(n)
2 * L
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
NumPy’s UFuncs (Universal Functions)
X = np.arange(1, 16).reshape((3, 5))
X
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
X + 1
array([[ 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16]])
X - 1
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
X * 2
array([[ 2, 4, 6, 8, 10],
[12, 14, 16, 18, 20],
[22, 24, 26, 28, 30]])
X / 2
array([[ 0.5, 1. , 1.5, 2. , 2.5],
[ 3. , 3.5, 4. , 4.5, 5. ],
[ 5.5, 6. , 6.5, 7. , 7.5]])
X // 2
array([[0, 1, 1, 2, 2],
[3, 3, 4, 4, 5],
[5, 6, 6, 7, 7]])
X ** 2
array([[ 1, 4, 9, 16, 25],
[ 36, 49, 64, 81, 100],
[121, 144, 169, 196, 225]])
X % 2
array([[1, 0, 1, 0, 1],
[0, 1, 0, 1, 0],
[1, 0, 1, 0, 1]])
1 / X
array([[ 1. , 0.5 , 0.33333333, 0.25 , 0.2 ],
[ 0.16666667, 0.14285714, 0.125 , 0.11111111, 0.1 ],
[ 0.09090909, 0.08333333, 0.07692308, 0.07142857, 0.06666667]])
np.abs(X)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
np.sin(X)
array([[ 0.84147098, 0.90929743, 0.14112001, -0.7568025 , -0.95892427],
[-0.2794155 , 0.6569866 , 0.98935825, 0.41211849, -0.54402111],
[-0.99999021, -0.53657292, 0.42016704, 0.99060736, 0.65028784]])
np.cos(X)
array([[ 0.54030231, -0.41614684, -0.9899925 , -0.65364362, 0.28366219],
[ 0.96017029, 0.75390225, -0.14550003, -0.91113026, -0.83907153],
[ 0.0044257 , 0.84385396, 0.90744678, 0.13673722, -0.75968791]])
np.tan(X)
array([[ 1.55740772e+00, -2.18503986e+00, -1.42546543e-01,
1.15782128e+00, -3.38051501e+00],
[ -2.91006191e-01, 8.71447983e-01, -6.79971146e+00,
-4.52315659e-01, 6.48360827e-01],
[ -2.25950846e+02, -6.35859929e-01, 4.63021133e-01,
7.24460662e+00, -8.55993401e-01]])
np.arctan(X)
array([[ 0.78539816, 1.10714872, 1.24904577, 1.32581766, 1.37340077],
[ 1.40564765, 1.42889927, 1.44644133, 1.46013911, 1.47112767],
[ 1.48013644, 1.48765509, 1.49402444, 1.49948886, 1.50422816]])
np.exp(X)
array([[ 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
5.45981500e+01, 1.48413159e+02],
[ 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
8.10308393e+03, 2.20264658e+04],
[ 5.98741417e+04, 1.62754791e+05, 4.42413392e+05,
1.20260428e+06, 3.26901737e+06]])
np.exp2(X)
array([[ 2.00000000e+00, 4.00000000e+00, 8.00000000e+00,
1.60000000e+01, 3.20000000e+01],
[ 6.40000000e+01, 1.28000000e+02, 2.56000000e+02,
5.12000000e+02, 1.02400000e+03],
[ 2.04800000e+03, 4.09600000e+03, 8.19200000e+03,
1.63840000e+04, 3.27680000e+04]])
np.power(3, X)
array([[ 3, 9, 27, 81, 243],
[ 729, 2187, 6561, 19683, 59049],
[ 177147, 531441, 1594323, 4782969, 14348907]])
np.log(X)
array([[ 0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791],
[ 1.79175947, 1.94591015, 2.07944154, 2.19722458, 2.30258509],
[ 2.39789527, 2.48490665, 2.56494936, 2.63905733, 2.7080502 ]])
np.log2(X)
array([[ 0. , 1. , 1.5849625 , 2. , 2.32192809],
[ 2.5849625 , 2.80735492, 3. , 3.169925 , 3.32192809],
[ 3.45943162, 3.5849625 , 3.70043972, 3.80735492, 3.9068906 ]])
np.log10(X)
array([[ 0. , 0.30103 , 0.47712125, 0.60205999, 0.69897 ],
[ 0.77815125, 0.84509804, 0.90308999, 0.95424251, 1. ],
[ 1.04139269, 1.07918125, 1.11394335, 1.14612804, 1.17609126]])
矩阵运算
A = np.arange(4).reshape(2, 2)
A
array([[0, 1],
[2, 3]])
B = np.full((2, 2), 10)
B
array([[10, 10],
[10, 10]])
A + B
array([[10, 11],
[12, 13]])
A - B
array([[-10, -9],
[ -8, -7]])
A * B
array([[ 0, 10],
[20, 30]])
A.dot(B)
array([[10, 10],
[50, 50]])
A.T
array([[0, 2],
[1, 3]])
C = np.full((3, 3), 666)
A + C
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-145-cb7c4a36a7ba> in <module>()
----> 1 A + C
ValueError: operands could not be broadcast together with shapes (2,2) (3,3)
向量和矩阵的运算
加法
v = np.array([1, 2])
v + A
array([[1, 3],
[3, 5]])
v + A
是可以的,但是在这个课程中,我们不研究其中的计算法则。有兴趣的同学可以查询资料自学numpy.array
的broadcast
np.vstack([v] * A.shape[0])
array([[1, 2],
[1, 2]])
np.vstack([v] * A.shape[0]) + A
array([[1, 3],
[3, 5]])
np.tile(v, (2, 1))
array([[1, 2],
[1, 2]])
np.tile(v, (2, 1)) + A
array([[1, 3],
[3, 5]])
np.tile(v, (2, 2))
array([[1, 2, 1, 2],
[1, 2, 1, 2]])
乘法
v * A
array([[0, 2],
[2, 6]])
v.dot(A)
array([4, 7])
A.dot(v)
array([2, 8])
矩阵的逆
np.linalg.inv(A)
array([[-1.5, 0.5],
[ 1. , 0. ]])
invA = np.linalg.inv(A)
A.dot(invA)
array([[ 1., 0.],
[ 0., 1.]])
invA.dot(A)
array([[ 1., 0.],
[ 0., 1.]])
X = np.arange(16).reshape((2, 8))
invX = np.linalg.inv(X)
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-207-60b1a25f4891> in <module>()
----> 1 invX = np.linalg.inv(X)
/Users/yuanzhang/anaconda/lib/python3.6/site-packages/numpy/linalg/linalg.py in inv(a)
515 a, wrap = _makearray(a)
516 _assertRankAtLeast2(a)
--> 517 _assertNdSquareness(a)
518 t, result_t = _commonType(a)
519
/Users/yuanzhang/anaconda/lib/python3.6/site-packages/numpy/linalg/linalg.py in _assertNdSquareness(*arrays)
210 for a in arrays:
211 if max(a.shape[-2:]) != min(a.shape[-2:]):
--> 212 raise LinAlgError('Last 2 dimensions of the array must be square')
213
214 def _assertFinite(*arrays):
LinAlgError: Last 2 dimensions of the array must be square
矩阵的伪逆
pinvX = np.linalg.pinv(X)
pinvX
array([[ -1.35416667e-01, 5.20833333e-02],
[ -1.01190476e-01, 4.16666667e-02],
[ -6.69642857e-02, 3.12500000e-02],
[ -3.27380952e-02, 2.08333333e-02],
[ 1.48809524e-03, 1.04166667e-02],
[ 3.57142857e-02, 8.67361738e-18],
[ 6.99404762e-02, -1.04166667e-02],
[ 1.04166667e-01, -2.08333333e-02]])
X.dot(pinvX)
array([[ 1.00000000e+00, -9.71445147e-17],
[ -1.33226763e-15, 1.00000000e+00]])
矩阵的伪逆又被称为“广义逆矩阵”,有兴趣的同学可以翻看线性教材课本查看更多额广义逆矩阵相关的性质。中文wiki链接: https://zh.wikipedia.org/wiki/%E5%B9%BF%E4%B9%89%E9%80%86%E9%98%B5
Numpy
中的聚合操作
sum
import numpy as np
L = np.random.random(100)
sum(L)
52.675554310672098
np.sum(L)
52.675554310672105
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)
10 loops, best of 3: 173 ms per loop
1000 loops, best of 3: 1.02 ms per loop
min, max
np.min(big_array)
2.2765289564574687e-07
np.max(big_array)
0.99999686126703025
big_array.min()
2.2765289564574687e-07
big_array.max()
0.99999686126703025
big_array.sum()
500454.89231729991
多维度聚合
X = np.arange(16).reshape(4,-1)
X
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
np.sum(X)
120
np.sum(X, axis=0)
array([24, 28, 32, 36])
np.sum(X, axis=1)
array([ 6, 22, 38, 54])
注意:axis描述的是将要被压缩的维度。
其他聚合操作
np.prod(X)
0
np.prod(X + 1)
20922789888000
np.mean(X)
7.5
np.median(X)
7.5
v = np.array([1, 1, 2, 2, 10])
np.mean(v)
3.2000000000000002
np.median(v)
2.0
np.percentile(big_array, q=50)
0.50056612640031206
np.median(big_array)
0.50056612640031206
np.percentile(big_array, q=100)
0.99999686126703025
np.max(big_array)
0.99999686126703025
for percent in [0, 25, 50, 75, 100]:
print(np.percentile(big_array, q=percent))
2.27652895646e-07
0.250501365819
0.5005661264
0.750543416185
0.999996861267
np.var(big_array)
0.083379660489048227
np.std(big_array)
0.28875536443336985
x = np.random.normal(0, 1, 1000000)
np.mean(x)
-0.00044876833100538597
np.std(x)
1.0000457010611321
Numpy
中arg运算
import numpy as np
x = np.random.normal(0, 1, 1000000)
索引
np.argmin(x)
886266
x[886266]
-4.8354963762015108
np.min(x)
-4.8354963762015108
np.argmax(x)
4851
x[4851]
4.5860138951376461
np.max(x)
4.5860138951376461
排序和使用索引
x = np.arange(16)
x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
np.random.shuffle(x)
x
array([13, 2, 6, 7, 11, 10, 3, 4, 8, 0, 5, 1, 9, 14, 12, 15])
np.sort(x)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
x
array([13, 2, 6, 7, 11, 10, 3, 4, 8, 0, 5, 1, 9, 14, 12, 15])
x.sort()
x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
X = np.random.randint(10, size=(4,4))
X
array([[8, 8, 5, 8],
[1, 2, 2, 4],
[5, 5, 9, 9],
[3, 9, 3, 4]])
np.sort(X, axis=0)
array([[1, 2, 2, 4],
[3, 5, 3, 4],
[5, 8, 5, 8],
[8, 9, 9, 9]])
np.sort(X, axis=1)
array([[5, 8, 8, 8],
[1, 2, 2, 4],
[5, 5, 9, 9],
[3, 3, 4, 9]])
使用索引
x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
np.random.shuffle(x)
x
array([14, 15, 8, 7, 10, 4, 9, 1, 6, 5, 3, 12, 2, 11, 0, 13])
np.argsort(x)
array([14, 7, 12, 10, 5, 9, 8, 3, 2, 6, 4, 13, 11, 15, 0, 1])
np.partition(x, 3)
array([ 1, 0, 2, 3, 4, 5, 7, 8, 6, 9, 10, 12, 11, 13, 15, 14])
np.argpartition(x, 3)
array([ 7, 14, 12, 10, 5, 9, 3, 2, 8, 6, 4, 11, 13, 15, 1, 0])
X
array([[8, 8, 5, 8],
[1, 2, 2, 4],
[5, 5, 9, 9],
[3, 9, 3, 4]])
np.argsort(X, axis=1)
array([[2, 0, 1, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 2, 3, 1]])
np.argpartition(X, 2, axis=1)
array([[2, 1, 0, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 2, 3, 1]])
Numpy
中的比较和Fancy Indexing
Fancy Indexing
import numpy as np
x = np.arange(16)
x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
x[3]
3
x[3:9]
array([3, 4, 5, 6, 7, 8])
x[3:9:2]
array([3, 5, 7])
[x[3], x[5], x[7]]
[3, 5, 7]
ind = [3, 5, 7]
x[ind]
array([3, 5, 7])
ind = np.array([[0, 2], [1, 3]])
x[ind]
array([[0, 2],
[1, 3]])
Fancy Indexing 应用在二维数组
X = x.reshape(4, -1)
X
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
row = np.array([0, 1, 2])
col = np.array([1, 2, 3])
X[row, col]
array([ 1, 6, 11])
X[0, col]
array([1, 2, 3])
X[:2, col]
array([[1, 2, 3],
[5, 6, 7]])
col = [True, False, True, True]
X[0, col]
array([0, 2, 3])
numpy.array
的比较
x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
x < 3
array([ True, True, True, False, False, False, False, False, False,
False, False, False, False, False, False, False], dtype=bool)
x > 3
array([False, False, False, False, True, True, True, True, True,
True, True, True, True, True, True, True], dtype=bool)
x <= 3
array([ True, True, True, True, False, False, False, False, False,
False, False, False, False, False, False, False], dtype=bool)
x >= 3
array([False, False, False, True, True, True, True, True, True,
True, True, True, True, True, True, True], dtype=bool)
x == 3
array([False, False, False, True, False, False, False, False, False,
False, False, False, False, False, False, False], dtype=bool)
x != 3
array([ True, True, True, False, True, True, True, True, True,
True, True, True, True, True, True, True], dtype=bool)
2 * x == 24 - 4 * x
array([False, False, False, False, True, False, False, False, False,
False, False, False, False, False, False, False], dtype=bool)
X < 6
array([[ True, True, True, True],
[ True, True, False, False],
[False, False, False, False],
[False, False, False, False]], dtype=bool)
使用 numpy.array
的比较结果
np.count_nonzero( x <= 3)
4
np.sum(x <= 3)
4
np.sum(X % 2 == 0, axis=0)
array([4, 0, 4, 0])
np.sum(X % 2 == 0, axis=1)
array([2, 2, 2, 2])
np.any(x == 0)
True
np.any(x < 0)
False
np.all(x > 0)
False
np.all(x >= 0)
True
np.all(X > 0, axis=1)
array([False, True, True, True], dtype=bool)
np.sum((x > 3) & (x < 10))
6
np.sum((x > 3) && (x < 10))
File "<ipython-input-45-780ca9b7c144>", line 1
np.sum((x > 3) && (x < 10))
^
SyntaxError: invalid syntax
np.sum((x % 2 == 0) | (x > 10))
11
np.sum(~(x == 0))
15
比较结果和Fancy Indexing
x < 5
array([ True, True, True, True, True, False, False, False, False,
False, False, False, False, False, False, False], dtype=bool)
x[x < 5]
array([0, 1, 2, 3, 4])
x[x % 2 == 0]
array([ 0, 2, 4, 6, 8, 10, 12, 14])
X[X[:,3] % 3 == 0, :]
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])