[转载]Python基础-Numpy基本语法

Numpy - multidimensional data arrays

J.R. Johansson (jrjohansson at gmail.com)

The latest version of this IPython notebook lecture is available at http://github.com/jrjohansson/scientific-python-lectures.

The other notebooks in this lecture series are indexed at http://jrjohansson.github.io.

# what is this line all about?!? Answer in lecture 4
%matplotlib inline
import matplotlib.pyplot as plt

Introduction

The numpy package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.

To use numpy you need to import the module, using for example:

from numpy import *

In the numpy package the terminology used for vectors, matrices and higher-dimensional data sets is array.

Creating numpy arrays

There are a number of ways to initialize new numpy arrays, for example from

  • a Python list or tuples
  • using functions that are dedicated to generating numpy arrays, such as arange, linspace, etc.
  • reading data from files

From lists

For example, to create new vector and matrix arrays from Python lists we can use the numpy.array function.

# a vector: the argument to the array function is a Python list
v = array([1,2,3,4])

v
array([1, 2, 3, 4])
# a matrix: the argument to the array function is a nested Python list
M = array([[1, 2], [3, 4]])

M
array([[1, 2],
       [3, 4]])

The v and M objects are both of the type ndarray that the numpy module provides.

type(v), type(M)
(numpy.ndarray, numpy.ndarray)

The difference between the v and M arrays is only their shapes. We can get information about the shape of an array by using the ndarray.shape property.

v.shape
(4,)
M.shape
(2, 2)

The number of elements in the array is available through the ndarray.size property:

M.size
4

Equivalently, we could use the function numpy.shape and numpy.size

shape(M)
(2, 2)
size(M)
4

So far the numpy.ndarray looks awefully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type?

There are several reasons:

  • Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
  • Numpy arrays are statically typed and homogeneous. The type of the elements is determined when the array is created.
  • Numpy arrays are memory efficient.
  • Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).

Using the dtype (data type) property of an ndarray, we can see what type the data of an array has:

M.dtype
dtype('int64')

We get an error if we try to assign a value of the wrong type to an element in a numpy array:

M[0,0] = "hello"
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-12-a09d72434238> in <module>()
----> 1 M[0,0] = "hello"


ValueError: invalid literal for long() with base 10: 'hello'

If we want, we can explicitly define the type of the array data when we create it, using the dtype keyword argument:

M = array([[1, 2], [3, 4]], dtype=complex)

M
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

Common data types that can be used with dtype are: int, float, complex, bool, object, etc.

We can also explicitly define the bit size of the data types, for example: int64, int16, float128, complex128.

Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in numpy that generate arrays of different forms. Some of the more common are:

arange

# create a range

x = arange(0, 10, 1) # arguments: start, stop, step

x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x = arange(-1, 1, 0.1)

x
array([ -1.00000000e+00,  -9.00000000e-01,  -8.00000000e-01,
        -7.00000000e-01,  -6.00000000e-01,  -5.00000000e-01,
        -4.00000000e-01,  -3.00000000e-01,  -2.00000000e-01,
        -1.00000000e-01,  -2.22044605e-16,   1.00000000e-01,
         2.00000000e-01,   3.00000000e-01,   4.00000000e-01,
         5.00000000e-01,   6.00000000e-01,   7.00000000e-01,
         8.00000000e-01,   9.00000000e-01])

linspace and logspace

# using linspace, both end points ARE included
linspace(0, 10, 25)
array([  0.        ,   0.41666667,   0.83333333,   1.25      ,
         1.66666667,   2.08333333,   2.5       ,   2.91666667,
         3.33333333,   3.75      ,   4.16666667,   4.58333333,
         5.        ,   5.41666667,   5.83333333,   6.25      ,
         6.66666667,   7.08333333,   7.5       ,   7.91666667,
         8.33333333,   8.75      ,   9.16666667,   9.58333333,  10.        ])
logspace(0, 10, 10, base=e)
array([  1.00000000e+00,   3.03773178e+00,   9.22781435e+00,
         2.80316249e+01,   8.51525577e+01,   2.58670631e+02,
         7.85771994e+02,   2.38696456e+03,   7.25095809e+03,
         2.20264658e+04])

mgrid

x, y = mgrid[0:5, 0:5] # similar to meshgrid in MATLAB
x
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])
y
array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

random data

from numpy import random
# uniform random numbers in [0,1]
random.rand(5,5)
array([[ 0.92932506,  0.19684255,  0.736434  ,  0.18125714,  0.70905038],
       [ 0.18803573,  0.9312815 ,  0.1284532 ,  0.38138008,  0.36646481],
       [ 0.53700462,  0.02361381,  0.97760688,  0.73296701,  0.23042324],
       [ 0.9024635 ,  0.20860922,  0.67729644,  0.68386687,  0.49385729],
       [ 0.95876515,  0.29341553,  0.37520629,  0.29194432,  0.64102804]])
# standard normal distributed random numbers
random.randn(5,5)
array([[ 0.117907  , -1.57016164,  0.78256246,  1.45386709,  0.54744436],
       [ 2.30356897, -0.28352021, -0.9087325 ,  1.2285279 , -1.00760167],
       [ 0.72216801,  0.77507299, -0.37793178, -0.31852241,  0.84493629],
       [-0.10682252,  1.15930142, -0.47291444, -0.69496967, -0.58912034],
       [ 0.34513487, -0.92389516, -0.216978  ,  0.42153272,  0.86650101]])

diag

# a diagonal matrix
diag([1,2,3])
array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])
# diagonal with offset from the main diagonal
diag([1,2,3], k=1) 
array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])

zeros and ones

zeros((3,3))
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
ones((3,3))
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

File I/O

Comma-separated values (CSV)

A very common file format for data files is comma-separated values (CSV), or related formats such as TSV (tab-separated values). To read data from such files into Numpy arrays we can use the numpy.genfromtxt function. For example,

!head stockholm_td_adj.dat
1800  1  1    -6.1    -6.1    -6.1 1
1800  1  2   -15.4   -15.4   -15.4 1
1800  1  3   -15.0   -15.0   -15.0 1
1800  1  4   -19.3   -19.3   -19.3 1
1800  1  5   -16.8   -16.8   -16.8 1
1800  1  6   -11.4   -11.4   -11.4 1
1800  1  7    -7.6    -7.6    -7.6 1
1800  1  8    -7.1    -7.1    -7.1 1
1800  1  9   -10.1   -10.1   -10.1 1
1800  1 10    -9.5    -9.5    -9.5 1
data = genfromtxt('stockholm_td_adj.dat')
data.shape
(77431, 7)
fig, ax = plt.subplots(figsize=(14,4))
ax.plot(data[:,0]+data[:,1]/12.0+data[:,2]/365, data[:,5])
ax.axis('tight')
ax.set_title('tempeatures in Stockholm')
ax.set_xlabel('year')
ax.set_ylabel('temperature (C)');

Using numpy.savetxt we can store a Numpy array to a file in CSV format:

M = random.rand(3,3)

M
array([[ 0.77872576,  0.40043577,  0.66254019],
       [ 0.60410063,  0.4791374 ,  0.8237106 ],
       [ 0.96856318,  0.15459644,  0.96082399]])
savetxt("random-matrix.csv", M)
!cat random-matrix.csv
7.787257639287014088e-01 4.004357670697732408e-01 6.625401863466899854e-01
6.041006328761111543e-01 4.791373994963619154e-01 8.237105968088237473e-01
9.685631757740569281e-01 1.545964379103705877e-01 9.608239852111523094e-01
savetxt("random-matrix.csv", M, fmt='%.5f') # fmt specifies the format

!cat random-matrix.csv
0.77873 0.40044 0.66254
0.60410 0.47914 0.82371
0.96856 0.15460 0.96082

Numpy's native file format

Useful when storing and reading back numpy array data. Use the functions numpy.save and numpy.load:

save("random-matrix.npy", M)

!file random-matrix.npy
random-matrix.npy: data
load("random-matrix.npy")
array([[ 0.77872576,  0.40043577,  0.66254019],
       [ 0.60410063,  0.4791374 ,  0.8237106 ],
       [ 0.96856318,  0.15459644,  0.96082399]])

More properties of the numpy arrays

M.itemsize # bytes per element
8
M.nbytes # number of bytes
72
M.ndim # number of dimensions
2

Manipulating arrays

Indexing

We can index elements in an array using square brackets and indices:

# v is a vector, and has only one dimension, taking one index
v[0]
1
# M is a matrix, or a 2 dimensional array, taking two indices 
M[1,1]
0.47913739949636192

If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array)

M
array([[ 0.77872576,  0.40043577,  0.66254019],
       [ 0.60410063,  0.4791374 ,  0.8237106 ],
       [ 0.96856318,  0.15459644,  0.96082399]])
M[1]
array([ 0.60410063,  0.4791374 ,  0.8237106 ])

The same thing can be achieved with using : instead of an index:

M[1,:] # row 1
array([ 0.60410063,  0.4791374 ,  0.8237106 ])
M[:,1] # column 1
array([ 0.40043577,  0.4791374 ,  0.15459644])

We can assign new values to elements in an array using indexing:

M[0,0] = 1
M
array([[ 1.        ,  0.40043577,  0.66254019],
       [ 0.60410063,  0.4791374 ,  0.8237106 ],
       [ 0.96856318,  0.15459644,  0.96082399]])
# also works for rows and columns
M[1,:] = 0
M[:,2] = -1
M
array([[ 1.        ,  0.40043577, -1.        ],
       [ 0.        ,  0.        , -1.        ],
       [ 0.96856318,  0.15459644, -1.        ]])

Index slicing

Index slicing is the technical name for the syntax M[lower:upper:step] to extract part of an array:

A = array([1,2,3,4,5])
A
array([1, 2, 3, 4, 5])
A[1:3]
array([2, 3])

Array slices are mutable: if they are assigned a new value the original array from which the slice was extracted is modified:

A[1:3] = [-2,-3]

A
array([ 1, -2, -3,  4,  5])

We can omit any of the three parameters in M[lower:upper:step]:

A[::] # lower, upper, step all take the default values
array([ 1, -2, -3,  4,  5])
A[::2] # step is 2, lower and upper defaults to the beginning and end of the array
array([ 1, -3,  5])
A[:3] # first three elements
array([ 1, -2, -3])
A[3:] # elements from index 3
array([4, 5])

Negative indices counts from the end of the array (positive index from the begining):

A = array([1,2,3,4,5])
A[-1] # the last element in the array
5
A[-3:] # the last three elements
array([3, 4, 5])

Index slicing works exactly the same way for multidimensional arrays:

A = array([[n+m*10 for n in range(5)] for m in range(5)])

A
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
# a block from the original array
A[1:4, 1:4]
array([[11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])
# strides
A[::2, ::2]
array([[ 0,  2,  4],
       [20, 22, 24],
       [40, 42, 44]])

Fancy indexing

Fancy indexing is the name for when an array or list is used in-place of an index:

row_indices = [1, 2, 3]
A[row_indices]
array([[10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])
col_indices = [1, 2, -1] # remember, index -1 means the last element
A[row_indices, col_indices]
array([11, 22, 34])

We can also use index masks: If the index mask is an Numpy array of data type bool, then an element is selected (True) or not (False) depending on the value of the index mask at the position of each element:

B = array([n for n in range(5)])
B
array([0, 1, 2, 3, 4])
row_mask = array([True, False, True, False, False])
B[row_mask]
array([0, 2])
# same thing
row_mask = array([1,0,1,0,0], dtype=bool)
B[row_mask]
array([0, 2])

This feature is very useful to conditionally select elements from an array, using for example comparison operators:

x = arange(0, 10, 0.5)
x
array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,
        5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5])
mask = (5 < x) * (x < 7.5)

mask
array([False, False, False, False, False, False, False, False, False,
       False, False,  True,  True,  True,  True, False, False, False,
       False, False], dtype=bool)
x[mask]
array([ 5.5,  6. ,  6.5,  7. ])

Functions for extracting data from arrays and creating arrays

where

The index mask can be converted to position index using the where function

indices = where(mask)

indices
(array([11, 12, 13, 14]),)
x[indices] # this indexing is equivalent to the fancy indexing x[mask]
array([ 5.5,  6. ,  6.5,  7. ])

diag

With the diag function we can also extract the diagonal and subdiagonals of an array:

diag(A)
array([ 0, 11, 22, 33, 44])
diag(A, -1)
array([10, 21, 32, 43])

take

The take function is similar to fancy indexing described above:

v2 = arange(-3,3)
v2
array([-3, -2, -1,  0,  1,  2])
row_indices = [1, 3, 5]
v2[row_indices] # fancy indexing
array([-2,  0,  2])
v2.take(row_indices)
array([-2,  0,  2])

But take also works on lists and other objects:

take([-3, -2, -1,  0,  1,  2], row_indices)
array([-2,  0,  2])

choose

Constructs an array by picking elements from several arrays:

which = [1, 0, 1, 0]
choices = [[-2,-2,-2,-2], [5,5,5,5]]

choose(which, choices)
array([ 5, -2,  5, -2])

Linear algebra

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations, like matrix-matrix multiplication.

Scalar-array operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

v1 = arange(0, 5)
v1 * 2
array([0, 2, 4, 6, 8])
v1 + 2
array([2, 3, 4, 5, 6])
A * 2, A + 2
(array([[ 0,  2,  4,  6,  8],
        [20, 22, 24, 26, 28],
        [40, 42, 44, 46, 48],
        [60, 62, 64, 66, 68],
        [80, 82, 84, 86, 88]]), array([[ 2,  3,  4,  5,  6],
        [12, 13, 14, 15, 16],
        [22, 23, 24, 25, 26],
        [32, 33, 34, 35, 36],
        [42, 43, 44, 45, 46]]))

Element-wise array-array operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is element-wise operations:

A * A # element-wise multiplication
array([[   0,    1,    4,    9,   16],
       [ 100,  121,  144,  169,  196],
       [ 400,  441,  484,  529,  576],
       [ 900,  961, 1024, 1089, 1156],
       [1600, 1681, 1764, 1849, 1936]])
v1 * v1
array([ 0,  1,  4,  9, 16])

If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:

A.shape, v1.shape
((5, 5), (5,))
A * v1
array([[  0,   1,   4,   9,  16],
       [  0,  11,  24,  39,  56],
       [  0,  21,  44,  69,  96],
       [  0,  31,  64,  99, 136],
       [  0,  41,  84, 129, 176]])

Matrix algebra

What about matrix mutiplication? There are two ways. We can either use the dot function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments:

dot(A, A)
array([[ 300,  310,  320,  330,  340],
       [1300, 1360, 1420, 1480, 1540],
       [2300, 2410, 2520, 2630, 2740],
       [3300, 3460, 3620, 3780, 3940],
       [4300, 4510, 4720, 4930, 5140]])
dot(A, v1)
array([ 30, 130, 230, 330, 430])
dot(v1, v1)
30

Alternatively, we can cast the array objects to the type matrix. This changes the behavior of the standard arithmetic operators +, -, * to use matrix algebra.

M = matrix(A)
v = matrix(v1).T # make it a column vector
v
matrix([[0],
        [1],
        [2],
        [3],
        [4]])
M * M
matrix([[ 300,  310,  320,  330,  340],
        [1300, 1360, 1420, 1480, 1540],
        [2300, 2410, 2520, 2630, 2740],
        [3300, 3460, 3620, 3780, 3940],
        [4300, 4510, 4720, 4930, 5140]])
M * v
matrix([[ 30],
        [130],
        [230],
        [330],
        [430]])
# inner product
v.T * v
matrix([[30]])
# with matrix objects, standard matrix algebra applies
v + M*v
matrix([[ 30],
        [131],
        [232],
        [333],
        [434]])

If we try to add, subtract or multiply objects with incomplatible shapes we get an error:

v = matrix([1,2,3,4,5,6]).T
shape(M), shape(v)
((5, 5), (6, 1))
M * v
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-100-995fb48ad0cc> in <module>()
----> 1 M * v


/Users/rob/miniconda/envs/py27-spl/lib/python2.7/site-packages/numpy/matrixlib/defmatrix.pyc in __mul__(self, other)
    339         if isinstance(other, (N.ndarray, list, tuple)) :
    340             # This promotes 1-D vectors to row vectors
--> 341             return N.dot(self, asmatrix(other))
    342         if isscalar(other) or not hasattr(other, '__rmul__') :
    343             return N.dot(self, other)


ValueError: shapes (5,5) and (6,1) not aligned: 5 (dim 1) != 6 (dim 0)

See also the related functions: inner, outer, cross, kron, tensordot. Try for example help(kron).

Array/Matrix transformations

Above we have used the .T to transpose the matrix object v. We could also have used the transpose function to accomplish the same thing.

Other mathematical functions that transform matrix objects are:

C = matrix([[1j, 2j], [3j, 4j]])
C
matrix([[ 0.+1.j,  0.+2.j],
        [ 0.+3.j,  0.+4.j]])
conjugate(C)
matrix([[ 0.-1.j,  0.-2.j],
        [ 0.-3.j,  0.-4.j]])

Hermitian conjugate: transpose + conjugate

C.H
matrix([[ 0.-1.j,  0.-3.j],
        [ 0.-2.j,  0.-4.j]])

We can extract the real and imaginary parts of complex-valued arrays using real and imag:

real(C) # same as: C.real
matrix([[ 0.,  0.],
        [ 0.,  0.]])
imag(C) # same as: C.imag
matrix([[ 1.,  2.],
        [ 3.,  4.]])

Or the complex argument and absolute value

angle(C+1) # heads up MATLAB Users, angle is used instead of arg
array([[ 0.78539816,  1.10714872],
       [ 1.24904577,  1.32581766]])
abs(C)
matrix([[ 1.,  2.],
        [ 3.,  4.]])

Matrix computations

Inverse

linalg.inv(C) # equivalent to C.I 
matrix([[ 0.+2.j ,  0.-1.j ],
        [ 0.-1.5j,  0.+0.5j]])
C.I * C
matrix([[  1.00000000e+00+0.j,   4.44089210e-16+0.j],
        [  0.00000000e+00+0.j,   1.00000000e+00+0.j]])

Determinant

linalg.det(C)
(2.0000000000000004+0j)
linalg.det(C.I)
(0.50000000000000011+0j)

Data processing

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays.

For example, let's calculate some properties from the Stockholm temperature dataset used above.

# reminder, the tempeature dataset is stored in the data variable:
shape(data)
(77431, 7)

mean

# the temperature data is in column 3
mean(data[:,3])
6.1971096847515854

The daily mean temperature in Stockholm over the last 200 years has been about 6.2 C.

standard deviations and variance

std(data[:,3]), var(data[:,3])
(8.2822716213405734, 68.596023209663414)

min and max

# lowest daily average temperature
data[:,3].min()
-25.800000000000001
# highest daily average temperature
data[:,3].max()
28.300000000000001

sum, prod, and trace

d = arange(0, 10)
d
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# sum up all elements
sum(d)
45
# product of all elements
prod(d+1)
3628800
# cummulative sum
cumsum(d)
array([ 0,  1,  3,  6, 10, 15, 21, 28, 36, 45])
# cummulative product
cumprod(d+1)
array([      1,       2,       6,      24,     120,     720,    5040,
         40320,  362880, 3628800])
# same as: diag(A).sum()
trace(A)
110

Computations on subsets of arrays

We can compute with subsets of the data in an array using indexing, fancy indexing, and the other methods of extracting data from an array (described above).

For example, let's go back to the temperature dataset:

!head -n 3 stockholm_td_adj.dat
1800  1  1    -6.1    -6.1    -6.1 1
1800  1  2   -15.4   -15.4   -15.4 1
1800  1  3   -15.0   -15.0   -15.0 1

The dataformat is: year, month, day, daily average temperature, low, high, location.

If we are interested in the average temperature only in a particular month, say February, then we can create a index mask and use it to select only the data for that month using:

unique(data[:,1]) # the month column takes values from 1 to 12
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.])
mask_feb = data[:,1] == 2
# the temperature data is in column 3
mean(data[mask_feb,3])
-3.2121095707365961

With these tools we have very powerful data processing capabilities at our disposal. For example, to extract the average monthly average temperatures for each month of the year only takes a few lines of code:

months = arange(1,13)
monthly_mean = [mean(data[data[:,1] == month, 3]) for month in months]

fig, ax = plt.subplots()
ax.bar(months, monthly_mean)
ax.set_xlabel("Month")
ax.set_ylabel("Monthly avg. temp.");

Calculations with higher-dimensional data

When functions such as min, max, etc. are applied to a multidimensional arrays, it is sometimes useful to apply the calculation to the entire array, and sometimes only on a row or column basis. Using the axis argument we can specify how these functions should behave:

m = random.rand(3,3)
m
array([[ 0.2850926 ,  0.17302017,  0.17748378],
       [ 0.80070487,  0.45527067,  0.61277451],
       [ 0.11372793,  0.43608703,  0.87010206]])
# global max
m.max()
0.87010206156754955
# max in each column
m.max(axis=0)
array([ 0.80070487,  0.45527067,  0.87010206])
# max in each row
m.max(axis=1)
array([ 0.2850926 ,  0.80070487,  0.87010206])

Many other functions and methods in the array and matrix classes accept the same (optional) axis keyword argument.

Reshaping, resizing and stacking arrays

The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays.

A
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
n, m = A.shape
B = A.reshape((1,n*m))
B
array([[ 0,  1,  2,  3,  4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
        32, 33, 34, 40, 41, 42, 43, 44]])
B[0,0:5] = 5 # modify the array

B
array([[ 5,  5,  5,  5,  5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
        32, 33, 34, 40, 41, 42, 43, 44]])
A # and the original variable is also changed. B is only a different view of the same data
array([[ 5,  5,  5,  5,  5],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

We can also use the function flatten to make a higher-dimensional array into a vector. But this function create a copy of the data.

B = A.flatten()

B
array([ 5,  5,  5,  5,  5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
       32, 33, 34, 40, 41, 42, 43, 44])
B[0:5] = 10

B
array([10, 10, 10, 10, 10, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
       32, 33, 34, 40, 41, 42, 43, 44])
A # now A has not changed, because B's data is a copy of A's, not refering to the same data
array([[ 5,  5,  5,  5,  5],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

Adding a new dimension: newaxis

With newaxis, we can insert new dimensions in an array, for example converting a vector to a column or row matrix:

v = array([1,2,3])
shape(v)
(3,)
# make a column matrix of the vector v
v[:, newaxis]
array([[1],
       [2],
       [3]])
# column matrix
v[:,newaxis].shape
(3, 1)
# row matrix
v[newaxis,:].shape
(1, 3)

Stacking and repeating arrays

Using function repeat, tile, vstack, hstack, and concatenate we can create larger vectors and matrices from smaller ones:

tile and repeat

a = array([[1, 2], [3, 4]])
# repeat each element 3 times
repeat(a, 3)
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])
# tile the matrix 3 times 
tile(a, 3)
array([[1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4]])

concatenate

b = array([[5, 6]])
concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])
concatenate((a, b.T), axis=1)
array([[1, 2, 5],
       [3, 4, 6]])

hstack and vstack

vstack((a,b))
array([[1, 2],
       [3, 4],
       [5, 6]])
hstack((a,b.T))
array([[1, 2, 5],
       [3, 4, 6]])

Copy and "deep copy"

To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (technical term: pass by reference).

A = array([[1, 2], [3, 4]])

A
array([[1, 2],
       [3, 4]])
# now B is referring to the same array data as A 
B = A 
# changing B affects A
B[0,0] = 10

B
array([[10,  2],
       [ 3,  4]])
A
array([[10,  2],
       [ 3,  4]])

If we want to avoid this behavior, so that when we get a new completely independent object B copied from A, then we need to do a so-called "deep copy" using the function copy:

B = copy(A)
# now, if we modify B, A is not affected
B[0,0] = -5

B
array([[-5,  2],
       [ 3,  4]])
A
array([[10,  2],
       [ 3,  4]])

Iterating over array elements

Generally, we want to avoid iterating over the elements of arrays whenever we can (at all costs). The reason is that in a interpreted language like Python (or MATLAB), iterations are really slow compared to vectorized operations.

However, sometimes iterations are unavoidable. For such cases, the Python for loop is the most convenient way to iterate over an array:

v = array([1,2,3,4])

for element in v:
    print(element)
1
2
3
4
M = array([[1,2], [3,4]])

for row in M:
    print("row", row)
    
    for element in row:
        print(element)
('row', array([1, 2]))
1
2
('row', array([3, 4]))
3
4

When we need to iterate over each element of an array and modify its elements, it is convenient to use the enumerate function to obtain both the element and its index in the for loop:

for row_idx, row in enumerate(M):
    print("row_idx", row_idx, "row", row)
    
    for col_idx, element in enumerate(row):
        print("col_idx", col_idx, "element", element)
       
        # update the matrix M: square each element
        M[row_idx, col_idx] = element ** 2
('row_idx', 0, 'row', array([1, 2]))
('col_idx', 0, 'element', 1)
('col_idx', 1, 'element', 2)
('row_idx', 1, 'row', array([3, 4]))
('col_idx', 0, 'element', 3)
('col_idx', 1, 'element', 4)
# each element in M is now squared
M
array([[ 1,  4],
       [ 9, 16]])

Vectorizing functions

As mentioned several times by now, to get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs.

def Theta(x):
    """
    Scalar implemenation of the Heaviside step function.
    """
    if x >= 0:
        return 1
    else:
        return 0
Theta(array([-3,-2,-1,0,1,2,3]))
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-165-6658efdd2f22> in <module>()
----> 1 Theta(array([-3,-2,-1,0,1,2,3]))


<ipython-input-164-9a0cb13d93d4> in Theta(x)
      3     Scalar implemenation of the Heaviside step function.
      4     """
----> 5     if x >= 0:
      6         return 1
      7     else:


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

OK, that didn't work because we didn't write the Theta function so that it can handle a vector input...

To get a vectorized version of Theta we can use the Numpy function vectorize. In many cases it can automatically vectorize a function:

Theta_vec = vectorize(Theta)
Theta_vec(array([-3,-2,-1,0,1,2,3]))
array([0, 0, 0, 1, 1, 1, 1])

We can also implement the function to accept a vector input from the beginning (requires more effort but might give better performance):

def Theta(x):
    """
    Vector-aware implemenation of the Heaviside step function.
    """
    return 1 * (x >= 0)
Theta(array([-3,-2,-1,0,1,2,3]))
array([0, 0, 0, 1, 1, 1, 1])
# still works for scalars as well
Theta(-1.2), Theta(2.6)
(0, 1)

Using arrays in conditions

When using arrays in conditions,for example if statements and other boolean expressions, one needs to use any or all, which requires that any or all elements in the array evalutes to True:

M
array([[ 1,  4],
       [ 9, 16]])
if (M > 5).any():
    print("at least one element in M is larger than 5")
else:
    print("no element in M is larger than 5")
at least one element in M is larger than 5
if (M > 5).all():
    print("all elements in M are larger than 5")
else:
    print("all elements in M are not larger than 5")
all elements in M are not larger than 5

Type casting

Since Numpy arrays are statically typed, the type of an array does not change once created. But we can explicitly cast an array of some type to another using the astype functions (see also the similar asarray function). This always create a new array of new type:

M.dtype
dtype('int64')
M2 = M.astype(float)

M2
array([[  1.,   4.],
       [  9.,  16.]])
M2.dtype
dtype('float64')
M3 = M.astype(bool)

M3
array([[ True,  True],
       [ True,  True]], dtype=bool)

Further reading

Versions

%reload_ext version_information

%version_information numpy
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,922评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,591评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,546评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,467评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,553评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,580评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,588评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,334评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,780评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,092评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,270评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,925评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,573评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,194评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,437评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,154评论 2 366
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,127评论 2 352

推荐阅读更多精彩内容