官方help文档
>>> help(numpy.random.choice)
Help on built-in function choice:
choice(...) method of mtrand.RandomState instance
choice(a, size=None, replace=True, p=None)
Generates a random sample from a given 1-D array
.. versionadded:: 1.7.0
Parameters
-----------
a : 1-D array-like or int
If an ndarray, a random sample is generated from its elements.
If an int, the random sample is generated as if a were np.arange(a)
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., ``(m, n, k)``, then
``m * n * k`` samples are drawn. Default is None, in which case a
single value is returned.
replace : boolean, optional
Whether the sample is with or without replacement
p : 1-D array-like, optional
The probabilities associated with each entry in a.
If not given the sample assumes a uniform distribution over all
entries in a.
Returns
--------
samples : single item or ndarray
The generated random samples
Raises
-------
ValueError
If a is an int and less than zero, if a or p are not 1-dimensional,
if a is an array-like of size 0, if p is not a vector of
probabilities, if a and p have different lengths, or if
replace=False and the sample size is greater than the population
size
See Also
---------
randint, shuffle, permutation
Examples
---------
Generate a uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3)
array([0, 3, 4])
>>> #This is equivalent to np.random.randint(0,5,3)
Generate a non-uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0])
Generate a uniform random sample from np.arange(5) of size 3 without
replacement:
>>> np.random.choice(5, 3, replace=False)
array([3,1,0])
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]
Generate a non-uniform random sample from np.arange(5) of size
3 without replacement:
>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0])
Any of the above can be repeated with an arbitrary array-like
instead of just integers. For instance:
>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'],
dtype='|S11')
个人理解
函数声明
choice(a, size=None, replace=True, p=None)
参数a:表示采样范围。需要传入一个一维的类似array的值(包括一维列表、元组、numpy中的ndarry)或者是一个整型值。如果传入的是一维数组n,那么对其中的元素随机采样;如果传入的是整型值n,那么从numpy.arrange(n)(也就是array([0,1,...,n-1]))中随机采样。
参数size:表示采样个数。整型或者是元组(其中所有元素需要是整型)或者无传入值。如果是传入的是整型值n,则表示从a中随机采n个样本;如果传入的是元组(n,m,p),则表示随机采n*m*p个样本,输出格式是形状为(n,m,p)的ndarray;如果不传入值,则默认采一个样本。
'''python
#当不传入size参数时,默认只采一个样本:
np.random.choice(8)
#0
#当传入的size参数为元组时:
np.random.choice(8,(2,2))
#array([[6, 1],
# [2, 4]])
#当传入的size为整型值时:
np.random.choice(8,4)
#array([7, 1, 1, 7])
'''
参数p:表示采样的概率。默认为none,表示每个样本被采取的概率相同,也就是统一采样;或者可以传入一个和a一样的,一维的类似array的值,传入数组必须和a的长度相同,且p中的元素之和必须为1。
'''python
#不传入p时,统一采样
np.random.choice(8,5)
#array([5, 6, 1, 1, 0])
#传入p时,按照给定的概率采样,可以看到下面因为第五个样本概率最大所以被采样次数较多
np.random.choice(8,4, p = [0, 0.1, 0.2, 0.3, 0.4, 0, 0, 0])
#array([4, 4, 4, 4], dtype=int64)
#当传入的p内元素之和不为1时报错
np.random.choice(8,4, p = [0, 0.1, 0.2, 0.3, 0.3, 0, 0, 0])
#Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "mtrand.pyx", line 1148, in mtrand.RandomState.choice
#ValueError: probabilities do not sum to 1
'''
参数replace:表示是否重复采样。默认为True表示可以重复采样(相当于从箱子里拿了球需要再放回去);设置为False时表示不重复采样(相当于从箱子中拿了球不再放回),此时采样数不能大于传入数组a的长度,即传入的size大小不能大于a的长度,如果同时也传入参数p的话,p中不为0的元素个数必须大于等于数组a的长度(因为概率为0表示不对该样本采样)。
'''python
#设置replace为False时,采取的样本不会重复:
np.random.choice(8,5,replace = False)
#array([7, 3, 2, 6, 5])
#设置replace为True或者不设置时(默认就是为True),采取的样本可以重复:
np.random.choice(8,5)
#array([0, 6, 7, 3, 6])
#replace参数为False时,size大小需要小于等于a的大小,否则会报错:
np.random.choice(8,9,replace = False)
#Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "mtrand.pyx", line 1168, in mtrand.RandomState.choice
#ValueError: Cannot take a larger sample than population when 'replace=False'
#不设置replace时size的大小可以任意取:
np.random.choice(8,10)
#array([5, 1, 1, 3, 4, 0, 1, 3, 4, 2])
#同理,设置replace为False时,p中不为0的元素必须大于等于a的大小否则也会报错:
np.random.choice(4,3, replace = False, p = [0, 0.8, 0.2, 0] )
#ValueError: Fewer non-zero entries in p than size
'''
总结
可以自己多尝试,这样对参数掌握得更快,对其理解更充分。