3.3 概率分布
3.3.1 离散型随机变量
若随机变量的取值为有限个或可列个,则称此随机变量为离散型(discrete)随机变量,简称离散量。
比如你抛掷一枚硬币两次,那么结果只有4种可能性:
HH,HT,TH和TT(H:正面;T:反面)
如果用一个随机变量X表示该试验中出现H结果的次数,那么X只有0,1,2三种可能。因此,X为离散型随机变量。具体地:
P(X=0)=0.25
P(X=1)=0.5
P(X=2)=0.25
P(X):Probability Distribution Function(PDF) of variable X 为X的概率分布律,满足下列性质:
3.3.2 连续型随机变量
对于随机变量X,若存在一个非负的实函数f(x),使X落在任意区域D上的概率
则称为X的连续型随机变量,简称连续量,称f(x)为X的概率密度函数,简称密度。
由定义知,密度函数具有以下性质:
(1)f(x)≥0
(2)
(3)
离散型变量和连续型变量的总结:
Mean and variance for discrete variable with a given PDF
3.3.3 0-1(p)分布
E(X)=1×p+0×(1-p)=p
Var(X)=E(X2)-(E(X))2=(12×p+02×(1-p))-p2=p-p2=p(1-p)
3.3.4 贝努里分布 Bernoulli distribution
定义:在n次独立重复的试验中,每次试验都只有两个结果:A,A‘,且每次试验中A发生的概率不变,记P(A)=p,0<p<1,称这一系列试验为n重贝努里(Bernoulli)试验。
在n重贝努里试验中,若记事件A发生的概率为P(A)=p,0<p<1,设X为在n次试验中A发生的次数,则:
E(x)=E(x1+x2+...+xn)=E(x1)+E(x2)+...+E(xn)=p+p+...+p=np
Var(x)=Var(x1+x2+...+xn)=Var(x1)+Var(x2)+...+Var(xn)=p(1-p)+p(1-p)+...+p(1-p)=np(1-p)
Example of a Binomial distribution
When a fair coin is flipped, the probability of it being Head or Tail is the same, i.e.,p=0.5.
If we flip the coin 5 times, what is the probability of having 5 Head?
Example of a Binomial distribution
After a genome wide Chip-seq experiment, a transcription factor was found to bind to the promoter region of 100 genes(out of 26,000). Now, if we do another experiment with a second TF and identify also 100 genes, what is the probability of finding at least 5 of them with the first TF binding site?
Suppose the first TF binds to gene without any preference, then the probability of a gene randomly selected from the genome that is bound by the first TF is 100/26000=0.039
For a given gene, it is either bound by the first TF('success') or not ('failure'),i.e.,a Bernoulli trail.
If the second TF is independent of the first TF, then the number of genes bound by the second TF that are also bound by the first TF follows a binomial distribution.
Binomial distribution:n=100,p=0.0039
P(k=0)=0.6765408
P(k=1)=0.2648840
P(k=2)=0.05133606
P(k=3)=0.006565821
P(k=4)=0.0006233937
P(k>=5)=1-P(k=0)-P(k=1)-P(k=2)-P(k=3)-P(k=4)=4.992756e-05
3.3.5 负贝努里分布 Negative Binomial distribution
定义:实验包含一系列独立的试验,每个试验都有成功、失败两种结果,成功的概率p是恒定的,实现持续到r次成功,r为正整数。满足上述条件的称为负贝努里分布。
Mean and Variance of Negative Binomial Distribution
Alternative formulation of Negative Binomial distribution
Example of negative binomial distribution
If a predator must capture 10 prey before it can grow large enough to reproduce, what would the mean age of onset of reproduction be if the probability of capturing a prey on any given day is 0.1?
The expected time is 100 days. However, the variance is quite high (900) and that the distribution looks quite skewed. Some predators will reach reproductive age much sooner and some much later than the average.
3.3.6 几何分布 Geometric distribution
定义:在n次贝努里试验中,试验k次才得到第一次成功的机率。即,前k-1次皆失败,第k次成功的概率。
Example of geometric distribution
If the probability of extinction of an endangered population is estimated to be 0.1 every year, what is the expected time until extinction?
The expected time is 10 year. However, because of large variance, it will be difficult to predict the actual year in which the population go to extinct accurately.