Maximum Likelihood Estimation
1. The Poisson distribution is useful for modeling the number of events occurring within a unit time, such as the number of packets arrived at some server per minute. The probability mass function of a Poisson distribution is as follows:
where is the parameter of the distribution and is the discrete random variable modeling the number of events encountered per unit time.
1.1. Let be an i.i.d. sample drawn from a Poisson distribution with parameter . Derive the MLE estimate of based on this sample.
根据样本推测泊松分布参数λ的最大似然估计
1.2. Let K be a random variable following a Poisson distribution with parameter . Derive its mean E[K] and variance Var[K]. Since depends on the sample used for estimation, it is also a random variable. Derive the mean and the variance of , and compare them with E[K] and Var[K]. What do you find?
证明泊松分布的均值方差,并且证明该估计是无偏估计
2. One measure of the difference between two distributions in the same space is the
Kullback-Leibler divergence of Kullback-Leibler "distance":
(This "distance" does not obey the requisite symmetry and triangle inequalities for a metric.)
Suppose we seek to approximate an arbitrary distribution by a 1-dimensional normal distribution . Show that the values that lead to the smallest Kullback-Leibler divergence are the obvious ones:
where the expectation taken is over the density .
证明KL散度取最小。
Bayes Estimation
**1. The purpose of this problem is to derive the Bayesian classifier for the -dimensional multivariate Bernoulli case. Work with each class separately, interpreting to mean . Let the conditional probability for a given category be given by
**
and let be a set of samples independently drawn according to this probability density.
-
1.1 Let be a statistical model where the paramter space is either finite- or infinite- dimensional. We say that is identifiable if the mapping is one-to-one:
证明该问题是一个可以识别的分类 1.2 Show that the maximum likelihood estimate for is
-
该概率密度函数的最大似然估计为多少
1.3 If is the sum of the samples, show that
-
-
1.4 Assuming a uniform a priori distribution for and using the identity
show that
1.5 Integrate the product over to obtain the desired conditional probability
-
programing
-
Try window-function, Estimate p(x) with different window width a. Please draw under different n and a and to show the estimation effect.
使用方窗核并使用不同的a跟n,画出图形如下图所示。
-
Derive how to compute numerically.
将积分转化为求和,取步长为0.001,即在[-5, 5]区间之间取10000个点,,可的下表所示数据。
a = 0.5 | a = 1 | a = 2 | a = 4 | |
---|---|---|---|---|
N = 5 | 0.2183 | 0.0403 | 0.0146 | 0.0201 |
N = 10 | 0.0935 | 0.0201 | 0.0102 | 0.0196 |
N = 100 | 0.0069 | 0.0028 | 0.0041 | 0.0224 |
N = 500 | 0.0031 | 0.0022 | 0.0034 | 0.0214 |
N = 1000 | 0.0017 | 0.0010 | 0.0022 | 0.0195 |
N = 2000 | 0.0007 | 0.0003 | 0.0020 | 0.0202 |
-
Demonstrate the expectation and variance of w.r.t different and .
-
With n given, how to choose optimal from above the empirical experiences?
当n 给定时,可以通过观察不同a下的值来进行优化,因为实际上可以作为评估曲线与理论曲线相近程度的度量,即 越小,评估曲线越接近理论值。
-
Substitute in (a) with Gaussian window. Repeat (a)-(e).
使用高斯核函数,估计的概率密度函数如下所示。
不同窗宽与样本个数的如下表所示
a = 0.5 | a = 1 | a = 2 | a = 4 | |
---|---|---|---|---|
N = 5 | 0.0597 | 0.0383 | 0.0483 | 0.0852 |
N = 10 | 0.0780 | 0.0188 | 0.0070 | 0.0587 |
N = 100 | 0.0181 | 0.0062 | 0.0033 | 0.0295 |
N = 500 | 0.0031 | 0.0013 | 0.0028 | 0.0168 |
N = 1000 | 0.0033 | 0.0017 | 0.0024 | 0.0119 |
N = 2000 | 0.0016 | 0.0013 | 0.0021 | 0.0085 |
-
Try different window functions and parameters as many as you can. Which window function/parameter is the best one? Demonstrate it numerically.
分别使用gauss_kernel, window_kernel,quartic_kernel,epanechnikov_kernel,cosine_kernel,分别取不同的窗口大小h与不同的样本个数n,得到的可视化图形如下所示。
他们的方差与均值如下所示
E σ gauss_kernel 0.02326597 0.00074471 window_kernel 0.04124473 0.00423351 quartic_kernel 0.03156898 0.00188394 epanechnikov_kernel 0.03205083 0.00142668 cosine_kernel 0.03167592 0.00147622 可知方窗的效果不太好,均值与方差都比较大,由于是高斯混合分布的样本,所以使用高斯核函数达到的效果最好,其他的三个核函数效果差不多。
代码
import numpy as np
import matplotlib.pyplot as plt
from numba import jit
def normFun(x, mu, sigma):
pdf = np.exp(-((x - mu) ** 2) / (2 * sigma ** 2)) / (sigma * np.sqrt(2 * np.pi))
return pdf
def groundTruth():
x = np.random.random(10000)
x = (-1 + 2 * x) * 5
x = np.sort(x)
y = 0.2 * normFun(x, -1, 1) + 0.8 * normFun(x, 1, 1)
return [x, y]
def getSample(n):
x = np.concatenate((np.random.normal(-1, 1, int(0.2 * n)), np.random.normal(1, 1, int(0.8 * n))))[:, np.newaxis]
x.sort()
return x
def windowKernel(x, xi, h, n):
result = 0
add_one = 1 / h
for i in range(n):
if abs(x - xi[i]) <= h / 2:
result += add_one
return result / n
def gaussKernel(x, xi, h, n):
result = 0
hn = h / n ** (1 / 5)
for i in range(n):
result += np.exp(-0.5 * (((x - xi[i]) / hn) ** 2))
return (result / (n * (np.sqrt(2 * np.pi) * hn)))[0]
def parzenWindow(x, h, n):
result = []
for i in range(10000):
xi = i / 1000 - 5
temp = windowKernel(xi, x, h, n)
result.append([xi, temp])
return result
def parzenGauss(x, h, n):
result = []
for i in range(10000):
xi = i / 1000 - 5
temp = gaussKernel(xi, x, h, n)
result.append([xi, temp])
return result
# gkde = stats.gaussian_kde(x, bw_method=h)
# ind = np.linspace(-5, 5, 10000)
# return [ind, gkde.evaluate(ind)]
def EpanechnikovKernel(x, xi, h, n):
result = 0
for i in range(n):
u = abs(x - xi[i]) / h
if u <= 1:
result += 3 / 4 * (1 - u ** 2)
return float(result) / n / h
def parzenEpanechnikov(x, h, n):
result = []
for i in range(10000):
xi = i / 1000 - 5
temp = EpanechnikovKernel(xi, x, h, n)
result.append([xi, temp])
return result
def QuarticKernel(x, xi, h, n):
result = 0
for i in range(n):
u = abs(x - xi[i]) / h
if u <= 1:
result += 15 / 16 * (1 - u ** 2) ** 2
return float(result) / n / h
def parzenQuartic(x, h, n):
result = []
for i in range(10000):
xi = i / 1000 - 5
temp = QuarticKernel(xi, x, h, n)
result.append([xi, temp])
return result
def cosineKernel(x, xi, h, n):
result = 0
for i in range(n):
u = abs(x - xi[i]) / h
if u <= 1:
result += np.pi / 4 * np.cos(np.pi / 2 * u)
return float(result) / n / h
def parzenCosine(x, h, n):
result = []
for i in range(10000):
xi = i / 1000 - 5
temp = cosineKernel(xi, x, h, n)
result.append([xi, temp])
return result
if __name__ == '__main__':
sample_nums = [5, 10, 100, 500, 1000]
window_widths = [0.5, 1, 2, 4]
plt.figure(figsize=(30, 25))
infinity = groundTruth()
sum_x = [[] for _ in range(5)]
for row in range(len(sample_nums)):
sample_num = sample_nums[row]
samples = getSample(sample_num)
for col in range(len(window_widths)):
window_width = window_widths[col]
print("n = ", sample_num, " window_width = ", window_width)
seq_gauss = parzenGauss(samples, window_width, sample_num)
seq_win = parzenWindow(samples, window_width, sample_num)
seq_epan = parzenEpanechnikov(samples, window_width, sample_num)
seq_quartic = parzenQuartic(samples, window_width, sample_num)
seq_cosine = parzenCosine(samples, window_width, sample_num)
temp = [infinity[1][i] - seq_gauss[i][1] for i in range(len(seq_gauss))]
temp = np.dot(temp, np.transpose(temp)) * 0.001
sum_x[0].append(temp)
temp = [infinity[1][i] - seq_win[i][1] for i in range(len(seq_win))]
temp = np.dot(temp, np.transpose(temp)) * 0.001
sum_x[1].append(temp)
temp = [infinity[1][i] - seq_quartic[i][1] for i in range(len(seq_quartic))]
temp = np.dot(temp, np.transpose(temp)) * 0.001
sum_x[2].append(temp)
temp = [infinity[1][i] - seq_epan[i][1] for i in range(len(seq_epan))]
temp = np.dot(temp, np.transpose(temp)) * 0.001
sum_x[3].append(temp)
temp = [infinity[1][i] - seq_cosine[i][1] for i in range(len(seq_cosine))]
temp = np.dot(temp, np.transpose(temp)) * 0.001
sum_x[4].append(temp)
plt.subplot(len(sample_nums), len(window_widths), row * len(window_widths) + col + 1)
plt.plot(infinity[0], infinity[1], c='r', label="ground_truth")
plt.plot([seq_win[i][0] for i in range(len(seq_win))], [seq_win[i][1] for i in range(len(seq_win))], c='b',
label="window_kernel")
plt.plot([seq_gauss[i][0] for i in range(len(seq_gauss))], [seq_gauss[i][1] for i in range(len(seq_gauss))],
c='g', label="gauss_kernel")
plt.plot([seq_epan[i][0] for i in range(len(seq_epan))], [seq_epan[i][1] for i in range(len(seq_epan))],
c='m', label="epan_kernel")
plt.plot([seq_cosine[i][0] for i in range(len(seq_cosine))],
[seq_cosine[i][1] for i in range(len(seq_cosine))],
c='y', label="cosine_kernel")
plt.plot([seq_quartic[i][0] for i in range(len(seq_quartic))],
[seq_quartic[i][1] for i in range(len(seq_quartic))],
c='k', label="quartic_kernel")
plt.legend()
plt.title("n = " + str(sample_num) + " window_witth = " + str(window_width))
sum_x = np.array(sum_x)
e = np.sum(sum_x, axis=1) / len(sum_x[0])
print(e)
e = np.reshape(e, (5, 1))
d = np.sum((sum_x - e) ** 2, axis=1) / len((sum_x[0]))
print(d)
plt.savefig("all.pdf")
plt.savefig("all.svg")
plt.savefig("all.png")
plt.show()