lec3 似然估计与parzen窗

Maximum Likelihood Estimation

1. The Poisson distribution is useful for modeling the number of events occurring within a unit time, such as the number of packets arrived at some server per minute. The probability mass function of a Poisson distribution is as follows:
$P(k|\lambda):= \frac{\lambda^k e^{-\lambda}}{k!},$
where $\lambda > 0$ is the parameter of the distribution and $k\in\{0,1,2,...\}$ is the discrete random variable modeling the number of events encountered per unit time.

1.1. Let $\{k_1,k_2,...,k_n\}$ be an i.i.d. sample drawn from a Poisson distribution with parameter $\lambda$ . Derive the MLE estimate $\hat{\lambda}_{mle}$ of $\lambda$ based on this sample.

根据样本推测泊松分布参数λ的最大似然估计

微信图片_202003111239451

1.2. Let K be a random variable following a Poisson distribution with parameter $\lambda$ . Derive its mean E[K] and variance Var[K]. Since $\hat{\lambda}_{mle}$ depends on the sample used for estimation, it is also a random variable. Derive the mean and the variance of $\hat{\lambda}_{mle}$ , and compare them with E[K] and Var[K]. What do you find?

证明泊松分布的均值方差，并且证明该估计是无偏估计

微信图片_20200311123945

微信图片_20200311124908

2. One measure of the difference between two distributions in the same space is the
Kullback-Leibler divergence of Kullback-Leibler "distance":
$D_{KL}(p_1(x),p_2(x)) = \int p_1(x)ln\frac{p_1(x)}{p_2(x)}dx$

(This "distance" does not obey the requisite symmetry and triangle inequalities for a metric.)

Suppose we seek to approximate an arbitrary distribution $p_1(x)$ by a 1-dimensional normal distribution $p_2(x)\sim \mathcal{N} (\mu,\sigma^2)$ . Show that the values that lead to the smallest Kullback-Leibler divergence are the obvious ones:
$\mu = E_1[x]$
$\sigma^2 = E_1[(x-\mu)^2]$

where the expectation $E_1$ taken is over the density $p_1(x)$ .

证明KL散度取最小。

微信图片_20200311123551

Bayes Estimation

**1. The purpose of this problem is to derive the Bayesian classifier for the $d$ -dimensional multivariate Bernoulli case. Work with each class separately, interpreting $P(\mathbf{x} | \mathcal{D})$ to mean $P(\mathbf{x} | \mathcal{D}_i, \omega_i)$ . Let the conditional probability for a given category be given by
**
$P(\mathbf{x} | {\theta}) = \prod_{i=1}^{d} \theta_i^{x_i} ( 1 - \theta_i )^{1 - x_i},$
and let $\mathcal{D} = \{\mathbf{x}_1, ..., \mathbf{x}_n\}$ be a set of $n$ samples independently drawn according to this probability density.

1.1 Let $\mathcal{P} = \{ P_\theta: \theta \in \Theta \}$ be a statistical model where the paramter space $\Theta$ is either finite- or infinite- dimensional. We say that $\mathcal{P}$ is identifiable if the mapping $\theta \mapsto P_\theta$ is one-to-one:
$P_{\theta_1} = P_{\theta_2} \quad \Rightarrow \quad \theta_1 = \theta_2 \quad$
证明该问题是一个可以识别的分类

微信图片_202003111227281
1.2 Show that the maximum likelihood estimate for ${\theta}$ is
$\hat{{\theta}} = \frac{1}{n} \sum_{k=1}^n \mathbf{x}_k.$

该概率密度函数的最大似然估计为多少

微信图片_20200311122728
1.3 If $\mathbf{s} = (s_1, ..., s_d)$ is the sum of the $n$ samples, show that
$P(\mathcal{D} | {\theta}) = \prod_{i=1}^{d} \theta_i^{s_i} (1 - \theta_i)^{n-s_i}.$

微信图片_202003111302091
1.4 Assuming a uniform a priori distribution for ${\theta}$ and using the identity
$int_0^1 \theta^m (1 - \theta)^n d\theta = \frac{m!n!}{(m + n + 1)!}.$
show that
$p({\theta} | \mathcal{D}) = \prod_{i=1}^d \frac{(n + 1)!}{s_i!(n-s_i)!} \theta_i^{s_i} ( 1 - \theta_i )^{n - s_i}.$

微信图片_20200311130209
1.5 Integrate the product $P(\mathbf{x} | {\theta}) p({\theta} | \mathcal{D})$ over ${\theta}$ to obtain the desired conditional probability
$P(\mathbf{x} | \mathcal{D}) = \prod_{i=1}^{d} \left(\frac{s_i+1}{n+2} \right)^{x_i} \left(1 - \frac{s_i+1}{n+2} \right)^{1 - x_i}.$

微信图片_20200311130341

programing

Try window-function $P(x) = c(u)=\begin{cases} \frac{1}{a}，abs(x) <= 0.5* a\\ 0， others\end{cases}$ ， Estimate p(x) with different window width a. Please draw $pn(x)$ under different n and a and $p(x)$ to show the estimation effect.

使用方窗核并使用不同的a跟n，画出图形如下图所示。

window_kernel.png

Derive how to compute $\epsilon(p_n)=\int[p_n(x)-p(x)]^2dx$ numerically.

将积分转化为求和，取步长为0.001，即在[-5, 5]区间之间取10000个点， $\sum(fn(x) - f(x)) * 0.001$ ,可的下表所示数据。
$\varepsilon = \sum_{n=1}^{100}{(fn(x) - f(x))^2 * 0.001}$

	a = 0.5	a = 1	a = 2	a = 4
N = 5	0.2183	0.0403	0.0146	0.0201
N = 10	0.0935	0.0201	0.0102	0.0196
N = 100	0.0069	0.0028	0.0041	0.0224
N = 500	0.0031	0.0022	0.0034	0.0214
N = 1000	0.0017	0.0010	0.0022	0.0195
N = 2000	0.0007	0.0003	0.0020	0.0202

Demonstrate the expectation and variance of $\epsilon(p_n)$ w.r.t different $n$ and $a$ .
$E(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{p(\varepsilon)} = 0.02294$

$Var(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{(\varepsilon(pn) - E(\varepsilon(pn)))^2} = 0.00449$

With n given, how to choose optimal $a$ from above the empirical experiences?

当n 给定时，可以通过观察不同a下的 $\varepsilon(pn)$ 值来进行优化，因为 $\varepsilon(pn)$ 实际上可以作为评估曲线与理论曲线相近程度的度量，即 $\varepsilon(pn)$ 越小，评估曲线越接近理论值。
Substitute $h(x)$ in (a) with Gaussian window. Repeat (a)-(e).

使用高斯核函数，估计的概率密度函数如下所示。

gauss_kernel.png

不同窗宽与样本个数的 $\varepsilon(pn)$ 如下表所示

	a = 0.5	a = 1	a = 2	a = 4
N = 5	0.0597	0.0383	0.0483	0.0852
N = 10	0.0780	0.0188	0.0070	0.0587
N = 100	0.0181	0.0062	0.0033	0.0295
N = 500	0.0031	0.0013	0.0028	0.0168
N = 1000	0.0033	0.0017	0.0024	0.0119
N = 2000	0.0016	0.0013	0.0021	0.0085

$E(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{p(\varepsilon)} = 0.021167$

$Var(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{(\varepsilon(pn) - E(\varepsilon(pn)))^2} = 0.0006498$

Try different window functions and parameters as many as you can. Which window function/parameter is the best one? Demonstrate it numerically.

分别使用gauss_kernel， window_kernel，quartic_kernel，epanechnikov_kernel，cosine_kernel，分别取不同的窗口大小h与不同的样本个数n，得到的可视化图形如下所示。

all.png

他们的方差与均值如下所示

	E	σ
gauss_kernel	0.02326597	0.00074471
window_kernel	0.04124473	0.00423351
quartic_kernel	0.03156898	0.00188394
epanechnikov_kernel	0.03205083	0.00142668
cosine_kernel	0.03167592	0.00147622

可知方窗的效果不太好，均值与方差都比较大，由于是高斯混合分布的样本，所以使用高斯核函数达到的效果最好，其他的三个核函数效果差不多。

代码

import numpy as np
import matplotlib.pyplot as plt
from numba import jit


def normFun(x, mu, sigma):
    pdf = np.exp(-((x - mu) ** 2) / (2 * sigma ** 2)) / (sigma * np.sqrt(2 * np.pi))
    return pdf


def groundTruth():
    x = np.random.random(10000)
    x = (-1 + 2 * x) * 5
    x = np.sort(x)
    y = 0.2 * normFun(x, -1, 1) + 0.8 * normFun(x, 1, 1)
    return [x, y]


def getSample(n):
    x = np.concatenate((np.random.normal(-1, 1, int(0.2 * n)), np.random.normal(1, 1, int(0.8 * n))))[:, np.newaxis]
    x.sort()
    return x


def windowKernel(x, xi, h, n):
    result = 0
    add_one = 1 / h
    for i in range(n):
        if abs(x - xi[i]) <= h / 2:
            result += add_one
    return result / n


def gaussKernel(x, xi, h, n):
    result = 0
    hn = h / n ** (1 / 5)
    for i in range(n):
        result += np.exp(-0.5 * (((x - xi[i]) / hn) ** 2))
    return (result / (n * (np.sqrt(2 * np.pi) * hn)))[0]


def parzenWindow(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = windowKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


def parzenGauss(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = gaussKernel(xi, x, h, n)
        result.append([xi, temp])
    return result

    # gkde = stats.gaussian_kde(x, bw_method=h)
    # ind = np.linspace(-5, 5, 10000)
    # return [ind, gkde.evaluate(ind)]


def EpanechnikovKernel(x, xi, h, n):
    result = 0
    for i in range(n):
        u = abs(x - xi[i]) / h
        if u <= 1:
            result += 3 / 4 * (1 - u ** 2)
    return float(result) / n / h


def parzenEpanechnikov(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = EpanechnikovKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


def QuarticKernel(x, xi, h, n):
    result = 0
    for i in range(n):
        u = abs(x - xi[i]) / h
        if u <= 1:
            result += 15 / 16 * (1 - u ** 2) ** 2
    return float(result) / n / h


def parzenQuartic(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = QuarticKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


def cosineKernel(x, xi, h, n):
    result = 0
    for i in range(n):
        u = abs(x - xi[i]) / h
        if u <= 1:
            result += np.pi / 4 * np.cos(np.pi / 2 * u)
    return float(result) / n / h


def parzenCosine(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = cosineKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


if __name__ == '__main__':
    sample_nums = [5, 10, 100, 500, 1000]
    window_widths = [0.5, 1, 2, 4]
    plt.figure(figsize=(30, 25))
    infinity = groundTruth()
    sum_x = [[] for _ in range(5)]
    for row in range(len(sample_nums)):
        sample_num = sample_nums[row]
        samples = getSample(sample_num)
        for col in range(len(window_widths)):
            window_width = window_widths[col]
            print("n = ", sample_num, " window_width = ", window_width)
            seq_gauss = parzenGauss(samples, window_width, sample_num)
            seq_win = parzenWindow(samples, window_width, sample_num)
            seq_epan = parzenEpanechnikov(samples, window_width, sample_num)
            seq_quartic = parzenQuartic(samples, window_width, sample_num)
            seq_cosine = parzenCosine(samples, window_width, sample_num)

            temp = [infinity[1][i] - seq_gauss[i][1] for i in range(len(seq_gauss))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[0].append(temp)
            temp = [infinity[1][i] - seq_win[i][1] for i in range(len(seq_win))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[1].append(temp)
            temp = [infinity[1][i] - seq_quartic[i][1] for i in range(len(seq_quartic))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[2].append(temp)
            temp = [infinity[1][i] - seq_epan[i][1] for i in range(len(seq_epan))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[3].append(temp)
            temp = [infinity[1][i] - seq_cosine[i][1] for i in range(len(seq_cosine))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[4].append(temp)
            plt.subplot(len(sample_nums), len(window_widths), row * len(window_widths) + col + 1)
            plt.plot(infinity[0], infinity[1], c='r', label="ground_truth")

            plt.plot([seq_win[i][0] for i in range(len(seq_win))], [seq_win[i][1] for i in range(len(seq_win))], c='b',
                     label="window_kernel")
            plt.plot([seq_gauss[i][0] for i in range(len(seq_gauss))], [seq_gauss[i][1] for i in range(len(seq_gauss))],
                     c='g', label="gauss_kernel")
            plt.plot([seq_epan[i][0] for i in range(len(seq_epan))], [seq_epan[i][1] for i in range(len(seq_epan))],
                     c='m', label="epan_kernel")
            plt.plot([seq_cosine[i][0] for i in range(len(seq_cosine))],
                     [seq_cosine[i][1] for i in range(len(seq_cosine))],
                     c='y', label="cosine_kernel")
            plt.plot([seq_quartic[i][0] for i in range(len(seq_quartic))],
                     [seq_quartic[i][1] for i in range(len(seq_quartic))],
                     c='k', label="quartic_kernel")
            plt.legend()
            plt.title("n = " + str(sample_num) + " window_witth = " + str(window_width))
    sum_x = np.array(sum_x)
    e = np.sum(sum_x, axis=1) / len(sum_x[0])
    print(e)
    e = np.reshape(e, (5, 1))
    d = np.sum((sum_x - e) ** 2, axis=1) / len((sum_x[0]))
    print(d)

    plt.savefig("all.pdf")
    plt.savefig("all.svg")
    plt.savefig("all.png")

    plt.show()

最后编辑于：2020.03.11 23:19:53

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 216,001评论 6赞 498
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,210评论 3赞 392
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 161,874评论 0赞 351
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,001评论 1赞 291
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,022评论 6赞 388
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,005评论 1赞 295
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 39,929评论 3赞 416
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,742评论 0赞 271
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,193评论 1赞 309
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,427评论 2赞 331
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,583评论 1赞 346
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,305评论 5赞 342
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 40,911评论 3赞 325
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,564评论 0赞 21
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,731评论 1赞 268
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,581评论 2赞 368
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,478评论 2赞 352

lec3 似然估计与parzen窗

Maximum Likelihood Estimation

Bayes Estimation

programing

代码

推荐阅读更多精彩内容