lec3 似然估计与parzen窗

Maximum Likelihood Estimation

1. The Poisson distribution is useful for modeling the number of events occurring within a unit time, such as the number of packets arrived at some server per minute. The probability mass function of a Poisson distribution is as follows:
P(k|\lambda):= \frac{\lambda^k e^{-\lambda}}{k!},
where \lambda > 0 is the parameter of the distribution and k\in\{0,1,2,...\} is the discrete random variable modeling the number of events encountered per unit time.

1.1. Let \{k_1,k_2,...,k_n\} be an i.i.d. sample drawn from a Poisson distribution with parameter \lambda. Derive the MLE estimate \hat{\lambda}_{mle} of \lambda based on this sample.

根据样本推测泊松分布参数λ的最大似然估计

微信图片_202003111239451

1.2. Let K be a random variable following a Poisson distribution with parameter \lambda. Derive its mean E[K] and variance Var[K]. Since \hat{\lambda}_{mle} depends on the sample used for estimation, it is also a random variable. Derive the mean and the variance of \hat{\lambda}_{mle}, and compare them with E[K] and Var[K]. What do you find?

证明泊松分布的均值方差,并且证明该估计是无偏估计

微信图片_20200311123945
微信图片_20200311124908

2. One measure of the difference between two distributions in the same space is the
Kullback-Leibler divergence of Kullback-Leibler "distance":

D_{KL}(p_1(x),p_2(x)) = \int p_1(x)ln\frac{p_1(x)}{p_2(x)}dx

(This "distance" does not obey the requisite symmetry and triangle inequalities for a metric.)

Suppose we seek to approximate an arbitrary distribution p_1(x) by a 1-dimensional normal distribution p_2(x)\sim \mathcal{N} (\mu,\sigma^2) . Show that the values that lead to the smallest Kullback-Leibler divergence are the obvious ones:
\mu = E_1[x]
\sigma^2 = E_1[(x-\mu)^2]

where the expectation E_1 taken is over the density p_1(x).

证明KL散度取最小。

微信图片_20200311123551

Bayes Estimation

**1. The purpose of this problem is to derive the Bayesian classifier for the d-dimensional multivariate Bernoulli case. Work with each class separately, interpreting P(\mathbf{x} | \mathcal{D}) to mean P(\mathbf{x} | \mathcal{D}_i, \omega_i). Let the conditional probability for a given category be given by
**
P(\mathbf{x} | {\theta}) = \prod_{i=1}^{d} \theta_i^{x_i} ( 1 - \theta_i )^{1 - x_i},
and let \mathcal{D} = \{\mathbf{x}_1, ..., \mathbf{x}_n\} be a set of n samples independently drawn according to this probability density.

  • 1.1 Let \mathcal{P} = \{ P_\theta: \theta \in \Theta \} be a statistical model where the paramter space \Theta is either finite- or infinite- dimensional. We say that \mathcal{P} is identifiable if the mapping \theta \mapsto P_\theta is one-to-one:
    P_{\theta_1} = P_{\theta_2} \quad \Rightarrow \quad \theta_1 = \theta_2 \quad
    证明该问题是一个可以识别的分类

    微信图片_202003111227281
  • 1.2 Show that the maximum likelihood estimate for {\theta} is

  • \hat{{\theta}} = \frac{1}{n} \sum_{k=1}^n \mathbf{x}_k.

    该概率密度函数的最大似然估计为多少

    微信图片_20200311122728
  • 1.3 If \mathbf{s} = (s_1, ..., s_d) is the sum of the n samples, show that

  • P(\mathcal{D} | {\theta}) = \prod_{i=1}^{d} \theta_i^{s_i} (1 - \theta_i)^{n-s_i}.

    微信图片_202003111302091
  • 1.4 Assuming a uniform a priori distribution for {\theta} and using the identity
    int_0^1 \theta^m (1 - \theta)^n d\theta = \frac{m!n!}{(m + n + 1)!}.
    show that
    p({\theta} | \mathcal{D}) = \prod_{i=1}^d \frac{(n + 1)!}{s_i!(n-s_i)!} \theta_i^{s_i} ( 1 - \theta_i )^{n - s_i}.

    微信图片_20200311130209

  • 1.5 Integrate the product P(\mathbf{x} | {\theta}) p({\theta} | \mathcal{D}) over {\theta} to obtain the desired conditional probability

  • P(\mathbf{x} | \mathcal{D}) = \prod_{i=1}^{d} \left(\frac{s_i+1}{n+2} \right)^{x_i} \left(1 - \frac{s_i+1}{n+2} \right)^{1 - x_i}.

    微信图片_20200311130341

programing

  • Try window-functionP(x) = c(u)=\begin{cases} \frac{1}{a},abs(x) <= 0.5* a\\ 0, others\end{cases}, Estimate p(x) with different window width a. Please draw pn(x) under different n and a and p(x) to show the estimation effect.

    使用方窗核并使用不同的a跟n,画出图形如下图所示。

window_kernel.png
  • Derive how to compute \epsilon(p_n)=\int[p_n(x)-p(x)]^2dx numerically.

    将积分转化为求和,取步长为0.001,即在[-5, 5]区间之间取10000个点,\sum(fn(x) - f(x)) * 0.001,可的下表所示数据。
    \varepsilon = \sum_{n=1}^{100}{(fn(x) - f(x))^2 * 0.001}

a = 0.5 a = 1 a = 2 a = 4
N = 5 0.2183 0.0403 0.0146 0.0201
N = 10 0.0935 0.0201 0.0102 0.0196
N = 100 0.0069 0.0028 0.0041 0.0224
N = 500 0.0031 0.0022 0.0034 0.0214
N = 1000 0.0017 0.0010 0.0022 0.0195
N = 2000 0.0007 0.0003 0.0020 0.0202
  • Demonstrate the expectation and variance of \epsilon(p_n) w.r.t different n and a .
    E(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{p(\varepsilon)} = 0.02294

    Var(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{(\varepsilon(pn) - E(\varepsilon(pn)))^2} = 0.00449

  • With n given, how to choose optimal a from above the empirical experiences?

    当n 给定时,可以通过观察不同a下的\varepsilon(pn)值来进行优化,因为\varepsilon(pn)实际上可以作为评估曲线与理论曲线相近程度的度量,即\varepsilon(pn) 越小,评估曲线越接近理论值。

  • Substitute h(x) in (a) with Gaussian window. Repeat (a)-(e).

    使用高斯核函数,估计的概率密度函数如下所示。

gauss_kernel.png

不同窗宽与样本个数的\varepsilon(pn)如下表所示

a = 0.5 a = 1 a = 2 a = 4
N = 5 0.0597 0.0383 0.0483 0.0852
N = 10 0.0780 0.0188 0.0070 0.0587
N = 100 0.0181 0.0062 0.0033 0.0295
N = 500 0.0031 0.0013 0.0028 0.0168
N = 1000 0.0033 0.0017 0.0024 0.0119
N = 2000 0.0016 0.0013 0.0021 0.0085

E(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{p(\varepsilon)} = 0.021167

Var(\varepsilon(pn)) = \frac{1}{N}\sum_{i = 1}^{N}{(\varepsilon(pn) - E(\varepsilon(pn)))^2} = 0.0006498

  • Try different window functions and parameters as many as you can. Which window function/parameter is the best one? Demonstrate it numerically.

    分别使用gauss_kernel, window_kernel,quartic_kernel,epanechnikov_kernel,cosine_kernel,分别取不同的窗口大小h与不同的样本个数n,得到的可视化图形如下所示。

    all.png

    他们的方差与均值如下所示

    E σ
    gauss_kernel 0.02326597 0.00074471
    window_kernel 0.04124473 0.00423351
    quartic_kernel 0.03156898 0.00188394
    epanechnikov_kernel 0.03205083 0.00142668
    cosine_kernel 0.03167592 0.00147622

    可知方窗的效果不太好,均值与方差都比较大,由于是高斯混合分布的样本,所以使用高斯核函数达到的效果最好,其他的三个核函数效果差不多。

代码

import numpy as np
import matplotlib.pyplot as plt
from numba import jit


def normFun(x, mu, sigma):
    pdf = np.exp(-((x - mu) ** 2) / (2 * sigma ** 2)) / (sigma * np.sqrt(2 * np.pi))
    return pdf


def groundTruth():
    x = np.random.random(10000)
    x = (-1 + 2 * x) * 5
    x = np.sort(x)
    y = 0.2 * normFun(x, -1, 1) + 0.8 * normFun(x, 1, 1)
    return [x, y]


def getSample(n):
    x = np.concatenate((np.random.normal(-1, 1, int(0.2 * n)), np.random.normal(1, 1, int(0.8 * n))))[:, np.newaxis]
    x.sort()
    return x


def windowKernel(x, xi, h, n):
    result = 0
    add_one = 1 / h
    for i in range(n):
        if abs(x - xi[i]) <= h / 2:
            result += add_one
    return result / n


def gaussKernel(x, xi, h, n):
    result = 0
    hn = h / n ** (1 / 5)
    for i in range(n):
        result += np.exp(-0.5 * (((x - xi[i]) / hn) ** 2))
    return (result / (n * (np.sqrt(2 * np.pi) * hn)))[0]


def parzenWindow(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = windowKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


def parzenGauss(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = gaussKernel(xi, x, h, n)
        result.append([xi, temp])
    return result

    # gkde = stats.gaussian_kde(x, bw_method=h)
    # ind = np.linspace(-5, 5, 10000)
    # return [ind, gkde.evaluate(ind)]


def EpanechnikovKernel(x, xi, h, n):
    result = 0
    for i in range(n):
        u = abs(x - xi[i]) / h
        if u <= 1:
            result += 3 / 4 * (1 - u ** 2)
    return float(result) / n / h


def parzenEpanechnikov(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = EpanechnikovKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


def QuarticKernel(x, xi, h, n):
    result = 0
    for i in range(n):
        u = abs(x - xi[i]) / h
        if u <= 1:
            result += 15 / 16 * (1 - u ** 2) ** 2
    return float(result) / n / h


def parzenQuartic(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = QuarticKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


def cosineKernel(x, xi, h, n):
    result = 0
    for i in range(n):
        u = abs(x - xi[i]) / h
        if u <= 1:
            result += np.pi / 4 * np.cos(np.pi / 2 * u)
    return float(result) / n / h


def parzenCosine(x, h, n):
    result = []
    for i in range(10000):
        xi = i / 1000 - 5
        temp = cosineKernel(xi, x, h, n)
        result.append([xi, temp])
    return result


if __name__ == '__main__':
    sample_nums = [5, 10, 100, 500, 1000]
    window_widths = [0.5, 1, 2, 4]
    plt.figure(figsize=(30, 25))
    infinity = groundTruth()
    sum_x = [[] for _ in range(5)]
    for row in range(len(sample_nums)):
        sample_num = sample_nums[row]
        samples = getSample(sample_num)
        for col in range(len(window_widths)):
            window_width = window_widths[col]
            print("n = ", sample_num, " window_width = ", window_width)
            seq_gauss = parzenGauss(samples, window_width, sample_num)
            seq_win = parzenWindow(samples, window_width, sample_num)
            seq_epan = parzenEpanechnikov(samples, window_width, sample_num)
            seq_quartic = parzenQuartic(samples, window_width, sample_num)
            seq_cosine = parzenCosine(samples, window_width, sample_num)

            temp = [infinity[1][i] - seq_gauss[i][1] for i in range(len(seq_gauss))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[0].append(temp)
            temp = [infinity[1][i] - seq_win[i][1] for i in range(len(seq_win))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[1].append(temp)
            temp = [infinity[1][i] - seq_quartic[i][1] for i in range(len(seq_quartic))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[2].append(temp)
            temp = [infinity[1][i] - seq_epan[i][1] for i in range(len(seq_epan))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[3].append(temp)
            temp = [infinity[1][i] - seq_cosine[i][1] for i in range(len(seq_cosine))]
            temp = np.dot(temp, np.transpose(temp)) * 0.001
            sum_x[4].append(temp)
            plt.subplot(len(sample_nums), len(window_widths), row * len(window_widths) + col + 1)
            plt.plot(infinity[0], infinity[1], c='r', label="ground_truth")

            plt.plot([seq_win[i][0] for i in range(len(seq_win))], [seq_win[i][1] for i in range(len(seq_win))], c='b',
                     label="window_kernel")
            plt.plot([seq_gauss[i][0] for i in range(len(seq_gauss))], [seq_gauss[i][1] for i in range(len(seq_gauss))],
                     c='g', label="gauss_kernel")
            plt.plot([seq_epan[i][0] for i in range(len(seq_epan))], [seq_epan[i][1] for i in range(len(seq_epan))],
                     c='m', label="epan_kernel")
            plt.plot([seq_cosine[i][0] for i in range(len(seq_cosine))],
                     [seq_cosine[i][1] for i in range(len(seq_cosine))],
                     c='y', label="cosine_kernel")
            plt.plot([seq_quartic[i][0] for i in range(len(seq_quartic))],
                     [seq_quartic[i][1] for i in range(len(seq_quartic))],
                     c='k', label="quartic_kernel")
            plt.legend()
            plt.title("n = " + str(sample_num) + " window_witth = " + str(window_width))
    sum_x = np.array(sum_x)
    e = np.sum(sum_x, axis=1) / len(sum_x[0])
    print(e)
    e = np.reshape(e, (5, 1))
    d = np.sum((sum_x - e) ** 2, axis=1) / len((sum_x[0]))
    print(d)

    plt.savefig("all.pdf")
    plt.savefig("all.svg")
    plt.savefig("all.png")

    plt.show()

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,001评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,210评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,874评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,001评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,022评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,005评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,929评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,742评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,193评论 1 309
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,427评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,583评论 1 346
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,305评论 5 342
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,911评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,564评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,731评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,581评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,478评论 2 352

推荐阅读更多精彩内容