9. 二叉搜索树

Tree insert: searchs for that element A[i]
Tree walk:

Theorem:

E[height of a randomly built binary search tree] = O(lg n)

Proof outline

  1. Prove Jensen's inequality:
    f(E[X]) \leq E[f(X)] for convex function f
  2. Instead of analyzing X_n = the random variable of the height of a randomly constructed BST on n nodes, we analize Y_n = 2^{X_n}
  3. Prove that E[Y_n] = O(n^3)
  4. Conclude that
    2^{E[X_n]} \leq E[2^{X_n}] = E[Y_n] = O(n^3)
    ⇒ E[X_n] \leq lg(O(n^3)) = 3lg(n) + O(1)

E[X_n] \approx 2.9882lg\,n (by Luke Devroy, 1986)

Proof 1.Jensen's inequality

f: \mathbb{R} \rightarrow \mathbb{R} is convex if for all x,y \in \mathbb{R} and all \alpha, \beta \geq 0 with \alpha + \beta = 1, f(\alpha x + \beta y) \leq \alpha f(x) + \beta f(y)

vector and convex function

Lemma

if \mathbb{R} \rightarrow \mathbb{R} is convex & x_1, x_2, ... , x_n \in \mathbb{R} & \alpha_1, \alpha_2, ... , \alpha_n \geq 0 & \sum^n_{k=1}{x_k} = 1,
then f \left( \sum_{k=1}^n{\alpha_kx_k} \right) \leq \sum_{k=1}^n{\alpha_kf(x_k)}

灰色区域某一点意味着不等号右边

Induction
Base Case: When n = 1,
\alpha_1 = 1
, 不等式
f \left( \sum_{k=1}^n{\alpha_kx_k} \right) \leq \sum_{k=1}^n{\alpha_kf(x_k)}
成立。
Induction Step:
f \left( \sum_{k=1}^n{\alpha_kx_k} \right)

= f \left( \alpha_nx_n +(1 - \alpha_n) \sum^{n-1}_{k=1}{\frac{\alpha_k}{1-\alpha_n}x_k} \right)

\leq \alpha_nf(x_n) + (1 - \alpha_n)f\left( \sum^{n-1}_{k=1}{\frac{\alpha_k}{1-\alpha_n}x_k} \right)

= \alpha_nf(x_n) + f \left( \sum^{n-1}_{k=1}{\alpha_kx_k} \right) = \sum_{k=1}^n{\alpha_kf(x_k)}

Jensen's inequality

f(E[X]) \leq E[f(X)] (f is a convex and X is a integer random variable)
proof:
f(E[X]) = f \left(\sum^{∞}_{-∞}{x * Pr\{X = x\}} \right)
\leq \sum^{∞}_{-∞}{Pro\{X = x\} * f(x)} = E[f(X)]

Proof 2. Analysis Yn

Xn = the random variable of the height of a randomly constructed BST on n nodes. Yn = 2 Xn

If root r has rank k, then Xn = 1 + max{Xk-1, Xn-k}, Yn = 2 * max{Yk-1, Yn-k}

define indicator r.v.'s
Z_{nk}=\begin{cases} 1, rank(root) = k\\ 0, otherwise \end{cases}
Pr{Znk = 1} = E[Znk] = 1/n
Y_n = \sum^n_{k = 1}{Z_{nk}(2max\{Y_{k-1}, Y_{n-k}\})}

Proof 3. Analysis E[Yn]

E[Y_n] = E \left[ \sum^n_{k = 1}{Z_{nk}(2max\{Y_{k-1}, Y_{n-k}\})} \right]
= \sum^n_{k = 1}{E[Z_{nk}(2max\{Y_{k-1}, Y_{n-k}\})]}
= 2\sum^n_{k=1}{E[Z_{nk}]E[max\{Y_{k-1}, Y_{n-k}\}]}
= \frac{2}{n} \sum^n_{k=1}{E[max\{Y_{k-1}, Y_{n-k}\}]}
\leq \frac{2}{n} \sum^n_{k=1}{E[Y_{k-1} + Y_{n-k}]}
= \frac{4}{n} \sum^{n-1}_{k=0}{E[Y_k]}
Then, we are going to use substitution.
Claim: E[X_n] \leq cn^3
Proof:
The base case must be true when c is sufficiently large.
Induction Step: E[Y_n] \leq \frac{4}{n} \sum^{n-1}_{k=0}{E[Y_k]} \leq \frac{4}{n} \sum^{n-1}_{k=0}{ck^3} \leq \frac{4c}{n} \int_0^n {x^3}\,{\rm d}x = cn^3

Proof 4.

2^{E[X_n]} \leq E[2^{X_n}] = E[Y_n] = O(n^3) (简单地使用了Jensen's inequality)
⇒ E[X_n] \leq lg(O(n^3)) = 3lg(n) + O(1)( lg(O(n3)) = lg(cn3) = 3lg(n) + lg(c) )

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • 有人征询有没有好的谈涨薪的方法, 我随口答道: “老板, 我老婆最近总跟我吵架, 说我工资太低, 比同行的同学低不...
    桑迪大王阅读 2,695评论 0 0
  • 儿行千里母担忧,还好有手机通讯,更有着老师们的悉心照顾,离开父母的日子依旧那么美好!
    杨梅飘香阅读 1,040评论 0 0
  • 我发现我还是那个淡定拖沓的人,几天假期什么都没干,由于今天下午的火车,我才花了一上午的时间把这本书看完了,...
    笨小孩奋斗1992阅读 1,585评论 0 1
  • 近来身边的亲人都遇到身体上的不适,到医院看望后,深深感觉健康真的是最可贵的,不是花钱就可以拥有的,如果没有好好照顾...
    自由的花园阅读 1,594评论 7 1

友情链接更多精彩内容