登录注册写文章

[Notes]Lecture14 Stochastic Multi-armed Bandit by Haipeng Luo

[Notes]Lecture14 Stochastic Multi-armed Bandit by Haipeng Luo

文章链接：http://www-bcf.usc.edu/~haipengl/courses/CSCI699/lecture14.pdf

这篇讲义主要介绍了Stochastic MAB 的一些基本概念，有很多数学公式及证明，如果要从数学角度理解细节和推敲，可以参考。
第一部分将Stochastic MAB的基本概念，讲解了pesudo-regret。
第二部分 2 First Attempt: Explore-then-exploit 一种基本思想，引出了bound。
第三部分 3 The UCB Algorithm, 讲义中实际中讨论的Lower Bound，思想与UCB对称。

内容与读过的几篇高度重叠，只作部分摘录：

Stochastic Multi-armed Bandit

Pseudo-regret

Pseudo-regret is the expected regret against the ﬁxed action $a^*$ (instead of the empirically best actiontion, where the expectation is over the randomness of both the environment and the algorithm.

Pseudo-regret can be simplified as:

Simpified pseudo-regret
pseudo-regret of UCB is bounded as:

pseudo-regret bound

Symbols

$a$ :each action
$D_a$ :Distribution
$l_1(a),\dots,l_T(a)$ : Independent samples of $D_a$
$a^*=argmin_a \mu(a)$ : action $argmin_a \hat{\mu}(a)$ : Optimal action on terms of the expected lossEmpirically best
$\Delta_a = \mu(a)-\mu(a^*)$ : the suboptiomal gap of action a
The number of times action a has been pulled up to round t

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

我爱你
如果爱你是在伤害你，我选择喜欢你。如果喜欢你是在困扰你，我选择离开你。如果可以用我的一切来换你的今生的幸福...
琳依水絮阅读 256评论 0赞 1
我读欧阳修
“环滁皆山也，其西南诸峰……”这是宋代欧阳修流芳千古的名篇――《醉翁亭楼记》。《醉翁亭记》是欧阳修被贬滁州太守时...
北海今辰阅读 672评论 1赞 0
天山风光
仉咏阅读 337评论 6赞 11
即将去留学，能顺利吗？
时间：2017年8月2日 16：18 问卜者：想知道即将去留学，能顺利发展不？问卜者：好多倒的.... 我：嗯，...
十七Regina阅读 273评论 0赞 0

赞1赞

赞赏

手机看全文