bandit-notes of combinatorial bandits 2011

I want to write my thoughts of the paper [Combinatorial Bandits] by Nicolo Cesa-Bianchi and Gabor Lugosi in 2011. The first author is a great professor in this area. His paper of [Finite-time analysis of the multi-armed bandit problem] in 2002 as second author has been cited over 1,500 times. The survey which I am reading about bandit problem, [Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems], is written by him as second author.

Before I read this paper, I thought linear bandits is just a small variant of normal bandit problem and combinatorial bandit problem is a higher-level generalization of bandit problem. But combinatorial bandit problem is indeed a special case of linear bandits. To see this, represent one super-arm by a 0-1 sequence of length K, using 1 to denote base arm belong to super-arm and 0 otherwise.

In linear bandit problem setting, the set of bandits should be a compact set of euclidean space R^d. At each time t, the adversarial gives a loss vector l_t from R^d and the loss revealed to the forecaster is the inner product of played arm, which is a vector of R^d, and loss vector l_t. For combinatorial bandit problem, super arm is a combination of some base arms and the loss of one super arm is the sum of all losses of base arms belonging to it. So if we use 0-1 sequence to denote one super arm and represent loss vector by all losses of base arms, then the loss of super arm, the sum of base arms belonging to it, is the inner product of 0-1 sequence and loss vectors. That's why combinatorial is one special case of linear bandits.

In this paper, it introduces an algorithm of ComBand and provides a lot of good applications using it.

In the step 4, it use probability p_t to construct a matrix P where probability is a combination of q, depending on the loss, and a random distribution \mu on S, set of all super arms. Probability q is used for exploitation and \mu is for exploration. So probability is responsible for trade-off between exploration and exploitation.

Then the author use it to solve multitask bandit problem, hypercube where S ={0,1}^d is all possible combinations of base arms, graph problems with "symmetric" S which includes perfect matchings/spanning trees/cut sets/hamiltonian cycles/stars, m-sized subsets where S is all combinations of m base arms and path planning where S denotes all possible paths.

I think this paper is great not only because it provides a great applications which has wide influence in reality, especially those in graph problems. He has a solid background in computer science which reflects in translating those problems in terms of combinatorial bandit language and also the detailed solving methods. 

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 时间8:30-9:10 书名《心理学史》 书页203-222 感想: 美国行为主义的产生和发展依附于美国资本...
    不要多愁善感阅读 238评论 0 0
  • 从小到大我一直是个固守成规的人,说白了,老实人一个,老师让干嘛干嘛,说往东绝不敢往西的那种人。 小学的时候,压根...
    曲奇_52阅读 265评论 0 0
  • 最近一段日子属于人生的低谷,而我的基本判断是:这样的低谷估计还会继续持续一段时间,在这段岁月里自己的心智的确是被狠...
    雕兄_KYP阅读 471评论 2 0
  • 作者:雨落安稳 一生一代一双人,争教两处销魂。相思相望不相亲,天为谁春?浆向蓝桥易乞,药成碧海难奔。若...
    雨落安稳阅读 450评论 0 1
  • 阵阵惊雷迎骤雨,东风吹散南风。依栏独坐小楼中。海棠谢了,零落影无踪。 雀儿年年栖老树,年年复看颜红。怎知花与去时同...
    七夜迷离阅读 233评论 0 0