Paper Summary4: Projection-free Online Learning

0 Paper

Hazan, Elad, and Satyen Kale. "Projection-free online learning." arXiv preprint arXiv:1206.4657 (2012).

1 Key contribution

Providing a projection free algorithm for online learning that has the following advantages:
(1) computationally efficient
(2) better regret bounds for cases such as stochastic online smooth convex optimization
(3) parameter-free in the stochastic case and produce sparse decisions

2 Motivation

Assuming it is possible to do
linear optimization over the convex domain efficiently, an algorithm with sublinear for online convex optimization can be achieved via utilizing an online version of the classic Frank-Wolfe algorithm

Frank-Wolfe algorithm from wiki

3 The proposed method

(1) Preliminaries
Diameter of set $\kappa$ is $D$ and $f$ is $L$ -Lipchitz

Beta-smooth and Sigma strongly convex

Smoothed function

$\kappa -Sparsity$

kappa sparsity

(2) main claim

main claim

(3) Algorithm

OFW

4 Some thoughts

When we propose an algorithm, we may consider the following aspects:
(1) Effective
(2) Parameter-less or parameter-free
(3) Computationally feasible
(4) Sample efficient

First of all, an algorithm should be effective in solving the target problem. Effectiveness is the fundamental requirement. Otherwise, the proposed algorithm is nothing but useless. Secondly, it is advantageous if the algorithm is parameter-less or parameter-free. An algorithm with many parameters may be good in terms of being flexible with solving different questions. However, tuning the parameter usually requires expert knowledge, which is not user-friendly. A less experienced practitioner may end up choosing the wrong parameter, which will significantly affect the performance of the algorithm. Therefore, an algorithm with parameters should be equipped with the guidance of parameter selection. Thirdly, an algorithm should be computationally feasible. Otherwise, one cannot even use such an algorithm. However, feasible computation is just the basic requirement. A faster algorithm even with just a slightly worse performance is favorable. This paper mainly focuses on the computation aspect and the proposed algorithm not only faster but also has a good performance in solving the problem. Finally, it is favorable if the algorithm is sample efficient.