Robust risk-averse stochastic multi-armed bandits

OA Maillard - … Learning Theory: 24th International Conference, ALT …, 2013 - Springer
Algorithmic Learning Theory: 24th International Conference, ALT 2013 …, 2013Springer
We study a variant of the standard stochastic multi-armed bandit problem when one is not
interested in the arm with the best mean, but instead in the arm maximizing some coherent
risk measure criterion. Further, we are studying the deviations of the regret instead of the
less informative expected regret. We provide an algorithm, called RA-UCB to solve this
problem, together with a high probability bound on its regret.
Abstract
We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret.
Springer