# Search Results

## You are looking at 1 - 1 of 1 items for

• Author or Editor: Peter Auer
• Refine by Access: All Content
Clear All Modify Search

# UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Periodica Mathematica Hungarica
Authors:
Peter Auer
and
Ronald Ortner

## Abstract

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. . For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const ·
\documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{upgreek} \usepackage{portland,xspace} \usepackage{amsmath,amsxtra} \pagestyle{empty} \DeclareMathSizes{10}{9}{7}{6} \begin{document} $$\frac{{K\log (T)}} {\Delta }$$ \end{document}
, where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const ·
\documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{upgreek} \usepackage{portland,xspace} \usepackage{amsmath,amsxtra} \pagestyle{empty} \DeclareMathSizes{10}{9}{7}{6} \begin{document} $$\frac{{K\log (T\Delta ^2 )}} {\Delta }$$ \end{document}
.
Restricted access