Search Results

You are looking at 1 - 1 of 1 items for

  • Author or Editor: Ronald Ortner x
  • Refine by Access: All Content x
Clear All Modify Search

Abstract  

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const ·
\documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{upgreek} \usepackage{portland,xspace} \usepackage{amsmath,amsxtra} \pagestyle{empty} \DeclareMathSizes{10}{9}{7}{6} \begin{document} $$\frac{{K\log (T)}} {\Delta }$$ \end{document}
, where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const ·
\documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{upgreek} \usepackage{portland,xspace} \usepackage{amsmath,amsxtra} \pagestyle{empty} \DeclareMathSizes{10}{9}{7}{6} \begin{document} $$\frac{{K\log (T\Delta ^2 )}} {\Delta }$$ \end{document}
.
Restricted access