Search Results
You are looking at 1 - 1 of 1 items for
- Author or Editor: Ronald Ortner x
- Refine by Access: All Content x
Periodica Mathematica Hungarica
Authors:
Peter Auer
and
Ronald Ortner
Abstract
In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this
modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB
algorithm the regret in K-armed bandits after T trials is bounded by const · \documentclass{aastex}
\usepackage{amsbsy}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{bm}
\usepackage{mathrsfs}
\usepackage{pifont}
\usepackage{stmaryrd}
\usepackage{textcomp}
\usepackage{upgreek}
\usepackage{portland,xspace}
\usepackage{amsmath,amsxtra}
\pagestyle{empty}
\DeclareMathSizes{10}{9}{7}{6}
\begin{document}
$$\frac{{K\log (T)}}
{\Delta }$$
\end{document} , where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper
bound on the regret of const · \documentclass{aastex}
\usepackage{amsbsy}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{bm}
\usepackage{mathrsfs}
\usepackage{pifont}
\usepackage{stmaryrd}
\usepackage{textcomp}
\usepackage{upgreek}
\usepackage{portland,xspace}
\usepackage{amsmath,amsxtra}
\pagestyle{empty}
\DeclareMathSizes{10}{9}{7}{6}
\begin{document}
$$\frac{{K\log (T\Delta ^2 )}}
{\Delta }$$
\end{document} .