Active learning in multi-armed bandits

Antos, András and Grover, Varun and Szepesvári, Csaba (2008) Active learning in multi-armed bandits. In: ALT 2008. 19th international conference on algorithmic learning theory. Budapest, 2008. (Lecture notes in artificial intelligence 5254.).

Image (cover image)
cover.jpg - Cover Image

Download (24kB) | Preview
[img] Text
Allocation.pdf - Published Version
Restricted to Registered users only

Download (228kB)
[img] Text
fulltext - Published Version
Restricted to Registered users only

Download (37kB)


In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate the next sample from in order to produce estimates with equally good precision for all the distributions. When an algorithm uses sample means to estimate the unknown values then the optimal solution, assuming full knowledge of the distributions, is to sample each option proportional to its variance. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as $n^{-3/2}$, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated in a simple problem.

Item Type: Conference or Workshop Item (Paper)
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: Eszter Nagy
Date Deposited: 11 Dec 2012 15:29
Last Modified: 11 Dec 2012 15:29

Update Item Update Item