Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Antos, András and Szepesvári, Csaba and Munos, R. (2006) Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Lecture Notes in Computer Science, 4005. pp. 574-588.

[img]
Preview
Image (cover image)
cover.jpg - Cover Image

Download (2kB) | Preview
[img] Other
anszmu_sapi_colt.ps.gz - Published Version
Restricted to Registered users only

Download (216kB)
[img] Text
colt2006.pdf - Published Version
Restricted to Registered users only

Download (260kB)
[img] Text
fulltext - Published Version
Restricted to Registered users only

Download (39kB)

Abstract

Gábor Lugosi, Hans-Ulrich Simon (Eds.): COLT 2006. 19th annual conference on learning theory. Pittsburgh, 2006. Berlin, Springer, 2006. We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical work, we consider the case when the training data consists of a single sample path (trajectory) of some behaviour policy.In particular, we do not assume access to a generative model of the environment.The algorithm studied is fitted Q-iteration where in successive iterations the $Q$-functions of the intermediate policies are obtained by means of minimizing a novel Bellman-residual type error.PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performancewhere the bound depends on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used.

Item Type: ISI Article
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Divisions: Informatics Laboratory
Depositing User: Eszter Nagy
Date Deposited: 11 Dec 2012 15:27
Last Modified: 11 Dec 2012 15:27
URI: https://eprints.sztaki.hu/id/eprint/4672

Update Item Update Item