Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Antos, András and Szepesvári, Csaba and Munos, Rémi (2008) Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71 (DOI: 1). pp. 89-129.

[img]
Preview
Image (cover image)
cover.jpg - Cover Image

Download (3kB) | Preview
[img] Text
10.10072Fs1099400750382.pdf - Published Version
Restricted to Registered users only

Download (784kB)
[img] Other
anszmu_sapi_mlj.ps.gz - Published Version
Restricted to Registered users only

Download (278kB)

Abstract

We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory.

Item Type: ISI Article
Uncontrolled Keywords: Reinforcement learning, policy iteration, Bellman-residual minimization, off-policy learning, nonparametric regression, least-squares regression, finite-sample bounds
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: Eszter Nagy
Date Deposited: 11 Dec 2012 15:26
Last Modified: 11 Dec 2012 15:26
URI: https://eprints.sztaki.hu/id/eprint/4403

Update Item Update Item