Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
Antos, András and Szepesvári, Csaba and Munos, Rémi (2008) Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71 (DOI: 1). pp. 89-129.
|
Image (cover image)
cover.jpg - Cover Image Download (3kB) | Preview |
|
Text
10.10072Fs1099400750382.pdf - Published Version Restricted to Registered users only Download (784kB) |
||
Other
anszmu_sapi_mlj.ps.gz - Published Version Restricted to Registered users only Download (278kB) |
Abstract
We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory.
Item Type: | ISI Article |
---|---|
Uncontrolled Keywords: | Reinforcement learning, policy iteration, Bellman-residual minimization, off-policy learning, nonparametric regression, least-squares regression, finite-sample bounds |
Subjects: | Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
Depositing User: | Eszter Nagy |
Date Deposited: | 11 Dec 2012 15:26 |
Last Modified: | 11 Dec 2012 15:26 |
URI: | https://eprints.sztaki.hu/id/eprint/4403 |
Update Item |