Fitted Q-iteration in continuous action-space MDPs
Antos, András and Munos, Rémi and Szepesvári, Csaba (2007) Fitted Q-iteration in continuous action-space MDPs. In: NIPS 2007. Proceedings of 21th annual conference on neural information processing systems. Vancouver, 2007..
Full text not available from this repository.Abstract
We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous theoretical analysis of this algorithm, proving what we believe is the first finite-time bounds for value-function based algorithms for continuous state- and action-space problems.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Subjects: | Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
Depositing User: | Eszter Nagy |
Date Deposited: | 11 Dec 2012 15:26 |
Last Modified: | 11 Dec 2012 15:26 |
URI: | https://eprints.sztaki.hu/id/eprint/4357 |
Update Item |