Fitted Q-iteration in continuous action-space MDPs

Antos, András and Munos, Rémi and Szepesvári, Csaba (2007) Fitted Q-iteration in continuous action-space MDPs. In: NIPS 2007. Proceedings of 21th annual conference on neural information processing systems. Vancouver, 2007..

Full text not available from this repository.

Official URL: http://www.szit.bme.hu/~antos/ps/anmusz_sapi.ps.gz

Abstract

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous theoretical analysis of this algorithm, proving what we believe is the first finite-time bounds for value-function based algorithms for continuous state- and action-space problems.

Item Type:	Conference or Workshop Item (Paper)
Subjects:	Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User:	Eszter Nagy
Date Deposited:	11 Dec 2012 15:26
Last Modified:	11 Dec 2012 15:26
URI:	https://eprints.sztaki.hu/id/eprint/4357

Update Item