The online loop-free stochastic shortest-path problem

Neu, Gergely and György, András and Szepesvári, Csaba (2010) The online loop-free stochastic shortest-path problem. In: COLT 2010. 23rd Annual conference on learning theory. Haifa, 2010..

[img] Text
COLT2010proceedings - Published Version
Restricted to Registered users only

Download (14kB)


We consider a stochastic extension of the loop-free shortest path problem with adversarial rewards. In this episodicMarkov decision problem an agent traverses through an acyclic graph with random transitions: at each step of an episode the agent chooses an action, receives some reward, and arrives at a random next state, where the reward and the distribution of the next state depend on the actual state and the chosen action. We consider the bandit situation when only the reward of the just visited state-action pair is revealed to the agent. For this problem we develop algorithms that perform asymptotically as well as the best stationary policy in hindsight. Assuming that all states are reachable with probability a>0 under all policies, we give an algorithm and prove that its regret is O(L^2 sqrt(T|A|)/a), where T is the number of episodes, A denotes the (finite) set of actions, and L is the length of the longest path in the graph. Variants of the algorithm are given that improve the dependence on the transition probabilities under specific conditions. The results are also extended to variations of the problem, including the case when the agent competes with time varying policies.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: online learning, stochastic shortest path problem, bandit feedback, episodic Markov decision process
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: Eszter Nagy
Date Deposited: 12 Dec 2012 08:38
Last Modified: 12 Dec 2012 08:38

Update Item Update Item