SPI represents the policy as a decision tree with state-variables labeling interior nodes and a concrete action as a leaf node. The policy backup uses the graphical form of the policy. In each such backup, for each leaf node (policy action) a in the policy tree, its Q-function Qa is computed and attached to the leaf.
The core idea is a symbolic procedure that applies policy constraints only when they reduce the space and time complexity of the update, and otherwise performs ...
This paper addresses the scalability of symbolic planning under uncertainty with factored states and actions. Our first contribution is a symbolic ...
This paper addresses the scalability of symbolic planning under uncertainty with factored states and actions. Our first contribution is a symbolic ...
This paper addresses the scalability of symbolic planning under uncertainty with factored states and actions with a novel convergent algorithm lying between ...
Offline planning based on symbolic oper- ators exploits the factored structure of MDPs, but is memory intensive.
Symbolic Opportunistic Policy Iteration for Factored-Action MDPs ... We address the scalability of symbolic planning under uncertainty with factored states and ...
Our first contribution is a symbolic implementation of Modified Policy Iteration (MPI) for factored actions that views policy evaluation as policy-constrained ...
Tadepalli "Symbolic Opportunistic Policy Iteration for Factored-Action MDPs" Proceedings of the International Conference on Neural Information Processing ...
policy, where the optimal action to take in state x is the action aj corresponding to the first event tj in the list with which x is consistent. Theorem 7.1 ...
Missing: Symbolic Opportunistic