Approximability and efficient algorithms for constrained fixed-horizon POMDPs with durative actions

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Partially Observable Markov Decision Process (POMDP) is a fundamental model for probabilistic planning in stochastic domains. More recently, constrained POMDP and chance-constrained POMDP extend the model allowing constraints to be specified on some aspects of the policy in addition to the objective function. Despite their expressive power, these models assume all actions take a fixed duration, which poses a limitation in modeling real-world planning problems. In this work, we propose a unified model for durative POMDP and its constrained extensions. First, we convert these extensions into an Integer Linear Programming (ILP) formulation, which can be solved using existing solvers in the ILP literature. Second, a heuristic search approach is provided that can efficiently prune the search space, guided by solving successive partial ILP programs. Third, we give a theoretical analysis of the problem: unlike short-horizon POMDPs, with policies of a constant depth, which can be solved in polynomial time, the constrained extensions are NP-Hard even with a planning horizon of two and non-negative rewards. To alleviate that, we propose a Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies in polynomial time. The FPTAS is among the best achievable in theory in terms of approximation ratio. Finally, evaluation results show that our approach is empirically superior to the state-of-the-art fixed-horizon chance-constrained POMDP solver.

    Original languageBritish English
    Article number103968
    JournalArtificial Intelligence
    Volume323
    DOIs
    StatePublished - Oct 2023

    Keywords

    • Constrained partially observable Markov decision process
    • Durative actions
    • Heuristic search
    • Risk-bounded planning

    Fingerprint

    Dive into the research topics of 'Approximability and efficient algorithms for constrained fixed-horizon POMDPs with durative actions'. Together they form a unique fingerprint.

    Cite this