On the Complexity of Discounted Markov Decision process
发布时间:2017年06月12日
浏览次数:7605
发布者: Xiaoni Tan
主讲人: Mengdi Wang, Princeton University
活动时间: 从 2017-06-12 14:00 到 15:00
场地: 北京国际数学研究中心,全斋全29教室
We provide the first sublinear running time upper bound and a nearly matching lower bound for the discounted Markov decision problem.
Upper bound: We propose a randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality, the algorithm adaptively samples state transitions and makes exponentiated primal-dual updates. We show that it finds an ε-optimal policy using nearly-linear running time in the worst case. For Markov decision processes that are ergodic under every stationary policy, we show that the algorithm finds an ε-optimal policy using running time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide new complexity benchmarks for solving stochastic dynamic programs.
Lower bound: We also show that there exists computational lower bound on the run time of any algorithm for solving the MDP.
Upper bound: We propose a randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality, the algorithm adaptively samples state transitions and makes exponentiated primal-dual updates. We show that it finds an ε-optimal policy using nearly-linear running time in the worst case. For Markov decision processes that are ergodic under every stationary policy, we show that the algorithm finds an ε-optimal policy using running time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide new complexity benchmarks for solving stochastic dynamic programs.
Lower bound: We also show that there exists computational lower bound on the run time of any algorithm for solving the MDP.