The Sample Complexity in Data (Sample)-Driven Optimization/Learning
发布时间:2019年05月31日
浏览次数:6565
发布者: Xiaoni Tan
主讲人: Prof. Yinyu Ye, Stanford University
活动时间: 从 2019-06-05 15:30 到 16:30
场地: Room 77201, Jingchunyuan 78, BICMR
We consider data-driven optimization problems where the objective/reward functions can only be estimated via sample data, for examples, 1) the discounted Markov Decision Process (DMDP) or Reinforced Learning (RL) where we can only access its value functions through a generative sampling model and 2) the stochastic programming (SP) via the sample average approximation (SAA) scheme. In such settings, a fundamental question is how many examples are sufficient and necessary to learn/compute an $\epsilon$-optimal policy/solution with high probability. In this talk, we present two sets of results along this research direction. The first result is on infinite horizon DMDP we provide an algorithm which computes an epsilon-optimal policy with high probability where both the run time spent and number of samples taken are near optimal and it matches the sample complexity lower bound up to logarithmic factors. We also extend our method to the Two-Person Stochastic Game with a generative model and provide a nearly matching sample complexity lower bound. The second result in on general stochastic programming (SP) where the classical SAA dictates that the number of samples should be polynomial in the number of problem dimensions in order to ensure proper optimization accuracy. We now study a modification to the SAA in the scenario where the global minimizer is either sparse or can be approximated by a sparse solution. By making use of a regularization penalty referred to as (folded) concave penalty (FCP), we show that, if an FCP-regularized SAA formulation is solved to meet a weaker second-order necessary condition, then the required number of samples can be significantly reduced in approximating the global solution of a convex SP: the sample size is only required to be poly-logarithmic in the number of dimensions. The efficacy of the FCP regularization for nonconvex and non-smooth SPs is also discussed, and the finding allows us to further understand two important paradigms: the high-dimensional statistical learning (HDSL) and the (deep) neural networks (NN).