Policy Learning in Adaptive Experiments

发布时间：2023年03月06日浏览次数：4358 发布者: Xiaoni Tan

打印

主讲人： Ruohan Zhan (The Hong Kong University of Science and Technology)

活动时间： 从 2023-04-10 10:00 到 11:00

场地： Room 29, Quan Zhai, BICMR

Abstract: Learning optimal policies from historical data enables personalization in a variety of domains across healthcare, digital recommendations, and online education. Recently, there has been increasing attention on adaptive experiments (for example, contextual bandits), which allow for progressively updating data-collection rules to identify good treatment assignment policies. However, most existing contextual bandit algorithms are geared towards maximizing the operational performance during the experiment, while the optimality of the learned policy is yet to be guaranteed, especially when outcome models are misspecified. Conversely, non-adaptive experiments, known as randomized controlled trials (RCT), guarantee to identify the best policy in large samples but can be prohibitively costly or even unethical in some cases. We propose to address this policy learning problem from two perspectives:

- Offline policy learning using adaptively collected data. We seek to make the fullest use of such data, which is increasingly prevalent due to the popularity of adaptive designs, so as to learn a policy—without doing new experiments—that yields the best outcome for each individual. We show that our algorithm is robust to model misspecification and achieves the minimax optimality, even when the data is collected by an original experiment with diminishing exploration.

- Online contextual bandit algorithm tailored to policy learning. We seek to design a practical contextual bandit algorithm to collect “relevant” data for policy learning, such that our algorithm guarantees to learn the optimal policy at a faster rate than RCT in many instances. We also show that our algorithm can be flexibly adapted to optimize the performance during the experiment (a.k.a. cumulative regret minimization) with minimax optimality guarantees.

The talk is based on joint works with Susan Athey, Emma Brunskill, Sanath Krishnamurthy, Zhimei Ren, and Zhengyuan Zhou.

Bio: Ruohan Zhan is an assistant professor of Industrial Engineering and Decision Analytics at the Hong Kong University of Science and Technology. Her research develops methods to innovate data-driven decision making using tools from causal inference, statistics, and machine learning, with particular interest in problems from platform operations and economics. Previously, she received her BS in mathematics from Peking University (2017), her MS in statistics and PhD in computational and applied mathematics from Stanford University (2021), where her doctoral research was advised by Susan Athey. She was a postdoc fellow at Stanford Graduate School of Business (2022). She has also spent the summers of 2019 and 2020 at Google Research and have worked full-time at Kuaishou Technology during 2021-2022.

打印