Scientific Computing meets Machine Learning
Time: 2020-12-10
Published By: Kangkang Deng
Speaker(s): Qi Sun
Time: 14:00-15:00 December 18, 2020
Venue: Room 29, Quan Zhai, BICMR
Abstract:
(Session #1: Machine Learning for Scientific Computing)
Subtitle: Robust-pathwise Control of System Governed by Elliptic Partial Differential Equations with High-dimensional Random Inputs
Robust deterministic control, which can be implemented prior to observing the input uncertainty of the underlying system, has been extensively studied and exploited in optimization problems governed by partial differential equations with random inputs. Robust stochastic control, by contrast, can actively adjust the system response according to the input data encountered but practically rarely helps due to the so-called curse of dimensionality. In this work, a learning approach is proposed for constructing the data-driven numerical solver for optimization problems under uncertainty. Specifically, we consider an optimal control problem of tracking type, in which the governing system is given by second-order elliptic equations with log-normal coefficients and stochastic control functions. To deal with the high-dimensional input data that originates from the numerical treatment of uncertain coefficients, the Monte Carlo finite element method is adopted for discretization which enables us to construct a surrogate model, also called the robust-pathwise control, of the numerical optimal control solutions using convolutional encoder-decoder networks. Our methodology delivers both high accuracy and high efficiency for the tracking task with only a single forward-pass of the trained network. Numerical experiments with hundreds or even thousands of stochastic input dimensionality are carried out to demonstrate its effectiveness and efficiency in comparison to other benchmark approaches.
(Session #2: Scientific Computing for Machine Learning)
Subtitle: Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks)
Algorithms for training residual networks typically require the forward-pass of data, followed by the backpropagating of loss gradient to perform parameter updates, which can take many hours or even days for networks with hundreds of layers. Inspired by the penalty and augmented Lagrangian methods, a layer-parallel training algorithm is proposed in this work to overcome the scalability barrier caused by the serial nature of forward-backward propagation in deep residual learning. Moreover, by viewing the supervised classification task as a numerical discretization of the terminal control problem, we bridge the concept of synthetic gradient for decoupling backpropagation with the parareal method for solving differential equations, which not only offers a novel perspective on the design of synthetic loss function but also performs parameter updates with reduced storage overhead. Experiments on a preliminary example demonstrate that the proposed algorithm achieves comparable or even better testing accuracy to the full serial backpropagation approach, while enabling layer-parallelism can provide speedup over the traditional layer-serial training methods.
(Session #1: Machine Learning for Scientific Computing)
Subtitle: Robust-pathwise Control of System Governed by Elliptic Partial Differential Equations with High-dimensional Random Inputs
Robust deterministic control, which can be implemented prior to observing the input uncertainty of the underlying system, has been extensively studied and exploited in optimization problems governed by partial differential equations with random inputs. Robust stochastic control, by contrast, can actively adjust the system response according to the input data encountered but practically rarely helps due to the so-called curse of dimensionality. In this work, a learning approach is proposed for constructing the data-driven numerical solver for optimization problems under uncertainty. Specifically, we consider an optimal control problem of tracking type, in which the governing system is given by second-order elliptic equations with log-normal coefficients and stochastic control functions. To deal with the high-dimensional input data that originates from the numerical treatment of uncertain coefficients, the Monte Carlo finite element method is adopted for discretization which enables us to construct a surrogate model, also called the robust-pathwise control, of the numerical optimal control solutions using convolutional encoder-decoder networks. Our methodology delivers both high accuracy and high efficiency for the tracking task with only a single forward-pass of the trained network. Numerical experiments with hundreds or even thousands of stochastic input dimensionality are carried out to demonstrate its effectiveness and efficiency in comparison to other benchmark approaches.
(Session #2: Scientific Computing for Machine Learning)
Subtitle: Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks)
Algorithms for training residual networks typically require the forward-pass of data, followed by the backpropagating of loss gradient to perform parameter updates, which can take many hours or even days for networks with hundreds of layers. Inspired by the penalty and augmented Lagrangian methods, a layer-parallel training algorithm is proposed in this work to overcome the scalability barrier caused by the serial nature of forward-backward propagation in deep residual learning. Moreover, by viewing the supervised classification task as a numerical discretization of the terminal control problem, we bridge the concept of synthetic gradient for decoupling backpropagation with the parareal method for solving differential equations, which not only offers a novel perspective on the design of synthetic loss function but also performs parameter updates with reduced storage overhead. Experiments on a preliminary example demonstrate that the proposed algorithm achieves comparable or even better testing accuracy to the full serial backpropagation approach, while enabling layer-parallelism can provide speedup over the traditional layer-serial training methods.