Selected Research TopicsWe study optimization theory and algorithms for
Sparse and lowrank matrix reconstructionCompressive sensing tries to recover the sparsest solution from relatively few incomplete measurements by solving a problem involving the l1norm regularization. The nonsmoothness of the functions and the requirements handling data sets at extremely large scales are the primary challenges. We proposed in [40, 41] a framework which separates the task of identifying the sparsity and the task of recovering the magnitude of the solution by using different optimization tools. Specifically, inexpensive firstorder type approaches are used to yield a sparsity pattern, then secondorder type methods are used to explore a smaller but smooth subspace problem based on this estimated support. Hence, the computational cost at each stage is always affordable. The corresponding code FPC_AS has been downloaded for more than 2000 times. Extensive numerical comparisons in [2, 21, 13] show that our method is competitive on quite a few cases. We proposed nonconvex factorization methods for lowrank matrix optimization problems including matrix completion [42] and matrix separation (robust PCA) [28]. Note that solving convex models of these problems typically requires costly iterations such as computing singular value decomposition. The nonconvex factorization models, as well as the alternating direction method, improve our capacity on handling largescale problems. In particular, the linear least squares subproblems are solved in an equivalent yet cheaper way. The approach has been applied in seismic data reconstruction [47]. It performs well in the comparisons with several other algorithms [32, 22] and etc. Semidefinite programming (SDP)Semidefinite programming is concerned with minimizing a linear function of a symmetric positive semidefinite matrix X subject to linear constraints. In [35], we present a rowbyrow (RBR) method. By fixing any (n−1)dimensional principal submatrix of X and using its Schur complement, the positive semidefinite constraint is reduced to a mere secondorder cone constraint, and then a sequence of secondorder cone programming problems constructed from the primal augmented Lagrangian function are minimized. After some suitable modifications, the method can be used for sparse principal component analysis [51] and phase retrieval [33]. We present and analyze an alternating direction method of multipliers (ADMM) for solving SDP problems in [36]. At each iteration, the algorithm minimizes the dual augmented Lagrangian function on each block of dual variables separately while other blocks are fixed and then updates the primal variables. This approach has been applied to exact and stable recovery of rotations for robust synchronization [34]. The evaluations in [29, 26, 25, 24, 46, 23] demonstrated that our method can be efficient in many cases. Recently, we developed a secondorder type method [45, 14] based on the equivalence between ADMM and the DouglasRachford splitting (DRS). They actually reformulate SDP as a system of nonlinear equations. Although these nonlinear equations may be nondifferentiable, they are often semismooth, and their generalized Jacobian matrix is positive semidefinite due to monotonicity. We propose an adaptive semismooth Newton method and establish its convergence to global optimality. Our secondorder type algorithms are able to achieve superlinear or quadratic convergence under certain assumptions. Numerical results show that our algorithm can be faster than one of the best available codes SDPNAL+ [46] for achieving comparable accuracy. Optimization with orthogonality constraintsMinimization under orthogonality constraints and/or spherical constraints [39] arises in many problems in science and engineering, such as polynomial optimization, combinatorial optimization, linear and nonlinear eigenvalue problems, sparse PCA, density functional theory, BoseEinstein condensates, phase retrieval, CryoEM, matrix rank minimization, etc. These problems are difficult because the constraints are not only nonconvex but numerically expensive to preserve during iterations. To deal with these difficulties, we apply the Cayley transform — a CrankNicolsonlike update scheme — to maintain the constraints and based on it, develop curvilinear search algorithms with lower flops compared to those based on projections and geodesics. The globally optimal solutions can even be identified under certain conditions, and they are useful in synchronization and community detection [1]. The extensive numerical comparisons in [8, 10] show that our method is indeed efficient. The algorithms has been demonstrated on a variety of problems mentioned above [39, 6, 31, 11, 50, 44]. In particular, they can provide multifold accelerations over SDP algorithms on several problems arising from polynomial optimization, combinatorial optimization, and the correlation matrix estimation. It has also been applied to robust orthogonal complement principal component analysis in [27] and Bayesian group factor analysis with structured sparsity in [52]. Recently, we proposed an adaptive regularized Newton method [37, 44, 7] which approximates the original objective function by the secondorder Taylor expansion in Euclidean space but keeps the orthogonality constraints. The regularization term in the objective function of the subproblem enables us to establish a Cauchypoint like condition as the standard trustregion method for proving global convergence. The subproblem can be solved inexactly either by firstorder methods or a truncated Riemannian Newton method. In the latter case, it can further take advantage of negative curvature directions. This Newtontype method can be more efficient than the gradienttype method [39] in density functional theory [37] and in BoseEinstein condensates [44]. Linear eigenvalue optimizationLinear eigenvalue problem is a typical optimization problem with orthogonality constraints. Most iterative algorithms for eigenpair computation consist of two main steps: a subspace update (SU) step that generates bases for approximate eigenspaces, followed by a RayleighRitz (RR) projection step that extracts approximate eigenpairs. So far the predominant methodology for the SU step is based on Krylov subspaces that build orthonormal bases piece by piece in a sequential manner. In [43, 18, 38, 17], we investigate block methods in the SU step that allow a higher level of concurrency than what is reachable by Krylov subspace methods. A limited memory block Krylov subspace optimization approach is studied in [17]. The subspace optimization technique can significantly accelerate the classic simultaneous iteration method. In [18], we further derive and study a GaussNewton method for computing a symmetric lowrank product that is the closest to a given symmetric matrix in Frobenius norm. Our GaussNewton method, which has a particularly simple form, shares the same order of iterationcomplexity as a gradient method but can be significantly faster on a broad range of problems. Global convergence and a Qlinear convergence rate are established. To achieve a competitive speed to the stateoftheart eigensolvers, we propose in [43] an augmented RayleighRitz (ARR) procedure and analyze its rate of convergence under realistic conditions. Combining this ARR procedure with a set of polynomial accelerators, as well as utilizing a few other techniques such as continuation and deflation, we construct a block algorithm designed to reduce the number of RR steps and elevate concurrency in the SU steps. Extensive computational experiments are conducted in the CLanguage on a representative set of test problems to evaluate the performance of our algorithm in comparison to wellestablished, highquality eigensolvers ARPACK, FEAST, and SLEPc. Numerical results, obtained on a manycore computer without explicit code parallelization, show that when computing a relatively large number of eigenpairs, the performance of our algorithms is competitive with, and frequently superior to, that of the three stateoftheart eigensolvers. Moreover, the algorithm possesses a higher degree of concurrency than Krylov subspace methods, thus offering better scalability on modern multi/manycore computers. Our code LMSVD has been used in [20] for computing symmetric tensor network state, in [12] for estimating the singular values and vectors in tenor train format, and in [3] for ultrasonic beamforming clutter reduction. Our GaussNewton method has been extended in [30] for LowRank Matrix Optimization. Nonlinear eigenvalue optimizationThe density functional theory (DFT) in electronic structure calculation is an important source of optimization problems with orthogonality constraints. The KohnSham (KS) equations try to identify orthogonal eigenvectors to satisfy the nonlinear eigenvalue problems. On the other hand, the KS minimization problem minimizes the KS total energy functionals under the orthogonality constraints. These two problems are connected by the optimality conditions. In [16, 15], we study a few theoretical issues in the discretized KS density functional theory. The equivalence between either a local or global minimizer of the KS total energy minimization problem and the solution to the KS equation is established under certain assumptions. The nonzero charge densities of a strong local minimizer are shown to be bounded below by a positive constant uniformly. We analyze the selfconsistent field (SCF) iteration by formulating the KS equation as a fixed point map respect to potential. The Jacobian of these fixed point maps is derived explicitly. Both global and local convergence of the simple mixing scheme can be established if the gap between the occupied states and unoccupied states is sufficiently large. This assumption can be relaxed if the charge density is computed using the FermiDirac distribution and it is not required if the exchange correlation functional is zero. Although our assumption on the gap is very stringent, our analysis is still valuable for a better understanding of the KS minimization problem, the KS equation, and the SCF iteration. The SCF method with heuristics such as charge mixing often works remarkably well on many problems, but it is well known that its convergence can be unpredictable. Hence, we have developed a few efficient algorithms from the point view of solving the KS minimization problem. In [31, 50], the gradient type methods are competitive to the SCF iteration. We further regularize the SCF iteration and establish rigorous global convergence to the firstorder optimality conditions in [37]. The Hessian of the total energy functional is exploited. By adding the part of the Hessian which is not considered in SCF, our methods can always achieve a highly accurate solution to problems for which SCF fails and exhibit a better convergence rate than SCF in the KSSOLV toolbox under the Matlab environment. It has been mentioned in [5, 53] that our analysis and approaches for DFT are particularly useful. Pursuing global solutions of orthogonality constrained optimizationWe have constructed an efficient algorithm for finding approximate global minimizers for problems with orthogonality constraints [49]. The main concept is based on noisy gradient flow constructed from stochastic differential equations (SDE) on the Stiefel manifold, the differential geometric characterization of orthogonality constraints. We derive the explicit representation of SDE on the Stiefel manifold endowed with a canonical metric and propose a numerically efficient scheme to simulate this SDE based on Cayley transformation with theoretical convergence guarantee. The convergence to global optimizers is proved if stochastic diffusion is sufficiently enough. The effectiveness and efficiency of the proposed algorithms are demonstrated on a variety of problems including homogeneous polynomial optimization, computation of stability number, iterative closest point and 3D structure determination from Common Lines in CryoEM. In [19], we consider a nonlinear least squares model and propose a modified LevenbergMarquardt (LM) method for phase retrieval. Under similar assumptions as the wellknown Wirtinger flow (WF) algorithm [4], global linear convergence and local quadratic convergence to the global solution set are established by estimating the smallest nonzero eigenvalues of the GaussNewton matrix, establishing local error bound properties and constructing a modified regularization condition. The computational cost becomes tractable when a preconditioned conjugate gradient (PCG) method is applied to solve the LM equation inexactly. Specifically, the preconditioner is constructed from the expectation of the LM coefficient matrix by assuming the independence between the measurements and iteration points. Our numerical experiments show that our algorithm is robust and it is often faster than the WF method on both random examples and natural image recovery. Problems with combinatorial structuresWe have studied firstorder type methods for optimization with combinatorial structures. In [9], we relax the permutation matrices to be the more tractable doubly stochastic matrices and add an Lpnorm (0<p<1) regularization term to the objective function. The optimal solutions of the Lpregularized problem are the same as the original problem if the regularization parameter is sufficiently large. A lower bound estimation of the nonzero entries of the stationary points and some connections between the local minimizers and the permutation matrices are further established. Then we propose an regularization algorithm with local refinements. The algorithm approximately solves a sequence of regularization subproblems by the projected gradient method using a nonmonotone line search method with the BarzilaiBorwein step sizes. Its performance can be further improved if it is combined with certain local search methods, the cutting plane techniques as well as a new negative proximal point scheme. Extensive numerical results on QAPLIB and the bandwidth minimization problem show that our proposed algorithms can often find reasonably highquality solutions within a competitive amount of time. In [48], we study the treatment plan optimization in volumetric modulated arc therapy (VMAT) for cancer treatment. It is a complicated integer programming problem due to the physical constraints. We perform minimization on the shapes of the aperture and the beam intensities alternatively. The proposed algorithm is further improved by an incremental random importance sampling of the voxels to reduce the computational cost of the energy functional. Numerical simulations on two clinical cancer date sets demonstrate that our method is highly competitive to the stateoftheart algorithms regarding both computational time and quality of treatment planning. References:
