Doctoral research

A major challenge facing machine learning is the rapidly growing computational demands of big data applications. These applications strongly rely on increases in computing power to manage growing data sets and improve performance. This progress, however, is expected to rapidly become economically and environmentally unsustainable as computational requirements become a severe constraint. The design of efficient machine learning algorithms that take advantage of emerging hardware, such as field-programmable gate arrays architectures, has accordingly been identified as crucial to meeting big data applications’ computational demands. Unfortunately, many traditional machine learning algorithms were not developed to handle big data sets efficiently this way.

My research aims to develop innovative optimization methods to create efficient machine learning algorithms tailored for big data applications in the manner described above. I also research how partial differential equations can create new insights and novel methods for high-dimensional Bayesian estimation in imaging science and for neural networks and regularization methods in deep learning.

Accelerated nonlinear primal-dual algorithms with applications to large-scale machine learning

My work on optimization concentrates on the analysis, design, and efficient implementation of accelerated nonlinear primal-dual hybrid gradient (PDHG) methods for solving large-scale machine learning problems. The classic PDHG method is a first-order method that splits optimization problems with saddle-point structure into smaller subproblems. Unlike those obtained in most splitting methods, these subproblems can generally be solved efficiently. To work quickly, however, the classic PDHG method requires precise stepsize parameters that are often prohibitively expensive to compute for large-scale optimization problems, such as those in machine learning. Roughly speaking, if a data set consists of m samples n features, it takes on the order of O(mn^2 + m^2n) operations to estimate the parameters required for obtaining an optimal convergence rate. This problem arises in most first-order optimization methods as well.

In my research, I introduced accelerated nonlinear variants of the PDHG method that achieve an optimal convergence rate with stepsize parameters that can be computed in \Theta(mn) operations. I showed that these methods exhibit scalable parallelism, which is required for leveraging emerging hardware computational technologies, and efficiently solve many problems arising in machine learning applications. You can find a preprint of my work on these methods here. I also have a preprint on accelerated nonlinear PDHG methods for regularized logistic regression problems, which you can find here.

Bayesian methods in imaging science, neural network architectures and regularization methods in deep learning, and Hamilton–Jacobi equations

My work on partial differential equations uses Hamilton–Jacobi equations to describe theoretical properties of Bayesian methods in imaging science and the local entropy loss function in deep learning. In imaging science, variational and Bayesian methods correspond to using maximum a posteriori (MAP) estimates and posterior mean (PM) estimates for reconstructing images. Variational methods are generally well understood; it is known, for instance, that a broad class of MAP estimates corresponds to solutions of first-order Hamilton–Jacobi equations. The image denoising properties of these MAP estimates follow from the properties of the solutions of these equations. Bayesian methods, in contrast, are less well understood.

To fill this gap, I provided novel connections between a broad class of posterior mean estimates with both quadratic data fidelity term and log-concave prior and some solutions to viscous Hamilton–Jacobi equations. I used these connections to establish various properties of these PM estimates, including explicit representation formulas in terms of proximal mappings. My results suggest a novel computational method for estimating these PM estimates without any sampling strategy. In addition, my work provides new insights into the local entropy loss function used to train neural networks in deep learning. I published my results in the Journal of Mathematical Imaging and Vision (here) and as part of a book chapter (here). Finally, I co-authored an article on the representation of certain neural network architectures as solutions to Hamilton–Jacobi equations in Research in the Mathematical Sciences. These results yield efficient neural network-based approaches for evaluating solutions of these equations in high dimensional without numerical approximations. You can find the paper here.