Better Search Methods for Derivative-free Optimization Non-convex global optimization problems are well-known to be NP-hard, and the practical challenge lies in distinguishing the global optimum from exponentially many potential local optima. Existing approaches to non-convex optimization can be largely categorized into sampling-based methods and tree-search methods. Sampling-based approaches explore the solution space through random sampling […]
Read More
Xiang Cheng, TILOS Postdoctoral Scholar at MIT Abstract: Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a […]
Read More
Aadirupa Saha, Research Scientist at Apple Abstract: Customer statistics collected in several real-world systems have reflected that users often prefer eliciting their liking for a given pair of items, say (A,B), in terms of relative queries like: “Do you prefer Item A over B?”, rather than their absolute counterparts: “How much do you score items […]
Read More
Mert Pilanci, Stanford University Abstract: Since deep neural network training problems are inherently non-convex, their recent dramatic success largely relies on non-convex optimization heuristics and experimental findings. Despite significant advancements, the non-convex nature of neural network training poses two central challenges: first, understanding the underlying mechanisms that contribute to model performance, and second, achieving efficient […]
Read More
Ramya Korlakai Vinayak, Assistant Professor, University of Wisconsin–Madison Abstract: Machine learning (ML) algorithms are becoming ubiquitous in various application domains such as public health, genomics, psychology, and social sciences. In these domains, data is often obtained from populations that are diverse, e.g., varying demographics, phenotypes, preferences etc. Many ML algorithms focus on learning model parameters […]
Read More
Abstract: Sparsity has given us MP3, JPEG, MPEG, Faster MRI and many fun mathematical problems. Deep generative models like GANs, VAEs, invertible flows and Score-based models are modern data-driven generalizations of sparse structure. We will start by presenting the CSGM framework by Bora et al. to solve inverse problems like denoising, filling missing data, and recovery from linear projections using an unsupervised method that relies on a pre-trained generator. We generalize compressed sensing theory beyond sparsity, extending Restricted Isometries to sets created by deep generative models. Our recent results include establishing theoretical results for Langevin sampling from full-dimensional generative models, generative models for MRI reconstruction and fairness guarantees for inverse problems.
Read More
Siddhartha Banerjee, Cornell University Abstract: I will present a class of finite-horizon control problems, where we see a random stream of arrivals, need to select actions in each step, and where the final objective depends only on the aggregate type-action counts; this includes many widely-studied control problems including online resource-allocation, dynamic pricing, generalized assignment, online […]
Read More
Abstract: In this talk, we will focus on the emerging field of (adversarially) robust machine learning. The talk will be self-contained and no particular background on robust learning will be needed. Recent progress in this field has been accelerated by the observation that despite unprecedented performance on clean data, modern learning models remain fragile to seemingly innocuous changes such as small, norm-bounded additive perturbations. Moreover, recent work in this field has looked beyond norm-bounded perturbations and has revealed that various other types of distributional shifts in the data can significantly degrade performance. However, in general our understanding of such shifts is in its infancy and several key questions remain unaddressed.
The goal of this talk is to explain why robust learning paradigms have to be designed — and sometimes rethought — based on the geometry of the input perturbations. We will cover a wide range of perturbation geometries from simple norm-bounded perturbations, to sparse, natural, and more general distribution shifts. As we will show, the geometry of the perturbations necessitates fundamental modifications to the learning procedure as well as the architecture in order to ensure robustness. In the first part of the talk, we will discuss our recent theoretical results on robust learning with respect to various geometries, along with fundamental tradeoffs between robustness and accuracy, phase transitions, etc. The remaining portion of the talk will be about developing practical robust training algorithms and evaluating the resulting (robust) deep networks against state-of-the-art methods on naturally-varying, real-world datasets.
Read More
Abstract: Policy Optimization methods enjoy wide practical use in reinforcement learning (RL) for applications ranging from robotic manipulation to game-playing, partly because they are easy to implement and allow for richly parameterized policies. Yet their theoretical properties, from optimality to statistical complexity, are still not fully understood. To help develop a theoretical basis for these methods, and to bridge the gap between RL and control theoretic approaches, recent work has studied whether gradient-based policy optimization can succeed in designing feedback control policies.
In this talk, we start by showing the convergence and optimality of these methods for linear dynamical systems with quadratic costs, where despite nonconvexity, convergence to the optimal policy occurs under mild assumptions. Next, we make a connection between convex parameterizations in control theory on one hand, and the Polyak-Lojasiewicz property of the nonconvex cost function, on the other. Such a connection between the nonconvex and convex landscapes provides a unified view towards extending the results to more complex control problems.
Read More
Abstract: Recent studies have started to apply machine learning techniques to the control of unknown dynamical systems. They have achieved impressive empirical results. However, the convergence behavior, statistical properties, and robustness performance of these approaches are often poorly understood due to the non-convex nature of the underlying control problems. In this talk, we revisit the Linear Quadratic Gaussian (LQG) control and present recent progress towards its landscape analysis from a non-convex optimization perspective. We view the LQG cost as a function of the controller parameters and study its analytical and geometrical properties. Due to the inherent symmetry induced by similarity transformations, the LQG landscape is very rich yet complicated. We show that 1) the set of stabilizing controllers has at most two path-connected components, and 2) despite the nonconvexity, all minimal stationary points (controllable and observable controllers) are globally optimal. Based on the special non-convex optimization landscape, we further introduce a novel perturbed policy gradient (PGD) method to escape a large class of suboptimal stationary points (including high-order saddles). These results shed some light on the performance analysis of direct policy gradient methods for solving the LQG problem. The talk is based on our recent papers: https://arxiv.org/abs/2102.04393 and https://arxiv.org/abs/2204.00912.
Read More