Recorded Talks: TILOS Seminar Series
Neuromorphic LLMs
Jason Eshraghian, UC Santa Cruz
This talk will show you what neuromorphic computing can do when an academic lab accidentally pulls $2-million of GPU-hours. We will showcase a series of frontier reasoning LLMs developed out of an academic lab, from data curation and pre-training to post-training and alignment. These models surpass leading LLMs from Meta, Google, and other heavily-resourced labs in the ~10-billion parameter regime, despite being 5x smaller.
We have deployed several models on neuromorphic hardware at just 2 watts, bringing state-of-the-art reasoning from the datacenter to the edge. Along the way, we dispel a series of widely-held assumptions about large-scale neuromorphic computation, revealing how it fundamentally differs from conventional deep learning, and why that difference matters.
Jason Eshraghian is an Assistant Professor and Fulbright Scholar in the Department of Electrical and Computer Engineering at the University of California, Santa Cruz. He is the developer of snnTorch, a Python library with over 500,000 downloads for training spiking neural networks. He is a dual-appointed IEEE CAS and EMBS Distinguished Lecturer, an Associate Editor of APL Machine Learning, the Chair of the IEEE Neural Systems and Applications Technical Committee, has been the recipient of seven IEEE Best Paper Awards, a Scientific Advisory Board Member of BrainChip and leads the Neuromorphic Agents Team at Conscium.
AI-Driven Design Automation for Multi-Chip Integration in AI Chips
Sung-Kyu Lim, University of Southern California
Multi-chip integration has become a standard approach in AI training and is rapidly gaining traction in edge learning applications. Leveraging 2.5D and 3D IC architecture enables substantial improvements in energy efficiency and latency by optimizing inter chip data transfer. At the core of this transformation lies the automation of design and simulation for heterogeneous AI chips, shifting from manual engineering to algorithm driven methodologies. This evolution is being accelerated by advanced electronic design automation (EDA) tools powered by AI. My research group develops novel AI driven algorithms that enhance or replace traditional design automation techniques, with a focus on enabling next generation heterogeneous AI systems. In this talk, I will present our recent innovations and explore the critical challenges that lie ahead in applying AI algorithms to EDA for high performance AI chip design.
Dr. Sung Kyu Lim is Dean’s Professor of Electrical and Computer Engineering at the University of Southern California, joining in Fall 2025 after over two decades at Georgia Tech. He received his B.S., M.S., and Ph.D. in Computer Science from UCLA. His research focuses on the architecture, design, and electronic design automation (EDA) of 2.5D and 3D integrated circuits, with over 450 publications. Dr. Lim is an IEEE Fellow and recipient of major awards including multiple Best Paper Awards (DAC 2023, TCAD 2022), and several Georgia Tech teaching honors. From 2022 to 2024, he served as a Program Manager at DARPA’s Microsystems Technology Office.
Kinetic Theory Perspective of Foundation Models for Physics
Maarten de Hoop, Rice University
We present a kinetic theory perspective of foundation models for physics. We begin with providing a mathematical framework for analyzing transformers. To uniformly address their expressivity, we consider the case that the mappings are conditioned on a context represented by a probability distribution of tokens. That is, transformers become mappings between probability measures. The relevant notion of smoothness then corresponds to continuity in terms of the Wasserstein distance between such contexts. We demonstrate that deep transformers are universal and can approximate continuous in-context mappings to arbitrary precision, uniformly over compact token domains. We then characterize the conditions on mappings between measures that enable these to be represented in terms of in-context mappings as transformers. The solution map of the Vlasov equation, which is of nonlocal transport type, for interacting particle systems in the mean-field regime for the Cauchy problem satisfies the conditions; conversely, we prove that the measure-theoretic self-attention has the properties that ensure that the infinite depth, mean-field transformer can be identified with a Vlasov flow. Extending this framework from interactions to collisions leads to a further development of structured architectures inspired by Lattice Boltzmann Models, while flow motivates a design based on self-warping.
Professor Maarten V. de Hoop, Simons Chair in Computational and Applied Mathematics and Earth Science at Rice University, is internationally recognized for his contributions to the mathematical foundations of seismology, wave propagation, and inverse problems. His research bridges microlocal and harmonic analysis, scattering theory, and structured numerical methods with applications to seismic imaging, geophysical inversion, and large-scale computational modeling of acoustic, elastic, and electromagnetic phenomena. De Hoop has been a pioneer in developing techniques to extract subtle information from massive, complex seismic datasets, advancing our ability to probe the Earth’s interior with unprecedented resolution, and more recently has integrated deep learning and data-driven discovery with rigorous mathematical frameworks to open new frontiers in the analysis of multiscale wave phenomena and inverse spectral problems. He is the recipient of the J. Clarence Karcher Award from the Society of Exploration Geophysicists and the Young Scientists Award from the International Society for Analysis, its Applications and Computation, has been elected a Fellow of the Institute of Physics and an External Member of the Finnish Academy of Science and Letters, and has served as associate editor for Inverse Problems, Inverse Problems and Imaging, and the International Journal on Geomathematics.
Incentivizing Emergent Behaviors for LLMs via Reinforcement Learning
Yi Wu, Tsinghua University
Reinforcement Learning (RL) has become a powerful post-training method for eliciting advanced behaviors in large language models (LLMs). This talk presents recent results showing how RL can incentivize the emergence of LLM capabilities across three domains: (1) multi-player deduction game, Werewolf, where RL-trained LLM agents develop strategic behaviors and outperform strong human players; (2) agentic search, where large-scale RL enables a 32B model to run multi-step search to answer non-trivial questions beyond commercial baselines; and (3) efficient reasoning, where RL mitigates over-thinking and improves both reliability and compute efficiency.
The papers can be found at
- Werewolf: https://arxiv.org/abs/2310.
18940 (ICML24), https://arxiv.org/abs/2502. 04686 (ICML25) - ASearcher: https://arxiv.org/abs/2508.
07976 - Thinking Efficiency: https://www.arxiv.org/abs/
2506.07104 (NeurIPS25)
All the projects are trained using our large-scale agentic RL system, AReaL, which is open-source at https://github.com/
Yi Wu is an assistant professor at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. He obtained his Ph.D. from UC Berkeley and was a researcher at OpenAI from 2019 to 2020. His research focuses on reinforcement learning, multi-agent learning, and LLM agents. His representative works include the value iteration network, the MADDPG/MAPPO algorithm, OpenAI’s hide-and-seek project, and the AReaL project. He received the best paper award at NIPS 2016, the best demo award finalist at ICRA 2024, and MIT TR35 Asia Pacific 2025 award.
95 Percent: Bridging the Gap Between Prototype and Product
Jeremy Schwartz, Zoox
When transitioning from the academic world to the professional world of engineering, one of the most common pitfalls is failing to understand the difference between a compelling prototype and a successful product. This talk will focus on that distinction. We will discuss the differences between them, and the work required to evolve a good prototype into a real product. We will also discuss some common pitfalls encountered in product development, and some of the practical software design considerations to keep in mind for development of robust, mature code. The talk will include examples from my background developing robotic systems for air, space, and ground.
Jeremy Schwartz is a robotics engineer at Zoox with expertise in a wide variety of areas of mechanical and electrical engineering and computer science. His primary professional expertise is in autonomy and behavioral algorithms, and he has worked in the aerospace industry as well as ground robotics, specializing in autonomous systems of all kinds.
Certifiably Correct Machine Perception
David Rosen, Northeastern University
Many fundamental machine perception and state estimation tasks require the solution of a high-dimensional nonconvex estimation problem; this class includes (for example) the fundamental problems of simultaneous localization and mapping (in robotics), 3D reconstruction (in computer vision), and sensor network localization (in distributed sensing). Such problems are known to be computationally hard in general, with many local minima that can entrap the smooth local optimization methods commonly applied to solve them. The result is that standard machine perception algorithms (based upon local optimization) can be surprisingly brittle, often returning egregiously wrong answers even when the problem to which they are applied is well-posed.
In this talk, we present a novel class of certifiably correct estimation algorithms that are capable of efficiently recovering provably good (often globally optimal) solutions of generally-intractable machine perception problems in many practical settings. Our approach directly tackles the problem of nonconvexity by employing convex relaxations whose minimizers provide provably good approximate solutions to the original estimation problem under moderate measurement noise. We illustrate the design of this class of methods using the fundamental problem of pose-graph optimization (a mathematical abstraction of robotic mapping) as a running example. We conclude with a brief discussion of open questions and future research directions.
David M. Rosen is an Assistant Professor in the Departments of Electrical & Computer Engineering and Mathematics and the Khoury College of Computer Sciences (by courtesy) at Northeastern University, where he leads the Robust Autonomy Laboratory (NEURAL). Prior to joining Northeastern, he was a Research Scientist at Oculus Research (now Meta Reality Labs) from 2016 to 2018, and a Postdoctoral Associate at MIT’s Laboratory for Information and Decision Systems (LIDS) from 2018 to 2021. He holds the degrees of B.S. in Mathematics from the California Institute of Technology (2008), M.A. in Mathematics from the University of Texas at Austin (2010), and ScD in Computer Science from the Massachusetts Institute of Technology (2016).
He is broadly interested in the mathematical and algorithmic foundations of trustworthy machine perception, learning, and control. His work has been recognized with the IEEE Transactions on Robotics Best Paper Award (2024), an Honorable Mention for the IEEE Transactions on Robotics Best Paper Award (2021), a Best Student Paper Award at Robotics: Science and Systems (2020), a Best Paper Award at the International Workshop on the Algorithmic Foundations of Robotics (2016), and selection as an RSS Pioneer (2019).
AI Safety Theory: The Missing Middle Ground
Adam Oberman, McGill University
Over the past few years, the capabilities of generative artificial intelligence (AI) systems have advanced rapidly. Along with the benefits of AI, there is also a risk of harm. In order to benefit from AI while mitigating the risks, we need a grounded theoretical framework.
The current AI safety theory, which predates generative AI, is insufficient. Most theoretical AI safety results tend to reason absolutely: a system is a system is “aligned” or “mis-aligned”, “honest” or “dishonest”. But in practice safety is probabilistic, not absolute. The missing middle ground is a quantitative or relative theory of safety — a way to reason formally about degrees of safety. Such a theory is required for defining safety and harms, and is essential for technical solutions as well as for making good policy decisions.
In this talk I will:
- Review current AI risks (from misuse, from lack of reliability, and systemic risks to the economy) as well as important future risks (lack of control).
- Review theoretical predictions of bad AI behavior and discuss experiments which demonstrate that they can occur in current LLMs.
- Explain why technical and theoretical safety solutions are valuable, even by contributors outside of the major labs.
- Discuss some gaps in the theory and present some open problems which could address the gaps.
Adam Oberman is a Full Professor of Mathematics and Statistics at McGill University, a Canada CIFAR AI Chair, and an Associate Member of Mila. He is a research collaborator at LawZero, Yoshua Bengio’s AI Safety Institute. He has been researching AI safety since 2024. His research spans generative models, reinforcement learning, optimization, calibration, and robustness. Earlier in his career, he made significant contributions to optimal transport and nonlinear partial differential equations. He earned degrees from the University of Toronto and the University of Chicago, and previously held faculty and postdoctoral positions at Simon Fraser University and the University of Texas at Austin.
High-dimensional Optimization with Applications to Compute-Optimal Neural Scaling Laws
Courtney Paquette (McGill University)
Given the massive scale of modern ML models, we now only get a single shot to train them effectively. This restricts our ability to test multiple architectures and hyper-parameter configurations. Instead, we need to understand how these models scale, allowing us to experiment with smaller problems and then apply those insights to larger-scale models. In this talk, I will present a framework for analyzing scaling laws in stochastic learning algorithms using a power-law random features model (PLRF), leveraging high-dimensional probability and random matrix theory. I will then use this scaling law to address the compute-optimal question: How should we choose model size and hyper-parameters to achieve the best possible performance in the most compute-efficient manner? Then using this PLRF model, I will devise a new momentum-based algorithm that (provably) improves the scaling law exponent. Finally, I will present some numerical experiments on LSTMs that show how this new stochastic algorithm can be applied to real data to improve the compute-optimal exponent.
Courtney Paquette is an assistant professor at McGill University in the Mathematics and Statistics department, a CIFAR AI Chair (MILA), and an active member of the Montreal Machine Learning Optimization Group (MTL MLOpt) at MILA. Her research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, motivated by applications in data science, and using techniques that draw from a variety of fields, including probability, complexity theory, and convex and nonsmooth analysis. Dr. Paquette is a lead organizer of the OPT-ML Workshop at NeurIPS since 2020, and a lead organizer (and original creator) of the High-dimensional Learning Dynamics (HiLD) Workshop at ICML.
A New Paradigm for Learning with Distribution Shift
Adam Klivans, UT Austin
We revisit the fundamental problem of learning with distribution shift, where a learner is given labeled samples from training distribution D, unlabeled samples from test distribution D′ and is asked to output a classifier with low test error. The standard approach in this setting is to prove a generalization bound in terms of some notion of distance between D and D′. These distances, however, are difficult to compute, and this has been the main stumbling block for efficient algorithm design over the last two decades.
We sidestep this issue and define a new model called TDS learning, where a learner runs a test on the training set and is allowed to reject if this test detects distribution shift relative to a fixed output classifier. This approach leads to the first set of efficient algorithms for learning with distribution shift that do not take any assumptions on the test distribution. Finally, we discuss how our techniques have recently been used to solve longstanding problems in supervised learning with contamination.
Adam Klivans is a Professor of Computer Science at the University of Texas at Austin and Director of the NSF AI Institute for Foundations of Machine Learning (IFML). His research interests lie in machine learning and theoretical computer science, in particular, Learning Theory, Computational Complexity, Pseudorandomness, Limit Theorems, and Gaussian Space. Dr. Klivans is a recipient of the NSF CAREER Award and serves on the editorial board for the Theory of Computing and Machine Learning Journal.
Amplifying human performance in combinatorial competitive programming
Petar Veličković, Google DeepMind
Recent years have seen a significant surge in complex AI systems for competitive programming, capable of performing at admirable levels against human competitors. While steady progress has been made, the highest percentiles still remain out of reach for these methods on standard competition platforms such as Codeforces. In this talk, I will describe and dive into our recent work, where we focussed on combinatorial competitive programming. In combinatorial challenges, the target is to find as-good-as-possible solutions to otherwise computationally intractable problems, over specific given inputs. We hypothesise that this scenario offers a unique testbed for human-AI synergy, as human programmers can write a backbone of a heuristic solution, after which AI can be used to optimise the scoring function used by the heuristic. We deploy our approach on previous iterations of Hash Code, a global team programming competition inspired by NP-hard software engineering problems at Google, and we leverage FunSearch to evolve our scoring functions. Our evolved solutions significantly improve the attained scores from their baseline, successfully breaking into the top percentile on all previous Hash Code online qualification rounds, and outperforming the top human teams on several. To the best of our knowledge, this is the first known AI-assisted top-tier result in competitive programming.
Foundational Methods for Foundation Models for Scientific Machine Learning
Michael W. Mahoney, LBNL and UC Berkeley
The remarkable successes of ChatGPT in natural language processing (NLP) and related developments in computer vision (CV) motivate the question of what foundation models would look like and what new advances they would enable, when built on the rich, diverse, multimodal data that are available from large-scale experimental and simulational data in scientific computing (SC), broadly defined. Such models could provide a robust and principled foundation for scientific machine learning (SciML), going well beyond simply using ML tools developed for internet and social media applications to help solve future scientific problems. I will describe recent work demonstrating the potential of the “pre-train and fine-tune” paradigm, widely-used in CV and NLP, for SciML problems, demonstrating a clear path towards building SciML foundation models; as well as recent work highlighting multiple “failure modes” that arise when trying to interface data-driven ML methodologies with domain-driven SC methodologies, demonstrating clear obstacles to traversing that path successfully. I will also describe initial work on developing novel methods to address several of these challenges, as well as their implementations at scale, a general solution to which will be needed to build robust and reliable SciML models consisting of millions or billions or trillions of parameters.
Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He is also an Amazon Scholar as well as head of the Machine Learning and Analytics Group at the Lawrence Berkeley National Laboratory. He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, scientific machine learning, scalable stochastic optimization, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, computational methods for neural network analysis, physics informed machine learning, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he was on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he co-organized the Simons Institute’s fall 2013 and 2018 programs on the foundations of data science, he ran the Park City Mathematics Institute’s 2016 PCMI Summer Session on The Mathematics of Data, he ran the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he was the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. More information is available at https://www.stat.berkeley.edu/~mmahoney/.
Single location regression and attention-based models
Claire Boyer, Université Paris-Saclay
Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linear projection of the input. To solve this task, we propose a dedicated predictor, which turns out to be a simplified version of a non-linear self-attention layer. We study its theoretical properties, by showing its asymptotic Bayes optimality and analyzing its training dynamics. In particular, despite the non-convex nature of the problem, the predictor effectively learns the underlying structure. This work highlights the capacity of attention mechanisms to handle sparse token information and internal linear structures.
Synthetic Tasks as Testbeds for Attributing Model Behavior
Surbhi Goel, University of Pennsylvania
Understanding how different components of the machine learning pipeline—spanning data composition, architectural choices, and optimization dynamics—shape model behavior remains a fundamental challenge. In this talk, I will argue that synthetic tasks, which enable precise control over data distribution and task complexity, serve as powerful testbeds for analyzing and attributing behaviors in deep learning. Focusing on the sparse parity learning problem, a canonical task in learning theory, I will present insights into: (1) the phenomenon of “hidden progress” in gradient-based optimization, where models exhibit consistent advancement despite stagnating loss curves; (2) nuanced trade-offs between data, compute, model width, and initialization that govern learning success; and (3) the role of progressive distillation in implicitly structuring curricula to accelerate feature learning. These findings highlight the utility of synthetic tasks in uncovering empirical insights into the mechanisms driving deep learning, without the cost of training expensive models.
This talk is based on joint work with a lot of amazing collaborators: Boaz Barak, Ben Edelman, Sham Kakade, Bingbin Liu, Eran Malach, Sadhika Malladi, Abhishek Panigrahi, Andrej Risteski, and Cyril Zhang.
Surbhi Goel is the Magerman Term Assistant Professor of Computer and Information Science at the University of Pennsylvania. She is associated with the theory group, the ASSET Center on safe, explainable, and trustworthy AI systems, and the Warren Center for Network and Data Sciences. Surbhi’s research focuses on theoretical foundations of modern machine learning paradigms, particularly deep learning, and is supported by Microsoft Research and OpenAI. Previously, she was a postdoctoral researcher at Microsoft Research NYC and completed her Ph.D. at the University of Texas at Austin under Adam Klivans, receiving the UTCS Bert Kay Dissertation Award. She has also been a visiting researcher at IAS, Princeton, and the Simons Institute at UC Berkeley. Surbhi co-founded the Learning Theory Alliance (LeT‐All) and holds several leadership roles, including Office Hours co-chair for ICLR 2024 and co-treasurer for the Association for Computational Learning Theory.
Challenging Estimation Problems in Vehicle Autonomy
Rajesh Rajamani, University of Minnesota
This talk presents some interesting problems in estimation related to vehicle autonomy. First, a teleoperation application in which a remote operator can intervene to control an autonomous vehicle is considered. Fundamental challenges here include the need to design an effective teleoperation station, bandwidth and time-criticality constraints in wireless communication, and the need for a control system that can handle delays. A predictive display system that uses generative AI to estimate the current video display for the teleoperator from fusion of delayed camera and Lidar images is developed. By estimating trajectories of the ego vehicle and of other nearby vehicles on the road, realistic intermediate updates of the remote vehicle environment are used to compensate for delayed camera data. A different estimation application involving the driving of a vehicle with automated steering control on snow-covered and rural roads is considered next. Since camera-based feedback of lane markers cannot be used, sensor fusion algorithms and RTK-corrected GPS are utilized for lateral position estimation. Finally, the modification of target vehicle tracking methods utilized on autonomous vehicles for use on other low-cost platforms is considered. Applications involving protection of vulnerable road users such as e-scooter riders, bicyclists and construction zone workers is demonstrated. The fundamental theme underlying the different estimation problems in this seminar is the effective use of nonlinear vehicle dynamic models and novel nonlinear observer design algorithms.
Rajesh Rajamani obtained his M.S. and Ph.D. degrees from the University of California at Berkeley and his B.Tech degree from the Indian Institute of Technology at Madras. He joined the faculty in Mechanical Engineering at the University of Minnesota in 1998 where he is currently the Benjamin Y.H. Liu-TSI Endowed Chair Professor and Associate Director (Research) of the Minnesota Robotics Institute. His active research interests include estimation, sensing and control for smart and autonomous systems.
Dr. Rajamani has co-authored over 190 journal papers and is a co-inventor on 20+ patents/patent applications. He is a Fellow of IEEE and ASME and has been a recipient of the CAREER award from the National Science Foundation, the O. Hugo Schuck Award from the American Automatic Control Council, the Ralph Teetor Award from SAE, the Charles Stark Draper award from ASME, and a number of best paper awards from journals and conferences.
Several inventions from his laboratory have been commercialized through start-up ventures co-founded by industry executives. One of these companies, Innotronics, was recently recognized among the 35 Best University Start-Ups of 2016 by the US National Council of Entrepreneurial Tech Transfer.
Unlearnable Facts Cause Hallucinations in Pretrained Language Models
Adam Tauman Kalai, OpenAI
Pretrained language models (LMs) tend to preserve many qualities present in their training data, such as grammaticality, formatting, and politeness. However, for specific types of factuality, even LMs pretrained on factually correct statements tend to produce falsehoods at high rates. We explain these “hallucinations” by drawing a connection to binary classification, enabling us to leverage insights from supervised learning. We prove that pretrained LMs (which are “calibrated”) fail to mimic criteria that cannot be learned. Our analysis explains why pretrained LMs hallucinate on facts such as people’s birthdays but not on systematic facts such as even vs. odd numbers.
Of course, LM pretraining is only one stage in the development of a chatbot, and thus hallucinations are *not* inevitable in chatbots.
This is joint work with Santosh Vempala.
Adam Tauman Kalai is a Research Scientist at OpenAI working on AI Safety and Ethics. He has worked in Algorithms, Fairness, Machine Learning Theory, Game Theory, and Crowdsourcing. He received his PhD from Carnegie Mellon University. He has served as an Assistant Professor at Georgia Tech and TTIC, and is on the science team of the whale-translation Project CETI. He has co-chaired AI and crowdsourcing conferences and has numerous honors, most notably the Majulook prize.
How Transformers Learn Causal Structure with Gradient Descent
Jason Lee, Princeton University
The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encode causal structure which makes them particularly suitable for sequence modeling. However, the process by which transformers learn such causal structure via gradient-based training algorithms remains poorly understood. To better understand this process, we introduce an in-context learning task that requires learning latent causal structure. We prove that gradient descent on a simplified two-layer transformer learns to solve this task by encoding the latent causal graph in the first attention layer. The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens. As a consequence of the data processing inequality, the largest entries of this gradient correspond to edges in the latent causal graph. As a special case, when the sequences are generated from in-context Markov chains, we prove that transformers learn an induction head (Olsson et al., 2022). We confirm our theoretical findings by showing that transformers trained on our in-context learning task are able to recover a wide variety of causal structures.
Jason Lee is an associate professor in Electrical Engineering and Computer Science (secondary) at Princeton University. Prior to that, he was in the Data Science and Operations department at the University of Southern California and a postdoctoral researcher at UC Berkeley working with Michael I. Jordan. Jason received his PhD at Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in the theory of machine learning, optimization, and statistics. Lately, he has worked on the foundations of deep learning, representation learning, and reinforcement learning. He has received the Samsung AI Researcher of the Year Award, NSF Career Award, ONR Young Investigator Award in Mathematical Data Science, Sloan Research Fellowship, NeurIPS Best Student Paper Award and Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization, and Princeton Commendation for Outstanding Teaching.















