Synthetic Tasks as Testbeds for Attributing Model Behavior
Surbhi Goel, University of Pennsylvania
Understanding how different components of the machine learning pipeline—spanning data composition, architectural choices, and optimization dynamics—shape model behavior remains a fundamental challenge. In this talk, I will argue that synthetic tasks, which enable precise control over data distribution and task complexity, serve as powerful testbeds for analyzing and attributing behaviors in deep learning. Focusing on the sparse parity learning problem, a canonical task in learning theory, I will present insights into: (1) the phenomenon of “hidden progress” in gradient-based optimization, where models exhibit consistent advancement despite stagnating loss curves; (2) nuanced trade-offs between data, compute, model width, and initialization that govern learning success; and (3) the role of progressive distillation in implicitly structuring curricula to accelerate feature learning. These findings highlight the utility of synthetic tasks in uncovering empirical insights into the mechanisms driving deep learning, without the cost of training expensive models.
This talk is based on joint work with a lot of amazing collaborators: Boaz Barak, Ben Edelman, Sham Kakade, Bingbin Liu, Eran Malach, Sadhika Malladi, Abhishek Panigrahi, Andrej Risteski, and Cyril Zhang.
Surbhi Goel is the Magerman Term Assistant Professor of Computer and Information Science at the University of Pennsylvania. She is associated with the theory group, the ASSET Center on safe, explainable, and trustworthy AI systems, and the Warren Center for Network and Data Sciences. Surbhi’s research focuses on theoretical foundations of modern machine learning paradigms, particularly deep learning, and is supported by Microsoft Research and OpenAI. Previously, she was a postdoctoral researcher at Microsoft Research NYC and completed her Ph.D. at the University of Texas at Austin under Adam Klivans, receiving the UTCS Bert Kay Dissertation Award. She has also been a visiting researcher at IAS, Princeton, and the Simons Institute at UC Berkeley. Surbhi co-founded the Learning Theory Alliance (LeT‐All) and holds several leadership roles, including Office Hours co-chair for ICLR 2024 and co-treasurer for the Association for Computational Learning Theory.
Tutorial on AI Alignment (part 2 of 2): Methodologies for AI Alignment
Ahmad Beirami, Google DeepMind
Hamed Hassani, University of Pennsylvania
The second part of the tutorial focuses on AI alignment techniques and is structured as three segments: In the first segment, we examine black-box techniques aimed at aligning models towards various goals (e.g., safety), such as controlled decoding and the best-of-N algorithm. In the second segment, we will also consider efficiency, where we examine information-theoretic techniques designed to improve inference latency, such as model compression or speculative decoding. If time permits, in the final segment, we discuss inference-aware alignment, which is a framework to align models to work better with inference-time compute algorithms.
Tutorial on AI Alignment (part 1 of 2): Safety Vulnerabilities of Current Frontier Models
Ahmad Beirami, Google DeepMind
Hamed Hassani, University of Pennsylvania
In recent years, large language models have been used to solve a multitude of natural language tasks. In the first part of the tutorial, we start by giving a brief overview of the history of language modeling and the fundamental techniques that led to the development of the modern language models behind Claude, Gemini, GPT, and Llama. We then dive into the safety failure modes of the current frontier models. Specifically, we will explain that, despite efforts to align large language models (LLMs) with human intentions, popular LLMs are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. We review the current state of the jailbreaking literature, including new questions about robust generalization, discussions of open-box and black-box attacks on LLMs, defenses against jailbreaking attacks, and a new leaderboard to evaluate the robust generalization of production LLMs.
The focus of the first session will be mostly on safety vulnerabilities of the frontier LLMs. In the second session, we will focus on the current methodologies that aim to mitigate these vulnerabilities and more generally align language models with human standards.
Challenging Estimation Problems in Vehicle Autonomy
Rajesh Rajamani, University of Minnesota
This talk presents some interesting problems in estimation related to vehicle autonomy. First, a teleoperation application in which a remote operator can intervene to control an autonomous vehicle is considered. Fundamental challenges here include the need to design an effective teleoperation station, bandwidth and time-criticality constraints in wireless communication, and the need for a control system that can handle delays. A predictive display system that uses generative AI to estimate the current video display for the teleoperator from fusion of delayed camera and Lidar images is developed. By estimating trajectories of the ego vehicle and of other nearby vehicles on the road, realistic intermediate updates of the remote vehicle environment are used to compensate for delayed camera data. A different estimation application involving the driving of a vehicle with automated steering control on snow-covered and rural roads is considered next. Since camera-based feedback of lane markers cannot be used, sensor fusion algorithms and RTK-corrected GPS are utilized for lateral position estimation. Finally, the modification of target vehicle tracking methods utilized on autonomous vehicles for use on other low-cost platforms is considered. Applications involving protection of vulnerable road users such as e-scooter riders, bicyclists and construction zone workers is demonstrated. The fundamental theme underlying the different estimation problems in this seminar is the effective use of nonlinear vehicle dynamic models and novel nonlinear observer design algorithms.
Rajesh Rajamani obtained his M.S. and Ph.D. degrees from the University of California at Berkeley and his B.Tech degree from the Indian Institute of Technology at Madras. He joined the faculty in Mechanical Engineering at the University of Minnesota in 1998 where he is currently the Benjamin Y.H. Liu-TSI Endowed Chair Professor and Associate Director (Research) of the Minnesota Robotics Institute. His active research interests include estimation, sensing and control for smart and autonomous systems.
Dr. Rajamani has co-authored over 190 journal papers and is a co-inventor on 20+ patents/patent applications. He is a Fellow of IEEE and ASME and has been a recipient of the CAREER award from the National Science Foundation, the O. Hugo Schuck Award from the American Automatic Control Council, the Ralph Teetor Award from SAE, the Charles Stark Draper award from ASME, and a number of best paper awards from journals and conferences.
Several inventions from his laboratory have been commercialized through start-up ventures co-founded by industry executives. One of these companies, Innotronics, was recently recognized among the 35 Best University Start-Ups of 2016 by the US National Council of Entrepreneurial Tech Transfer.
Unlearnable Facts Cause Hallucinations in Pretrained Language Models
Adam Tauman Kalai, OpenAI
Pretrained language models (LMs) tend to preserve many qualities present in their training data, such as grammaticality, formatting, and politeness. However, for specific types of factuality, even LMs pretrained on factually correct statements tend to produce falsehoods at high rates. We explain these “hallucinations” by drawing a connection to binary classification, enabling us to leverage insights from supervised learning. We prove that pretrained LMs (which are “calibrated”) fail to mimic criteria that cannot be learned. Our analysis explains why pretrained LMs hallucinate on facts such as people’s birthdays but not on systematic facts such as even vs. odd numbers.
Of course, LM pretraining is only one stage in the development of a chatbot, and thus hallucinations are *not* inevitable in chatbots.
This is joint work with Santosh Vempala.
Adam Tauman Kalai is a Research Scientist at OpenAI working on AI Safety and Ethics. He has worked in Algorithms, Fairness, Machine Learning Theory, Game Theory, and Crowdsourcing. He received his PhD from Carnegie Mellon University. He has served as an Assistant Professor at Georgia Tech and TTIC, and is on the science team of the whale-translation Project CETI. He has co-chaired AI and crowdsourcing conferences and has numerous honors, most notably the Majulook prize.
How Transformers Learn Causal Structure with Gradient Descent
Jason Lee, Princeton University
The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encode causal structure which makes them particularly suitable for sequence modeling. However, the process by which transformers learn such causal structure via gradient-based training algorithms remains poorly understood. To better understand this process, we introduce an in-context learning task that requires learning latent causal structure. We prove that gradient descent on a simplified two-layer transformer learns to solve this task by encoding the latent causal graph in the first attention layer. The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens. As a consequence of the data processing inequality, the largest entries of this gradient correspond to edges in the latent causal graph. As a special case, when the sequences are generated from in-context Markov chains, we prove that transformers learn an induction head (Olsson et al., 2022). We confirm our theoretical findings by showing that transformers trained on our in-context learning task are able to recover a wide variety of causal structures.
Jason Lee is an associate professor in Electrical Engineering and Computer Science (secondary) at Princeton University. Prior to that, he was in the Data Science and Operations department at the University of Southern California and a postdoctoral researcher at UC Berkeley working with Michael I. Jordan. Jason received his PhD at Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in the theory of machine learning, optimization, and statistics. Lately, he has worked on the foundations of deep learning, representation learning, and reinforcement learning. He has received the Samsung AI Researcher of the Year Award, NSF Career Award, ONR Young Investigator Award in Mathematical Data Science, Sloan Research Fellowship, NeurIPS Best Student Paper Award and Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization, and Princeton Commendation for Outstanding Teaching.
Off-the-shelf Algorithmic Stability
Rebecca Willett, University of Chicago
Algorithmic stability holds when our conclusions, estimates, fitted models, predictions, or decisions are insensitive to small changes to the training data. Stability has emerged as a core principle for reliable data science, providing insights into generalization, cross-validation, uncertainty quantification, and more. Whereas prior literature has developed mathematical tools for analyzing the stability of specific machine learning (ML) algorithms, we study methods that can be applied to arbitrary learning algorithms to satisfy a desired level of stability. First, I will discuss how bagging is guaranteed to stabilize any prediction model, regardless of the input data. Thus, if we remove or replace a small fraction of the training data at random, the resulting prediction will typically change very little. Our analysis provides insight into how the size of the bags (bootstrap datasets) influences stability, giving practitioners a new tool for guaranteeing a desired level of stability. Second, I will describe how to extend these stability guarantees beyond prediction modeling to more general statistical estimation problems where bagging is not as well known but equally useful for stability. Specifically, I will describe a new framework for stable classification and model selection by combining bagging on class or model weights with a stable, “soft” version of the argmax operator.
Rebecca Willett is a Professor of Statistics and Computer Science and the Director of AI in the Data Science Institute at the University of Chicago, and she holds a courtesy appointment at the Toyota Technological Institute at Chicago. Her research is focused on machine learning foundations, scientific machine learning, and signal processing. Willett received the inaugural Data Science Career Prize from the Society of Industrial and Applied Mathematics in 2024, was named a Fellow of the Society of Industrial and Applied Mathematics in 2021, and was named a Fellow of the IEEE in 2022. She is the Deputy Director for Research at the NSF-Simons Foundation National Institute for Theory and Mathematics in Biology, Deputy Director for Research at the NSF-Simons Institute for AI in the Sky (SkAI), and a member of the NSF Institute for the Foundations of Data Science Executive Committee. She is the Faculty Director of the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship. She helps direct the Air Force Research Lab University Center of Excellence on Machine Learning. She received the National Science Foundation CAREER Award in 2007, was a DARPA Computer Science Study Group member, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005. She was an Assistant and then tenured Associate Professor of Electrical and Computer Engineering at Duke University from 2005 to 2013. She was an Associate Professor of Electrical and Computer Engineering, Harvey D. Spangler Faculty Scholar, and Fellow of the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison from 2013 to 2018.
What Kinds of Functions do Neural Networks Learn? Theory and Practical Applications
Robert Nowak, University of Wisconsin
This talk presents a theory characterizing the types of functions neural networks learn from data. Specifically, the function space generated by deep ReLU networks consists of compositions of functions from the Banach space of second-order bounded variation in the Radon transform domain. This Banach space includes functions with smooth projections in most directions. A representer theorem associated with this space demonstrates that finite-width neural networks suffice for fitting finite datasets. The theory has several practical applications. First, it provides a simple and theoretically grounded method for network compression. Second, it shows that multi-task training can yield significantly different solutions compared to single-task training, and that multi-task solutions can be related to kernel ridge regressions. Third, the theory has implications for improving implicit neural representations, where multi-layer neural networks are used to represent continuous signals, images, or 3D scenes. This exploration bridges theoretical insights with practical advancements, offering a new perspective on neural network capabilities and future research directions.
Robert Nowak is the Grace Wahba Professor of Data Science and Keith and Jane Nosbusch Professor in Electrical and Computer Engineering at the University of Wisconsin-Madison. His research focuses on machine learning, optimization, and signal processing. He serves on the editorial boards of the SIAM Journal on the Mathematics of Data Science and the IEEE Journal on Selected Areas in Information Theory.
Large Datasets and Models for Robots in the Real World
Nicklas Hansen, UC San Diego
Recent progress in AI can be attributed to the emergence of large models trained on large datasets. However, teaching AI agents to reliably interact with our physical world has proven challenging, which is in part due to a lack of large and sufficiently diverse robot datasets. In this talk, I will cover ongoing efforts of the Open X-Embodiment project–a collaboration between 279 researchers across 20+ institutions–to build a large, open dataset for real-world robotics, and discuss how this new paradigm is rapidly changing the field. Concretely, I will discuss why we need large datasets in robotics, what such datasets may look like, and how large models can be trained and evaluated effectively in a cross-embodiment cross-environment setting. Finally, I will conclude the talk by sharing my perspective on the limitations of current embodied AI agents, as well as how to move forward as a community.
Nicklas Hansen is a Ph.D. student at University of California San Diego advised by Prof. Xiaolong Wang and Prof. Hao Su. His research focuses on developing generalist AI agents that learn from interaction with the physical and digital world. He has spent time at Meta AI (FAIR) and University of California Berkeley (BAIR), and received his B.S. and M.S. degrees from Technical University of Denmark. He is a recipient of the 2024 NVIDIA Graduate Fellowship, and his work has been featured at top venues in machine learning and robotics.
Transformers Learn In-context by (Functional) Gradient Descent
Xiang Cheng, TILOS Postdoctoral Scholar at MIT
Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a non-linear Transformer can implement functional gradient descent with respect to some RKHS metric, which allows it to learn a broad class of functions in-context. Additionally, we show that the RKHS metric is determined by the choice of attention activation, and that the optimal choice of attention activation depends in a natural way on the class of functions that need to be learned. I will end by discussing some implications of our results for the choice and design of Transformer architectures.
How Large Models of Language and Vision Help Agents to Learn to Behave
Roy Fox, Assistant Professor and Director of the Intelligent Dynamics Lab, UC Irvine
If learning from data is valuable, can learning from big data be very valuable? So far, it has been so in vision and language, for which foundation models can be trained on web-scale data to support a plethora of downstream tasks; not so much in control, for which scalable learning remains elusive. Can information encoded in vision and language models guide reinforcement learning of control policies? In this talk, I will discuss several ways for foundation models to help agents to learn to behave. Language models can provide better context for decision-making: we will see how they can succinctly describe the world state to focus the agent on relevant features; and how they can form generalizable skills that identify key subgoals. Vision and vision–language models can help the agent to model the world: we will see how they can block visual distractions to keep state representations task-relevant; and how they can hypothesize about abstract world models that guide exploration and planning.
Roy Fox is an Assistant Professor of Computer Science at the University of California, Irvine. His research interests include theory and applications of control learning: reinforcement learning (RL), control theory, information theory, and robotics. His current research focuses on structured and model-based RL, language for RL and RL for language, and optimization in deep control learning of virtual and physical agents.
The Synergy between Machine Learning and the Natural Sciences
Max Welling, Research Chair in Machine Learning, University of Amsterdam
Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster DFT, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks.
Prof. Max Welling is a research chair in Machine Learning at the University of Amsterdam and a Distinguished Scientist at MSR. He is a fellow at the Canadian Institute for Advanced Research (CIFAR) and the European Lab for Learning and Intelligent Systems (ELLIS) where he also serves on the founding board. His previous appointments include VP at Qualcomm Technologies, professor at UC Irvine, postdoc at U. Toronto and UCL under supervision of prof. Geoffrey Hinton, and postdoc at Caltech under supervision of prof. Pietro Perona. He finished his PhD in theoretical high energy physics under supervision of Nobel laureate Prof. Gerard ‘t Hooft.
TILOS Webinar: AI Ethics in Research
Dr. Nisheeth Vishnoi (Yale) and Dr. David Danks (UC San Diego) discuss their research in the ethics of AI. Professor Danks develops practical frameworks and methods to incorporate ethical and policy considerations throughout the AI lifecycle, including different ways to include them in optimization steps. Bias and fairness have been a particular focus given the multiple ways in which they can be measured, represented, and used. Professor Vishnoi uses optimization as a lens to study how subjective human and societal biases emerge in the objective world of artificial algorithms, as well as how to design strategies to mitigate these biases.
Nisheeth Vishnoi is the A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. He studies the foundations of computation, and his research spans several areas of theoretical computer science, optimization, and machine learning. He is also interested in understanding nature and society from a computational viewpoint. Here, his current focus includes understanding the emergence of intelligence and developing methods to address ethical issues at the interface of artificial intelligence and humanity.
David Danks is Professor of Data Science and Philosophy and affiliate faculty in Computer Science and Engineering at University of California, San Diego. His research interests range widely across philosophy, cognitive science, and machine learning, including their intersection. Danks has examined the ethical, psychological, and policy issues around AI and robotics across multiple sectors, including transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science and developed multiple novel causal discovery algorithms for complex types of observational and experimental data. Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He currently serves on multiple advisory boards, including the National AI Advisory Committee.
TILOS Seminar: The Dissimilarity Dimension: Sharper Bounds for Optimistic Algorithms
Aldo Pacchiano, Assistant Professor, Boston University Center for Computing and Data Sciences
The principle of Optimism in the Face of Uncertainty (OFU) is one of the foundational algorithmic design choices in Reinforcement Learning and Bandits. Optimistic algorithms balance exploration and exploitation by deploying data collection strategies that maximize expected rewards in plausible models. This is the basis of celebrated algorithms like the Upper Confidence Bound (UCB) for multi-armed bandits. For nearly a decade, the analysis of optimistic algorithms, including Optimistic Least Squares, in the context of rich reward function classes has relied on the concept of eluder dimension, introduced by Russo and Van Roy in 2013. In this talk we shed light on the limitations of the eluder dimension in capturing the true behavior of optimistic strategies in the realm of function approximation. We remediate these by introducing a novel statistical measure, the “dissimilarity dimension”. We show it can be used to provide sharper sample analysis of algorithms like Optimistic Least Squares by establishing a link between regret and the dissimilarity dimension. To illustrate this, we will show that some function classes have arbitrarily large eluder dimension but constant dissimilarity. Our regret analysis draws inspiration from graph theory and may be of interest to the mathematically minded beyond the field of statistical learning theory. This talk sheds new light on the fundamental principle of optimism and its algorithms in the function approximation regime, advancing our understanding of these concepts.
TILOS Seminar: Building Personalized Decision Models with Federated Human Preferences
Aadirupa Saha, Research Scientist at Apple
Customer statistics collected in several real-world systems have reflected that users often prefer eliciting their liking for a given pair of items, say (A,B), in terms of relative queries like: “Do you prefer Item A over B?”, rather than their absolute counterparts: “How much do you score items A and B on a scale of [0-10]?”. Drawing inspirations, in the search for a more effective feedback collection mechanism, led to the famous formulation of Dueling Bandits (DB), which is a widely studied online learning framework for efficient information aggregation from relative/comparative feedback. However despite the novel objective, unfortunately, most of the existing DB techniques were limited only to simpler settings of finite decision spaces, and stochastic environments, which are unrealistic in practice.
In this talk, we will start with the basic problem formulations for DB and familiarize ourselves with some of the breakthrough results. Following this, will dive deeper into a more practical framework of contextual dueling bandits (C-DB) where the goal of the learner is to make personalized predictions based on the user contexts. We will see a new algorithmic approach that can efficiently achieve the optimal O(sqrt T) regret performance for this problem, resolving an open problem from Dudík et al. [COLT, 2015]. In the last part of the talk, we will extend the aforementioned models to a federated framework, which entails developing preference-driven prediction models for distributed environments for creating large-scale personalized systems, including recommender systems and chatbot interactions. Apart from exploiting the limited preference feedback model, the challenge lies in ensuring user privacy and reducing communication complexity in the federated setting. We will conclude the talk with some interesting open problems.
Towards Foundation Models for Graph Reasoning and AI 4 Science
Michael Galkin, Research Scientist at Intel AI Lab
Foundation models in graph learning are hard to design due to the lack of common invariances that transfer across different structures and domains. In this talk, I will give an overview of the two main tracks of my research at Intel AI: creating foundation models for knowledge graph reasoning that can run zero-shot inference on any multi-relational graphs, and foundation models for materials discovery in the AI4Science domain that capture physical properties of crystal structures and transfer to a variety of predictive and generative tasks. We will also talk about theoretical and practical challenges like scaling behavior, data scarcity, and diverse evaluation of foundation graph models.
Michael Galkin is a Research Scientist at Intel AI Lab in San Diego working on Graph Machine Learning and Geometric Deep Learning. Previously, he was a postdoc at Mila–Quebec AI Institute with Will Hamilton, Reihaneh Rabbany, and Jian Tang, focusing on many graph representation learning problems. Sometimes, Mike writes long blog posts on Medium about graph learning.
TILOS Fireside Chat: Theory in the Age of Modern AI
A conversation about theory in the age of modern artificial intelligence (AI) with TILOS member panelists Nisheeth Vishnoi, Tara Javidi, Misha Belkin, and Arya Mazumdar (moderator).
TILOS AI Ethics Panel
Panelists Dr. Nisheeth Vishnoi (Yale), Dr. David Danks (UC San Diego), and Dr. Hoda Heidari (Carnegie Mellon University) discuss a variety of aspects of the ethics of AI with our moderators Dr. Stefanie Jegelka (MIT) and Dr. Jodi Reeves (National University).
Nisheeth Vishnoi is the A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. He studies the foundations of computation, and his research spans several areas of theoretical computer science, optimization, and machine learning. He is also interested in understanding nature and society from a computational viewpoint. Here, his current focus includes understanding the emergence of intelligence and developing methods to address ethical issues at the interface of artificial intelligence and humanity.
David Danks is Professor of Data Science & Philosophy and affiliate faculty in Computer Science & Engineering at University of California, San Diego. His research interests range widely across philosophy, cognitive science, and machine learning, including their intersection. Danks has examined the ethical, psychological, and policy issues around AI and robotics across multiple sectors, including transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science and developed multiple novel causal discovery algorithms for complex types of observational and experimental data. Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He currently serves on multiple advisory boards, including the National AI Advisory Committee.
Hoda Heidari is an Assistant Professor in Machine Learning and Societal Computing at the School of Computer Science, Carnegie Mellon University. Her research is broadly concerned with the social, ethical, and economic implications of Artificial Intelligence. In particular, her research addresses issues of unfairness and accountability through Machine Learning. Her work in this area has won a best-paper award at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) and an exemplary track award at the ACM Conference on Economics and Computation (EC). She has organized several scholarly events on topics related to Responsible and Trustworthy AI, including a tutorial at the Web Conference (WWW) and several workshops at the Neural and Information Processing Systems (NeurIPS) conference. Dr. Heidari completed her doctoral studies in Computer and Information Science at the University of Pennsylvania. She holds an M.Sc. degree in Statistics from the Wharton School of Business. Before joining Carnegie Mellon as a faculty member, she was a postdoctoral scholar at the Machine Learning Institute of ETH Zurich, followed by a year at the Artificial Intelligence, Policy, and Practice (AIPP) initiative at Cornell University.
TILOS Seminar: Learning from Diverse and Small Data
Ramya Korlakai Vinayak, Assistant Professor, University of Wisconsin at Madison
In this talk, we will address this question in the following settings:
(i) In many applications, we observe count data which can be modeled as Binomial (e.g., polling, surveys, epidemiology) or Poisson (e.g., single cell RNA data) data. As a single or finite parameters do not capture the diversity of the population in such datasets, they are often modeled as nonparametric mixtures. In this setting, we will address the following question, “how well can we learn the distribution of parameters over the population without learning the individual parameters?” and show that nonparametric maximum likelihood estimators are in fact minimax optimal.
(ii) Learning preferences from human judgements using comparison queries plays a crucial role in cognitive and behavioral psychology, crowdsourcing democracy, surveys in social science applications, and recommendation systems. Models in the literature often focus on learning average preference over the population due to the limitations on the amount of data available per individual. We will discuss some recent results on how we can reliably capture diversity in preferences while pooling together data from individuals.
Ramya Korlakai Vinayak is an assistant professor in the Dept. of ECE and affiliated faculty in the Dept. of Computer Science and the Dept. of Statistics at the UW-Madison. Her research interests span the areas of machine learning, statistical inference, and crowdsourcing. Her work focuses on addressing theoretical and practical challenges that arise when learning from societal data. Prior to joining UW-Madison, Ramya was a postdoctoral researcher in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. She received her Ph.D. in Electrical Engineering from Caltech. She obtained her Masters from Caltech and Bachelors from IIT Madras. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013-15, and an invited participant at the Rising Stars in EECS workshop in 2019. She is the recipient of NSF CAREER Award 2023-2028.
TILOS Seminar: Machine Learning Training Strategies Inspired by Humans' Learning Skills
Pengtao Xie, Assistant Professor, UC San Diego
Humans, as the most powerful learners on the planet, have accumulated a lot of learning skills, such as learning through tests, interleaving learning, self-explanation, active recalling, to name a few. These learning skills and methodologies enable humans to learn new topics more effectively and efficiently. We are interested in investigating whether humans' learning skills can be borrowed to help machines to learn better. Specifically, we aim to formalize these skills and leverage them to train better machine learning (ML) models. To achieve this goal, we develop a general framework--Skillearn, which provides a principled way to represent humans' learning skills mathematically and use the formally-represented skills to improve the training of ML models. In two case studies, we apply Skillearn to formalize two learning skills of humans: learning by passing tests and interleaving learning, and use the formalized skills to improve neural architecture search.
Pengtao Xie is an assistant professor at UC San Diego. He received his PhD from the Machine Learning Department at Carnegie Mellon University in 2018. His research interests lie in machine learning inspired by human learning and its applications in healthcare. His research outcomes have been adopted by medical device companies, medical imaging centers, hospitals, etc. and have been published at top-tier artificial intelligence conferences and journals including ICML, NeurIPS, ACL, ICCV, TACL, etc. He is the recipient of the Tencent AI-Lab Faculty Award, Tencent WeChat Faculty Award, the Innovator Award presented by the Pittsburgh Business Times, the Siebel Scholars award, and the Goldman Sachs Global Leader Scholarship.