Unlearnable Facts Cause Hallucinations in Pretrained Language Models
Adam Tauman Kalai, OpenAI
Pretrained language models (LMs) tend to preserve many qualities present in their training data, such as grammaticality, formatting, and politeness. However, for specific types of factuality, even LMs pretrained on factually correct statements tend to produce falsehoods at high rates. We explain these “hallucinations” by drawing a connection to binary classification, enabling us to leverage insights from supervised learning. We prove that pretrained LMs (which are “calibrated”) fail to mimic criteria that cannot be learned. Our analysis explains why pretrained LMs hallucinate on facts such as people’s birthdays but not on systematic facts such as even vs. odd numbers.
Of course, LM pretraining is only one stage in the development of a chatbot, and thus hallucinations are *not* inevitable in chatbots.
This is joint work with Santosh Vempala.
Adam Tauman Kalai is a Research Scientist at OpenAI working on AI Safety and Ethics. He has worked in Algorithms, Fairness, Machine Learning Theory, Game Theory, and Crowdsourcing. He received his PhD from Carnegie Mellon University. He has served as an Assistant Professor at Georgia Tech and TTIC, and is on the science team of the whale-translation Project CETI. He has co-chaired AI and crowdsourcing conferences and has numerous honors, most notably the Majulook prize.
How Transformers Learn Causal Structure with Gradient Descent
Jason Lee, Princeton University
The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encode causal structure which makes them particularly suitable for sequence modeling. However, the process by which transformers learn such causal structure via gradient-based training algorithms remains poorly understood. To better understand this process, we introduce an in-context learning task that requires learning latent causal structure. We prove that gradient descent on a simplified two-layer transformer learns to solve this task by encoding the latent causal graph in the first attention layer. The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens. As a consequence of the data processing inequality, the largest entries of this gradient correspond to edges in the latent causal graph. As a special case, when the sequences are generated from in-context Markov chains, we prove that transformers learn an induction head (Olsson et al., 2022). We confirm our theoretical findings by showing that transformers trained on our in-context learning task are able to recover a wide variety of causal structures.
Jason Lee is an associate professor in Electrical Engineering and Computer Science (secondary) at Princeton University. Prior to that, he was in the Data Science and Operations department at the University of Southern California and a postdoctoral researcher at UC Berkeley working with Michael I. Jordan. Jason received his PhD at Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in the theory of machine learning, optimization, and statistics. Lately, he has worked on the foundations of deep learning, representation learning, and reinforcement learning. He has received the Samsung AI Researcher of the Year Award, NSF Career Award, ONR Young Investigator Award in Mathematical Data Science, Sloan Research Fellowship, NeurIPS Best Student Paper Award and Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization, and Princeton Commendation for Outstanding Teaching.
Off-the-shelf Algorithmic Stability
Rebecca Willett, University of Chicago
Algorithmic stability holds when our conclusions, estimates, fitted models, predictions, or decisions are insensitive to small changes to the training data. Stability has emerged as a core principle for reliable data science, providing insights into generalization, cross-validation, uncertainty quantification, and more. Whereas prior literature has developed mathematical tools for analyzing the stability of specific machine learning (ML) algorithms, we study methods that can be applied to arbitrary learning algorithms to satisfy a desired level of stability. First, I will discuss how bagging is guaranteed to stabilize any prediction model, regardless of the input data. Thus, if we remove or replace a small fraction of the training data at random, the resulting prediction will typically change very little. Our analysis provides insight into how the size of the bags (bootstrap datasets) influences stability, giving practitioners a new tool for guaranteeing a desired level of stability. Second, I will describe how to extend these stability guarantees beyond prediction modeling to more general statistical estimation problems where bagging is not as well known but equally useful for stability. Specifically, I will describe a new framework for stable classification and model selection by combining bagging on class or model weights with a stable, “soft” version of the argmax operator.
Rebecca Willett is a Professor of Statistics and Computer Science and the Director of AI in the Data Science Institute at the University of Chicago, and she holds a courtesy appointment at the Toyota Technological Institute at Chicago. Her research is focused on machine learning foundations, scientific machine learning, and signal processing. Willett received the inaugural Data Science Career Prize from the Society of Industrial and Applied Mathematics in 2024, was named a Fellow of the Society of Industrial and Applied Mathematics in 2021, and was named a Fellow of the IEEE in 2022. She is the Deputy Director for Research at the NSF-Simons Foundation National Institute for Theory and Mathematics in Biology, Deputy Director for Research at the NSF-Simons Institute for AI in the Sky (SkAI), and a member of the NSF Institute for the Foundations of Data Science Executive Committee. She is the Faculty Director of the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship. She helps direct the Air Force Research Lab University Center of Excellence on Machine Learning. She received the National Science Foundation CAREER Award in 2007, was a DARPA Computer Science Study Group member, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005. She was an Assistant and then tenured Associate Professor of Electrical and Computer Engineering at Duke University from 2005 to 2013. She was an Associate Professor of Electrical and Computer Engineering, Harvey D. Spangler Faculty Scholar, and Fellow of the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison from 2013 to 2018.
What Kinds of Functions do Neural Networks Learn? Theory and Practical Applications
Robert Nowak, University of Wisconsin
This talk presents a theory characterizing the types of functions neural networks learn from data. Specifically, the function space generated by deep ReLU networks consists of compositions of functions from the Banach space of second-order bounded variation in the Radon transform domain. This Banach space includes functions with smooth projections in most directions. A representer theorem associated with this space demonstrates that finite-width neural networks suffice for fitting finite datasets. The theory has several practical applications. First, it provides a simple and theoretically grounded method for network compression. Second, it shows that multi-task training can yield significantly different solutions compared to single-task training, and that multi-task solutions can be related to kernel ridge regressions. Third, the theory has implications for improving implicit neural representations, where multi-layer neural networks are used to represent continuous signals, images, or 3D scenes. This exploration bridges theoretical insights with practical advancements, offering a new perspective on neural network capabilities and future research directions.
Robert Nowak is the Grace Wahba Professor of Data Science and Keith and Jane Nosbusch Professor in Electrical and Computer Engineering at the University of Wisconsin-Madison. His research focuses on machine learning, optimization, and signal processing. He serves on the editorial boards of the SIAM Journal on the Mathematics of Data Science and the IEEE Journal on Selected Areas in Information Theory.
Large Datasets and Models for Robots in the Real World
Nicklas Hansen, UC San Diego
Recent progress in AI can be attributed to the emergence of large models trained on large datasets. However, teaching AI agents to reliably interact with our physical world has proven challenging, which is in part due to a lack of large and sufficiently diverse robot datasets. In this talk, I will cover ongoing efforts of the Open X-Embodiment project–a collaboration between 279 researchers across 20+ institutions–to build a large, open dataset for real-world robotics, and discuss how this new paradigm is rapidly changing the field. Concretely, I will discuss why we need large datasets in robotics, what such datasets may look like, and how large models can be trained and evaluated effectively in a cross-embodiment cross-environment setting. Finally, I will conclude the talk by sharing my perspective on the limitations of current embodied AI agents, as well as how to move forward as a community.
Nicklas Hansen is a Ph.D. student at University of California San Diego advised by Prof. Xiaolong Wang and Prof. Hao Su. His research focuses on developing generalist AI agents that learn from interaction with the physical and digital world. He has spent time at Meta AI (FAIR) and University of California Berkeley (BAIR), and received his B.S. and M.S. degrees from Technical University of Denmark. He is a recipient of the 2024 NVIDIA Graduate Fellowship, and his work has been featured at top venues in machine learning and robotics.
Transformers Learn In-context by (Functional) Gradient Descent
Xiang Cheng, TILOS Postdoctoral Scholar at MIT
Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a non-linear Transformer can implement functional gradient descent with respect to some RKHS metric, which allows it to learn a broad class of functions in-context. Additionally, we show that the RKHS metric is determined by the choice of attention activation, and that the optimal choice of attention activation depends in a natural way on the class of functions that need to be learned. I will end by discussing some implications of our results for the choice and design of Transformer architectures.
How Large Models of Language and Vision Help Agents to Learn to Behave
Roy Fox, Assistant Professor and Director of the Intelligent Dynamics Lab, UC Irvine
If learning from data is valuable, can learning from big data be very valuable? So far, it has been so in vision and language, for which foundation models can be trained on web-scale data to support a plethora of downstream tasks; not so much in control, for which scalable learning remains elusive. Can information encoded in vision and language models guide reinforcement learning of control policies? In this talk, I will discuss several ways for foundation models to help agents to learn to behave. Language models can provide better context for decision-making: we will see how they can succinctly describe the world state to focus the agent on relevant features; and how they can form generalizable skills that identify key subgoals. Vision and vision–language models can help the agent to model the world: we will see how they can block visual distractions to keep state representations task-relevant; and how they can hypothesize about abstract world models that guide exploration and planning.
Roy Fox is an Assistant Professor of Computer Science at the University of California, Irvine. His research interests include theory and applications of control learning: reinforcement learning (RL), control theory, information theory, and robotics. His current research focuses on structured and model-based RL, language for RL and RL for language, and optimization in deep control learning of virtual and physical agents.
The Synergy between Machine Learning and the Natural Sciences
Max Welling, Research Chair in Machine Learning, University of Amsterdam
Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster DFT, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks.
Prof. Max Welling is a research chair in Machine Learning at the University of Amsterdam and a Distinguished Scientist at MSR. He is a fellow at the Canadian Institute for Advanced Research (CIFAR) and the European Lab for Learning and Intelligent Systems (ELLIS) where he also serves on the founding board. His previous appointments include VP at Qualcomm Technologies, professor at UC Irvine, postdoc at U. Toronto and UCL under supervision of prof. Geoffrey Hinton, and postdoc at Caltech under supervision of prof. Pietro Perona. He finished his PhD in theoretical high energy physics under supervision of Nobel laureate Prof. Gerard ‘t Hooft.
TILOS Webinar: AI Ethics in Research
Dr. Nisheeth Vishnoi (Yale) and Dr. David Danks (UC San Diego) discuss their research in the ethics of AI. Professor Danks develops practical frameworks and methods to incorporate ethical and policy considerations throughout the AI lifecycle, including different ways to include them in optimization steps. Bias and fairness have been a particular focus given the multiple ways in which they can be measured, represented, and used. Professor Vishnoi uses optimization as a lens to study how subjective human and societal biases emerge in the objective world of artificial algorithms, as well as how to design strategies to mitigate these biases.
Nisheeth Vishnoi is the A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. He studies the foundations of computation, and his research spans several areas of theoretical computer science, optimization, and machine learning. He is also interested in understanding nature and society from a computational viewpoint. Here, his current focus includes understanding the emergence of intelligence and developing methods to address ethical issues at the interface of artificial intelligence and humanity.
David Danks is Professor of Data Science and Philosophy and affiliate faculty in Computer Science and Engineering at University of California, San Diego. His research interests range widely across philosophy, cognitive science, and machine learning, including their intersection. Danks has examined the ethical, psychological, and policy issues around AI and robotics across multiple sectors, including transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science and developed multiple novel causal discovery algorithms for complex types of observational and experimental data. Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He currently serves on multiple advisory boards, including the National AI Advisory Committee.
TILOS Seminar: The Dissimilarity Dimension: Sharper Bounds for Optimistic Algorithms
Aldo Pacchiano, Assistant Professor, Boston University Center for Computing and Data Sciences
The principle of Optimism in the Face of Uncertainty (OFU) is one of the foundational algorithmic design choices in Reinforcement Learning and Bandits. Optimistic algorithms balance exploration and exploitation by deploying data collection strategies that maximize expected rewards in plausible models. This is the basis of celebrated algorithms like the Upper Confidence Bound (UCB) for multi-armed bandits. For nearly a decade, the analysis of optimistic algorithms, including Optimistic Least Squares, in the context of rich reward function classes has relied on the concept of eluder dimension, introduced by Russo and Van Roy in 2013. In this talk we shed light on the limitations of the eluder dimension in capturing the true behavior of optimistic strategies in the realm of function approximation. We remediate these by introducing a novel statistical measure, the “dissimilarity dimension”. We show it can be used to provide sharper sample analysis of algorithms like Optimistic Least Squares by establishing a link between regret and the dissimilarity dimension. To illustrate this, we will show that some function classes have arbitrarily large eluder dimension but constant dissimilarity. Our regret analysis draws inspiration from graph theory and may be of interest to the mathematically minded beyond the field of statistical learning theory. This talk sheds new light on the fundamental principle of optimism and its algorithms in the function approximation regime, advancing our understanding of these concepts.
TILOS Seminar: Building Personalized Decision Models with Federated Human Preferences
Aadirupa Saha, Research Scientist at Apple
Customer statistics collected in several real-world systems have reflected that users often prefer eliciting their liking for a given pair of items, say (A,B), in terms of relative queries like: “Do you prefer Item A over B?”, rather than their absolute counterparts: “How much do you score items A and B on a scale of [0-10]?”. Drawing inspirations, in the search for a more effective feedback collection mechanism, led to the famous formulation of Dueling Bandits (DB), which is a widely studied online learning framework for efficient information aggregation from relative/comparative feedback. However despite the novel objective, unfortunately, most of the existing DB techniques were limited only to simpler settings of finite decision spaces, and stochastic environments, which are unrealistic in practice.
In this talk, we will start with the basic problem formulations for DB and familiarize ourselves with some of the breakthrough results. Following this, will dive deeper into a more practical framework of contextual dueling bandits (C-DB) where the goal of the learner is to make personalized predictions based on the user contexts. We will see a new algorithmic approach that can efficiently achieve the optimal O(sqrt T) regret performance for this problem, resolving an open problem from Dudík et al. [COLT, 2015]. In the last part of the talk, we will extend the aforementioned models to a federated framework, which entails developing preference-driven prediction models for distributed environments for creating large-scale personalized systems, including recommender systems and chatbot interactions. Apart from exploiting the limited preference feedback model, the challenge lies in ensuring user privacy and reducing communication complexity in the federated setting. We will conclude the talk with some interesting open problems.
Towards Foundation Models for Graph Reasoning and AI 4 Science
Michael Galkin, Research Scientist at Intel AI Lab
Foundation models in graph learning are hard to design due to the lack of common invariances that transfer across different structures and domains. In this talk, I will give an overview of the two main tracks of my research at Intel AI: creating foundation models for knowledge graph reasoning that can run zero-shot inference on any multi-relational graphs, and foundation models for materials discovery in the AI4Science domain that capture physical properties of crystal structures and transfer to a variety of predictive and generative tasks. We will also talk about theoretical and practical challenges like scaling behavior, data scarcity, and diverse evaluation of foundation graph models.
Michael Galkin is a Research Scientist at Intel AI Lab in San Diego working on Graph Machine Learning and Geometric Deep Learning. Previously, he was a postdoc at Mila–Quebec AI Institute with Will Hamilton, Reihaneh Rabbany, and Jian Tang, focusing on many graph representation learning problems. Sometimes, Mike writes long blog posts on Medium about graph learning.
TILOS Fireside Chat: Theory in the Age of Modern AI
A conversation about theory in the age of modern artificial intelligence (AI) with TILOS member panelists Nisheeth Vishnoi, Tara Javidi, Misha Belkin, and Arya Mazumdar (moderator).
TILOS AI Ethics Panel
Panelists Dr. Nisheeth Vishnoi (Yale), Dr. David Danks (UC San Diego), and Dr. Hoda Heidari (Carnegie Mellon University) discuss a variety of aspects of the ethics of AI with our moderators Dr. Stefanie Jegelka (MIT) and Dr. Jodi Reeves (National University).
Nisheeth Vishnoi is the A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. He studies the foundations of computation, and his research spans several areas of theoretical computer science, optimization, and machine learning. He is also interested in understanding nature and society from a computational viewpoint. Here, his current focus includes understanding the emergence of intelligence and developing methods to address ethical issues at the interface of artificial intelligence and humanity.
David Danks is Professor of Data Science & Philosophy and affiliate faculty in Computer Science & Engineering at University of California, San Diego. His research interests range widely across philosophy, cognitive science, and machine learning, including their intersection. Danks has examined the ethical, psychological, and policy issues around AI and robotics across multiple sectors, including transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science and developed multiple novel causal discovery algorithms for complex types of observational and experimental data. Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He currently serves on multiple advisory boards, including the National AI Advisory Committee.
Hoda Heidari is an Assistant Professor in Machine Learning and Societal Computing at the School of Computer Science, Carnegie Mellon University. Her research is broadly concerned with the social, ethical, and economic implications of Artificial Intelligence. In particular, her research addresses issues of unfairness and accountability through Machine Learning. Her work in this area has won a best-paper award at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) and an exemplary track award at the ACM Conference on Economics and Computation (EC). She has organized several scholarly events on topics related to Responsible and Trustworthy AI, including a tutorial at the Web Conference (WWW) and several workshops at the Neural and Information Processing Systems (NeurIPS) conference. Dr. Heidari completed her doctoral studies in Computer and Information Science at the University of Pennsylvania. She holds an M.Sc. degree in Statistics from the Wharton School of Business. Before joining Carnegie Mellon as a faculty member, she was a postdoctoral scholar at the Machine Learning Institute of ETH Zurich, followed by a year at the Artificial Intelligence, Policy, and Practice (AIPP) initiative at Cornell University.
TILOS Seminar: Learning from Diverse and Small Data
Ramya Korlakai Vinayak, Assistant Professor, University of Wisconsin at Madison
In this talk, we will address this question in the following settings:
(i) In many applications, we observe count data which can be modeled as Binomial (e.g., polling, surveys, epidemiology) or Poisson (e.g., single cell RNA data) data. As a single or finite parameters do not capture the diversity of the population in such datasets, they are often modeled as nonparametric mixtures. In this setting, we will address the following question, “how well can we learn the distribution of parameters over the population without learning the individual parameters?” and show that nonparametric maximum likelihood estimators are in fact minimax optimal.
(ii) Learning preferences from human judgements using comparison queries plays a crucial role in cognitive and behavioral psychology, crowdsourcing democracy, surveys in social science applications, and recommendation systems. Models in the literature often focus on learning average preference over the population due to the limitations on the amount of data available per individual. We will discuss some recent results on how we can reliably capture diversity in preferences while pooling together data from individuals.
Ramya Korlakai Vinayak is an assistant professor in the Dept. of ECE and affiliated faculty in the Dept. of Computer Science and the Dept. of Statistics at the UW-Madison. Her research interests span the areas of machine learning, statistical inference, and crowdsourcing. Her work focuses on addressing theoretical and practical challenges that arise when learning from societal data. Prior to joining UW-Madison, Ramya was a postdoctoral researcher in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. She received her Ph.D. in Electrical Engineering from Caltech. She obtained her Masters from Caltech and Bachelors from IIT Madras. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013-15, and an invited participant at the Rising Stars in EECS workshop in 2019. She is the recipient of NSF CAREER Award 2023-2028.
TILOS Seminar: Machine Learning Training Strategies Inspired by Humans' Learning Skills
Pengtao Xie, Assistant Professor, UC San Diego
Humans, as the most powerful learners on the planet, have accumulated a lot of learning skills, such as learning through tests, interleaving learning, self-explanation, active recalling, to name a few. These learning skills and methodologies enable humans to learn new topics more effectively and efficiently. We are interested in investigating whether humans' learning skills can be borrowed to help machines to learn better. Specifically, we aim to formalize these skills and leverage them to train better machine learning (ML) models. To achieve this goal, we develop a general framework--Skillearn, which provides a principled way to represent humans' learning skills mathematically and use the formally-represented skills to improve the training of ML models. In two case studies, we apply Skillearn to formalize two learning skills of humans: learning by passing tests and interleaving learning, and use the formalized skills to improve neural architecture search.
Pengtao Xie is an assistant professor at UC San Diego. He received his PhD from the Machine Learning Department at Carnegie Mellon University in 2018. His research interests lie in machine learning inspired by human learning and its applications in healthcare. His research outcomes have been adopted by medical device companies, medical imaging centers, hospitals, etc. and have been published at top-tier artificial intelligence conferences and journals including ICML, NeurIPS, ACL, ICCV, TACL, etc. He is the recipient of the Tencent AI-Lab Faculty Award, Tencent WeChat Faculty Award, the Innovator Award presented by the Pittsburgh Business Times, the Siebel Scholars award, and the Goldman Sachs Global Leader Scholarship.
TILOS Early Career Development in Industry Panel
Panelists:
- Ismail Bustany, Fellow at AMD
- Anna Goldie, Member of Technical Staff at Anthropic
- Liangzhen Lai, Research Scientist Manager at Meta
- Vahab Mirrokni, Google Fellow and VP of Google Research
- Ruchir Puri, Chief Scientist at IBM Research; IBM Fellow; Vice
- President of IBM Corporate Technology
TILOS Seminar: Engineering the Future of Software with AI
Dr. Ruchir Puri, Chief Scientist, IBM Research, IBM Fellow, Vice-President IBM Corporate Technology
Software has become woven into every aspect of our society, and it will be fair to say that “Software has eaten the world”. More recently, advances in AI are starting to transform every aspect of our society as well. These two tectonic forces of transformation – “Software” and “AI” are colliding together resulting in a seismic shift – a future where software itself will be built, maintained, and operated by AI – pushing us towards a future where “Computers can program themselves!” In this talk, we will discuss these forces of “AI for Code” and how the future of software engineering is being redefined by AI.
Dr. Ruchir Puri is the Chief Scientist of IBM Research, an IBM Fellow, and Vice-President of IBM Corporate Technology. He led IBM Watson as its CTO and Chief Architect from 2016-19 and has held various technical, research, and engineering leadership roles across IBM’s AI and Research businesses. Dr. Puri is a Fellow of the IEEE, and has been an ACM Distinguished Speaker, an IEEE Distinguished Lecturer, and was awarded 2014 Asian American Engineer of the Year. Ruchir has been an adjunct professor at Columbia University, NY, and a visiting scientist at Stanford University, CA. He was honored with John Von-Neumann Chair at Institute of Discrete Mathematics at Bonn University, Germany. Dr. Puri is an inventor of over 70 United States patents and has authored over 100 scientific publications on software-hardware automation methods, microprocessor design, and optimization and AI algorithms. He is the chair of AAAI-IAAI conference that focused on industrial applications of AI. Ruchir is the recipient of the prestigious Distinguished Alumnus Award from Indian Institute of Technology (IIT), Kanpur in 2022.
TILOS Seminar: Causal Discovery for Root Cause Analysis
Professor Murat Kocaoglu, Assistant Professor, Purdue University
Cause-effect relations are crucial for several fields, from medicine to policy design as they inform us of the outcomes of our actions a priori. However, causal knowledge is hard to curate for complex systems that might be changing frequently. Causal discovery algorithms allow us to extract causal knowledge from the available data. In this talk, first, we provide a short introduction to algorithmic causal discovery. Next, we propose a novel causal discovery algorithm from a collection of observational and interventional datasets in the presence of unobserved confounders, with unknown intervention targets. Finally, we demonstrate the effectiveness of our algorithm for root-cause analysis in microservice architectures.
Dr. Kocaoglu received his B.S. degree in Electrical-Electronics Engineering with a minor in Physics from the Middle East Technical University in 2010, his M.S. degree from the Koc University, Turkey in 2012, and his Ph.D. degree from The University of Texas at Austin in 2018 under the supervision of Prof. Alex Dimakis and Prof. Sriram Vishwanath. He was a Research Staff Member in the MIT-IBM Watson AI Lab in IBM Research, Cambridge, Massachusetts from 2018 to 2020. Since 2021, he is an assistant professor in the School of ECE at Purdue University. His current research interests include causal inference and discovery, causal machine learning, deep generative models, and information theory.
IEEE Seasonal School: Manufacturability, Testing, Reliability, and Security
00:00:00 - Introduction
01:51:00 - "Machine Learning for DFM", Bei Yu, Associate Professor, Chinese University of Hong Kong
59:54:00 - "ML for Testing and Yield", Li-C. Wang, Professor, UC Santa Barbara
01:59:00 - "ML for Cross-Layer Reliability and Security", Muhammad Shafique, Professor of Computer Engineering, NYU Abu Dhabi