AI Safety Theory: The Missing Middle Ground
Adam Oberman, McGill University
Over the past few years, the capabilities of generative artificial intelligence (AI) systems have advanced rapidly. Along with the benefits of AI, there is also a risk of harm. In order to benefit from AI while mitigating the risks, we need a grounded theoretical framework.
The current AI safety theory, which predates generative AI, is insufficient. Most theoretical AI safety results tend to reason absolutely: a system is a system is “aligned” or “mis-aligned”, “honest” or “dishonest”. But in practice safety is probabilistic, not absolute. The missing middle ground is a quantitative or relative theory of safety — a way to reason formally about degrees of safety. Such a theory is required for defining safety and harms, and is essential for technical solutions as well as for making good policy decisions.
In this talk I will:
- Review current AI risks (from misuse, from lack of reliability, and systemic risks to the economy) as well as important future risks (lack of control).
- Review theoretical predictions of bad AI behavior and discuss experiments which demonstrate that they can occur in current LLMs.
- Explain why technical and theoretical safety solutions are valuable, even by contributors outside of the major labs.
- Discuss some gaps in the theory and present some open problems which could address the gaps.
Adam Oberman is a Full Professor of Mathematics and Statistics at McGill University, a Canada CIFAR AI Chair, and an Associate Member of Mila. He is a research collaborator at LawZero, Yoshua Bengio’s AI Safety Institute. He has been researching AI safety since 2024. His research spans generative models, reinforcement learning, optimization, calibration, and robustness. Earlier in his career, he made significant contributions to optimal transport and nonlinear partial differential equations. He earned degrees from the University of Toronto and the University of Chicago, and previously held faculty and postdoctoral positions at Simon Fraser University and the University of Texas at Austin.
TILOS Webinar: AI Ethics in Research
Dr. Nisheeth Vishnoi (Yale) and Dr. David Danks (UC San Diego) discuss their research in the ethics of AI. Professor Danks develops practical frameworks and methods to incorporate ethical and policy considerations throughout the AI lifecycle, including different ways to include them in optimization steps. Bias and fairness have been a particular focus given the multiple ways in which they can be measured, represented, and used. Professor Vishnoi uses optimization as a lens to study how subjective human and societal biases emerge in the objective world of artificial algorithms, as well as how to design strategies to mitigate these biases.
Nisheeth Vishnoi is the A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. He studies the foundations of computation, and his research spans several areas of theoretical computer science, optimization, and machine learning. He is also interested in understanding nature and society from a computational viewpoint. Here, his current focus includes understanding the emergence of intelligence and developing methods to address ethical issues at the interface of artificial intelligence and humanity.
David Danks is Professor of Data Science and Philosophy and affiliate faculty in Computer Science and Engineering at University of California, San Diego. His research interests range widely across philosophy, cognitive science, and machine learning, including their intersection. Danks has examined the ethical, psychological, and policy issues around AI and robotics across multiple sectors, including transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science and developed multiple novel causal discovery algorithms for complex types of observational and experimental data. Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He currently serves on multiple advisory boards, including the National AI Advisory Committee.
TILOS AI Ethics Panel
Panelists Dr. Nisheeth Vishnoi (Yale), Dr. David Danks (UC San Diego), and Dr. Hoda Heidari (Carnegie Mellon University) discuss a variety of aspects of the ethics of AI with our moderators Dr. Stefanie Jegelka (MIT) and Dr. Jodi Reeves (National University).
Nisheeth Vishnoi is the A. Bartlett Giamatti Professor of Computer Science and a co-founder of the Computation and Society Initiative at Yale University. He studies the foundations of computation, and his research spans several areas of theoretical computer science, optimization, and machine learning. He is also interested in understanding nature and society from a computational viewpoint. Here, his current focus includes understanding the emergence of intelligence and developing methods to address ethical issues at the interface of artificial intelligence and humanity.
David Danks is Professor of Data Science & Philosophy and affiliate faculty in Computer Science & Engineering at University of California, San Diego. His research interests range widely across philosophy, cognitive science, and machine learning, including their intersection. Danks has examined the ethical, psychological, and policy issues around AI and robotics across multiple sectors, including transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science and developed multiple novel causal discovery algorithms for complex types of observational and experimental data. Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He currently serves on multiple advisory boards, including the National AI Advisory Committee.
Hoda Heidari is an Assistant Professor in Machine Learning and Societal Computing at the School of Computer Science, Carnegie Mellon University. Her research is broadly concerned with the social, ethical, and economic implications of Artificial Intelligence. In particular, her research addresses issues of unfairness and accountability through Machine Learning. Her work in this area has won a best-paper award at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) and an exemplary track award at the ACM Conference on Economics and Computation (EC). She has organized several scholarly events on topics related to Responsible and Trustworthy AI, including a tutorial at the Web Conference (WWW) and several workshops at the Neural and Information Processing Systems (NeurIPS) conference. Dr. Heidari completed her doctoral studies in Computer and Information Science at the University of Pennsylvania. She holds an M.Sc. degree in Statistics from the Wharton School of Business. Before joining Carnegie Mellon as a faculty member, she was a postdoctoral scholar at the Machine Learning Institute of ETH Zurich, followed by a year at the Artificial Intelligence, Policy, and Practice (AIPP) initiative at Cornell University.


