Off-the-shelf Algorithmic Stability
Rebecca Willett, University of Chicago
Algorithmic stability holds when our conclusions, estimates, fitted models, predictions, or decisions are insensitive to small changes to the training data. Stability has emerged as a core principle for reliable data science, providing insights into generalization, cross-validation, uncertainty quantification, and more. Whereas prior literature has developed mathematical tools for analyzing the stability of specific machine learning (ML) algorithms, we study methods that can be applied to arbitrary learning algorithms to satisfy a desired level of stability. First, I will discuss how bagging is guaranteed to stabilize any prediction model, regardless of the input data. Thus, if we remove or replace a small fraction of the training data at random, the resulting prediction will typically change very little. Our analysis provides insight into how the size of the bags (bootstrap datasets) influences stability, giving practitioners a new tool for guaranteeing a desired level of stability. Second, I will describe how to extend these stability guarantees beyond prediction modeling to more general statistical estimation problems where bagging is not as well known but equally useful for stability. Specifically, I will describe a new framework for stable classification and model selection by combining bagging on class or model weights with a stable, “soft” version of the argmax operator.
Rebecca Willett is a Professor of Statistics and Computer Science and the Director of AI in the Data Science Institute at the University of Chicago, and she holds a courtesy appointment at the Toyota Technological Institute at Chicago. Her research is focused on machine learning foundations, scientific machine learning, and signal processing. Willett received the inaugural Data Science Career Prize from the Society of Industrial and Applied Mathematics in 2024, was named a Fellow of the Society of Industrial and Applied Mathematics in 2021, and was named a Fellow of the IEEE in 2022. She is the Deputy Director for Research at the NSF-Simons Foundation National Institute for Theory and Mathematics in Biology, Deputy Director for Research at the NSF-Simons Institute for AI in the Sky (SkAI), and a member of the NSF Institute for the Foundations of Data Science Executive Committee. She is the Faculty Director of the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship. She helps direct the Air Force Research Lab University Center of Excellence on Machine Learning. She received the National Science Foundation CAREER Award in 2007, was a DARPA Computer Science Study Group member, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005. She was an Assistant and then tenured Associate Professor of Electrical and Computer Engineering at Duke University from 2005 to 2013. She was an Associate Professor of Electrical and Computer Engineering, Harvey D. Spangler Faculty Scholar, and Fellow of the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison from 2013 to 2018.