The Dissimilarity Dimension: Sharper Bounds for Optimistic Algorithms
Aldo Pacchiano, Assistant Professor, Boston University Center for Computing and Data Sciences
Abstract: The principle of Optimism in the Face of Uncertainty (OFU) is one of the foundational algorithmic design choices in Reinforcement Learning and Bandits. Optimistic algorithms balance exploration and exploitation by deploying data collection strategies that maximize expected rewards in plausible models. This is the basis of celebrated algorithms like the Upper Confidence Bound (UCB) for multi-armed bandits. For nearly a decade, the analysis of optimistic algorithms, including Optimistic Least Squares, in the context of rich reward function classes has relied on the concept of eluder dimension, introduced by Russo and Van Roy in 2013. In this talk we shed light on the limitations of the eluder dimension in capturing the true behavior of optimistic strategies in the realm of function approximation. We remediate these by introducing a novel statistical measure, the “dissimilarity dimension”. We show it can be used to provide sharper sample analysis of algorithms like Optimistic Least Squares by establishing a link between regret and the dissimilarity dimension. To illustrate this, we will show that some function classes have arbitrarily large eluder dimension but constant dissimilarity. Our regret analysis draws inspiration from graph theory and may be of interest to the mathematically minded beyond the field of statistical learning theory. This talk sheds new light on the fundamental principle of optimism and its algorithms in the function approximation regime, advancing our understanding of these concepts.
Aldo Pacchiano is an Assistant Professor at the Boston University Center for Computing and Data Sciences and a Fellow at the Eric and Wendy Schmidt Center of the Broad Institute of MIT and Harvard. He obtained his PhD under the supervision of Profs. Michael Jordan and Peter Bartlett at UC Berkeley and was a Postdoctoral Researcher at Microsoft Research, NYC. His research lies in the areas of Reinforcement Learning, Online Learning, Bandits and Algorithmic Fairness. He is particularly interested in furthering our statistical understanding of learning phenomena in adaptive environments and use these theoretical insights and techniques to design efficient and safe algorithms for scientific, engineering, and large-scale societal applications.