Recorded Talks: TILOS Seminar Series
A Modular AgenticAI Architecture for Commercially Scalable and Compliant Robotics
Sahil Rajesh Dhayalkar, Brain Corporation
Autonomous navigation in dynamic environments faces immense challenges. Traditional rigid, rules-based systems often fail due to a lack of semantic understanding needed to adapt to continuous environmental shifts. Conversely, emerging end-to-end Vision-Language-Action (VLA) models introduce a critical "black box" dilemma; they inherently lack the explicit application context, deterministic guardrails, and data efficiency required for rigorous enterprise safety and compliance (e.g., SOC2). To address this, Brain Corp, in collaboration with UCSD, proposes a robust hybrid architecture underpinning the BrainOS platform. In this framework, visual inputs (via VLMs) and task commands (via LLMs) feed directly into a distinct Perception block anchored by a Contextual Grounding Layer with Semantic Mapping. This rich, grounded perception then informs a hybrid Action block, where the reasoning capabilities of VLA models operate safely alongside proven deterministic controls such as deep learning, reinforcement learning, model predictive control, etc. Crucially, an underlying Directed Safety Layer and strict Enterprise Infrastructure wrap this entire process. By isolating adaptable AI reasoning from hard-coded physical controls, this architecture provides a framework designed to securely manage the unpredictable realities of varied environments. Ultimately, this approach addresses the compliance bottleneck, laying the foundation to scale safely across diverse commercial applications and power the continuous, real-world data engine necessary to fuel next-generation physical AI.
Sahil Rajesh Dhayalkar is a Staff Autonomy Engineer and Perception Team Lead at Brain Corporation. He specializes in architecting real-time perception pipelines across LiDAR, RGB, and depth sensors, with his work currently deployed on production robots in dynamic commercial environments. During his tenure, he has pioneered the real-time computer vision pipeline for on-robot object detection at the edge, spearheaded "Localize From Anywhere," a global localization system utilizing Vision-Language Models and RGB images, and auto-calibration, a targetless calibration of ranging sensors on robots. He holds a Master's degree in Computer Science from Arizona State University. His research interests include robotic perception, large language models, deep learning, neuro-symbolic reasoning, and optimizations.
Machine learning for discrete optimization: Theoretical foundations
Ellen Vitercik, Stanford University
Many of the most important optimization problems in practice are massive in scale, mathematically complex, and involve numerous unknown parameters. Machine learning offers a powerful way to address these challenges by uncovering hidden structure across problem instances, but integrating predictions into algorithms raises fundamental questions: which architectures align with combinatorial structure, and how can we ensure robustness to error? This talk presents two case studies. First, we show how graph neural networks can approximate the optimal dynamic program for online matching, yielding algorithms that generalize across graph sizes and achieve strong empirical performance. Second, we investigate calibration as a principled interface between machine learning and decision-making, demonstrating through rent-or-buy and job scheduling problems that calibrated predictions yield both theoretical guarantees and practical improvements. This is joint work with Alexandre Hayderi, Amin Saberi, Anders Wikum, and Judy Hanwen Shen.
Ellen Vitercik is an Assistant Professor at Stanford University with a joint appointment between the Management Science & Engineering department and the Computer Science department. Her research interests include machine learning, algorithm design, discrete and combinatorial optimization, and the interface between economics and computation. Before joining Stanford, she spent a year as a Miller Postdoctoral Fellow at UC Berkeley and received a PhD in Computer Science from Carnegie Mellon University. Her research has been recognized with a Schmidt Sciences AI2050 Early Career Fellowship, an NSF CAREER award, the SIGecom Doctoral Dissertation Award, and the CMU School of Computer Science Distinguished Dissertation Award, among other honors.
ComPO: Preference Alignment via Comparison Oracles
Tianyi Lin, Columbia University
Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the likelihood displacement, which can be driven by noisy preference pairs that induce similar likelihood for preferred and dis-preferred responses. To address this issue, we consider doing derivative-free optimization based on comparison oracles. First, we propose a new preference alignment method via comparison oracles and provide convergence guarantees for its basic mechanism. Second, we improve our method using some heuristics and conduct the experiments to demonstrate the flexibility and compatibility of practical mechanisms in improving the performance of LLMs using noisy preference pairs. Evaluations are conducted across multiple base and instruction-tuned models with different benchmarks. Experimental results show the effectiveness of our method as an alternative to addressing the limitations of existing methods. A highlight of our work is that we evidence the importance of designing specialized methods for preference pairs with distinct likelihood margins.
Tianyi Lin is an assistant professor in the Department of Industrial Engineering and Operations Research (IEOR) at Columbia University. His research interests lie in generative artificial intelligence, optimization for machine learning, game theory, social and economic network, and optimal transport. He obtained his Ph.D. in Electrical Engineering and Computer Science at UC Berkeley, where he was advised by Professor Michael Jordan and was associated with the Berkeley Artificial Intelligence Research (BAIR) group. From 2023 to 2024, he was a postdoctoral researcher at the Laboratory for Information & Decision Systems (LIDS) at Massachusetts Institute of Technology, working with Professor Asuman Ozdaglar. Prior to that, he received a B.S. in Mathematics from Nanjing University, a M.S. in Pure Mathematics and Statistics from University of Cambridge and a M.S. in Operations Research from UC Berkeley.
Inference-Time Algorithms: A Theoretical Lens on Tractability and Error Propagation
Andrej Risteski, Carnegie Mellon
Modern AI systems are increasingly built by placing trained models inside larger computational loops. Inference-time algorithms are a basic instance of this idea: they use one or more trained models at test time to incorporate new information, exploit pretrained models as priors, and trade computational effort for accuracy, sample quality, or control. Examples include generator-verifier search for reasoning, diffusion models for solving inverse problems, and reward-guided generation. Theoretically, this revisits a classical question from optimization and theoretical computer science: what can be done with access to an oracle? Here, however, the oracles are new and non-standard: they model the capabilities of large pretrained models, making them powerful, but also imperfect because they are learned. This combination leads to new questions about algorithm design and error propagation.
This talk studies two central aspects of this paradigm: computational efficiency and error propagation. The first vignette considers generator-verifier systems, and shows how stochastic backtracking can trade additional computation for accuracy, giving a principled version of test-time scaling even with imperfect learned oracles. The second vignette studies diffusion steering: when can we efficiently bias a pretrained diffusion model toward higher-reward samples while staying close to the original model? We show that tractability depends strongly on both the reward structure and the alignment objective, and that simple primitives—such as sampling from linear tilts—can be surprisingly useful for handling richer reward classes.
Andrej Risteski is an Associate Professor at the Machine Learning Department in Carnegie Mellon University. Prior to that, he was a Norbert Wiener Research Fellow jointly in the Applied Math department and IDSS at MIT. Dr. Risteski received his PhD in the Computer Science Department at Princeton University under the advisement of Sanjeev Arora.
Dr. Risteski’s research interests lie in the intersection of machine learning, statistics, and theoretical computer science, spanning topics like (probabilistic) generative models, algorithmic tools for learning and inference, representation and self-supervised learning, out-of-distribution generalization and applications of neural approaches to natural language processing and scientific domains. The broad goal of his research is principled and mathematical understanding of statistical and algorithmic problems arising in modern machine learning paradigms.
Engineering Interpretable and Faithful AI Systems
René Vidal, University of Pennsylvania
Large Language Models (LLMs) and Vision Language Models (VLMs) have achieved remarkable performance across a wide range of tasks. However, their growing deployment has exposed fundamental limitations in faithfulness, safety, and transparency. In this talk, I will present a unified perspective on addressing these challenges through principled model interventions and interpretable decision-making frameworks. I first introduce Information Pursuit (IP), an interpretable-by-design prediction framework that replaces opaque reasoning with a sequence of informative, user-interpretable queries, yielding concise explanations alongside accurate predictions. I then present Parsimonious Concept Engineering (PaCE), an approach that improves faithfulness and alignment by selectively removing undesirable internal activations, mitigating hallucinations and biased language while preserving linguistic competence. Results across text, vision, and medical tasks illustrate how these ideas advance transparency without sacrificing performance. Together, these contributions point toward a broader direction for building AI systems that are powerful, faithful, and aligned with human values.
René Vidal is the Penn Integrates Knowledge and Rachleff University Professor of Electrical and Systems Engineering and Radiology at the University of Pennsylvania, where he directs the Center for Innovation in Data Engineering and Science (IDEAS) and serves as Co-Chair of Penn AI. He is also an Amazon Scholar, Affiliated Chief Scientist at NORCE, and former Associate Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence. Professor Vidal’s research advances the mathematical foundations of deep learning and trustworthy AI, with broad impact across computer vision and biomedical data science. His contributions have been recognized with major honors, including the IEEE Edward J. McCluskey Technical Achievement Award, the D’Alembert Faculty Award, the J.K. Aggarwal Prize, the ONR Young Investigator Award, the NSF CAREER Award, and best paper awards in machine learning, computer vision, signal processing, control, and medical robotics. He is a Fellow of ACM, AIMBE, IEEE, and IAPR, and a Sloan Fellow.
Autopilots Need Parachutes: Reliability Lessons from LLM-Automated Embedded AI Systems
Roberto Morabito, EURECOM
Embedded AI systems are becoming increasingly complex to develop and maintain, requiring specialized workflows that span data processing, model conversion, optimization, and deployment across heterogeneous hardware platforms. Recently, large language models have emerged as a promising tool to automate parts of this lifecycle. In this talk, I present recent work investigating the use of generative AI models as orchestration agents for embedded machine learning pipelines. Using an automated system that leverages LLMs to generate and iteratively refine software artifacts for embedded platforms, we evaluate the feasibility of automating key stages of the AI lifecycle. Our empirical results reveal both the promise and the limitations of this approach. Generative models can significantly accelerate development workflows. However, they also introduce instability, iterative failure modes, and unpredictable operational costs. I will discuss the main failure patterns observed in practice and outline research directions aimed at improving reliability through hybrid reasoning frameworks and system-level feedback mechanisms.
Roberto Morabito is an Assistant Professor in the Networked Systems group of the Communication Systems Department at EURECOM, France, and a Docent at the University of Helsinki. Before joining EURECOM, he was a Senior Researcher in the Department of Computer Science at the University of Helsinki. Earlier in his career, he spent eight years at Ericsson Research Finland, where he worked on cloud platforms, IoT systems, and cyber-physical systems. He received his PhD in Networking Technology from Aalto University in 2019 and was a postdoctoral researcher at the EDGE Lab, School of Electrical and Computer Engineering, Princeton University. His research lies at the intersection of networked systems, edge computing, and distributed AI, focusing on the design and lifecycle management of AI systems operating under computing and networking resource constraints.
Neuromorphic LLMs
Jason Eshraghian, UC Santa Cruz
This talk will show you what neuromorphic computing can do when an academic lab accidentally pulls $2-million of GPU-hours. We will showcase a series of frontier reasoning LLMs developed out of an academic lab, from data curation and pre-training to post-training and alignment. These models surpass leading LLMs from Meta, Google, and other heavily-resourced labs in the ~10-billion parameter regime, despite being 5x smaller.
We have deployed several models on neuromorphic hardware at just 2 watts, bringing state-of-the-art reasoning from the datacenter to the edge. Along the way, we dispel a series of widely-held assumptions about large-scale neuromorphic computation, revealing how it fundamentally differs from conventional deep learning, and why that difference matters.
Jason Eshraghian is an Assistant Professor and Fulbright Scholar in the Department of Electrical and Computer Engineering at the University of California, Santa Cruz. He is the developer of snnTorch, a Python library with over 500,000 downloads for training spiking neural networks. He is a dual-appointed IEEE CAS and EMBS Distinguished Lecturer, an Associate Editor of APL Machine Learning, the Chair of the IEEE Neural Systems and Applications Technical Committee, has been the recipient of seven IEEE Best Paper Awards, a Scientific Advisory Board Member of BrainChip and leads the Neuromorphic Agents Team at Conscium.
AI-Driven Design Automation for Multi-Chip Integration in AI Chips
Sung-Kyu Lim, University of Southern California
Multi-chip integration has become a standard approach in AI training and is rapidly gaining traction in edge learning applications. Leveraging 2.5D and 3D IC architecture enables substantial improvements in energy efficiency and latency by optimizing inter chip data transfer. At the core of this transformation lies the automation of design and simulation for heterogeneous AI chips, shifting from manual engineering to algorithm driven methodologies. This evolution is being accelerated by advanced electronic design automation (EDA) tools powered by AI. My research group develops novel AI driven algorithms that enhance or replace traditional design automation techniques, with a focus on enabling next generation heterogeneous AI systems. In this talk, I will present our recent innovations and explore the critical challenges that lie ahead in applying AI algorithms to EDA for high performance AI chip design.
Dr. Sung Kyu Lim is Dean’s Professor of Electrical and Computer Engineering at the University of Southern California, joining in Fall 2025 after over two decades at Georgia Tech. He received his B.S., M.S., and Ph.D. in Computer Science from UCLA. His research focuses on the architecture, design, and electronic design automation (EDA) of 2.5D and 3D integrated circuits, with over 450 publications. Dr. Lim is an IEEE Fellow and recipient of major awards including multiple Best Paper Awards (DAC 2023, TCAD 2022), and several Georgia Tech teaching honors. From 2022 to 2024, he served as a Program Manager at DARPA’s Microsystems Technology Office.
Kinetic Theory Perspective of Foundation Models for Physics
Maarten de Hoop, Rice University
We present a kinetic theory perspective of foundation models for physics. We begin with providing a mathematical framework for analyzing transformers. To uniformly address their expressivity, we consider the case that the mappings are conditioned on a context represented by a probability distribution of tokens. That is, transformers become mappings between probability measures. The relevant notion of smoothness then corresponds to continuity in terms of the Wasserstein distance between such contexts. We demonstrate that deep transformers are universal and can approximate continuous in-context mappings to arbitrary precision, uniformly over compact token domains. We then characterize the conditions on mappings between measures that enable these to be represented in terms of in-context mappings as transformers. The solution map of the Vlasov equation, which is of nonlocal transport type, for interacting particle systems in the mean-field regime for the Cauchy problem satisfies the conditions; conversely, we prove that the measure-theoretic self-attention has the properties that ensure that the infinite depth, mean-field transformer can be identified with a Vlasov flow. Extending this framework from interactions to collisions leads to a further development of structured architectures inspired by Lattice Boltzmann Models, while flow motivates a design based on self-warping.
Professor Maarten V. de Hoop, Simons Chair in Computational and Applied Mathematics and Earth Science at Rice University, is internationally recognized for his contributions to the mathematical foundations of seismology, wave propagation, and inverse problems. His research bridges microlocal and harmonic analysis, scattering theory, and structured numerical methods with applications to seismic imaging, geophysical inversion, and large-scale computational modeling of acoustic, elastic, and electromagnetic phenomena. De Hoop has been a pioneer in developing techniques to extract subtle information from massive, complex seismic datasets, advancing our ability to probe the Earth’s interior with unprecedented resolution, and more recently has integrated deep learning and data-driven discovery with rigorous mathematical frameworks to open new frontiers in the analysis of multiscale wave phenomena and inverse spectral problems. He is the recipient of the J. Clarence Karcher Award from the Society of Exploration Geophysicists and the Young Scientists Award from the International Society for Analysis, its Applications and Computation, has been elected a Fellow of the Institute of Physics and an External Member of the Finnish Academy of Science and Letters, and has served as associate editor for Inverse Problems, Inverse Problems and Imaging, and the International Journal on Geomathematics.
Incentivizing Emergent Behaviors for LLMs via Reinforcement Learning
Yi Wu, Tsinghua University
Reinforcement Learning (RL) has become a powerful post-training method for eliciting advanced behaviors in large language models (LLMs). This talk presents recent results showing how RL can incentivize the emergence of LLM capabilities across three domains: (1) multi-player deduction game, Werewolf, where RL-trained LLM agents develop strategic behaviors and outperform strong human players; (2) agentic search, where large-scale RL enables a 32B model to run multi-step search to answer non-trivial questions beyond commercial baselines; and (3) efficient reasoning, where RL mitigates over-thinking and improves both reliability and compute efficiency.
The papers can be found at
- Werewolf: https://arxiv.org/abs/2310.
18940 (ICML24), https://arxiv.org/abs/2502. 04686 (ICML25) - ASearcher: https://arxiv.org/abs/2508.
07976 - Thinking Efficiency: https://www.arxiv.org/abs/
2506.07104 (NeurIPS25)
All the projects are trained using our large-scale agentic RL system, AReaL, which is open-source at https://github.com/
Yi Wu is an assistant professor at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. He obtained his Ph.D. from UC Berkeley and was a researcher at OpenAI from 2019 to 2020. His research focuses on reinforcement learning, multi-agent learning, and LLM agents. His representative works include the value iteration network, the MADDPG/MAPPO algorithm, OpenAI’s hide-and-seek project, and the AReaL project. He received the best paper award at NIPS 2016, the best demo award finalist at ICRA 2024, and MIT TR35 Asia Pacific 2025 award.
95 Percent: Bridging the Gap Between Prototype and Product
Jeremy Schwartz, Zoox
When transitioning from the academic world to the professional world of engineering, one of the most common pitfalls is failing to understand the difference between a compelling prototype and a successful product. This talk will focus on that distinction. We will discuss the differences between them, and the work required to evolve a good prototype into a real product. We will also discuss some common pitfalls encountered in product development, and some of the practical software design considerations to keep in mind for development of robust, mature code. The talk will include examples from my background developing robotic systems for air, space, and ground.
Jeremy Schwartz is a robotics engineer at Zoox with expertise in a wide variety of areas of mechanical and electrical engineering and computer science. His primary professional expertise is in autonomy and behavioral algorithms, and he has worked in the aerospace industry as well as ground robotics, specializing in autonomous systems of all kinds.
Certifiably Correct Machine Perception
David Rosen, Northeastern University
Many fundamental machine perception and state estimation tasks require the solution of a high-dimensional nonconvex estimation problem; this class includes (for example) the fundamental problems of simultaneous localization and mapping (in robotics), 3D reconstruction (in computer vision), and sensor network localization (in distributed sensing). Such problems are known to be computationally hard in general, with many local minima that can entrap the smooth local optimization methods commonly applied to solve them. The result is that standard machine perception algorithms (based upon local optimization) can be surprisingly brittle, often returning egregiously wrong answers even when the problem to which they are applied is well-posed.
In this talk, we present a novel class of certifiably correct estimation algorithms that are capable of efficiently recovering provably good (often globally optimal) solutions of generally-intractable machine perception problems in many practical settings. Our approach directly tackles the problem of nonconvexity by employing convex relaxations whose minimizers provide provably good approximate solutions to the original estimation problem under moderate measurement noise. We illustrate the design of this class of methods using the fundamental problem of pose-graph optimization (a mathematical abstraction of robotic mapping) as a running example. We conclude with a brief discussion of open questions and future research directions.
David M. Rosen is an Assistant Professor in the Departments of Electrical & Computer Engineering and Mathematics and the Khoury College of Computer Sciences (by courtesy) at Northeastern University, where he leads the Robust Autonomy Laboratory (NEURAL). Prior to joining Northeastern, he was a Research Scientist at Oculus Research (now Meta Reality Labs) from 2016 to 2018, and a Postdoctoral Associate at MIT’s Laboratory for Information and Decision Systems (LIDS) from 2018 to 2021. He holds the degrees of B.S. in Mathematics from the California Institute of Technology (2008), M.A. in Mathematics from the University of Texas at Austin (2010), and ScD in Computer Science from the Massachusetts Institute of Technology (2016).
He is broadly interested in the mathematical and algorithmic foundations of trustworthy machine perception, learning, and control. His work has been recognized with the IEEE Transactions on Robotics Best Paper Award (2024), an Honorable Mention for the IEEE Transactions on Robotics Best Paper Award (2021), a Best Student Paper Award at Robotics: Science and Systems (2020), a Best Paper Award at the International Workshop on the Algorithmic Foundations of Robotics (2016), and selection as an RSS Pioneer (2019).
AI Safety Theory: The Missing Middle Ground
Adam Oberman, McGill University
Over the past few years, the capabilities of generative artificial intelligence (AI) systems have advanced rapidly. Along with the benefits of AI, there is also a risk of harm. In order to benefit from AI while mitigating the risks, we need a grounded theoretical framework.
The current AI safety theory, which predates generative AI, is insufficient. Most theoretical AI safety results tend to reason absolutely: a system is a system is “aligned” or “mis-aligned”, “honest” or “dishonest”. But in practice safety is probabilistic, not absolute. The missing middle ground is a quantitative or relative theory of safety — a way to reason formally about degrees of safety. Such a theory is required for defining safety and harms, and is essential for technical solutions as well as for making good policy decisions.
In this talk I will:
- Review current AI risks (from misuse, from lack of reliability, and systemic risks to the economy) as well as important future risks (lack of control).
- Review theoretical predictions of bad AI behavior and discuss experiments which demonstrate that they can occur in current LLMs.
- Explain why technical and theoretical safety solutions are valuable, even by contributors outside of the major labs.
- Discuss some gaps in the theory and present some open problems which could address the gaps.
Adam Oberman is a Full Professor of Mathematics and Statistics at McGill University, a Canada CIFAR AI Chair, and an Associate Member of Mila. He is a research collaborator at LawZero, Yoshua Bengio’s AI Safety Institute. He has been researching AI safety since 2024. His research spans generative models, reinforcement learning, optimization, calibration, and robustness. Earlier in his career, he made significant contributions to optimal transport and nonlinear partial differential equations. He earned degrees from the University of Toronto and the University of Chicago, and previously held faculty and postdoctoral positions at Simon Fraser University and the University of Texas at Austin.
High-dimensional Optimization with Applications to Compute-Optimal Neural Scaling Laws
Courtney Paquette (McGill University)
Given the massive scale of modern ML models, we now only get a single shot to train them effectively. This restricts our ability to test multiple architectures and hyper-parameter configurations. Instead, we need to understand how these models scale, allowing us to experiment with smaller problems and then apply those insights to larger-scale models. In this talk, I will present a framework for analyzing scaling laws in stochastic learning algorithms using a power-law random features model (PLRF), leveraging high-dimensional probability and random matrix theory. I will then use this scaling law to address the compute-optimal question: How should we choose model size and hyper-parameters to achieve the best possible performance in the most compute-efficient manner? Then using this PLRF model, I will devise a new momentum-based algorithm that (provably) improves the scaling law exponent. Finally, I will present some numerical experiments on LSTMs that show how this new stochastic algorithm can be applied to real data to improve the compute-optimal exponent.
Courtney Paquette is an assistant professor at McGill University in the Mathematics and Statistics department, a CIFAR AI Chair (MILA), and an active member of the Montreal Machine Learning Optimization Group (MTL MLOpt) at MILA. Her research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, motivated by applications in data science, and using techniques that draw from a variety of fields, including probability, complexity theory, and convex and nonsmooth analysis. Dr. Paquette is a lead organizer of the OPT-ML Workshop at NeurIPS since 2020, and a lead organizer (and original creator) of the High-dimensional Learning Dynamics (HiLD) Workshop at ICML.
A New Paradigm for Learning with Distribution Shift
Adam Klivans, UT Austin
We revisit the fundamental problem of learning with distribution shift, where a learner is given labeled samples from training distribution D, unlabeled samples from test distribution D′ and is asked to output a classifier with low test error. The standard approach in this setting is to prove a generalization bound in terms of some notion of distance between D and D′. These distances, however, are difficult to compute, and this has been the main stumbling block for efficient algorithm design over the last two decades.
We sidestep this issue and define a new model called TDS learning, where a learner runs a test on the training set and is allowed to reject if this test detects distribution shift relative to a fixed output classifier. This approach leads to the first set of efficient algorithms for learning with distribution shift that do not take any assumptions on the test distribution. Finally, we discuss how our techniques have recently been used to solve longstanding problems in supervised learning with contamination.
Adam Klivans is a Professor of Computer Science at the University of Texas at Austin and Director of the NSF AI Institute for Foundations of Machine Learning (IFML). His research interests lie in machine learning and theoretical computer science, in particular, Learning Theory, Computational Complexity, Pseudorandomness, Limit Theorems, and Gaussian Space. Dr. Klivans is a recipient of the NSF CAREER Award and serves on the editorial board for the Theory of Computing and Machine Learning Journal.
Amplifying human performance in combinatorial competitive programming
Petar Veličković, Google DeepMind
Recent years have seen a significant surge in complex AI systems for competitive programming, capable of performing at admirable levels against human competitors. While steady progress has been made, the highest percentiles still remain out of reach for these methods on standard competition platforms such as Codeforces. In this talk, I will describe and dive into our recent work, where we focussed on combinatorial competitive programming. In combinatorial challenges, the target is to find as-good-as-possible solutions to otherwise computationally intractable problems, over specific given inputs. We hypothesise that this scenario offers a unique testbed for human-AI synergy, as human programmers can write a backbone of a heuristic solution, after which AI can be used to optimise the scoring function used by the heuristic. We deploy our approach on previous iterations of Hash Code, a global team programming competition inspired by NP-hard software engineering problems at Google, and we leverage FunSearch to evolve our scoring functions. Our evolved solutions significantly improve the attained scores from their baseline, successfully breaking into the top percentile on all previous Hash Code online qualification rounds, and outperforming the top human teams on several. To the best of our knowledge, this is the first known AI-assisted top-tier result in competitive programming.
















