Constant Regret in Online Decision-Making

Siddhartha Banerjee, Cornell University

Abstract: I will present a class of finite-horizon control problems, where we see a random stream of arrivals, need to select actions in each step, and where the final objective depends only on the aggregate type-action counts; this includes many widely-studied control problems including online resource-allocation, dynamic pricing, generalized assignment, online bin packing, and bandits with knapsacks. For such settings, I will introduce a unified algorithmic paradigm, and provide a simple yet general condition under which these algorithms achieve constant regret, i.e., additive loss compared to the hindsight optimal solution which is independent of the horizon and state-space. These results stem from an elementary coupling argument, which may prove useful for many other questions in online decision-making. Time permitting, I will illustrate this by showing how we can use this technique to incorporate side information and historical data in these settings, and achieve constant regret with as little as a single data trace.

The event is finished.

Local Time

  • Timezone: America/New_York
  • Date: 11 May 2022
  • Time: 17:30 - 18:30

Location

Virtual

Organizer

TILOS & OPTML++

Speaker