Empirical Network Optimization via Distributed Zeroth-order Optimization

Distributed zeroth-order optimization forms the backbone of empirical network optimization, and has wide applications in federated learning, where multiple agents use their local data and computation to collectively train a model. There are two main challenges in practical network optimization tasks. The first is the unavailable gradient of objective functions. This happens when the model is a black box or the relationship between the cost function and the model parameters is so complicated that the direct calculation of the derivatives could be expensive or infeasible. The second challenge is that in practical distributed systems there is non-negligible communication delay. We propose an approximate gradient-descent based method that can cope with these two challenges.

Architecture design is a fundamental component of TILOS research. The central goal of this TILOS project is to apply AI-enabled optimization methods for blackbox function optimization to the design and fine-tuning of large-scale networks. We have sought to automate the architecture design cycle in a data-driven manner by modeling the network performance metric as an unknown function of the architecture hyperparameters. We recently arrived at an entirely new framework that combines adaptive sampling and Gaussian process bandit optimization to inherently account for reward shaping by focusing on instance-dependent performance metrics and adaptivity to hyper-parameters. We now focus on extending these adaptive sampling approaches to account for model misspecification, resulting in an optimization framework that simultaneously adapts the sampling strategy and the underlying models based on the data.

Consider the problem of n agents working jointly to minimize a decomposable objective function fi(x) = Σi=1..n fi(x), where fi is the local cost function that is accessible to agent i, and only zeroth-order information (i.e., function value) is available. The goal is to design an algorithm where agents use zeroth-order information to estimate gradient and communicate gradient information through a network to collaboratively minimize f(x).

We proposed an algorithm using a two-point gradient estimator and demonstrated that if the communication delay is bounded by a constant, we can design a certain step size sequence to get convergence to the optimal point (or stationary point for nonconvex functions) with the optimal rate that aligns with traditional stochastic gradient descent method.

Team Members

Tara Javidi1

Collaborators

Behrouz Touri1

1. UC San Diego

Publications

IEEE Control Systems Letters 2023 >