Personalized Federated Learning via Data-centric Regularization
Federated learning is a large-scale machine learning training paradigm where data is distributed across clients, and can be highly heterogeneous from one client to another. To ensure personalization in client models, and at the same time to ensure that the local models have enough commonality (i.e., prevent “client-drift”), it has been recently proposed to cast the federated learning problem as a consensus optimization problem, where local models are trained on local data, but are forced to be similar via a regularization term.
We proposed an improved federated learning algorithm, where we ensure consensus optimization at the representation part of each local client, and not on whole local models. This algorithm naturally takes into account that today's deep networks are often partitioned into a feature extraction part (representation) and a prediction part. Our algorithm ensures greater flexibility compared to previous works on exact shared representation in highly heterogeneous settings, as it has been seen that the representation part can differ substantially with data distribution. We validate its good performance experimentally in standard datasets.
Client Selection in Federated Learning
However, the practical convergence of federated learning is challenged by data and latency heterogeneity, arising from diverse local data distributions, computational abilities, and networking conditions. While existing research has introduced various client selection methods tackling both these heterogeneities simultaneously, these approaches are either heuristic-based without any theoretical guarantees, or infeasible to implement in practice.
We propose two novel theoretically optimal and computationally efficient client selection schemes that can handle both these heterogeneities. Our schemes minimize the theoretical runtime to convergence to select clients which can balance data and latency heterogeneity. Empirical evaluations on nine datasets with non-iid data distribution and practical delay distributions demonstrate that our algorithms are better or competitive with the best available methods for this task.