Federated Learning
A distributed learning paradigm in which many clients collaboratively train a shared model under privacy and communication constraints, exchanging only model updates rather than raw data.
Federated learning (FL) trains a shared model across many clients — phones, hospitals, banks, edge sensors — without centralising their raw data. Each client performs local optimisation on its own dataset and uploads only model updates (gradients or weights) to a coordinator, which aggregates the updates into a new global model and ships it back. The canonical algorithm, FedAvg, runs several SGD steps per client between rounds and averages client weights at the server. Around that simple loop the field studies four interacting axes: communication cost (uploading and downloading model parameters dominates round time on real networks), statistical heterogeneity (clients’ local distributions are non-IID, so naïve averaging degrades convergence), privacy and trust (clients may be untrusted or adversarial; updates can leak training data), and governance (regulators may force a client’s data to be forgotten after training). Methodological work in FL is best understood as attacks on these four constraints, and most architectural proposals address one or two of them at the cost of another.
Reducing communication cost
Model updates are large; many of FL’s most cited methods buy bandwidth in different currencies. Xiong et al. (2023) propose FedDM, which replaces model averaging with iterative distribution matching: each client synthesises a small set of pseudo-samples whose embeddings match the local data distribution, and the server trains the global model on the union of pseudo-samples rather than on raw weights. Communication scales with the synthetic dataset rather than with the model, so the savings grow as models get bigger. A related strand replaces model parameters entirely. Guo et al. (2023) introduce PromptFL for the foundation-model era: when every client already has access to a frozen pretrained backbone, federated training only needs to share prompts, which are orders of magnitude smaller than the underlying model. PromptFL recasts the federated problem as federated prompt tuning and demonstrates that this both cuts uplink/downlink bandwidth and enables non-IID clients to converge on a useful set of cooperatively-learned prompts. Song et al. (2023) push the idea further into resource-constrained edge settings via decentralised dataset distillation: each client distils its private dataset into a compact synthetic surrogate that captures the same training signal, and the surrogates — not the model weights — circulate between peers. The three approaches map a clean design space: send less information per round (FedDM), send updates to fewer parameters (PromptFL), or send a distilled view of the data instead of the model (Song et al.).
Aggregation rules under heterogeneity
When clients hold non-IID data, server-side averaging silently underfits the heads of the data distribution and forgets the tails. Luo et al. (2023) attack this with GradMA, a gradient-memory accelerator: the server maintains a running memory of past client gradients and uses it both to correct the current round’s aggregate and to alleviate the catastrophic forgetting that arises when clients participate only intermittently. The memory acts as a regulariser that biases the global model toward directions consistent with the full client population rather than the small subset participating in any given round. Chen et al. (2023) take a complementary route with Elastic Aggregation, an alternative to FedAvg in which each parameter’s aggregation weight is scaled by an elasticity coefficient that captures how sensitive the global loss is to perturbations in that parameter; insensitive parameters absorb large client-specific updates without disturbing the global solution, while sensitive parameters are aggregated conservatively. Both papers fit a broader pattern in heterogeneous FL: introduce a sufficient statistic — past gradients, parameter sensitivities, client similarity — that the naïve averaging procedure throws away.
Unlearning and the right to erasure
Regulations such as the GDPR’s right to erasure require that a model trained on a user’s data behave, after a deletion request, as if that user had never contributed. In centralised learning this is already hard; in FL it is harder, because client contributions are interleaved over many rounds and the server never observes the data directly. Su et al. (2023) define asynchronous federated unlearning and propose a procedure that removes the influence of a withdrawing client without retraining the global model from scratch. The method tracks per-client update fingerprints during training and, on a deletion request, performs a calibrated rollback that approximately reverses the relevant contributions while letting the rest of the federation continue training in parallel. The framing matters: deletion is no longer a one-shot artefact of post-training data scrubbing but a first-class operation in the FL protocol, with its own latency, cost, and approximation guarantees.
Adversarial robustness and poisoning
Because clients directly upload model updates, FL exposes a wider attack surface than centralised training. Yuan et al. (2023) study federated recommender systems, where the global model recommends items to users while their interaction logs stay local. They show that an attacker controlling a small number of clients can poison the system with synthetic user trajectories and bias recommendations toward arbitrary items, and they propose countermeasures based on update-clustering and anomaly detection at the server. The result is a clean case study of the general FL poisoning problem: the server must distinguish honest non-IID updates from adversarially crafted ones using only the updates themselves, since it has no access to the underlying data. Open methodological questions extend across the four axes above: can communication-efficient protocols and unlearning compose without exploding cost? Are aggregation rules robust under both heterogeneity and a Byzantine fraction of clients? And how do federated foundation models, in which only adapters or prompts circulate, change the geometry of all four constraints at once?
Prerequisites
Sources
- paper · primary · 2023guo-tao-2023
- paper · primary · 2023xiong-2023
- paper · supporting · 2023song-rui-2023
-
- paper · primary · 2023luo-kangyang-2023
-
- paper · primary · 2023yuan-wei-2023
In context
Where this topic sits in the prerequisite graph. Click any node to jump.
Reviewed by
Review this topic
This page was drafted by an agent and is waiting on expert review. Spotted a wrong prerequisite, a missing concept, a misattributed source, or a factual slip? Tell us — your review opens a tracked issue maintainers act on.