Technical
Federated Learning: Privacy-Preserving AI at Scale
Training models across decentralized data without central collection.
VI
Vijayakumar S
Oct 1, 202514 min read
Learning Without Sharing
Federated Learning enables model training on user devices, sending only model updates (not data) to central servers. This preserves privacy while benefiting from large-scale data.
How Federated Learning Works
- Server sends current model to selected clients
- Clients train on local data for several epochs
- Clients send model updates (gradients/weights) back
- Server aggregates updates (FedAvg: weighted average)
- Repeat
import flwr as fl
# Define client training
class FlowerClient(fl.client.NumPyClient):
def get_parameters(self):
return model.get_weights()
def fit(self, parameters, config):
model.set_weights(parameters)
model.fit(X_train, y_train, epochs=1, batch_size=32)
return model.get_weights(), len(X_train), {}
def evaluate(self, parameters, config):
model.set_weights(parameters)
loss, accuracy = model.evaluate(X_test, y_test)
return loss, len(X_test), {"accuracy": accuracy}
# Start server
fl.server.start_server(
server_address="localhost:8080",
config=fl.server.ServerConfig(num_rounds=10)
)
Major Advances in 2025
Secure Aggregation
Cryptographic techniques ensure server never sees individual updates:
- Secret sharing splits updates across multiple servers
- Threshold signatures require minimum participants
Differential Privacy in FL
Add noise to updates to prevent inference attacks:
# Add noise before sending
noise = np.random.laplace(0, scale=sigma, size=gradient.shape)
private_gradient = gradient + noise
Heterogeneous FL
Handle devices with different data distributions, compute, and communication:
- FedProx: Add proximal term for stability
- FedNova: Normalize updates for unbalanced data
- Personalized FL: Each client gets slightly different model
Real-World Deployments 2025
- Google Keyboard: Next-word prediction without logging keystrokes
- Apple Health: Activity tracking models across iPhone users
- Healthcare: Hospital collaboration without sharing patient records
- Finance: Fraud detection across banks
Frameworks
- Flower: Most flexible, framework-agnostic
- TensorFlow Federated: Deep integration with TF
- PySyft: Privacy-preserving deep learning
- OpenFL: Intel's federated learning framework
Challenges Remaining
- Communication efficiency (can be 1000x slower than centralized)
- Statistical heterogeneity (different data across clients)
- System heterogeneity (different compute/connectivity)
- Privacy-utility tradeoff (more privacy = worse models)
VI
Vijayakumar S
AI Engineer 路 ML Enthusiast
Passionate about building intelligent systems, speech synthesis, and LLM applications. Writing about the tools and ideas shaping the next decade of software.