Technical

Federated Learning: Privacy-Preserving AI at Scale

Training models across decentralized data without central collection.

VI
Vijayakumar S
Oct 1, 202514 min read
Federated Learning Architecture

Learning Without Sharing

Federated Learning enables model training on user devices, sending only model updates (not data) to central servers. This preserves privacy while benefiting from large-scale data.

How Federated Learning Works

  1. Server sends current model to selected clients
  2. Clients train on local data for several epochs
  3. Clients send model updates (gradients/weights) back
  4. Server aggregates updates (FedAvg: weighted average)
  5. Repeat
import flwr as fl

# Define client training
class FlowerClient(fl.client.NumPyClient):
    def get_parameters(self):
        return model.get_weights()
    
    def fit(self, parameters, config):
        model.set_weights(parameters)
        model.fit(X_train, y_train, epochs=1, batch_size=32)
        return model.get_weights(), len(X_train), {}
    
    def evaluate(self, parameters, config):
        model.set_weights(parameters)
        loss, accuracy = model.evaluate(X_test, y_test)
        return loss, len(X_test), {"accuracy": accuracy}

# Start server
fl.server.start_server(
    server_address="localhost:8080",
    config=fl.server.ServerConfig(num_rounds=10)
)

Major Advances in 2025

Secure Aggregation

Cryptographic techniques ensure server never sees individual updates:

  • Secret sharing splits updates across multiple servers
  • Threshold signatures require minimum participants

Differential Privacy in FL

Add noise to updates to prevent inference attacks:

# Add noise before sending
noise = np.random.laplace(0, scale=sigma, size=gradient.shape)
private_gradient = gradient + noise

Heterogeneous FL

Handle devices with different data distributions, compute, and communication:

  • FedProx: Add proximal term for stability
  • FedNova: Normalize updates for unbalanced data
  • Personalized FL: Each client gets slightly different model

Real-World Deployments 2025

  • Google Keyboard: Next-word prediction without logging keystrokes
  • Apple Health: Activity tracking models across iPhone users
  • Healthcare: Hospital collaboration without sharing patient records
  • Finance: Fraud detection across banks

Frameworks

  • Flower: Most flexible, framework-agnostic
  • TensorFlow Federated: Deep integration with TF
  • PySyft: Privacy-preserving deep learning
  • OpenFL: Intel's federated learning framework

Challenges Remaining

  • Communication efficiency (can be 1000x slower than centralized)
  • Statistical heterogeneity (different data across clients)
  • System heterogeneity (different compute/connectivity)
  • Privacy-utility tradeoff (more privacy = worse models)
VI
Vijayakumar S
AI Engineer 路 ML Enthusiast

Passionate about building intelligent systems, speech synthesis, and LLM applications. Writing about the tools and ideas shaping the next decade of software.