Weight transmission protocol in Federated Machine Learning - machine-learning

I am wondering, in federated machine learning, when we train our local models, and intend to update the cloud model, what protocol we use to transmit those weight? Also, when we use the tensorflow federated machine learning, how we transmit the weight (using which library and protocol)?
Kind regards,

Most authors of federated computation using TensorFlow Federated are using the "TFF Language". The specific protocol used during communication is determined by the platform running the computation and the instructions giving in the algorithm.
For computation authors, TFF supports a few different instructions for the platform which may result in different protocols, for example looking at summation operations of CLIENT values to a SERVER value:
tff.fedreated_sum that indicate any particular protocol.
tff.federated_secure_sum, tff.federated_secure_sum_bitwidth, and tfffederated_secure_modular_sum all use a secure protocol such that the server cannot learn the value of an individual summand, only the aggregate summation value (https://research.google/pubs/pub47246/ provides more details).
All of these could be composable with transport layer security schemes to prevent third parties on the network from learning transmitted values, and depend on the execution platform's implementation. For example TFF's own runtime uses gRPC which supports a few different schemes https://grpc.io/docs/guides/auth/.

Related

In FL, can clients train different model architectures?

I practice on this tutorial, I would like that each client train a different architecture and different model, Is this possible?
TFF does support different clients having different model architectures.
However, the Federated Learning for Image Classification tutorial uses tff.learning.build_federated_averaging_process which implements the Federated Averaging (McMahan et. al 2017) algorithm, defined as each client receiving the same architecture. This is accomplished in TFF by "mapping" (in the functional programming sense) the model to each client dataset to produce a new model, and then aggregating the result.
To achieve different clients having different architectures, a different federated learning algorithm would need to be implemented. There are couple (non-exhaustive) ways this could be expressed:
Implement an alternative to ClientFedAvg. This method applies a fixed model to the clients dataset. An alternate implementation could potentially create a different architecture per client.
Create a replacement for tff.learning.build_federated_averaging_process
that uses a different function signature, splitting out groups of clients
that would receive different architectures. For example, currently FedAvg
looks like:
(<state#SERVER, data#CLIENTS> → <state#SERVER, metrics#SERVER>
this could be replaced with a method with signature:
(<state#SERVER, data1#CLIENTS, data2#CLIENTS, ...> → <state#SERVER, metrics#SERVER>
This would allow the function to internally tff.federated_map() different model architectures to different client datasets. This would likely only be useful in FL simulations or experimentation and research.
However, in federated learning there will be difficult questions around how to aggregate the models back on the server into a single global model. This probably needs to be designed out first.

TFF: Does TFF support any other models except neurel networks?

I'm trying to make a comparison between different federated learning frameworks.
When looking on the TFF site, I could not find any information about which models are supported.
Looking at the 'model' API they only talked about weights,...
Am I missing something or can TFF not be used for other models except neural networks?
You can also use Keras models, which is not limited to neural networks.
A Keras model can be converted to the tff.learning.Model format using tff.learning.from_keras_model, and this can be used together with the higher level computations like tff.learning.build_federated_averaging_process. For an example of logistic regression in TFF, see for instance https://github.com/google-research/federated/tree/master/optimization/stackoverflow_lr
I also second the other answer, you can write essentially anything if needed.
TFF has conceptually two levels of API:
The low level Federated Core API of TFF supports arbitrary computation on scalars, vectors, matrices, etc; doing anything TensorFlow can do. The notion of a model is not inherit at this level and there is greater freedom. The Custom Federated Algorithms, Part 1: Introduction to the Federated Core tutorial is a good introduction.
The higher level Federated Learning API is built on top of the Federated Core API and starts to add assumptions/constraints. For example the provided FedAvg algorithm implementation mostly expects backprop style training on a model's forward pass. Other federated algorithms are definitely interesting, but may need to be build on the Federated Core API.

Does TFF serializalize functions of another library?

I'm planning a TFF scheme in which the clients send to the sever data besides the weights, like their hardware information (e.g CPU frequency). To achieve that, I need to call functions of third-party python libraries, like psutils. Is it possible to serialize (using tff.tf_computation) such kind of functions?
If not, what could be a solution to achieve this objective in a scenario where I'm using a remote executor setting through gRPC?
Unfortunately no, this does not work without modification. TFF uses TensorFlow graphs to serialize the computation logic to run on remote machines. TFF does not interpret Python code on the remote machines.
There maybe a solution by creating a TensorFlow custom op. This would mean writing C++ code to retrieve CPU frequency, and then a Python API to add the operation to the TensorFlow graph during computation construction. TensorFlow's guide for Create an op can provide detailed instructions.

Other compression methods for Federated Learning

I noticed that the Gradient Quantization compression method is already implemented in TFF framework. How about non-traditional compression methods where we select a sub-model by dropping some parts of the global model? I come across the "Federated Dropout" compression method in the paper "Expanding the Reach of Federated Learning by Reducing Client Resource Requirements" (https://arxiv.org/abs/1812.07210). Any idea if Federated Dropout method is already supported in Tensorflow Federated. If not, any insights how to implement it (the main idea of the method is dropping a fixed percentage of the activations and filters in the global model to exchange and train a smaller sub-model)?
Currently, there is no implementation of this idea available in the TFF code base.
But here is an outline of how you could do it, I recommend to start from examples/simple_fedavg
Modify top-level build_federated_averaging_process to accept two model_fns -- one server_model_fn for the global model, one client_model_fn for the smaller sub-model structure actually trained on clients.
Modify build_server_broadcast_message to extract only the relevant sub-model from the server_state.model_weights. This would be the mapping from server model to client model.
The client_update may actually not need to be changed (I am not 100% sure), as long as only the client_model_fn is provided from client_update_fn.
Modify server_update - the weights_delta will be the update to the client sub-model, so you will need to map it back to the larger global model.
In general, the steps 2. and 4. are tricky, as they depend not only what layers are in a model, but also the how they are connected. So it will be hard to create a easy to use general solution, but it should be ok to write these for a specific model structure you know in advance.
We have several compression schemas implemented in our simulator:
"FL_PyTorch: Optimization Research Simulator for Federated Learning."
https://burlachenkok.github.io/FL_PyTorch-Available-As-Open-Source/
https://github.com/burlachenkok/flpytorch
FL_PyTorch is a suite of open-source software written in python that builds on top of one of the most popular research Deep Learning (DL) frameworks PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping, and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with sufficient flexibility to experiment with existing and novel approaches to advance the state-of-the-art. The work is in proceedings of the 2nd International Workshop on Distributed Machine Learning DistributedML 2021. The paper, presentation, and appendix are available in DistributedML’21 Proceedings (https://dl.acm.org/doi/abs/10.1145/3488659.3493775).

What tools do you know for storage, version control and deploy as API service of ML models?

I found https://dataversioncontrol.com and https://hydrosphere.io/ml-lambda/. What can be more?
Convert your ML pipeline to a standardized text-based representation, and use regular version control tools (such as Git). For example, the PMML standard can represent most popular R, Scikit-Learn and Apache Spark ML transformation and model types. Better yet, after conversion to the standardized representation, all these models become directly comparable with one another (eg. measuring the "complexity" of random forest model objects between different ML frameworks).
You can build whatever APIs you like on top of this versioned base layer.
To get started with the PMML standard, please check out the Java PMML API backend project, and its Openscoring REST API frontend project.

Resources