In FL, can clients train different model architectures? - tensorflow-federated

I practice on this tutorial, I would like that each client train a different architecture and different model, Is this possible?

TFF does support different clients having different model architectures.
However, the Federated Learning for Image Classification tutorial uses tff.learning.build_federated_averaging_process which implements the Federated Averaging (McMahan et. al 2017) algorithm, defined as each client receiving the same architecture. This is accomplished in TFF by "mapping" (in the functional programming sense) the model to each client dataset to produce a new model, and then aggregating the result.
To achieve different clients having different architectures, a different federated learning algorithm would need to be implemented. There are couple (non-exhaustive) ways this could be expressed:
Implement an alternative to ClientFedAvg. This method applies a fixed model to the clients dataset. An alternate implementation could potentially create a different architecture per client.
Create a replacement for tff.learning.build_federated_averaging_process
that uses a different function signature, splitting out groups of clients
that would receive different architectures. For example, currently FedAvg
looks like:
(<state#SERVER, data#CLIENTS> → <state#SERVER, metrics#SERVER>
this could be replaced with a method with signature:
(<state#SERVER, data1#CLIENTS, data2#CLIENTS, ...> → <state#SERVER, metrics#SERVER>
This would allow the function to internally tff.federated_map() different model architectures to different client datasets. This would likely only be useful in FL simulations or experimentation and research.
However, in federated learning there will be difficult questions around how to aggregate the models back on the server into a single global model. This probably needs to be designed out first.

Related

Do unsupervised machine learning model features need to be independent?

I'm training an unsupervised machine learning model and want to make sure my features are as useful as possible!
Do unsupervised machine learning model featured need to be independent? For example, I have a feature (subscriptionId) that is the subscription Id of different cloud accounts within a Tenant. I also have a feature that is the resourceId of a resource within the subscription.
However, this resourceId contains the subscriptionId. Is it best practice to combine these features or remove one feature (e.g. subscriptionId) to avoid dependence and duplication among dataset features?
For unsupervised learning, commonly used for clustering, association, or dimensionality reduction, features don't need to be fully independent, but if you have many unique values it's likely that your models can learn to differentiate on these high entropy values instead of learning interesting or significant things as you might hope.
If you're working on generative unsupervised models, for customers, I cannot express how much risk this may create, for security and secret disclosure, for Oracle Cloud Infrastructure (OCI) customers. Generative models are premised on regurgitating their inputs, and thousands of papers have been written on getting private information back out of trained models.
It's not clear what problem you're working on, and the question seems early in its formulation.
I recommend you spend time delving into the limits of statistics and data science, which are the foundation of modern popular machine learning methods.
Once you have an idea of what questions can be answered well by ML, and what can't, then you might consider something like fastAI's course.
https://towardsdatascience.com/the-actual-difference-between-statistics-and-machine-learning-64b49f07ea3
https://www.nature.com/articles/nmeth.4642
Again, depending on how the outputs will be used or who can view or (even indirectly) query the model, it seems unwise to train on private values, especially if you want to generate outputs. ML methods are only useful if you have access to a lot of data, and if you have access to the data of many users, you need to be good steward of Oracle Cloud customer data.

Limited number of clients used in federated learning

I just started studying federated learning and want to apply it to a certain dataset, and there are some questions that have risen up.
My data is containing records of 3 categories, each of which is having 3 departments. I am planning to have 3 different federated learning models for each category and treat the three department of this category as the distributed clients.
Is this possible? or building federated learning models requires having thousands of clients?
Thanks
Difficult to say by what you have provided in your question. Usually, when building a federated learning system, you are extending your centralized approach to one with data split/partitioned between segregated clients. Again, depending on the type of data you have, the type of task you are trying to solve and also the amount of data required to solve the task in a centralized approach, these factors along with other ones will depend how many clients you can use and how much data is required at each client. Additionally, the aggregation method that you wish to use combine the parameters from different clients will affect this. I suggest experimenting with different client numbers and partitioning methods and seeing what suits your needs.

TFF: Does TFF support any other models except neurel networks?

I'm trying to make a comparison between different federated learning frameworks.
When looking on the TFF site, I could not find any information about which models are supported.
Looking at the 'model' API they only talked about weights,...
Am I missing something or can TFF not be used for other models except neural networks?
You can also use Keras models, which is not limited to neural networks.
A Keras model can be converted to the tff.learning.Model format using tff.learning.from_keras_model, and this can be used together with the higher level computations like tff.learning.build_federated_averaging_process. For an example of logistic regression in TFF, see for instance https://github.com/google-research/federated/tree/master/optimization/stackoverflow_lr
I also second the other answer, you can write essentially anything if needed.
TFF has conceptually two levels of API:
The low level Federated Core API of TFF supports arbitrary computation on scalars, vectors, matrices, etc; doing anything TensorFlow can do. The notion of a model is not inherit at this level and there is greater freedom. The Custom Federated Algorithms, Part 1: Introduction to the Federated Core tutorial is a good introduction.
The higher level Federated Learning API is built on top of the Federated Core API and starts to add assumptions/constraints. For example the provided FedAvg algorithm implementation mostly expects backprop style training on a model's forward pass. Other federated algorithms are definitely interesting, but may need to be build on the Federated Core API.

Other compression methods for Federated Learning

I noticed that the Gradient Quantization compression method is already implemented in TFF framework. How about non-traditional compression methods where we select a sub-model by dropping some parts of the global model? I come across the "Federated Dropout" compression method in the paper "Expanding the Reach of Federated Learning by Reducing Client Resource Requirements" (https://arxiv.org/abs/1812.07210). Any idea if Federated Dropout method is already supported in Tensorflow Federated. If not, any insights how to implement it (the main idea of the method is dropping a fixed percentage of the activations and filters in the global model to exchange and train a smaller sub-model)?
Currently, there is no implementation of this idea available in the TFF code base.
But here is an outline of how you could do it, I recommend to start from examples/simple_fedavg
Modify top-level build_federated_averaging_process to accept two model_fns -- one server_model_fn for the global model, one client_model_fn for the smaller sub-model structure actually trained on clients.
Modify build_server_broadcast_message to extract only the relevant sub-model from the server_state.model_weights. This would be the mapping from server model to client model.
The client_update may actually not need to be changed (I am not 100% sure), as long as only the client_model_fn is provided from client_update_fn.
Modify server_update - the weights_delta will be the update to the client sub-model, so you will need to map it back to the larger global model.
In general, the steps 2. and 4. are tricky, as they depend not only what layers are in a model, but also the how they are connected. So it will be hard to create a easy to use general solution, but it should be ok to write these for a specific model structure you know in advance.
We have several compression schemas implemented in our simulator:
"FL_PyTorch: Optimization Research Simulator for Federated Learning."
https://burlachenkok.github.io/FL_PyTorch-Available-As-Open-Source/
https://github.com/burlachenkok/flpytorch
FL_PyTorch is a suite of open-source software written in python that builds on top of one of the most popular research Deep Learning (DL) frameworks PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping, and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with sufficient flexibility to experiment with existing and novel approaches to advance the state-of-the-art. The work is in proceedings of the 2nd International Workshop on Distributed Machine Learning DistributedML 2021. The paper, presentation, and appendix are available in DistributedML’21 Proceedings (https://dl.acm.org/doi/abs/10.1145/3488659.3493775).

How would one implement class weighting for individual federated learning clients?

I am attempting to utilise TensorFlow Federated for an image classification task with 7 classes and 3-5 clients. Each client has a different class distribution of labels. I have successfully implemented this tutorial for my use-case and am now looking for improvements. I have a few questions:
Can individual clients have different class weights in their loss function based on the class distribution that is unique to that client?
If so, how would one implement this?
If not, is it because federated averaging process requires that the clients and the global model share the same loss function?
If i understand your question, I can say yes, individual clients can have different class weight, in this case we talk about non iid data . Suppose that we have 7 labels, each client have data from 1 or 2 labels.

Resources