I'm trying to make a comparison between different federated learning frameworks.
When looking on the TFF site, I could not find any information about which models are supported.
Looking at the 'model' API they only talked about weights,...
Am I missing something or can TFF not be used for other models except neural networks?
You can also use Keras models, which is not limited to neural networks.
A Keras model can be converted to the tff.learning.Model format using tff.learning.from_keras_model, and this can be used together with the higher level computations like tff.learning.build_federated_averaging_process. For an example of logistic regression in TFF, see for instance https://github.com/google-research/federated/tree/master/optimization/stackoverflow_lr
I also second the other answer, you can write essentially anything if needed.
TFF has conceptually two levels of API:
The low level Federated Core API of TFF supports arbitrary computation on scalars, vectors, matrices, etc; doing anything TensorFlow can do. The notion of a model is not inherit at this level and there is greater freedom. The Custom Federated Algorithms, Part 1: Introduction to the Federated Core tutorial is a good introduction.
The higher level Federated Learning API is built on top of the Federated Core API and starts to add assumptions/constraints. For example the provided FedAvg algorithm implementation mostly expects backprop style training on a model's forward pass. Other federated algorithms are definitely interesting, but may need to be build on the Federated Core API.
Related
I noticed that the Gradient Quantization compression method is already implemented in TFF framework. How about non-traditional compression methods where we select a sub-model by dropping some parts of the global model? I come across the "Federated Dropout" compression method in the paper "Expanding the Reach of Federated Learning by Reducing Client Resource Requirements" (https://arxiv.org/abs/1812.07210). Any idea if Federated Dropout method is already supported in Tensorflow Federated. If not, any insights how to implement it (the main idea of the method is dropping a fixed percentage of the activations and filters in the global model to exchange and train a smaller sub-model)?
Currently, there is no implementation of this idea available in the TFF code base.
But here is an outline of how you could do it, I recommend to start from examples/simple_fedavg
Modify top-level build_federated_averaging_process to accept two model_fns -- one server_model_fn for the global model, one client_model_fn for the smaller sub-model structure actually trained on clients.
Modify build_server_broadcast_message to extract only the relevant sub-model from the server_state.model_weights. This would be the mapping from server model to client model.
The client_update may actually not need to be changed (I am not 100% sure), as long as only the client_model_fn is provided from client_update_fn.
Modify server_update - the weights_delta will be the update to the client sub-model, so you will need to map it back to the larger global model.
In general, the steps 2. and 4. are tricky, as they depend not only what layers are in a model, but also the how they are connected. So it will be hard to create a easy to use general solution, but it should be ok to write these for a specific model structure you know in advance.
We have several compression schemas implemented in our simulator:
"FL_PyTorch: Optimization Research Simulator for Federated Learning."
https://burlachenkok.github.io/FL_PyTorch-Available-As-Open-Source/
https://github.com/burlachenkok/flpytorch
FL_PyTorch is a suite of open-source software written in python that builds on top of one of the most popular research Deep Learning (DL) frameworks PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping, and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with sufficient flexibility to experiment with existing and novel approaches to advance the state-of-the-art. The work is in proceedings of the 2nd International Workshop on Distributed Machine Learning DistributedML 2021. The paper, presentation, and appendix are available in DistributedML’21 Proceedings (https://dl.acm.org/doi/abs/10.1145/3488659.3493775).
I'm preparing for the Azure Machine Learning exam, and here is a question confuses me.
You are designing an Azure Machine Learning workflow. You have a
dataset that contains two million large digital photographs. You plan
to detect the presence of trees in the photographs. You need to ensure
that your model supports the following:
Solution: You create a Machine
Learning experiment that implements the Multiclass Decision Jungle
module. Does this meet the goal?
Solution: You create a Machine Learning experiment that implements the
Multiclass Neural Network module. Does this meet the goal?
The answer for the first question is No while for second is Yes, but I cannot understand why Multiclass Decision Jungle doesn't meet the goal since it is a classifier. Can someone explain to me the reason?
I suppose that this is part of a series of questions that present the same scenario. And there should be definitely some constraints in the scenario.
Moreover if you have a look on the Azure documentation:
However, recent research has shown that deep neural networks (DNN)
with many layers can be very effective in complex tasks such as image
or speech recognition. The successive layers are used to model
increasing levels of semantic depth.
Thus, Azure recommends using Neural Networks for image classification. Remember, that the goal of the exam is to test your capacity to design data science solution using Azure so better to use their official documentation as a reference.
And comparing to the other solutions:
You create an Azure notebook that supports the Microsoft Cognitive
Toolkit.
You create a Machine Learning experiment that implements
the Multiclass Decision Jungle module.
You create an endpoint to the
Computer vision API.
You create a Machine Learning experiment that
implements the Multiclass Neural Network module.
You create an Azure
notebook that supports the Microsoft Cognitive Toolkit.
There are only 2 Azure ML Studio modules, and as the question is about constructing a workflow I guess we can only choose between them. (CNTK is actually the best solution as it allows constructing a deep neural network with ReLU whereas AML Studio doesn't, and API call is not about data science at all).
Finally, I do agree with the other contributors that the question is absurd. Hope this helps.
This question is indeed part of a series of questions that present the same scenario with multiple options. Both of the solutions approach the problem as a multi-class classification problem, which is correct. However, the key element here is dimensionality.
Your inputs (images) are highly dimensional which requires a deep learning approach in order to be effective. A decision jungle won't be able to learn effectively in such a high dimensional feature space, where a NN has higher chances to do so.
I hope it helps.
The statement of my exercise says : distribution of feature_3 is a hint of how the data is generated. I try to understand what I should infer from that for the rest of my ETL or ML model..
I have plotted the Q-Q plot of this feature. The distribution seems fairly normal. What can I infer from this information for the rest of my ETL or ML model ?
Most of machine learning models assume an underlying data distribution for them to function well.
So, coming back to your question, there are some ML techniques that assume that the data fed into them are normally (or Gaussian) distributed. These are Gaussian naive Bayes, Least Squares based (regression) models, LDA, QDA. So the statement you are referring to implies that your data was generated using such an algorithm and are normally distributed. See, here for a brief visual explanation of this and here for an explanation on the importance of normal distribution in Machine Learning.
In addition, please note that there are other algorithms (e.g. SVMs, Random Forests used for regression/classification, Decision trees, Gradient Boosted Trees etc) that do not assume any type of underlying data distribution.
Machine Learning - what a hoot!
I have a little project with which I would like to identify anomalies in unlabeled data. Thus, unsupervised clustering.
However, the sequence of the data is also important, as a single record may not be of interest, but the sequence of records that precede it may make it anomalous.
So I am thinking of building a Recurrent SOM to add the temporal context.
I have trained a few simple Machine Learning Models using Python Graphlab Create, Azure Machine Learning and Encog ML Framework, but Azure does not seem to provide unsupervised clustering and I am leaning towards using Encog.
I have looked at Recurrent Neural Networks in Encog, as well as SOM, but I have no idea how to combine the two. Most of the articles online regarding Feedback/Recurrent SOM Machine Learning are mostly academic.
Are there any good references for doing this with Encog?
A google search found only one good answer for RSOM in Encog: https://github.com/leadtune/encog-java/blob/master/encog-core/src/org/encog/neural/pattern/RSOMPattern.java
I'm playing around with writing a web crawler that scans for a specific set of keywords and then assigns a global score to each domain it encounters based on a cumulative score I assigned to each keyword (programming=1, clojure=2, javascript=-1, etc...).
I have set up my keyword scoring on a sliding scale of -10 to 10 and I have based my initial values on my own assumptions about what is and is not relevant.
I feel that my scoring model may be flawed, and I would prefer to feed a list of domains that match the criteria I'm trying to capture into an analysis tool and optimize my keyword weights based on some kind of statistical analysis.
What would be an appropriate analysis technique to generate an optimal scoring model for a list of "known good domains"? Is this problem suited for bayesian learning, monte carlo simulation, or some other technique?
So, given a training set of relevant and irrelevant domains, you'd like to build a model which classifies new domains to one of these categories. I assume the features you will be using are the terms appearing in the domains, i.e. this is can be framed as a document classification problem.
Generally, you are correct in assuming that letting statistical-based machine learning algorithms to do the "scoring" for you works better than assigning manual scores to keywords.
A simple way to approach the problem would be to using Bayesian learning, and specifically, Naive Bayes might be a good fit.
After generating a dataset from the domains you've manually tagged (e.g. collecting several pages from each domain and treating each as a document), you can experiment various algorithms using one of the machine learning frameworks, e.g. WEKA.
A primer on how to handle and load text documents to WEKA can be found here. After the data is loaded, you can use the framework to experiment with various classification algorithms, e.g. Naive Bayes, SVM, etc. Once you've found the method best fitting your needs, you can export the resulting model and use it via WEKA's Java API.