Does TFF support deployment across different devices and clouds? - tensorflow-federated

I would like to deploy TFF in a way, where I have one central (aggregation) server on a VM in a cloud and two different VMs with nodes, that train the model. Is this possible with TFF? Does it have the protocols necessary to communicate over the internet etc. or is it more of a Tensorflow with FL algorithms that can be used with different frameworks that provide the architecture?
Thank you

You can run TFF simulations on GCP: https://www.tensorflow.org/federated/gcp_setup

Related

Weight transmission protocol in Federated Machine Learning

I am wondering, in federated machine learning, when we train our local models, and intend to update the cloud model, what protocol we use to transmit those weight? Also, when we use the tensorflow federated machine learning, how we transmit the weight (using which library and protocol)?
Kind regards,
Most authors of federated computation using TensorFlow Federated are using the "TFF Language". The specific protocol used during communication is determined by the platform running the computation and the instructions giving in the algorithm.
For computation authors, TFF supports a few different instructions for the platform which may result in different protocols, for example looking at summation operations of CLIENT values to a SERVER value:
tff.fedreated_sum that indicate any particular protocol.
tff.federated_secure_sum, tff.federated_secure_sum_bitwidth, and tfffederated_secure_modular_sum all use a secure protocol such that the server cannot learn the value of an individual summand, only the aggregate summation value (https://research.google/pubs/pub47246/ provides more details).
All of these could be composable with transport layer security schemes to prevent third parties on the network from learning transmitted values, and depend on the execution platform's implementation. For example TFF's own runtime uses gRPC which supports a few different schemes https://grpc.io/docs/guides/auth/.

Using scikit-learn on Databricks

Scikit-Learn algorithms are single node implementations. Does this mean, that they are not an appropriate choice for building machine learning models on Databricks cluster for the reason that they cannot take advantage of the cluster computing resources ?
They are not appropriate, in the sense that, as you say, they cannot take advantage of the cluster computing resources, which Databricks is arguably all about. The raison d'ĂȘtre of Databricks is Apache Spark, and specifically for ML tasks, its ML library Spark MLlib.
This does not mean that you cannot use scikit-learn in Databricks (you'll find that a Databricks cluster comes by scikit-learn installed by default), only that it is usable for problems that do not actually require a cluster. If you want to exploit the cluster resource capabilities for ML, you need to revert to Spark MLlib.
I think desertnaut hit the nail on the head here. I believe Scikit Learn algos are designed only for non-parallel processing jobs, and all the MLlib stuff is designed to leverage cluster compute resources and parallel processing resources. Take a look at the link below for sample code for standard regression and classification tasks.
https://spark.apache.org/docs/latest/ml-classification-regression.html
In addition, here are some code samples for different clustering tasks.
https://spark.apache.org/docs/latest/ml-clustering.html
That should probably cover most of the things you will be doing.
I believe that it depends on the task at hand. I see two general scenarios:
Your data is big and does not fit into memory. Go with the Spark MLlib and their distributed algos.
Your data is not that big and you want to utilize sheer computing power. The typical use case is hyperparameter search.
Databricks allow for distributing such workloads from the driver node to the executors with hyperopt and its SparkTrials (random + Bayesian search).
Some docs here>
http://hyperopt.github.io/hyperopt/scaleout/spark/
However, there are much more attempts to make the sklearn on spark work. You can supposedly distribute the workloads through UDF, using joblib, or others. I am investigating the issue myself, and will update the answer later.

how to choose parallel computing framework for machine learning?

how to choose parallel computing framework for machine learning? I am a beginner, I saw there are Spark,Hadoop, OpenMP...what should I consider besides the language?
Look up Horovod from Uber, it's specifically designed for machine learning, available for several frameworks such as tensorflow/pytorch. It's available in Docker image repository on AWS too.

How to get a specific machine type for ML Engine online prediction?

Is there an option to request a faster node for online prediction in ML Engine?
For example, when training I can configure any of these machines for my job:
standard,
large_model,
complex_model_s,
complex_model_m,
complex_model_l,
standard_gpu,
complex_model_m_gpu,
complex_model_l_gpu,
standard_p100,
complex_model_m_p100
See description of available clusters and machines for training here and here
I am struggling to find if it is possible to control what kind of machine runs my online prediction.
We are currently adding that capability and will let you know when it's publicly available.
ML Engine offers 4-core instance type in addition to the default serving instance type for online prediction. However the feature is still at alpha stage and it will only be available to a selected list of accounts who opted in as "Trusted Testers". Please contact cloudml-feedback#google.com if you need help to setup prediction service with faster node.

Platform for benchmarking of classifiers

I need a platform (java) using for testing of different text classifiers by single training/benchmarking data. Of cause, different classifiers may come from different vendors and have different APIs. Obviously, I will have to write adapters. The propose of the platform is to manage training data and invocation of training/classification/benchmarking. Are you familiar with such open source?

Resources