How to build a federated system with CSV dataset with SparkNL library?

How to build a federated system with CSV dataset with SparkNL library? - tensorflow-federated

I am very interested in federated systems and i was trying one of the pre trained multilingual models such as this notebook Multi_Lingual_Training_and_models.
I was looking for any tutorials using TFF or Flower frameworks that handle csv datasets.
So could you suggest any tutorials or github repositories to help me do that with TFF or Flower!

Related

How to deploy machine learning model saved as pickle file on AWS SageMaker

I have built an XGBoost Classifier and RandomForest Classifier model for the audio classification project. I want to deploy these models which are saved in pickle (.pkl) format on AWS Sagemaker. From what I have observed, there isn't a lot of resources available online. Can anyone guide me with the steps and if possible also provide the code? I already have the models built and I'm just left with deploying it on Sagemaker.

By saying that you want to deploy to sagemaker, I assume you mean a sagemaker endpoint.
The answer is sagemaker inference toolkit. It's basically about educating sagemaker how to load and do inference. More details here: https://github.com/aws/sagemaker-inference-toolkit and here is an example implementation: https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/multi_model_bring_your_own

Is there a way to use external, compiled packages for data processing in Google's AI Platform?

I would like to set up a prediction task, but the data preprocessing step requires using tools outside of Python's data science ecosystem, though Python has APIs to work with those tools (e.g. a compiled java NLP tool set). I first thought about creating a Docker container to have an environment with those tools available, but a commentator has said that that is not currently supported. Is there perhaps some other way to make such tools available to the Python prediction class needed for AI Platform? I don't really have a clear sense of what's happening on the backend with AI platform, and how much ability a user has to modify or set that up.

Not possible today. Is there any specific use case you are targeting not satisfied today?
Cloud AI platform offers multiple prediction frameworks (TensorFlow, scikit-learn, XGboost, Pytorch, Custom predictions) in multiple versions.

After looking into the requirements you can use the new AI Platform feature custom prediction, https://cloud.google.com/ml-engine/docs/tensorflow/custom-prediction-routine-keras
To deploy a custom prediction routine to serve predictions from your trained model, do the following:
Create a custom predictor to handle requests
Package your predictor and your preprocessing module. Here you can install your custom libraries.
Upload your model artifacts and your custom code to Cloud Storage
Deploy your custom prediction routine to AI Platform

Image Classification in Azure Machine Learning

I'm preparing for the Azure Machine Learning exam, and here is a question confuses me.
You are designing an Azure Machine Learning workflow. You have a
dataset that contains two million large digital photographs. You plan
to detect the presence of trees in the photographs. You need to ensure
that your model supports the following:
Solution: You create a Machine
Learning experiment that implements the Multiclass Decision Jungle
module. Does this meet the goal?
Solution: You create a Machine Learning experiment that implements the
Multiclass Neural Network module. Does this meet the goal?
The answer for the first question is No while for second is Yes, but I cannot understand why Multiclass Decision Jungle doesn't meet the goal since it is a classifier. Can someone explain to me the reason?

I suppose that this is part of a series of questions that present the same scenario. And there should be definitely some constraints in the scenario.
Moreover if you have a look on the Azure documentation:
However, recent research has shown that deep neural networks (DNN)
with many layers can be very effective in complex tasks such as image
or speech recognition. The successive layers are used to model
increasing levels of semantic depth.
Thus, Azure recommends using Neural Networks for image classification. Remember, that the goal of the exam is to test your capacity to design data science solution using Azure so better to use their official documentation as a reference.
And comparing to the other solutions:
You create an Azure notebook that supports the Microsoft Cognitive
Toolkit.
You create a Machine Learning experiment that implements
the Multiclass Decision Jungle module.
You create an endpoint to the
Computer vision API.
You create a Machine Learning experiment that
implements the Multiclass Neural Network module.
You create an Azure
notebook that supports the Microsoft Cognitive Toolkit.
There are only 2 Azure ML Studio modules, and as the question is about constructing a workflow I guess we can only choose between them. (CNTK is actually the best solution as it allows constructing a deep neural network with ReLU whereas AML Studio doesn't, and API call is not about data science at all).
Finally, I do agree with the other contributors that the question is absurd. Hope this helps.

This question is indeed part of a series of questions that present the same scenario with multiple options. Both of the solutions approach the problem as a multi-class classification problem, which is correct. However, the key element here is dimensionality.
Your inputs (images) are highly dimensional which requires a deep learning approach in order to be effective. A decision jungle won't be able to learn effectively in such a high dimensional feature space, where a NN has higher chances to do so.
I hope it helps.

Difference between spacy_sklearn and tensorflow_embedding pipelines

I want to know if there is any basic difference between how spacy_sklearn and tensorflow_embedding pipelines operate under the hood.I mean tensorflow_embedding must also be using the same concepts of word embeddings,reducing the dimensionality of data using PCA etc. Is the only difference then that spacy_sklearn has some pre trained data to draw upon in the form of pre trained vectors and tensorflow pipeline does not?Is my understanding correct?Also how is tensorflow_embedding pipeline related to the tensorflow framework offered by google?
I tried looking up tensorflow framework on google, but could not get any specific answer.I also searched about it on RASA community page, but again found no help

The spacy_sklearn pipeline uses pre-trained word vectors.This is useful if we don’t have very much training data.
The tensorflow embedding pipeline doesn’t use any pre-trained word vectors,it fits specifically for our dataset. The advantage of the tensorflow_embedding pipeline is that the word vectors will be customised for our domain.
For more information ,please refer the below link
https://rasa.com/docs/nlu/choosing_pipeline/

SVM for web application

I'm working on a project where I train a text classifier and I need to create a web app to let the user enter text for classification. Currently all the code is written in Python and I'm using scikit-learn library. I've encountered a problem installing the scikit-learn on heroku, in order for my Python code to run on the server. I don't mind changing everything (Python language, Flask web framework, scikit learning library, heroku web-app hosting services), I just need to get this thing to work :)
Do any one of you in CV community had any experience in making a web-app that uses a learning library online? The web app hosting should be a free one though, as this project is not commercial, and also it would be very nice to have Python behind the scenes.
N.B. The classifiers that should be supported by the library are multiclass svm and naive bayes.

How about trying google app engine? It has python (2.5 and 2.7) and can be free.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart