I'm testing couple of IBM Watson APIs like the following:
Does Watson get smarter and learn more about my data the more I use it?
I read that Watson is getting smarter with more data it learns and processes. I'm not sure if this is only done behind the scenes by IBM Watson team, or if these API's as well allow an instance of Watson for example to be smarter with my specific application I'm developing.
If you mean that Watson is using the data you input into your instances, then no. Watson is IBM's, but your data is always yours.
By default, instances are isolated.
By smarter, they mean they have their very own instances of APIs which they train. Also they, improve algorithms behind the scenes.
It depends on your definition of learning. Is it offline learning or online learning? Do you refer to Watson learning from your corpus on the entire domain and use it later on, or just on your data.
It also depends which services you use, check out Retrieve and Rank or Natural Language Classifier for examples of services that learn from your data
Related
I currently have a simple Machine Learning infrastructure running locally and I want to migrate this all onto Google Cloud. I simply fetch the data I need from a database, build my model and then test the model on test data. This is all done in PyCharm locally.
I want to simply migrate this and have the possibility for all this to be done on Google Cloud, while having the flexibility to make local changes that can apply when run on the cloud as well. There are many Google Cloud resources relating to this and so I am looking for best practices people follow on running such a procedure.
Thanks and please let me know if there are any clarifications needed.
I highly suggest you to take a look at this machine learning workflow in the cloud which consists of:
Data Ingestion and Collection
Storing the data.
Processing data.
ML training.
ML deployment.
Data Ingestion and Collection
There are multiple resources you can use if you would like to ingest data with Google Cloud Platform. The simplest solution I can recommend to you are both Google Compute Engine or an App Engine App (for example for a forum where a user fill some data up).
Nonetheless, if you would like to ingest data in real-time, you can also use Cloud Pub/Sub.
Storing the data
As you mentioned, you are retrieving all the information from a database. If you are used to work with SQL or NoSQL I highy suggest you to go after Cloud SQL. Not only provides a good interface when building your instance, but also lets you access it securely and very rapidly.
If it not the case, you can also use Google Cloud Storage or BigQuery, but over those two, I will pick BigQuery since it has also the possibility to work with stream data.
Processing data
For processing data before feeding it to the model you can use either:
Cloud DataFlow: Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed.
Cloud Dataproc: Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.
Cloud Dataprep: Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning.
ML training & ML deployment
For training/deploying your ML model I would suggest to use AI platform.
AI Platform makes it easy for machine learning developers, data scientists, and data engineers to take their ML projects from ideation to production and deployment, quickly and cost-effectively.
If you have to work with huge datasets, the best practices are run the model as a Tensorflow job with AI Platform so you can have a training cluster.
Finally for deploying your models using AI Platform, you can take a look here.
I'm preparing for the Azure Machine Learning exam, and here is a question confuses me.
You are designing an Azure Machine Learning workflow. You have a
dataset that contains two million large digital photographs. You plan
to detect the presence of trees in the photographs. You need to ensure
that your model supports the following:
Solution: You create a Machine
Learning experiment that implements the Multiclass Decision Jungle
module. Does this meet the goal?
Solution: You create a Machine Learning experiment that implements the
Multiclass Neural Network module. Does this meet the goal?
The answer for the first question is No while for second is Yes, but I cannot understand why Multiclass Decision Jungle doesn't meet the goal since it is a classifier. Can someone explain to me the reason?
I suppose that this is part of a series of questions that present the same scenario. And there should be definitely some constraints in the scenario.
Moreover if you have a look on the Azure documentation:
However, recent research has shown that deep neural networks (DNN)
with many layers can be very effective in complex tasks such as image
or speech recognition. The successive layers are used to model
increasing levels of semantic depth.
Thus, Azure recommends using Neural Networks for image classification. Remember, that the goal of the exam is to test your capacity to design data science solution using Azure so better to use their official documentation as a reference.
And comparing to the other solutions:
You create an Azure notebook that supports the Microsoft Cognitive
Toolkit.
You create a Machine Learning experiment that implements
the Multiclass Decision Jungle module.
You create an endpoint to the
Computer vision API.
You create a Machine Learning experiment that
implements the Multiclass Neural Network module.
You create an Azure
notebook that supports the Microsoft Cognitive Toolkit.
There are only 2 Azure ML Studio modules, and as the question is about constructing a workflow I guess we can only choose between them. (CNTK is actually the best solution as it allows constructing a deep neural network with ReLU whereas AML Studio doesn't, and API call is not about data science at all).
Finally, I do agree with the other contributors that the question is absurd. Hope this helps.
This question is indeed part of a series of questions that present the same scenario with multiple options. Both of the solutions approach the problem as a multi-class classification problem, which is correct. However, the key element here is dimensionality.
Your inputs (images) are highly dimensional which requires a deep learning approach in order to be effective. A decision jungle won't be able to learn effectively in such a high dimensional feature space, where a NN has higher chances to do so.
I hope it helps.
I am looking for solutions where I can automatically approve or disapprove different supplier invoices based on historical data.
Let's say, I got an invoice from an HP laptop supplier and based on the previous data, I have to approve or reject that invoice.
Basically, I want to make a decision or prediction based on the data already available based on the history with artificial intelligence, machine learning or any other cloud service
This isn't a direct question though but you can start by looking into various methods of classifications. There is a huge amount of material available online. Try reading about K-Nearest Neighbors, Naive Bayes, K-means, etc. to get an idea about how algorithms in Machine Learning domain work. Once you start understanding what is written in the documentation then start implementing them. You will face a lot of problems which you can search online and I'm sure you will find most of them answered here in this portal.
I'm currently designing a data warehouse in BigQuery. I'm planning to store user data like past purchases or abandoned carts.
This seems to be perfect to manually analyze trends and to get insights. But what if I want to leverage Machine Learning, e.g. to suggest products to a group of users?
I have looked into Google ML Engine and TensorFlow, and it seems like the TensorFlow model would need to query BigQuery first. In some scenarios, this could mean that TensorFlow would need to query all or most of the data that is stored in BigQuery.
This feels a bit off, so I'm wondering if this is really how things are supposed to happen. Otherwise, I assume that my ML model would have to work with stale data?
So I would agree with you, using BigQuery as a data warehouse for your ML is expensive. It would be cheaper and much more efficient to use Google Cloud Storage to store all the data you wish to process. Once everything is processed and generated, you may then wish to push that data to BigQuery push that data to another source like Spanner or even Cloud Storage.
That being said Google has now created a beta product BigQuery ML. This now allows users to create and execute machine learning models in BigQuery via the use of SQL queries. I believe it uses python and tensorflow under the hood, but I believe it would be the best solution given that you have a light weight ML load.
Since it is still in beta as of now, I don't know well it's performance compares to Google ML engine and tensorflow.
Depending on what kind of model you want to train and how you want to server the model you can do one the following options:
You can export your data to Google Cloud Storage as CSV and then read the files in Cloud ML Engine. This will let you use the power of Tensorflow and you can then use Cloud ML Engine's serving system to send traffic to your model.
On the downside, this means that you have to export all of your BigQuery data to GCS and every time you decide to make any change to the data you need to go back to BigQuery and export again. Also if the data you want to prediction on is in BigQuery you have to export that as well and send it to Cloud ML Engine using a separate system.
If you want to explore and interactively train Logistic or Linear regression models on your data, you can use BigQuery Machine learning. This will allow you to slice and dice your data in BigQuery and experiment with different parts of your data and various preprocessing options. You can also use all the power of SQL. BigQuery ML also allows you to use the model after training within BigQuery (you can use SQL to feed data in to the model).
For many cases using full power of Tensorflow (i.e. using DNNs) is not necessary. This is especially true for structured data. On the other hand, most of your time will be spent on preprocessing and cleaning the data which would be much easier in SQL in BigQuery.
So you have two options here. Choose based on your needs.
P.S.: You can also try using BigQuery Reader in Tensorflow. I don't recommend it as it is very slow. But if your data is not huge it may work for you.
IBM Watson has a capability where you can train the classifiers on Watson using your images but I am unable to find a similar capability on Google Cloud Vision API? What I want is that I upload 10-15 classes of images and on the bases of upload images classify any images loaded after that. IBM Bluemix (Watson) has this capability but their pricing is significantly higher than Google. I am open to other services as well, if prices ares below Google's
As far as I know Google Cloud Vision API cannot be trained with your custom data. However, there is a service called vize.ai, where you can define your custom classes and upload the images, the training is for free and the prices for API usage are below Google's and IBM's.
Disclaimer: I'm vize.it co-founder
Edit: Link changed
You can train your own models using Cloud AutoML Vision. There are 2 different ways to do this:
Cloud-hosted models.
Edge exportable models.
With some work you can train a model for free using TensorFlow - see the model training section.
However, they have released an already trained model, so if you're lucky and what you want to classify already overlaps with their model, then no training is needed.
Azure has started this now, google for "azure custom vision" this is still a preview service but with good accuracy at least for our workload which is preschool children images.