How to get the percentage of the predicted labels in Azure Custom Vision? - microsoft-custom-vision

I'm using Azure Custom Vision to build an app, that will tell me if an object is defect or not. For that i've trained a model with the Custom Vision website. The Training is done and the model ready. When i upload a picture of the object, i get predicted labels with percentage of the components in the picture.
My question is now: How i can get the data of this percentage, to work with that? I need them to implement in Visual Studio, so i can write my C#-code, to let the ai make the decision wheter the object is defect or not.

Related

Creating a dataset for a machine learning project

I am working to create a video from a transcript where the idea is just to choose some series of images based on the meaning of the text. I need to create a model that can pick out the images based on the text but am struggling on how to create a meaningful way to choose images and also how to actually format the dataset with the images and text so that it can be used to train a model. Has anyone done anything similar to this?

How objects of an image can be labeled

I have searched a lot to know how can we test an object exist in an image. I am searching for the name of the scientific/ technology that can provide this. As an example I can mention Instagram where you upload an image and Instagram writes: This image may contain sea, people, car. Is this content based image retrieval? Do I need local feature extraction for it? Are they based on deep learning or do they work by something like SIFT?
Whatever I studied was just able to receive a query image and search a database to say that which image is "similar" to that, not which image contains it.
Yeah it uses the technique of deep learning where they train their model to recognize number of objects in an image using either bounding box approach or multilabel classification. If a new image is passed to the model, it'll predict label of all the objects present in that image.
This is known as object detection.

How to improve VNDetectRectanglesRequest to VNDetectCarRequest?

I use VNImageRequestHandler and VNDetectRectanglesRequest to handle request to find rectangles in a image. But since Vision in iOS11 only provide barcode、rectangle、face finding,but I want to find cars in an image ,what should I change code to find specify object in an image?
If you’re looking for Apple to create an API named VNDetectCarRequest you should probably file a feature request. (And if it happens, I’m sure the “Apple is making a car!” rumor mill will start up again...)
For general-purpose image recognition, the path to take with Vision is to use VNCoreMLRequest and supply a machine learning model trained for the image recognition task you have in mind.
On the native programming side, all image recognition/classification tasks are the same — you can start by reusing Apple’s Classifying Images with Vision and Core ML sample code, which sets up VNCoreMLRequest and handles the VNClassificationObservation results it produces. The special sauce that changes a general “what is this” classifier into a “hotdog or not a hotdog” classifier or a “what kind of vehicle is this (if it’s one at all)” classifier is all in the model.
There might be a machine learning model that already does the task you’re looking for out there — if you find one, you can wrap it in a Core ML Model file using the scripts Apple provides.
Otherwise, you’ll need to look at one of the general purpose image classifier models out there (again, there are several already conveniently gathered on developer.apple.com) and work on specializing / retraining it to your more specific task. That part of your work is outside Apple’s API ecosystem, and there are many possible options. Web searches for “train caffe image model” or “train keras image model” or similar should be helpful there.
Once you’ve trained your model, use the Core ML tools to get it into Core ML to use with Vision.

Specialization of image classifing model with user image tagging

I have a conceptual question, regarding a software process/architecture setup for machine learning. I have a web app and I am trying incorporate some machine learning algorithms that work like Facebook's face recognition (except with objects in general). So the model gets better at classifying specific images uploaded into my service (like how fb can classify specific persons, etc).
The rough outline is:
event: User uploads image; image attempts to be classified
if failure: draw a bounding box on object in image; return image
interaction: user tags object in box; send image back to server with tag
????: somehow this new image/label pair will fine tune the image classifier
I need help with the last step. Typically in transfer learning or training in general, a programmer has a large database full of images. In my case, I have a pretrained model (google's inception-v3) but my fine-tuning database is non-existent until a user starts uploading content.
So how could I use that tagging method to build a specialized database? I'm sure FB ran into this problem and solved it, but I can find their solution. After some thought (and inconclusive research), the only strategies I can think of is to either:
A) stockpile tagged images and do a big batch train
B) somehow incrementally input a few tagged images as they get
uploaded, and slowly over days/weeks, specialize the image classifier.
Ideally, I would like to avoid option A, but I not sure how realistic B is, nor if there are other ways to accomplish this task. Thanks!
Yes, this sounds like a classic example of online learning.
For deep conv nets in particular, given some new data, one can just run a few iterations of stochastic gradient descent on it, for example. It is probably a good idea to adjust the learning rate if needed as well (so that one can adjust the importance of a given sample, depending on, say, one's confidence in it).
You could also, as you mentioned, save up "mini-batches" with which to do this (depends on your setup).
Also, if you want to allow a little more specialization with your learner (e.g. between users), look up domain adaptation.

It's possible to do object detection (one-class) in images retraining Inception model?

There is a way to do object detection, retraining Inception model provided by Google in Tensorflow? The goal is to predict wheter an image contains a defined category of objects (e.g. balls) or not. I can think about it as a one-class classification or multi-class with only two categories (ball and not-ball images). However, in the latter I think that it's very difficult to create a good training set (how many and which kind of not-ball images I need?).
Yes, there is a way to tell if something is a ball. However, it is better to use Google's Tensorflow Object Detection API for Tensorflow. Instead of saying "ball/no ball," it will tell you it thinks something is a ball with XX% accuracy.
To answer your other questions: with object detection, you don't need non-ball images for training. You should gather about 400-500 ball images (more is almost always better), split them into a training and an eval group, and label them with this. Then you should convert your labels and images into a .record file according to this. After that, you should set up Tensorflow and train.
This entire process is not easy. It took me a good couple of weeks with an iOS background to successfully train a single object detector. But it is worth it in the end, because now I can rapidly switch out images to train a different object detector whenever an app needs it.
Bonus: use this to convert your new TF model into a .mlmodel usable by iOS/Android.

Resources