How objects of an image can be labeled - image-processing

I have searched a lot to know how can we test an object exist in an image. I am searching for the name of the scientific/ technology that can provide this. As an example I can mention Instagram where you upload an image and Instagram writes: This image may contain sea, people, car. Is this content based image retrieval? Do I need local feature extraction for it? Are they based on deep learning or do they work by something like SIFT?
Whatever I studied was just able to receive a query image and search a database to say that which image is "similar" to that, not which image contains it.

Yeah it uses the technique of deep learning where they train their model to recognize number of objects in an image using either bounding box approach or multilabel classification. If a new image is passed to the model, it'll predict label of all the objects present in that image.

This is known as object detection.

Related

Creating a dataset for a machine learning project

I am working to create a video from a transcript where the idea is just to choose some series of images based on the meaning of the text. I need to create a model that can pick out the images based on the text but am struggling on how to create a meaningful way to choose images and also how to actually format the dataset with the images and text so that it can be used to train a model. Has anyone done anything similar to this?

How to recognize or match two images?

I have one image stored in my bundle or in the application.
Now I want to scan images in camera and want to compare that images with my locally stored image. When image is matched I want to play one video and if user move camera from that particular image to somewhere else then I want to stop that video.
For that I have tried Wikitude sdk for iOS but it is not working properly as it is crashing anytime because of memory issues or some other reasons.
Other things came in mind that Core ML and ARKit but Core ML detect the image's properties like name, type, colors etc and I want to match the image. ARKit will not support all devices and ios and also image matching as per requirement is possible or not that I don't have idea.
If anybody have any idea to achieve this requirement they can share. every help will be appreciated. Thanks:)
Easiest way is ARKit's imageDetection. You know the limitation of devices it support. But the result it gives is wide and really easy to implement. Here is an example
Next is CoreML, which is the hardest way. You need to understand machine learning even if in brief. Then the tough part - training with your dataset. Biggest drawback is you have single image. I would discard this method.
Finally mid way solution is to use OpenCV. It might be hard but suit your need. You can find different methods of feature matching to find your image in camera feed. example here. You can use objective-c++ to code in c++ for ios.
Your task is image similarity you can do it simply and with more reliable output results using machine learning. Since your task is using camera scanning. Better option is CoreML.You can refer this link by apple for Image Similarity.You can optimize your results by training with your own datasets. Any more clarifications needed comment.
Another approach is to use a so-called "siamese network". Which really means that you use a model such as Inception-v3 or MobileNet and both images and you compare their outputs.
However, these models usually give a classification output, i.e. "this is a cat". But if you remove that classification layer from the model, it gives an output that is just a bunch of numbers that describe what sort of things are in the image but in a very abstract sense.
If these numbers for two images are very similar -- if the "distance" between them is very small -- then the two images are very similar too.
So you can take an existing Core ML model, remove the classification layer, run it twice (once on each image), which gives you two sets of numbers, and then compute the distance between these numbers. If this distance is lower than some kind of threshold, then the images are similar enough.

Recognize specific images, not the objects in the images

I need to recognize specific images using the iPhone camera. My goal is to have a set of 20 images, that when a print or other display of one of them is present in front of the camera, the app recognizes that image.
I thought about using classifiers (CoreML), but I don't think it would give the intended result. For example, if I had a model that recognizes fruits, and then I showed it two different pictures of a banana, It would recognize them both as bananas, which is not what I want. I want my app to recognize specific images, regardless of its content.
The behavior I want is exactly what ARToolKit does (https://www.artoolkit.org/documentation/doku.php?id=3_Marker_Training:marker_nft_training), but I do not wish to use this library.
So my question is: Are the any other libraries, or other ways, for me to recognize specific images from the camera on iOS (preferably in Swift).
Since you are using images specific to your use case there isn't going to be an existing model that you can use. You'd have to create a model, train it, and then import it into CoreML. It's hard to provide specific advice since I know nothing about your images.
As far as libraries are concerned checkout this list and Swift-AI.
Swift-AI has a neural network that you might be able to train if you had enough images.
Most likely you will have to create the model in another language, such as Python and then import it into your Xcode project.
Take a look at this question.
This blog post goes into some detail about how to train your own model for CoreML.
Keras is probably your best bet to build your model. Take a look at this tutorial.
There are other problems too though like you only have 20 images. This is certainly not enough to train an accurate model. Also the user can present modified versions of these images. You'd have to generate realistic sample of each possible image and then use that entire set to train the model. I'd say you need a minimum of 20 images of each image (400 total).
You'll want to pre-process the image and extract features that you can compare to the known features of your images. This is how facial recognition works. Here is a guide for facial recognition that might be able to help you with feature extraction.
Simply put without a model that is based on your images you can't do much.
Answering my own question.
I ended up following this awesome tutorial that uses OpenCV to recognize specific images, and teaches how to make a wrapper so this code can be accessed by Swift.

Representing the image data for recognition

So I am working on a project for school and what we are trying to do is to teach a neural network to recognize buildings from non-buildings. The problem I am having right now is representing the data in a form, that would be "readable" by the classifier function.
The training data is a bunch of pictures + .wkt file with coordinates of buildings on a picture. So far we have been able to rescale the polygons, but kinda got stuck there.
Can you give any hints or ideas of how to bring this all to an appropriate form?
Edit: I do not need the code written for me, a link to an article on a similar subject or a book is more of stuff I am looking for.
You did not mention what framework you are using, but I will give an answer for caffe.
Your problem is very close to detecting objects within an image. You have full images with object (building in your case) bounding boxes.
The easiest way of doing this is through a python data layer which reads an image and a file with stored coordinates for that image and feeds that into your network. A tutorial on how to use it can be found here: https://github.com/NVIDIA/DIGITS/tree/master/examples/python-layer
To accelerate the process you may want to store image, coordinate pairs in your custom lmdb database.
Finally a good working example with complete caffe implementation can be found within Faster-RCNN library here: https://github.com/rbgirshick/caffe-fast-rcnn/
You should check roi_pooling_layer.cpp in their custom caffe branch and roi_data_layer on how the data is fed into the network.

Specialization of image classifing model with user image tagging

I have a conceptual question, regarding a software process/architecture setup for machine learning. I have a web app and I am trying incorporate some machine learning algorithms that work like Facebook's face recognition (except with objects in general). So the model gets better at classifying specific images uploaded into my service (like how fb can classify specific persons, etc).
The rough outline is:
event: User uploads image; image attempts to be classified
if failure: draw a bounding box on object in image; return image
interaction: user tags object in box; send image back to server with tag
????: somehow this new image/label pair will fine tune the image classifier
I need help with the last step. Typically in transfer learning or training in general, a programmer has a large database full of images. In my case, I have a pretrained model (google's inception-v3) but my fine-tuning database is non-existent until a user starts uploading content.
So how could I use that tagging method to build a specialized database? I'm sure FB ran into this problem and solved it, but I can find their solution. After some thought (and inconclusive research), the only strategies I can think of is to either:
A) stockpile tagged images and do a big batch train
B) somehow incrementally input a few tagged images as they get
uploaded, and slowly over days/weeks, specialize the image classifier.
Ideally, I would like to avoid option A, but I not sure how realistic B is, nor if there are other ways to accomplish this task. Thanks!
Yes, this sounds like a classic example of online learning.
For deep conv nets in particular, given some new data, one can just run a few iterations of stochastic gradient descent on it, for example. It is probably a good idea to adjust the learning rate if needed as well (so that one can adjust the importance of a given sample, depending on, say, one's confidence in it).
You could also, as you mentioned, save up "mini-batches" with which to do this (depends on your setup).
Also, if you want to allow a little more specialization with your learner (e.g. between users), look up domain adaptation.

Resources