Recognize specific images, not the objects in the images - ios

I need to recognize specific images using the iPhone camera. My goal is to have a set of 20 images, that when a print or other display of one of them is present in front of the camera, the app recognizes that image.
I thought about using classifiers (CoreML), but I don't think it would give the intended result. For example, if I had a model that recognizes fruits, and then I showed it two different pictures of a banana, It would recognize them both as bananas, which is not what I want. I want my app to recognize specific images, regardless of its content.
The behavior I want is exactly what ARToolKit does (https://www.artoolkit.org/documentation/doku.php?id=3_Marker_Training:marker_nft_training), but I do not wish to use this library.
So my question is: Are the any other libraries, or other ways, for me to recognize specific images from the camera on iOS (preferably in Swift).

Since you are using images specific to your use case there isn't going to be an existing model that you can use. You'd have to create a model, train it, and then import it into CoreML. It's hard to provide specific advice since I know nothing about your images.
As far as libraries are concerned checkout this list and Swift-AI.
Swift-AI has a neural network that you might be able to train if you had enough images.
Most likely you will have to create the model in another language, such as Python and then import it into your Xcode project.
Take a look at this question.
This blog post goes into some detail about how to train your own model for CoreML.
Keras is probably your best bet to build your model. Take a look at this tutorial.
There are other problems too though like you only have 20 images. This is certainly not enough to train an accurate model. Also the user can present modified versions of these images. You'd have to generate realistic sample of each possible image and then use that entire set to train the model. I'd say you need a minimum of 20 images of each image (400 total).
You'll want to pre-process the image and extract features that you can compare to the known features of your images. This is how facial recognition works. Here is a guide for facial recognition that might be able to help you with feature extraction.
Simply put without a model that is based on your images you can't do much.

Answering my own question.
I ended up following this awesome tutorial that uses OpenCV to recognize specific images, and teaches how to make a wrapper so this code can be accessed by Swift.

Related

How to recognize or match two images?

I have one image stored in my bundle or in the application.
Now I want to scan images in camera and want to compare that images with my locally stored image. When image is matched I want to play one video and if user move camera from that particular image to somewhere else then I want to stop that video.
For that I have tried Wikitude sdk for iOS but it is not working properly as it is crashing anytime because of memory issues or some other reasons.
Other things came in mind that Core ML and ARKit but Core ML detect the image's properties like name, type, colors etc and I want to match the image. ARKit will not support all devices and ios and also image matching as per requirement is possible or not that I don't have idea.
If anybody have any idea to achieve this requirement they can share. every help will be appreciated. Thanks:)
Easiest way is ARKit's imageDetection. You know the limitation of devices it support. But the result it gives is wide and really easy to implement. Here is an example
Next is CoreML, which is the hardest way. You need to understand machine learning even if in brief. Then the tough part - training with your dataset. Biggest drawback is you have single image. I would discard this method.
Finally mid way solution is to use OpenCV. It might be hard but suit your need. You can find different methods of feature matching to find your image in camera feed. example here. You can use objective-c++ to code in c++ for ios.
Your task is image similarity you can do it simply and with more reliable output results using machine learning. Since your task is using camera scanning. Better option is CoreML.You can refer this link by apple for Image Similarity.You can optimize your results by training with your own datasets. Any more clarifications needed comment.
Another approach is to use a so-called "siamese network". Which really means that you use a model such as Inception-v3 or MobileNet and both images and you compare their outputs.
However, these models usually give a classification output, i.e. "this is a cat". But if you remove that classification layer from the model, it gives an output that is just a bunch of numbers that describe what sort of things are in the image but in a very abstract sense.
If these numbers for two images are very similar -- if the "distance" between them is very small -- then the two images are very similar too.
So you can take an existing Core ML model, remove the classification layer, run it twice (once on each image), which gives you two sets of numbers, and then compute the distance between these numbers. If this distance is lower than some kind of threshold, then the images are similar enough.

How to do segmentation based on some filters(e.g. TRAFFIC SIGNALS) from live streaming data

I am supposed to do traffic symbols recognition from live streaming data. Please tell me how to automate the process of segmentation. I am able to recognize the symbols using Neural Networks from segmented data but stuck in the segmentation part.
I have tried it using YOLO, but I think I am lacking something.
I have also tried it with openCV.
please help
INPUT IMAGE FRAME FROM LIVE STREAM
OUTPUT
I would suggest you follow this link:
https://github.com/AlexeyAB/darknet/tree/47c7af1cea5bbdedf1184963355e6418cb8b1b4f#how-to-train-pascal-voc-data
It's very simple to follow. You basicly need to do 2 steps. Installing and creating the data you want to use (road signs in your case).
So follow the installation guide and then try to find a dataset of road signs, use your own or create your own data set. You will need the annotation files as well (you can generate them yourself easily if you use your own dataset(s) - this is explained in the link as well). You don't need a huge amount of pictures, because darknet will augment the images automaticly (just resizing though). If you use a pretrained version you should get "ok" results pretty fast ~after 500 iterations.

Recognize "generic" objects

I'm working on a project for visually impaired people that converts the visual world to audio.
We prefer to create a prototype that doesn't need an internet connection. So we chose to work with OpenCV. After reading (a lot of) tutorials and documentation we were able to train OpenCV in recognizing specific objects.
For example: we trained OpenCV to recognize a certain chair and a door. That works fine.
But, we also tried to train OpenCV on a "generic" level. It should be possible to recognize (almost) all chairs. We did that by training OpenCV with a lot of positive and negative images as explained here: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
The actual result wasn't what we expected -he could not recognize any chair-. I know, there are a lot of different parameters to take into account (maybe we did something wrong with that) and we experimented a lot. But our time (and unfortunately our knowledge of opencv) is limited.
We are looking for some advice on how to train opencv to recognize generic objects.
Where do we start?
Is opencv even suited to do that?
Thank you for your time!
Open CV is the library to use. But object recognition is tricky. Often when people say they are doing "object recognition" they are not, they are processing one image, or at best a series of related images, to separate into object and background.
To recognise a "chair" - everything from an armchair to a dining chair to a throne - would be almost impossible. I'd want at least stereo images to give a chance to detect flat surfaces. I don't doubt that with a lot of work you can get quite a good result, maybe just recognising dining -style chairs, but it's skilled work, it's not just a case of feeding a few parameters to a hierarchical classifier.

Comparing images using OpenCv or something more useful

I need to compare two images in a project,
The images would be two fruits of the same kind -let's say two different images of two different apples-
To be more clear, the database will have images of the stages which an apple takes from the day it was picked from a tree until it gets rotten..
The user would upload an image of the apple they have and the software should compare it to all those images in the database and retrieve the data of the matching image and tell the user at which stage is it...
I did compare before images using OpenCv emgu but I really don't have much knowledge if it's the best way...
I need an expert advise is what i said in the project even possible? or the whole database images' will match the user's image!
And is this "image processing" or something else?
And is there any suggested tutorials to learn how to do this?
I know it seems not totally clear yet, but it's just a crazy idea that I wish I can get a way to know more how i can bring it to life!
N.B the project will be an android application
This is an example of a supervised image classification problem, which is a pretty broad field. You can read up on image classification here.
The way that you would approach this problem would be to define a few stages of decay (fresh, starting to rot, half rotten, completely rotten), put together a dataset of many images of the fruit in each stage, and train an image classifier on each stage. The sample dataset should contain images of many different pieces of fruit in many different settings. If you want to support different types of fruit, you would need to train a different classifier for each fruit.
There are many image classification tools out there. To name a few:
OpenCV's haar classifier
dlib's hog classifier
Matlab's Computer Vision System Toolbox
VLFeat
It would be up to you to look into which approach would work best for your situation.
Given that this is a fairly broad problem, I wouldn't expect to come up with a solid solution quickly unless you've had experience with image classification. If you are trying to develop a product, I would recommend getting in touch with a computer vision expert that you could contract to solve it.
If you are just looking to learn more about image classification, however, this could be a fun way to play around with different tools and get a feel for what's out there. You may want to start by learning about Machine Learning in general. Caltech offers a free online course that gives a pretty good intro to the subject.

Sketch-based Image Retrieval with OpenCV or LIRe

I'm currently reading for BSc Creative Computing with the University of London and I'm in my last year of my studies. The only remaining module I have left in order to complete the degree is the Project.
I'm very interested in the area of content-based image retrieval and my project idea is based on that concept. In a nutshell, my idea is to help novice artists in drawing sketches in perspective with the use of 3D models as references. I intend to achieve this by rendering the side/top/front views of each 3D model in a collection, pre-process these images and index them. While drawing, the user gets a series of models (that have been pre-processed) that best match his/her sketch, which can be used as guidelines to further enhance the sketch. Since this approach relies on 3D models, it is also possible for the user to rotate the sketch in 3D space and continue drawing based on that perspective. Such approach could help comic artists or concept designers in quickly sketching their ideas.
While carrying out my research I came across LIRe and I must say I was really impressed. I've downloaded the LIRe demo v0.9 and I played around with the included sample. I've also developed a small application which automatically downloades, indexes and searches for similar images in order to better understand the inner workings of the engine. Both approaches returned very good results even with a limited set of images (~300).
Next experiment was to test the output response when a sketch rather than an actual image is provided as input. As mentioned earlier, the system should be able to provide a set of matching models based on the user's sketch. This can be achieved by matching the sketch with the rendered images (which are of course then linked to the 3D model). I've tried this approach by comparing several sketches to a small set of images and the results were quite good - see http://claytoncurmi.net/wordpress/?p=17. However when I tried with a different set of images, results weren't as good as the previous scenario. I used the Bag of Visual Words (using SURF) technique provided by LIRe to create and search through the index.
I'm also trying out some sample code that comes with OpenCV (I've never used this library and I'm still finding my way).
So, my questions are;
1..Has anyone tried implementing a sketch-based image retrieval system? If so, how did you go about it?
2..Can LIRe/OpenCV be used for sketch-based image retrieval? If so, how this can be done?
PS. I've read several papers about this subject, however I didn't find any documentation about the actual implementation of such system.
Any help and/or feedback is greatly appreciated.
Regards,
Clayton

Resources