I'm currently reading for BSc Creative Computing with the University of London and I'm in my last year of my studies. The only remaining module I have left in order to complete the degree is the Project.
I'm very interested in the area of content-based image retrieval and my project idea is based on that concept. In a nutshell, my idea is to help novice artists in drawing sketches in perspective with the use of 3D models as references. I intend to achieve this by rendering the side/top/front views of each 3D model in a collection, pre-process these images and index them. While drawing, the user gets a series of models (that have been pre-processed) that best match his/her sketch, which can be used as guidelines to further enhance the sketch. Since this approach relies on 3D models, it is also possible for the user to rotate the sketch in 3D space and continue drawing based on that perspective. Such approach could help comic artists or concept designers in quickly sketching their ideas.
While carrying out my research I came across LIRe and I must say I was really impressed. I've downloaded the LIRe demo v0.9 and I played around with the included sample. I've also developed a small application which automatically downloades, indexes and searches for similar images in order to better understand the inner workings of the engine. Both approaches returned very good results even with a limited set of images (~300).
Next experiment was to test the output response when a sketch rather than an actual image is provided as input. As mentioned earlier, the system should be able to provide a set of matching models based on the user's sketch. This can be achieved by matching the sketch with the rendered images (which are of course then linked to the 3D model). I've tried this approach by comparing several sketches to a small set of images and the results were quite good - see http://claytoncurmi.net/wordpress/?p=17. However when I tried with a different set of images, results weren't as good as the previous scenario. I used the Bag of Visual Words (using SURF) technique provided by LIRe to create and search through the index.
I'm also trying out some sample code that comes with OpenCV (I've never used this library and I'm still finding my way).
So, my questions are;
1..Has anyone tried implementing a sketch-based image retrieval system? If so, how did you go about it?
2..Can LIRe/OpenCV be used for sketch-based image retrieval? If so, how this can be done?
PS. I've read several papers about this subject, however I didn't find any documentation about the actual implementation of such system.
Any help and/or feedback is greatly appreciated.
Regards,
Clayton
Related
I am using bounding box marking tools like BBox and YOLO marker for object localisation. I wanted to know is there any equivalent marking tools available for image segmentation tasks. How people in academia and research are preparing data sets for these image segmentation tasks. Recent Kaggle competition severstal-steel-defect-detection has pixel level segmentation information. Which tool they used to prepare this data?
Generally speaking it is a pretty complex but a common task, so you'll likely be able to find several tools. Supervise.ly is a good example. Look through the demo to understand the actual complexity.
Another way is to use OpenCV to get some specific results. We did that, but results were pretty rough. Another problem is performance. There are couple reasons we use 4K video.
Long story short, we decided to implement a custom tool to get required results (and do that fast enough).
(see in action)
Just to summarize, if you want to build a training set for segmentation you have the following options:
Use available services (pretty much all of them will require additional manual work)
Use OpenCV to deal with a specially prepared input
Develop a custom solution to deal with a properly prepared input, providing full control and accurate results
The third option seems to be the most flexible solution. Here are some examples. Those are custom multi-color segmentation results. You might got an impression custom implementation is way more complex, but as it turned out if you properly implement some straight-forward algorithm you might be surprised with the result. We were interested in accurate pixel-perfect results:
(see in action)
I have created a simple script to generate colored masks of annotation to be used for semantic segmentation.
You will need to use VIA(VGG image annotator) tool which gives the facility to mark a region in polygon. Once a polygon is created on a class/attribute you can give an attribute name and save the annotation as csv file. The x,y coordinates of the polygon gets saved in the csv file basically.
The program and steps to use are present at: https://github.com/pateldigant/SemanticAnnotator
If you have any doubts regarding the use of this script you can comment down.
I have one image stored in my bundle or in the application.
Now I want to scan images in camera and want to compare that images with my locally stored image. When image is matched I want to play one video and if user move camera from that particular image to somewhere else then I want to stop that video.
For that I have tried Wikitude sdk for iOS but it is not working properly as it is crashing anytime because of memory issues or some other reasons.
Other things came in mind that Core ML and ARKit but Core ML detect the image's properties like name, type, colors etc and I want to match the image. ARKit will not support all devices and ios and also image matching as per requirement is possible or not that I don't have idea.
If anybody have any idea to achieve this requirement they can share. every help will be appreciated. Thanks:)
Easiest way is ARKit's imageDetection. You know the limitation of devices it support. But the result it gives is wide and really easy to implement. Here is an example
Next is CoreML, which is the hardest way. You need to understand machine learning even if in brief. Then the tough part - training with your dataset. Biggest drawback is you have single image. I would discard this method.
Finally mid way solution is to use OpenCV. It might be hard but suit your need. You can find different methods of feature matching to find your image in camera feed. example here. You can use objective-c++ to code in c++ for ios.
Your task is image similarity you can do it simply and with more reliable output results using machine learning. Since your task is using camera scanning. Better option is CoreML.You can refer this link by apple for Image Similarity.You can optimize your results by training with your own datasets. Any more clarifications needed comment.
Another approach is to use a so-called "siamese network". Which really means that you use a model such as Inception-v3 or MobileNet and both images and you compare their outputs.
However, these models usually give a classification output, i.e. "this is a cat". But if you remove that classification layer from the model, it gives an output that is just a bunch of numbers that describe what sort of things are in the image but in a very abstract sense.
If these numbers for two images are very similar -- if the "distance" between them is very small -- then the two images are very similar too.
So you can take an existing Core ML model, remove the classification layer, run it twice (once on each image), which gives you two sets of numbers, and then compute the distance between these numbers. If this distance is lower than some kind of threshold, then the images are similar enough.
Here's some research I have done so far:
- I have used Google Vision API to detect various face landmarks.
Here's the reference: https://developers.google.com/vision/introduction
Here's the link to Sample Code to get the facial landmarks. It uses the same Google Vision API. Here's the reference link: https://github.com/googlesamples/ios-vision
I have gone through the various blogs on internet which says MSQRD based on the Google's cloud vision. Here's the link to it: https://medium.com/#AlexioCassani/how-to-create-a-msqrd-like-app-with-google-cloud-vision-802b578b30a0
For Android here's the reference:
https://www.raywenderlich.com/158580/augmented-reality-android-googles-face-api
There are multiple paid SDK's which full fills the purpose. But they are highly priced. So cant able to afford it.
For instance:
1) https://deepar.ai/contact/
2) https://www.luxand.com/
There is possibility might have some see this question as duplicate of this:
Face filter implementation like MSQRD/SnapChat
But the thread is almost 1.6 years old with no right answers to it.
I have gone through this article:
https://dzone.com/articles/mimic-snapchat-filters-programmatically-1
It describes all the essential steps to achieve the desired results. But they advice to use their own made SDK.
As per my research no good enough material is around which helps to full fill the desired results like MSQRD face filters.
One more Github repository around which has same implementation but it doesn't gives much information about same.
https://github.com/rootkit/LiveFaceMask
Now my question is:
If we have the facial landmarks using Google Vision API (or even using
DiLib), how I can add 2d or 3d models over it. In which format this
needs to be done like this require some X,Y coordinates with vertices
calculation.
NOTE: I have gone through the Googles "GooglyEyesDemo" which adds the
preview layer over eyes. It basically adds a view over the face. So I
dont want to add UIView one dimensional preview layers over it. Image
attached for reference :
https://developers.google.com/vision/ios/face-tracker-tutorial
Creating Models: I also want to know how to create models for live
filters like MSQRD. I welcome any software or format recommendations.
Hope the research I have done will help others and someone else
experience helps me to achieve the desired results. Let me know if any
more details are required.**
Image attached for more reference:
Thanks
Harry
Canvas class is used in android for drawing such 3D / 2D models or core graphics for IOS can be used.
What you can do is detect the face components, take their location points and draw images on top of them. Consider going through this
You need to either predict x,y,z coordinates(check out this demo), either use x,y predictions but then find parameters of universal 3d-model & camera that will give the closest projection of current x,y.
I want to recognize nutrient information from package labels Sample Nutrient label . This is one package image, different brands may style/layout their labels differently. But I know some things for sure, layout would be somewhat tabular with certain key words in heading like 'Nutrient' as well as the content of the table will have certain common words, like Energy/Fat etc. I want to extract these values in text form and save it into my db.
The sample image is part of a bigger problem, finding the contour/box that might contain this section 'Nutrient Label'.
As I understand their are 3 broad steps.
Scan the input image (product front/back/side image) to look for the best contour that could be my target contour containing these nutrient information
Go to this contour and perform OCR (possibly retain the layout information and not output everything in 1 line)
scan the text and look for needed info.
I am a beginner in Image Recognition. it would be a great help,
If i could get a feedback on my approach. for instance should I look for text in Image or gather similar images and train a model and then do classification? similar to performing face recognition.
if someone has already solved this problem, it would be great to get some pointers (their is no fun reinventing the wheel).
If its a research problem, then relevant code/libraries/pointers/similar SO questions that I could refer to.
It would be highly appreciable if the answers are not general (like perform feature extraction, I would no clue what is feature extraction, instead a sample code pointer would be awesome.)
I thank you for your time and help.
thanks
Chahat
It would be required to collect at least 200-300 images for sufficient training.
2/3. I did solve the problem, but it was done using not a free solution, so I'm not supposed to give a directions here.
I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.