Emgu CV Surf picture detection against known database? - opencv

I'm trying to compare an image against a known set of images and find the closest match(es) using Emgu CV and Surf. I've found a lot of people trying to do the same thing but not a complete solution that uses the GPU for speed.
The closest I've gotten is the tutorial here:
http://romovs.github.io/blog/2013/07/05/matching-image-to-a-set-of-images-with-emgu-cv/
However that doesn't take advantage of the GPU and it's really slow for my application. I need something fast like the SurfFeature sample.
So I tried to refactor that tutorial code to match the SurfFeature logic that uses the GPU. Everything was going well with GpuMat's replacing Matrix here and there. But I ran into a major problem when I got to the core of the tutorial above, that is to say, the logic that concatenates all of the descriptors into one large matrix. I couldn't find a way to append GpuMat's to each other - even if I could do that, there's no guarantee that the FlannIndex search routine would even work with the Gpu-based code.
So now I'm stuck on something I thought would be relatively straight-forward. There are certainly a number of people trying to do this over the years so I'm really surprised that there isn't a published solution.
If you could help me, I'd be most appreciative. To summarize, I need to do the following:
Build a large in-memory (on the GPU) list of descriptors and keypoints for a known set of images using Surf (as per the SurfFeature sample). Given an unknown image, search against the in-memory stuff to find the closest match (if any).
Thanks in advance if you can help!

Related

Recognize "generic" objects

I'm working on a project for visually impaired people that converts the visual world to audio.
We prefer to create a prototype that doesn't need an internet connection. So we chose to work with OpenCV. After reading (a lot of) tutorials and documentation we were able to train OpenCV in recognizing specific objects.
For example: we trained OpenCV to recognize a certain chair and a door. That works fine.
But, we also tried to train OpenCV on a "generic" level. It should be possible to recognize (almost) all chairs. We did that by training OpenCV with a lot of positive and negative images as explained here: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
The actual result wasn't what we expected -he could not recognize any chair-. I know, there are a lot of different parameters to take into account (maybe we did something wrong with that) and we experimented a lot. But our time (and unfortunately our knowledge of opencv) is limited.
We are looking for some advice on how to train opencv to recognize generic objects.
Where do we start?
Is opencv even suited to do that?
Thank you for your time!
Open CV is the library to use. But object recognition is tricky. Often when people say they are doing "object recognition" they are not, they are processing one image, or at best a series of related images, to separate into object and background.
To recognise a "chair" - everything from an armchair to a dining chair to a throne - would be almost impossible. I'd want at least stereo images to give a chance to detect flat surfaces. I don't doubt that with a lot of work you can get quite a good result, maybe just recognising dining -style chairs, but it's skilled work, it's not just a case of feeding a few parameters to a hierarchical classifier.

How to detect architecture and sculpture in opencv?

can someone tell me how i can detect pictures of architecture or sculpture?
I think hough-transforming is a good approach. But i'm new in CV and maybe there a better methods to detect pattern. I heard about haarcascade. can i take this for architecture,too?
For example i want to detect those kind of pictures:
Image Hosted by ImageShack.us http://img842.imageshack.us/img842/4748/resizeimg0931.jpg
If you want an algorithm to detect them, then detecting an object from an image need a description of that object which can be understood by a machine or computer. For a sculpture or architecture, how can you have such uniform definition since they vary a lot in every sense? For example both your input images vary a lot. How can we differentiate between a house and an architecture? A lot of problems will rise in your question. Even with Hough Transforming, how you are supposed to differentiate a big house and a big architecture?
Check out this SOF : Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
He wants to detect coca-cola cans, and not coca-cola bottles. But if you look into it clearly, you will understand can and bottles are almost alike and it will be difficult to differentiate between them. You can find a lot of its difficulties in subsequent answers. Major problem is that, in some cases, it will be difficult for humans as well to differentiate them.
In your second image, even if you train some cascades for second image, there is a change it will detect live lions if they are present in your image, since a sculpture lion and an original lion seems almost same for a machine.
Haar cascades may not be much effective since you have to train for a lot of these kinds of images.
If you have some sample images and want to check if those things are there in your image, may be you can use SURF features etc. But you may need some sample images first to compare. For a demo of SURF, check out this SOF : OpenCV 2.4.1 - computing SURF descriptors in Python
Another option is template matching. But it is slow, and it is not scale and orientation invariant. And you need some template images for this
I think I have seen some papers relating this topic ( but i don't remember now). May be googling will get you them. I will update the answer if I get it.

Is it possible to see the current iteration number in OpenCV's cvKmeans2?

I'm trying to cluster a really large dataset - 3030764x162 into 4000 clusters using the cvKmeans2 function in OpenCV 2.1.
I would like to see which iteration the K-means algorithm is currently in (similar to what is displayed in Matlab), but I don't see any documentation that points to how I can do this.
It's kind of frustrating seeing a blank screen and not knowing when the code is going to terminate!
Thank you.
Unfortunate as it seems, the answer is No, you cannot. There are no debugging/informative statements anywhere in the kmeans function as provided by OpenCV. However, you may edit and add statements to the method as you deem appropriate.
#Sau,
May be you need some other way of doing it. Though my answer is not relevant to OpenCV.
I have not tried in OpenCV, I had once done KMeans clustering for a extremely large data set and it was more a option better than OpenCV as it worked in a distributed mode. Though very lengthy, but still you might be interested. Its Kmeans clustering using Mahout
Check it out

image segmentation techniques

I am working on a computer vision application and I am stuck at a conceptual roadblock. I need to recognize a set of logos in a video, and so far I have been using feature matching methods like SIFT (and ASIFT by Yu and Morel), SURF, FERNS -- basically everything in the "Common Interfaces of Generic Descriptor Matchers" section of the OpenCV documentation. But recently I have been researching methods used in OCR/Random Trees classifier (I was playing with this dataaset: http://archive.ics.uci.edu/ml/datasets/Letter+Recognition) and thinking that this might be a better way to go about finding the logos. The problem is that I can't find a reliable way to automatically segment an arbitrary image.
My questions:
Should I bother looking into methods other than descriptor/keypoint, or is this the
best way to recognize a typical logo (stylized, few colors, sharp edges)?
How can I segment an arbitary image (or a video frame, in my case) so that I can properly
match against a sample database?
It would seem that HaarCascades work in a similar way (databases of samples), but I
can't figure out how the processes are related. Is there segmentation going on there?
Sorry of these questions are too broad. I'm trying to wrap my head around this stuff with little help. Thanks!
It seems like segmentation is not what you want. I think it has to do more with object detection and recognition. You want to detect the presence of a certain set of logos, in a certain set of images. This doesn't seem related to segmentation which is about labeling surfaces or areas of a common color, texture, shape, etc., although examining segmentation based methods may be useful.
I would definitely encourage you to look at problem and examine all possible methods that can be applied, not only the fashionable ones (such as SIFT, GLOH, SURF, etc). I would recommend you look at older, simpler methods like simple template matching, chamfering, etc.
Haar cascades became popular after a 2000 paper by Viola and Jones used for face detection (similar to what you see in modern point and click cameras). It does sound a bit similar to the problem you are interested in. You should perhaps also examine this part of the problem, but try not to focus too much on the learning part.

GUI version of OpenCV for feature-detection (SIFT etc.) prototyping before actual project development?

I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.

Resources