I have given a task to create an application, in which the image was given and i have to detect which object (out of list of finite objects) is present in that image..
Only one object is present in one image or no object in image.
the application should able to identify the object if present(any of the listed objects)
It would also be suffice if application(program) can calculate that what is probability that particular object is present in image (from the list of objects).
Can anyone suggest how to approach this problem ? opencv ?
Actually the task was to identify the logo(of some company like coke, pepsi, dell etc) from the image(if present any from the list of logos(which is finite say 100))
How can i do this project ? please help.!!!!
There are many ways of doing that but the one I like the most is building a feature set for each object and then match it in the image.
You can use SIFT for building the keypoints vector for each object. By aplying SIFT to each picture yo will get a set of descriptors for each picture (say picture, object,...).
When you get the image you want to process, use FAST for detecting points, and do cvMatchTemplate() for each different set of descriptors. The one with highest probability will tell you which objected you detected. If all probabilities are too low, then you probably don't have any object on the image.
This is just one approach I like, but it is quite state-of-the-art, precise, fast.
I recommend you googling and reading on the subject before trying to do stuff.
You want to perform object recognition, or logo recognition. There are already SO questions about this.
Here is a starting point for Opencv
The whole process took me half a minute to search for. Perhaps this is what you should start searching for
Related
I want to detect Or recognize a specific object in an image. First of all say what I have done. I tried to detect a LOGO e.g Google LOGO, I have the original image of the LOGO, but in the images which I am going to process are taken with different cameras from different angle and from different distance and from different screens (wide screen like cinema).
I am using OpenCV 3 to check whether this LOGO is in these images, I have tried the OpenCV SURF, SIFT etc functions and also tried NORM_L2 algorithm, which compares two images and template matching and also used SVM (it was so slow and not correct detection) and some other OpenCV functions, but no one was good to use. Then I did my own algorithm which is working better than the above functions, but also cannot satisfy the requirements.
Now my question is: Is there any better way to detect the specific object in an image? For example: what should I do at the first and second... steps?
Im trying implement a real time object classification program using SVM classification and BoW clustering algorithms. My questions is what are the good practices for selecting positive and negative training images?
Positive image sets
Should the background be empty? Meaning, should the image only contain the object of interest? When implementing this algorithm in real time, the test image will not contain only the object of interest, it will definitely have some information from the background as well. So instead of using isolated image collection, should I choose images which look more similar to the test images?
Negative image sets
Can these be any image set without the object of interest? Or should they be from the environment where this algorithm is going to be tested without object of interest?. For example, if I'm going to classify phones in my living room environment, should negatives be the background image set of my living room environment without the phone in the foreground? or can it be any image set? (like kitchen, living room, bedroom or outdoor images) Im asking this because, I don't want the system to be environment-specific. Must be robust at any environment (indoors and outdoors)
Thank you. Any help or advice is much appreciated.
Positive image sets
Yes you should definitely choose images which look more similar to the test images.
Negative image sets
It can be any image set however, it is better to include images from the environment where this algorithm is going to be tested without object of interest.
Generally
Please read my answer to some other SO question, it would be useful. Discussion continued in comments, so that might be useful as well.
I want to design an algorithm that would find matches in images of the same apartment, when put up by different real estate agents.
Photos are relatively taken in similar time so the interior of the rooms should not change that much but of course every guys takes different pictures from different angles, etc.
(TLDR; a apartment goes for sale, and different real estate guys come in and make their own pictures, and I want to know if the given pictures from various guys are of the same place)
I know that image processing and recognition algorithm selections highly depend on the use case, so could you point me in correct direction given my use-case?
http://reality.bazos.sk/inzerat/56232813/Prenajom-1-izb-bytu-v-sirsom-centre.php
http://reality.bazos.sk/inzerat/56371292/-PRENAJOM-krasny-1i-byt-rekonstr-Kupeckeho-Ruzinov-BA-II.php
You can actually use Clarifai's Custom Training API endpoint, fairly simple and straightforward. All you would have to do is train the initial image and then compare the second to it. If the probability is high, it is likely the same apartment. For example:
In javascript, to declare a positive it is:
clarifai.positive('http://example.com/apartment1.jpg', 'firstapartment', callback);
And a negative is:
clarifai.negative('http://example.com/notapartment1.jpg', 'firstapartment', callback);
You don't necessarily have to do a negative, but it could only help. Then, when you are comparing images to the first aparment, you do:
clarifai.predict('http://example.com/someotherapartment.jpg', 'firstapartment', callback);
This will give you a probability regarding the likeness of the photo to what you've trained ('firstapartment'). This API is basically doing machine learning without the hassle of the actual machine. Clarifai's API also has a tagging input that is extremely accurate with some basic tags. The API is free for a certain number of calls/month. Definitely worth it to check out for this case.
As user Shaked mentioned in a comment, this is a difficult problem. Even if you knew the position and orientation of each camera in space, and also the characteristics of each camera, it wouldn't be a trivial problem to match the images.
A "bag of words" (BoW) approach may be of use here. Rather than try to identify specific objects and/or deduce the original 3D scene, you determine what "feature descriptors" can distinguish objects from one another in your image sets.
https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision
Imagine you could describe the two images by the relative locations of textures and colors:
horizontal-ish line segments at far left
red blob near center left
green clumpy thing at bottom left
bright round object near top left
...
then for a reasonably constrained set of images (e.g. photos just within a certain zip code), you may be able to yield a good match between the two images above.
The Wikipedia article on BoW may look a bit daunting, but I think if you hunt around you'll find an article that describes "bag of words" for image processing clearly. I've seen a very good demo of a BoW approach used to identify objects such as boats and delivery vans in arbitrary video streams, and it worked impressively well. I wish I had a copy of the presentation to pass along.
If you don't suspect the image to change much, you could try the standard first step of any standard structure-from-motion algorithm to establish a notion of similarity between a pair of images. Any pair of images are similar if they contain a number of matching image features larger than a threshold which satisfy the geometrical constraint of the scene as well. For a general scene, that geometrical constraint is given by a Fundamental Matrix F computed using a subset of matching features.
Here are the steps. I have inserted the opencv method for each step, but you could write your methods too:
Read the pair of images. Use img = cv2.imread(filename).
Use SIFT/SURF to detect image features/descriptors in both images.
sift = cv2.xfeatures2d.SIFT_create()
kp, des = sift.detectAndCompute(img,None)
Match features using the descriptors.
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1,des2)
Use RANSAC to compute funamental matrix.
cv2.findFundamentalMatrix(pts1, pts2, cv2.FM_RANSAC, 3, 0.99, mask)
mask contains all the inliers. Simply count them to determine if the number of matches satisfying geometrical constraint is large enough.
CAUTION: In case of a planar scene, we use homography instead of a fundamental matrix and the steps described above work out pretty nicely because homography takes a point to a corresponding point in the other image. However, Fundamental matrix takes a point to the corresponding epipolar line in the other image, which makes the entire process a bit less stable. So I would recommend trying these steps a few more times with a little bit of jitter to the feature locations and collating the evidence over more than one trial to make the decision. You can also use more advanced steps to introduce robustness to this process but only if the steps described above don't yield the results you need.
Hi I am currently using OpenCV implementation of HOG and Haar Cascade to perform pedestrian detection and bounding them on a video feed.
However, I want to assign an unique id (number) for every pedestrian entering the video feed with the id remains the same until the pedestrian leaves the video feed. Since frames are processed one after another without regard of previous frame I wasn't sure how to implement this in the simplest but effective way possible.
Do I really need to use tracking algorithm like camshift or Kalman in which I have no knowledge about and could really use some help. Or is there any simpler way to achieve what I want?
P/S: This video is what I wanted to achieve. In fact I posted a similar question here before but that was more towards the detection techniques and this is towards the next step of assigning the unique identifier.
A simple solution:
Keep Track of your Objects in a Vector.
If you compute a new frame, for every Object: search for the nearest Object stored in your Vector. If the distance between the stored object and your current Object is below a certain threshold it is the same Object.
If no Match is found the Object is new. At the end delete all Objects in your Vector that are not associated with an Object of the current frame.
When you will use detectMultiScale to get the matches, you will have a std:Vector<cv:Rect> structure which will have all the detected pedestrians. While iterating through them for drawing, you can assign a number to each unique cv::Rect being detected (you may need to write a slightly deeper test for this, to check for overlapping rectangles) which you can then draw (let's say on the top) of the corresponding rectangle.
HTH
I have a simple photograph that may or may not include a logo image. I'm trying to identify whether a picture includes the logo shape or not. The logo (rectangular shape with a few extra features) could be of various sizes and could have multiple occurrences. I'd like to use Computer Vision techniques to identify the location of these logo occurrences. Can someone point me in the right direction (algorithm, technique?) that can be used to achieve this goal?
I'm quite a novice to Computer Vision so any direction would be very appreciative.
Thanks!
Practical issues
Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in Logo recognition in images, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find here some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.
General description and lingo
This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".
So first you go and find features in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of invariances: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.
Then you do some clean-up, like remove features that are not strong enough.
Then you go to your database. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —
Look-up. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like Hough transform and Geometric hashing.
Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a convolution) to check if it's really there.
You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some outlier detection. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.
That's it, you are done!