What image recognition technology is good at identifying a low resolution object?
Specifically I want to match a low resolution object to a particular item in my database.
Specific Example:
Given a picture with bottles of wine, I want to identify which wines are in it. Here's an example picture:
I already have a database of high resolution labels to match to.
Given a high-res picture of an individual bottle of wine - it was very easy to match it to its label using a Vuforia (service for some image recongition). However the service doesn't work well for lower resolution matching, like the bottles in the example image.
Research:
I'm new to this area of programming, so apologies for any ambiguities or obvious answers to this question. I've been researching, but theres a huge breadth of technologies out there for image recognition. Evaluating each one takes a significant amount of time, so I'll try to keep this question updated as I research them.
OpenCV: seems to be the most popular open source computer vision library. Many modules, not sure which are applicable yet.
haar-cascade feature detection: helps with pre-processing an image by orienting a component correctly (e.g. making a wine label vertical)
OCR: good for reading text at decent resolutions - not good for low-resolution labels where a lot of text is not visible
Vuforia: a hosted service that does some types of image recognition. Mostly meant for augmented reality developers. Doesn't offer control over algorithm. Doesn't work for this kind of resolution
Related
This might be a very broad question so I'm sorry in advance. I'd like to also point out I'm new in the CV field, so my insight in this field is minimum.
I am trying to find correspondences between points from a FLIR image and a VIS image. I'm currently building 40x40 pixels regions around keypoints, over which I'm applying the LoG. I'm trying to compare them to find the most similar regions.
For example, I have these data sets:
Where the columns represent, in this order:
the image for which I'm trying to find a correspondent
the candidate images
the LoG of the first column
the LoG of the second column
It is very clear, for the human eye, that the third image is the best match for the first set, while the first image is the best image for the second set.
I have tried various ways of expressing a similarity/disimilarity between these images, such as SSD, Cross Correlation, or Mutual Information, but they all fail to be consistent (they only work in some cases).
Now, my actual question is:
What should I use to express the similarity between images in a more semantic way, such that shapes would be more important in deciding the best match, rather than actual intensities of the pixels? Do you know of any technique that would aid me in my quest of finding these matches?
Thank you!
Note: I'm using OpenCV with Python right now, but the programming language and library is not important.
I'd like to find images, from a database I have, using a webcam.
Specifically, I'd like to set up a "price kiosk" where people can walk up with an item, put it in front of the camera, and have it search on the database for the price. For seveal reasons (ease of use being the most important) I don't want to use the barcodes in products.
The items are relatively easy to scan (they are, for practical purposes, 2D: they are comic books). I have all the covers already scanned. So what I'd like is some way to take the image from the webcam and use it as a source for the search. Of course the image will be distorted (angle, focus, resolution, lighting, rotation, etc). This isn't a problem for Google Goggles (Google Images really), as I've scanned comic book covers in a number of conditions and it's able to find them.
Now, i've been doing some research. I've seem pretty awesome things done with OpenCV, which makes me think this shouldn't be too difficult to implement. Especially considering my dataset is much smaller (about 2000 different products) than google's.
What am I looking for, specifically? Object detection, recognition, features...? I'm confused and I don't even know where to start.
Read up on SIFT: (Scale invariant feature transform)
What is a good feature extraction algorithm for images consisting largely of text (possibly rotated and scaled)?
An example use-case would be that I scan a document, extract features from it, and then match the features to those of frames from a video of a desk to find the time when the document was sitting on the desk.
To be more precise, I am aware that there exist numerous feature extraction algorithms, but I'm wondering if there are any such algorithms that can take advantage of prevalence of text in an image (high contrast, many corners, etc), and then find an occurrence of that image (possibly affine-transformed in some way) in a larger, non-text-only image.
You should definitely refer to the Locally Likely Arrangement Hashing method (a.k.a geometrical features) which is precisely used for camera-based document image retrieval.
I am building an iOS app that, as a key feature, incorporates image matching. The problem is the images I need to recognize are small orienteering 10x10 plaques with simple large text on them. They can be quite reflective and will be outside(so the light conditions will be variable). Sample image
There will be up to 15 of these types of image in the pool and really all I need to detect is the text, in order to log where the user has been.
The problem I am facing is that with the image matching software I have tried, aurasma and slightly more successfully arlabs, they can't distinguish between them as they are primarily built to work with detailed images.
I need to accurately detect which plaque is being scanned and have considered using gps to refine the selection but the only reliable way I have found is to get the user to manually enter the text. One of the key attractions we have based the product around is being able to detect these images that are already in place and not have to set up any additional material.
Can anyone suggest a piece of software that would work(as is iOS friendly) or a method of detection that would be effective and interactive/pleasing for the user.
Sample environment:
http://www.orienteeringcoach.com/wp-content/uploads/2012/08/startfinishscp.jpeg
The environment can change substantially, basically anywhere a plaque could be positioned they are; fences, walls, and posts in either wooded or open areas, but overwhelmingly outdoors.
I'm not an iOs programmer, but I will try to answer from an algorithmic point of view. Essentially, you have a detection problem ("Where is the plaque?") and a classification problem ("Which one is it?"). Asking the user to keep the plaque in a pre-defined region is certainly a good idea. This solves the detection problem, which is often harder to solve with limited resources than the classification problem.
For classification, I see two alternatives:
The classic "Computer Vision" route would be feature extraction and classification. Local Binary Patterns and HOG are feature extractors known to be fast enough for mobile (the former more than the latter), and they are not too complicated to implement. Classifiers, however, are non-trivial, and you would probably have to search for an appropriate iOs library.
Alternatively, you could try to binarize the image, i.e. classify pixels as "plate" / white or "text" / black. Then you can use an error-tolerant similarity measure for comparing your binarized image with a binarized reference image of the plaque. The chamfer distance measure is a good candidate. It essentially boils down to comparing the distance transforms of your two binarized images. This is more tolerant to misalignment than comparing binary images directly. The distance transforms of the reference images can be pre-computed and stored on the device.
Personally, I would try the second approach. A (non-mobile) prototype of the second approach is relatively easy to code and evaluate with a good image processing library (OpenCV, Matlab + Image Processing Toolbox, Python, etc).
I managed to find a solution that is working quite well. Im not fully optimized yet but I think its just tweaking filters, as ill explain later on.
Initially I tried to set up opencv but it was very time consuming and a steep learning curve but it did give me an idea. The key to my problem is really detecting the characters within the image and ignoring the background, which was basically just noise. OCR was designed exactly for this purpose.
I found the free library tesseract (https://github.com/ldiqual/tesseract-ios-lib) easy to use and with plenty of customizability. At first the results were very random but applying sharpening and monochromatic filter and a color invert worked well to clean up the text. Next a marked out a target area on the ui and used that to cut out the rectangle of image to process. The speed of processing is slow on large images and this cut it dramatically. The OCR filter allowed me to restrict allowable characters and as the plaques follow a standard configuration this narrowed down the accuracy.
So far its been successful with the grey background plaques but I havent found the correct filter for the red and white editions. My goal will be to add color detection and remove the need to feed in the data type.
I am working on the project and part of it is to recognize objects recorded on camera. So to be more specific:
I am using OpenCV
I have correctly setup camera and am able to retrieve pictures from it
I have compiled and experimented with number of demos from OpenCV
I need a scale- AND rotation- invariant algorithm for detection
Pictures of original objects are ONLY available as edge-images
All feature detection/extraction/matching algorithms I have seen so far are working reasonably well with gray-scale images (like photos), however due to my project specs I need to work with edge images (kinda like output of canny edge detector) which are typically BW and contain only edges found within the image. In this case the performance of algorithms I was trying to use (SURF, SIFT, MSER, etc) decreases dramatically.
So the actual question is: Has anyone come across algorithm that would be specific for matching edge images or is there a certain setup that can improve performance of SIFR/SURF/? in order to work well with that kind of input.
I would appretiate any advice or links to any relevant resources
PS: this is my first question of stackoverflow
Edge images have a problem: The information they contain about the objects of interest is very, very scarce.
So, a general algorithm to classify edge images is probably not to be found. However, if your images are simple, clear and specific, you can employ a number of techniques to classify them. Among them: find contours, and select by shape, area, positioning, tracking.
A good list of shape information (from Matlab help site) includes:
'Area'
'EulerNumber'
'Orientation'
'BoundingBox'
'Extent'
'Perimeter'
'Centroid'
'Extrema'
'PixelIdxList'
'ConvexArea'
'FilledArea'
'PixelList'
'ConvexHull'
'FilledImage'
'Solidity'
'ConvexImage'
'Image'
'SubarrayIdx'
'Eccentricity'
'MajorAxisLength'
'EquivDiameter'
'MinorAxisLength'
An important condition to use shapes in your algorithm is to be able to select them individually. Shape analysis is very sensitive to noise, overlap, etc
Update
I found a paper that may be interesting in this context - it is an object classifier that only uses shape information, and it can be applied on Canny images - it sounds like it's your solution
http://www.vision.ee.ethz.ch/publications/papers/articles/eth_biwi_00664.pdf