I am building an iOS app that, as a key feature, incorporates image matching. The problem is the images I need to recognize are small orienteering 10x10 plaques with simple large text on them. They can be quite reflective and will be outside(so the light conditions will be variable). Sample image
There will be up to 15 of these types of image in the pool and really all I need to detect is the text, in order to log where the user has been.
The problem I am facing is that with the image matching software I have tried, aurasma and slightly more successfully arlabs, they can't distinguish between them as they are primarily built to work with detailed images.
I need to accurately detect which plaque is being scanned and have considered using gps to refine the selection but the only reliable way I have found is to get the user to manually enter the text. One of the key attractions we have based the product around is being able to detect these images that are already in place and not have to set up any additional material.
Can anyone suggest a piece of software that would work(as is iOS friendly) or a method of detection that would be effective and interactive/pleasing for the user.
Sample environment:
http://www.orienteeringcoach.com/wp-content/uploads/2012/08/startfinishscp.jpeg
The environment can change substantially, basically anywhere a plaque could be positioned they are; fences, walls, and posts in either wooded or open areas, but overwhelmingly outdoors.
I'm not an iOs programmer, but I will try to answer from an algorithmic point of view. Essentially, you have a detection problem ("Where is the plaque?") and a classification problem ("Which one is it?"). Asking the user to keep the plaque in a pre-defined region is certainly a good idea. This solves the detection problem, which is often harder to solve with limited resources than the classification problem.
For classification, I see two alternatives:
The classic "Computer Vision" route would be feature extraction and classification. Local Binary Patterns and HOG are feature extractors known to be fast enough for mobile (the former more than the latter), and they are not too complicated to implement. Classifiers, however, are non-trivial, and you would probably have to search for an appropriate iOs library.
Alternatively, you could try to binarize the image, i.e. classify pixels as "plate" / white or "text" / black. Then you can use an error-tolerant similarity measure for comparing your binarized image with a binarized reference image of the plaque. The chamfer distance measure is a good candidate. It essentially boils down to comparing the distance transforms of your two binarized images. This is more tolerant to misalignment than comparing binary images directly. The distance transforms of the reference images can be pre-computed and stored on the device.
Personally, I would try the second approach. A (non-mobile) prototype of the second approach is relatively easy to code and evaluate with a good image processing library (OpenCV, Matlab + Image Processing Toolbox, Python, etc).
I managed to find a solution that is working quite well. Im not fully optimized yet but I think its just tweaking filters, as ill explain later on.
Initially I tried to set up opencv but it was very time consuming and a steep learning curve but it did give me an idea. The key to my problem is really detecting the characters within the image and ignoring the background, which was basically just noise. OCR was designed exactly for this purpose.
I found the free library tesseract (https://github.com/ldiqual/tesseract-ios-lib) easy to use and with plenty of customizability. At first the results were very random but applying sharpening and monochromatic filter and a color invert worked well to clean up the text. Next a marked out a target area on the ui and used that to cut out the rectangle of image to process. The speed of processing is slow on large images and this cut it dramatically. The OCR filter allowed me to restrict allowable characters and as the plaques follow a standard configuration this narrowed down the accuracy.
So far its been successful with the grey background plaques but I havent found the correct filter for the red and white editions. My goal will be to add color detection and remove the need to feed in the data type.
Related
Most game anti-cheat use heuristic approach such as detecting known binaries signature or preventing third party library injection. But, Valve software use deep learning to combat cheat. Valve feed its AI with view angle, fire rate, etc. And its quite working good.
My question is, how do i make such thing but with images instead of data?
Consider this example
Not - Cheat :
Cheating :
Is it possible to make a model like that?
Well images are also just data.
You can seperate each image and its pixels into f.e. its raw numbers for like rgb.
This way you could model a network based on converted inputs of pixels from your image.
In this example, the pattern would probably just recognize huge spikes of those vibrant colors since those value will differ alot from the usual environment.
If the question aims to archive some kind of "visual cheat detection" and is not all about deep-learning, you could simply check the images pixels manually, if you know the color of your "cheat"-overlay or simply detect differences, and flag them this way.
I'm thinking of starting a project for school where I'll use genetic algorithms to optimize digital sharpening of images. I've been playing around with unsharp masking (USM) techniques in Photoshop. Basically, I want to create a software that optimizes the parameters (i.e. blur radius, types of blur, blending the image) to create the "best-fit" set of filters.
I'm sort of quickly planning this project before starting it, and I can't think of a good fitness function for the 'selection' part. How would I determine the 'quality' of the filter sets, or measure how sharp the image is?
Also, I will be programming using python (with the Python Imaging Library) since it's the only language I'm proficient with. Should I learn a low-level language instead?
Any advice/tips on anything is greatly appreciated. Thanks in advance!
tl;dr How do I measure how 'sharp' an image is?
if its for tuning parameters you could take a known image and apply a known blurring/low pass filter. Then sharpen this with your GA+USM algorithm. Calculate your fitness function making use of the original image, e.g maybe something as simple as the mean absolute error. May need to create different datasets, e.g. landscape images (mostly sharp, in focus with large depth of field), portrait images (could be large areas deliberately out of focus and "soft"), along with low noise and noisy images. Sharpening noisy images is actually quite a challenge.
It would definitely be worth taking a look at Bruce Frasier' work on sharpening techniques for Photoshop etc.
Also it might worth checking out Imatest (www.imatest.com) to see if there is anything regarding sharpness/resolution. And finally you might also consider resolution charts.
And finally I seroiusly doubt one set of ideal parameters exists for USM, the optimum parameters will be image dependant and indeed be a personal perference (thatwhy I suggest starting for a known sharp image and blurring it). Understanding the type of image is probably as important and in itself and very interesting and challenging problem. Although perhaps basic hueristics like image varinance and edge histogram would reveal suitable clues.
Anyway just a thought, hopefully some of the above is useful
Suppose I have an image file/URL, and I want my software to search it within a set of up to 100 images (or at least in that order of magnitude). The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them (the two images may have been cropped differently, or they were compressed differently).
The question is - is this feasible a task, given that I won't have any of the images before the search is taking place (i.e., there won't be any indexing prior to the search.) Is it likely to work in subsecond time (remember that the compare set is quite small). And if feasible, which tools can I use for this task? This could be software components or even an online service (I can live with that for a proof of concept). Can OpenSURF help me here?
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them.
If "slight processing" doesn't involve rotation, but only "cropping", then simple cross-correlation should work, if there could be perspective correction, rotation, lens distortion correction, then things are more complicated.
I think this method is quite forgiving to slight color corrections. Anyway, you can always convert both images to grayscale and compare grayscale versions if you want.
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
You can start from cvMatchTemplate from OpenCV library (the link points to the C version of the API, but it's available also for C++ and Python). Use the cropped image as a template, and look for it in all your images.
If the images you compare have dark features on light backgrounds, you may benefit from using CV_TM_CCOEFF or CV_TM_CCOEFF_NORMED methods. They both subtract the average over the template area from both images. Normalized methods (CV_TM_*_NORMED) generally work better but are slower than their non-normalized counterparts.
You may consider to do some preprocessing with the images before the cross-correlation. If you normalize them first, the cross-correlation will be less sensitive to slight brightness/contrast modification. If you detect edges first, as suggested by #misha, you'll lose color/lightness information, but the results for contour overlapping will be much better.
jetxee set you off on the right track. However, if you simply use template matching, you can run into problems where the background interferes with your template matching result. For example, if your template is a building and your background is primarily light (e.g. desert sand), then the template matching will fail because the lighter background will always return a higher cross-correlation than the darker template. Here is an example of this problem.
The way you solve it is the same as what is in the link:
Perform edge-detection on both your template and the target image.
Throw original template and image away
Perform template detection using the edge-detected template and edge-detected target image
As far as forgiving slight processing, the edge detection step will take care of that. As long as the edges in the two images are not modified significantly (blurred, optically distorted), the approach will work.
I know you are not looking specifically for algorithms, but nonetheless, let me suggest the following which can accomplish exactly what you are trying to do, very efficiently...
For cropped versions of the same image, including rotation, the Fourier-Mellin transform or a log-polar transform (watch out for the artsy semi-nude drawing - good source however) will give you the translation, rotation and scale coefficients between the two images, allowing to to determine what operations were needed to go from one to the other.
I have a simple photograph that may or may not include a logo image. I'm trying to identify whether a picture includes the logo shape or not. The logo (rectangular shape with a few extra features) could be of various sizes and could have multiple occurrences. I'd like to use Computer Vision techniques to identify the location of these logo occurrences. Can someone point me in the right direction (algorithm, technique?) that can be used to achieve this goal?
I'm quite a novice to Computer Vision so any direction would be very appreciative.
Thanks!
Practical issues
Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in Logo recognition in images, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find here some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.
General description and lingo
This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".
So first you go and find features in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of invariances: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.
Then you do some clean-up, like remove features that are not strong enough.
Then you go to your database. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —
Look-up. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like Hough transform and Geometric hashing.
Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a convolution) to check if it's really there.
You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some outlier detection. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.
That's it, you are done!
Algorithm for a drawing and painting robot -
Hello
I want to write a piece of software which analyses an image, and then produces an image which captures what a human eye perceives in the original image, using a minimum of bezier path objects of varying of colour and opacity.
Unlike the recent twitter super compression contest (see: stackoverflow.com/questions/891643/twitter-image-encoding-challenge), my goal is not to create a replica which is faithful to the image, but instead to replicate the human experience of looking at the image.
As an example, if the original image shows a red balloon in the top left corner, and the reproduction has something that looks like a red balloon in the top left corner then I will have achieved my goal, even if the balloon in the reproduction is not quite in the same position and not quite the same size or colour.
When I say "as perceived by a human", I mean this in a very limited sense. i am not attempting to analyse the meaning of an image, I don't need to know what an image is of, i am only interested in the key visual features a human eye would notice, to the extent that this can be automated by an algorithm which has no capacity to conceptualise what it is actually observing.
Why this unusual criteria of human perception over photographic accuracy?
This software would be used to drive a drawing and painting robot, which will be collaborating with a human artist (see: video.google.com/videosearch?q=mr%20squiggle).
Rather than treating marks made by the human which are not photographically perfect as necessarily being mistakes, The algorithm should seek to incorporate what is already on the canvas into the final image.
So relative brightness, hue, saturation, size and position are much more important than being photographically identical to the original. The maintaining the topology of the features, block of colour, gradients, convex and concave curve will be more important the exact size shape and colour of those features
Still with me?
My problem is that I suffering a little from the "when you have a hammer everything looks like a nail" syndrome. To me it seems the way to do this is using a genetic algorithm with something like the comparison of wavelet transforms (see: grail.cs.washington.edu/projects/query/) used by retrievr (see: labs.systemone.at/retrievr/) to select fit solutions.
But the main reason I see this as the answer, is that these are these are the techniques I know, there are probably much more elegant solutions using techniques I don't now anything about.
It would be especially interesting to take into account the ways the human vision system analyses an image, so perhaps special attention needs to be paid to straight lines, and angles, high contrast borders and large blocks of similar colours.
Do you have any suggestions for things I should read on vision, image algorithms, genetic algorithms or similar projects?
Thank you
Mat
PS. Some of the spelling above may appear wrong to you and your spellcheck. It's just international spelling variations which may differ from the standard in your country: e.g. Australian standard: colour vs American standard: color
There is an model that can implemented as an algorithm to calculate a saliency map for an image, determining which parts of the image would get the most attention from a human.
The model is called itti koch model
You can find a startin paper here
And more resources and c++ sourcecode here
I cannot answer your question directly, but you should really take a look at artist/programmer (Lisp) Harold Cohen's painting machine Aaron.
That's quite a big task. You might be interested in image vectorizing (don't know what it's called officially), which is used to take in rasterized images (such as pictures you take with a camera) and outputs a set of bezier lines (i think) that approximate the image you put in. Since good algorithms often output very high quality (read: complex) line sets you'd also be interested in simplification algorithms which can help enormously.
Unfortunately I am not next to my library, or I could reccomend a number of books on perceptual psychology.
The first thing you must consider is the physiology of the human eye is such that when we examine an image or scene, we are only capturing very small bits at a time, as our eyes dart around rapidly. Our mind peices the different parts together to try and form a whole.
You might start by finding an algorithm for the path of an eyeball as it darts around. Perhaps it is attracted to contrast?
Next is that our eyes adjust the "exposure" depending on the context. It's like those high dynamic range images, if they were peiced together not by multiple exposures of a whole scene, but by many small images, each balanced on its own, but blended into its surroundings to form a high dynamic range.
Now there was a finding in a monkey brain that there is a single neuron that lights up if there's a diagonal line in the upper left of its field of vision. Similar neurons can be found for vertical lines, and horizontal lines in various areas of that monkey's field of vision. The "diagonalness" determines the frequency with which that neuron fires.
one might speculated that other neurons might be found and mapped to other qualities such as redness, or texturedness, and other things.
There's something humans can do that I've not seen a computer program ever able to do. it's something called "closure", where a human is able to fill in information about something that they are seeing, that doesn't actually exist in the image. an example:
*
* *
is that a triangle? If you knew that it was in advance, then you could probably make a program to connect the dots. But what if it's just dots? How can you know? I wouldn't attempt this one unless I had some really clever way of dealing with that one.
There are many other facts about human perception you might be able to use. Good luck, you've not picked a straightforward task.
i think a thing that could help you in this enormous task is human involvement. i mean data. like you could have many people sitting staring at random dots (like from the previous post) and connect them as they see right. you could harness that data.