I'm new to the field of computer vision and I want to solve the following task (preferrably with OpenCV and C#, but other solutions like with Scilab? are also gratefully welcome):
There is some reference object like a hand (more or less static) in the scene - the camera is looking down on the object. Now I want to recognize if there is something on my hand (whether it changes the overall shape of my hand or if its just as small as sitting in my palm).
This task is for demonstration purposes only hence I want to use as less effort as possible. I'd like to train it with a static picture and use it in a real environment.
Any help, hint or steps how to tackle this problem are deeply appreciated.
Thank you in advance!
If it's largely static, then I'd recommend background subtraction. It will be very robust and blazingly fast.
You can run a Gaussian filter + thresholding (fixed threshold/Otsu-type adjusting global thresholds/adaptive local threshold) to grab blobs in the difference image. The blobs will denote change and likely something new.
You can then intersect that with the original detected palm region to figure out if there's an intersection.
Related
I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.
Is it possible to compare two intensity histograms (derived from gray-scale images) and obtain a likeness factor? In other words, I'm trying to detect the presence or absence of an soccer ball in an image. I've tried feature detection algorithms (such as SIFT/SURF) but they are not reliable enough for my application. I need something very reliable and robust.
Many thanks for your thoughts everyone.
This answer (Comparing two histograms) might help you. Generally, intensity comparisons are quite sensitive as e.g. white during day time is different from white during night time.
I think you should be able to derive something from compareHist() in openCV (http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html) to suit your needs if compareHist() does fit your purpose.
If not, this paper http://www.researchgate.net/publication/222417618_Tracking_the_soccer_ball_using_multiple_fixed_cameras/file/32bfe512f7e5c13133.pdf
tracks the ball from multiple cameras and you might get some more ideas from that even though you might not be using multiple cameras.
As kkuilla have mentioned, there is an available method to compare histogram, such as compareHist() in opencv
But I am not certain if it's really applicable for your program. I think you will like to use HoughTransfrom to detect circles.
More details can be seen in this paper:
https://files.nyu.edu/jb4457/public/files/research/bristol/hough-report.pdf
Look for the part with coins for the circle detection in the paper. I did recall reading up somewhere before of how to do ball detection using Hough Transform too. Can't find it now. But it should be similar to your soccer ball.
This method should work. Hope this helps. Good luck(:
I'd like to create a system for use in a factory to measure the size of the objects coming off the assembly line. The objects are slabs of stone, approximately rectangular, and I'd like the width and height. Each stone is photographed in the same position with a flash, so the conditions are pretty controlled. The tricky part is the stones sometimes have patterns on their surface (often marble with ripples and streaks) and they are sometimes almost black, blending in with the shadows.
I tried simply subtracting each image from a reference image of the background, but there are enough small changes in the lighting and the positions of rollers and small bits of machinery that the output is really noisy.
The approach I plan to try next is to use Canny's edge detection algorithm and then use some kind of numerical optimization (Nelder-Mead maybe) to match a 4-sided polygon to the edges. Before I home-brew something, though, is there an existing approach that works well in this kind of situation?
If it helps, it would be possible to 'seed' the algorithm with a patch of the image known to be within the slab (they're always lined up in the corner) to help identify its surface pattern and colors. I could also produce a training set of annotated images if necessary.
Some sample images of the background and some stone slabs:
Have you tried an existing image segmentation algorithm?
I would start with the maxflow algorithm for image segmentation by Vladimir Kolmogorov here: http://pub.ist.ac.at/~vnk/software.html
In the papers they fix areas of an image to belong to a particular segment, which would help for you problem, but it may not be obvious how to do this in the software.
Deep learning algorithms for parsing scenes by Richard Socher might also help: http://www.socher.org/
And Eric Sudderth has at least one interesting method for visual scene understanding here: http://www.cs.brown.edu/~sudderth/software.html
I also haven't actually used any of this software, which is mostly, if not all, for research and not particularly user friendly.
I have a simple photograph that may or may not include a logo image. I'm trying to identify whether a picture includes the logo shape or not. The logo (rectangular shape with a few extra features) could be of various sizes and could have multiple occurrences. I'd like to use Computer Vision techniques to identify the location of these logo occurrences. Can someone point me in the right direction (algorithm, technique?) that can be used to achieve this goal?
I'm quite a novice to Computer Vision so any direction would be very appreciative.
Thanks!
Practical issues
Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in Logo recognition in images, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find here some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.
General description and lingo
This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".
So first you go and find features in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of invariances: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.
Then you do some clean-up, like remove features that are not strong enough.
Then you go to your database. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —
Look-up. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like Hough transform and Geometric hashing.
Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a convolution) to check if it's really there.
You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some outlier detection. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.
That's it, you are done!
Algorithm for a drawing and painting robot -
Hello
I want to write a piece of software which analyses an image, and then produces an image which captures what a human eye perceives in the original image, using a minimum of bezier path objects of varying of colour and opacity.
Unlike the recent twitter super compression contest (see: stackoverflow.com/questions/891643/twitter-image-encoding-challenge), my goal is not to create a replica which is faithful to the image, but instead to replicate the human experience of looking at the image.
As an example, if the original image shows a red balloon in the top left corner, and the reproduction has something that looks like a red balloon in the top left corner then I will have achieved my goal, even if the balloon in the reproduction is not quite in the same position and not quite the same size or colour.
When I say "as perceived by a human", I mean this in a very limited sense. i am not attempting to analyse the meaning of an image, I don't need to know what an image is of, i am only interested in the key visual features a human eye would notice, to the extent that this can be automated by an algorithm which has no capacity to conceptualise what it is actually observing.
Why this unusual criteria of human perception over photographic accuracy?
This software would be used to drive a drawing and painting robot, which will be collaborating with a human artist (see: video.google.com/videosearch?q=mr%20squiggle).
Rather than treating marks made by the human which are not photographically perfect as necessarily being mistakes, The algorithm should seek to incorporate what is already on the canvas into the final image.
So relative brightness, hue, saturation, size and position are much more important than being photographically identical to the original. The maintaining the topology of the features, block of colour, gradients, convex and concave curve will be more important the exact size shape and colour of those features
Still with me?
My problem is that I suffering a little from the "when you have a hammer everything looks like a nail" syndrome. To me it seems the way to do this is using a genetic algorithm with something like the comparison of wavelet transforms (see: grail.cs.washington.edu/projects/query/) used by retrievr (see: labs.systemone.at/retrievr/) to select fit solutions.
But the main reason I see this as the answer, is that these are these are the techniques I know, there are probably much more elegant solutions using techniques I don't now anything about.
It would be especially interesting to take into account the ways the human vision system analyses an image, so perhaps special attention needs to be paid to straight lines, and angles, high contrast borders and large blocks of similar colours.
Do you have any suggestions for things I should read on vision, image algorithms, genetic algorithms or similar projects?
Thank you
Mat
PS. Some of the spelling above may appear wrong to you and your spellcheck. It's just international spelling variations which may differ from the standard in your country: e.g. Australian standard: colour vs American standard: color
There is an model that can implemented as an algorithm to calculate a saliency map for an image, determining which parts of the image would get the most attention from a human.
The model is called itti koch model
You can find a startin paper here
And more resources and c++ sourcecode here
I cannot answer your question directly, but you should really take a look at artist/programmer (Lisp) Harold Cohen's painting machine Aaron.
That's quite a big task. You might be interested in image vectorizing (don't know what it's called officially), which is used to take in rasterized images (such as pictures you take with a camera) and outputs a set of bezier lines (i think) that approximate the image you put in. Since good algorithms often output very high quality (read: complex) line sets you'd also be interested in simplification algorithms which can help enormously.
Unfortunately I am not next to my library, or I could reccomend a number of books on perceptual psychology.
The first thing you must consider is the physiology of the human eye is such that when we examine an image or scene, we are only capturing very small bits at a time, as our eyes dart around rapidly. Our mind peices the different parts together to try and form a whole.
You might start by finding an algorithm for the path of an eyeball as it darts around. Perhaps it is attracted to contrast?
Next is that our eyes adjust the "exposure" depending on the context. It's like those high dynamic range images, if they were peiced together not by multiple exposures of a whole scene, but by many small images, each balanced on its own, but blended into its surroundings to form a high dynamic range.
Now there was a finding in a monkey brain that there is a single neuron that lights up if there's a diagonal line in the upper left of its field of vision. Similar neurons can be found for vertical lines, and horizontal lines in various areas of that monkey's field of vision. The "diagonalness" determines the frequency with which that neuron fires.
one might speculated that other neurons might be found and mapped to other qualities such as redness, or texturedness, and other things.
There's something humans can do that I've not seen a computer program ever able to do. it's something called "closure", where a human is able to fill in information about something that they are seeing, that doesn't actually exist in the image. an example:
*
* *
is that a triangle? If you knew that it was in advance, then you could probably make a program to connect the dots. But what if it's just dots? How can you know? I wouldn't attempt this one unless I had some really clever way of dealing with that one.
There are many other facts about human perception you might be able to use. Good luck, you've not picked a straightforward task.
i think a thing that could help you in this enormous task is human involvement. i mean data. like you could have many people sitting staring at random dots (like from the previous post) and connect them as they see right. you could harness that data.