Does anyone know of a good C++ or Python package that gets an RGB image an input and can detect the position of a human's hand palm in it? The performance would be based on how cluttered the scene might be or speed as well as robustness to lighting besides ease of use. Even better, but not necessarily, being able to classify hand gesture as well.
Related
I am trying to build a solution where I could differentiate between a 3D textured surface with the height of around 200 micron and a regular text print.
The following image is a textured surface. The black color here is the base surface.
Regular text print will be the 2D print of the same 3D textured surface.
[EDIT]
Initial thought about solving this problem, could look like this:
General idea here would be, images shot at different angles of a 3D object would be less related to each other than the images shot for a 2D object in the similar condition.
One of the possible way to verify could be: 1. Take 2 images, with enough light around (flash of the camera). These images should be shot at as far angle from the object plane as possible. Say, one taken at camera making 45 degree at left side and other with the same angle on the right side.
Extract the ROI, perspective correct them.
Find GLCM of the composite of these 2 images. If the contrast of the GLCM is low, then it would be a 3D image, else a 2D.
Please pardon the language, open for edit suggestion.
General idea here would be, images shot at different angles of a 3D object would be less related to each other than the images shot for a 2D object in the similar condition.
One of the possible way to verify could be:
1. Take 2 images, with enough light around (flash of the camera). These images should be shot at as far angle from the object plane as possible. Say, one taken at camera making 45 degree at left side and other with the same angle on the right side.
Extract the ROI, perspective correct them.
Find GLCM of composite of these 2 images. If contrast of the GLCM is low, then it would be a 3D image, else a 2D.
Please pardon the language, open for edit suggestion.
If you can get another image which
different angle or
sharper angle or
different lighting condition
you may get result. However, using two image with different angle with calibrate camera can get stereo vision image which solve your problem easily.
This is a pretty complex problem and there is no plug-in-and-go solution for this. Using light (structured or laser) or shadow to detect a height of 0.2 mm will almost surely not work with an acceptable degree of confidence, no matter of how much "photos" you take. (This is just my personal intuition, in computer vision we verify if something works by actually testing).
GLCM is a nice feature to describe texture, but it is, as far as I know, used to verify if there is a pattern in the texture, so, I believe it would output a positive value for 2D print text if there is some kind of repeating pattern.
I would let the computer learn what is text, what is texture. Just extract a large amount of 3D and 2D data, and use a machine learning engine to learn which is what. If the feature space is rich enough, it may be able to find a way to differentiate one from another, in a way our human mind wouldn't be able to. The feature space should consist of edge and colour features.
If the system environment is stable and controlled, this approach will work specially well, since the training data will be so similar to the testing data.
For this problem, I'd start by computing colour and edge features (local image pixel sums over different edge and colour channels) and try a boosted classifier. Boosted classifiers aren't the state of the art when it comes to machine learning, but they are good at not overfitting (meaning you can just insert as much data as you want), and will most likely work in a stable environment.
Hope this helps,
Good luck.
I'd like to create a system for use in a factory to measure the size of the objects coming off the assembly line. The objects are slabs of stone, approximately rectangular, and I'd like the width and height. Each stone is photographed in the same position with a flash, so the conditions are pretty controlled. The tricky part is the stones sometimes have patterns on their surface (often marble with ripples and streaks) and they are sometimes almost black, blending in with the shadows.
I tried simply subtracting each image from a reference image of the background, but there are enough small changes in the lighting and the positions of rollers and small bits of machinery that the output is really noisy.
The approach I plan to try next is to use Canny's edge detection algorithm and then use some kind of numerical optimization (Nelder-Mead maybe) to match a 4-sided polygon to the edges. Before I home-brew something, though, is there an existing approach that works well in this kind of situation?
If it helps, it would be possible to 'seed' the algorithm with a patch of the image known to be within the slab (they're always lined up in the corner) to help identify its surface pattern and colors. I could also produce a training set of annotated images if necessary.
Some sample images of the background and some stone slabs:
Have you tried an existing image segmentation algorithm?
I would start with the maxflow algorithm for image segmentation by Vladimir Kolmogorov here: http://pub.ist.ac.at/~vnk/software.html
In the papers they fix areas of an image to belong to a particular segment, which would help for you problem, but it may not be obvious how to do this in the software.
Deep learning algorithms for parsing scenes by Richard Socher might also help: http://www.socher.org/
And Eric Sudderth has at least one interesting method for visual scene understanding here: http://www.cs.brown.edu/~sudderth/software.html
I also haven't actually used any of this software, which is mostly, if not all, for research and not particularly user friendly.
I've built an imaging system with a webcam and feature matching such that as I move the camera around; I can track the camera's motion. I am doing something similar to here, except with the webcam frames as the input.
It works really well for "good" images, but when taking images in really low light lots of noise appears (camera high gain), and that messes with the feature detection and matching. Basically, it doesn't detect any good features, and when it does, it cannot match them correctly between frames.
Does anyone know a good solution for this? What other methods are used for finding and matching features?
Here are two example images with very low features:
I think phase correlation is going to be your best bet here. It is designed to tell you the phase shift (i.e., translation) between two images. It is much more resilient (but not immune) to noise than feature detection because it operates in frequency space; whereas, feature detectors operate spatially. Another benefit is, it is very fast when compared with feature detection methods. I have an implementation available in the OpenCV trunk that is sub-pixel accurate located here.
However, your images are pretty much "featureless" with the exception of the crease in the middle, so even phase correlation may have some trouble with it. Think of it like trying to detect translation in a snow storm. If all you can see is white, you can't tell that you have translated at all, thus the term whiteout. In your case, the algorithm might suffer from "greenout" :)
Can you adjust the camera settings to work better in low-light conditions. Have you fully opened the iris? Can you live with lower framerates? Setting a longer exposure time will allow the camera to gather more light, thus giving you more features at the cost of adding motion blur. Or, if low-light is your default environment you probably want something designed for this like an IR camera, but those can be expensive. Other than that, a big lens and long exposures are your friend :)
Histogram equalization may be of interest in improving the image contrast. But, sometimes it can just enhance the noise. OpenCV has a global histogram equalization function called equalizeHist. For a more localized implementation, you'll want to look at Contrast Limited Adaptive Histogram Equalization or CLAHE for short. Here is a good article on it. This page has some nice examples, and some code.
what approach would you recommend for finding obstacles in a 2D image?
Here are some key points I came up with till now:
I doubt I can use object recognition based on "database of obstacles" search, since I don't know what might the obstruction look like.
I assume color recognition might be problematic if the path does not differ a lot from the object itself.
Possibly, adding one more camera and computing a 3D image (like a Kinect does) would work, but that would not run as smooth as I require.
To illustrate the problem; robot can ride either left or right side of the pavement. In the following picture, left side is the correct choice:
If you know what the path looks like, this is largely a classification problem. Acquire a bunch of images of path at different distances, illumination, etc. and manually label the ground in each image. Use this labeled data to train a classifier that classifies each pixel as either "road" or "not road." Depending upon the texture of the road, this could be as simple as classifying each pixels' RGB (or HSV) values or using OpenCv's built-in histogram back-projection (i.e. cv::CalcBackProjectPatch()).
I suggest beginning with manual thresholds, moving to histogram-based matching, and only using a full-fledged machine learning classifier (such as a Naive Bayes Classifier or a SVM) if the simpler techniques fail. Once the entire image is classified, all pixels that are identified as "not road" are obstacles. By classifying the road instead of the obstacles, we completely avoided building a "database of objects".
Somewhat out of the scope of the question, the easiest solution is to add additional sensors ("throw more hardware at the problem!") and directly measure the three-dimensional position of obstacles. In order of preference:
Microsoft Kinect: Cheap, easy, and effective. Due to ambient IR light, it only works indoors.
Scanning Laser Rangefinder: Extremely accurate, easy to setup, and works outside. Also very expensive (~$1200-10,000 depending upon maximum range and sample rate).
Stereo Camera: Not as good as a Kinect, but it works outside. If you cannot afford a pre-made stereo camera (~$1800), you can make a decent custom stereo camera using USB webcams.
Note that professional stereo vision cameras can be very fast by using custom hardware (Stereo On-Chip, STOC). Software-based stereo is also reasonably fast (10-20 Hz) on a modern computer.
I am currently helping a friend working on a geo-physical project, I'm not by any means a image processing pro, but its fun to play
around with these kinds of problems. =)
The aim is to estimate the height of small rocks sticking out of water, from surface to top.
The experimental equipment will be a ~10MP camera mounted on a distance meter with a built in laser pointer.
The "operator" will point this at a rock, press a trigger which will register a distance along of a photo of the rock, which
will be in the center of the image.
The eqipment can be assumed to always be held at a fixed distance above the water.
As I see it there are a number of problems to overcome:
Lighting conditions
Depending on the time of day etc., the rock might be brighter then the water or opposite.
Sometimes the rock will have a color very close to the water.
The position of the shade will move throughout the day.
Depending on how rough the water is, there might sometimes be a reflection of the rock in the water.
Diversity
The rock is not evenly shaped.
Depending on the rock type, growth of lichen etc., changes the look of the rock.
Fortunateness, there is no shortage of test data. Pictures of rocks in water is easy to come by. Here are some sample images:
I've run a edge detector on the images, and esp. in the fourth picture the poor contrast makes it hard to see the edges:
Any ideas would be greatly appreciated!
I don't think that edge detection is best approach to detect the rocks. Other objects, like the mountains or even the reflections in the water will result in edges.
I suggest that you try a pixel classification approach to segment the rocks from the background of the image:
For each pixel in the image, extract a set of image descriptors from a NxN neighborhood centered at that pixel.
Select a set of images and manually label the pixels as rock or background.
Use the labeled pixels and the respective image descriptors to train a classifier (eg. a Naive Bayes classifier)
Since the rocks tends to have similar texture, I would use texture image descriptors to train the classifier. You could try, for example, to extract a few statistical measures from each color chanel (R,G,B) like the mean and standard deviation of the intensity values.
Pixel classification might work here, but will never yield a 100% accuracy. The variance in the data is really big, rocks have different colours (which are also "corrupted" with lighting) and different texture. So, one must account for global information as well.
The problem you deal with is foreground extraction. There are two approaches I am aware of.
Energy minimization via graph cuts, see e.g. http://en.wikipedia.org/wiki/GrabCut (there are links to the paper and OpenCV implementation). Some initialization ("seeds") should be done (either by a user or by some prior knowledge like the rock is in the center while water is on the periphery). Another variant of input is an approximate bounding rectangle. It is implemented in MS Office 2010 foreground extraction tool.
The energy function of possible foreground/background labellings enforces foreground to be similar to the foreground seeds, and a smooth boundary. So, the minimum of the energy corresponds to the good foreground mask. Note that with pixel classification approach one should pre-label a lot of images to learn from, then segmentation is done automatically, while with this approach one should select seeds on each query image (or they are chosen implicitly).
Active contours a.k.a. snakes also requre some user interaction. They are more like Photoshop Magic Wand tool. They also try to find a smooth boundary, but do not consider the inner area.
Both methods might have problems with the reflections (pixel classification will definitely have). If it is the case, you may try to find an approximate vertical symmetry, and delete the lower part, if any. You can also ask a user to mark the reflaction as a background while collecting stats for graph cuts.
Color segmentation to find the rock, together with edge detection to find the top.
To find the water level I would try and find all the water-rock boundaries, and the horizon (if possible) then fit a plane to the surface of the water.
That way you don't need to worry about reflections of the rock.
Easier if you know the pitch angle between the camera and the water and if the camera is is leveled horizontally (roll).
ps. This is a lot harder than I thought - you don't know the distance to all the rocks so fitting a plane is difficult.
It occurs that the reflection is actually the ideal way of finding the level, look for symetric path edges in the rock edge detection and pick the vertex?