Detect and dtermine the width of a human hand in opencv - opencv

I want to detect a human hand and determin its width. Is there a way to it in openCV, or any technique to do that.
I've tried searching google but couldn't find any solution.
My segmentation result:

As your question is too broad to be answered without writing a 10 page essay I'll give you a few ideas at least.
Option 1:
Detect the finger tips and fit a hand model (there should be plenty of papers, open source code and other resources available online that do hand and gesture detection). you can then position your line using that hand model.
Option 2:
The dimension you are looking for should be the shortest cross section of any hand. Place a scan line over your hand, rotate it around it's centroid and measure the distance between the transition hand - background on both ends. Then pick the minimum. Make sure you have the right rotation center. Maybe cut the fingers of using morphological operations to move the centroid a bit further down so you don't get a wrong measurement.
Option 3: Estimate the width of the hand by its total size. Human proportions usually follow some rules. See if you can find some correlation between that measure and other hand features. If you don't need too exact measures (your image resoltion suggests this) this should be the quickest and easiest solution.
There are many other options. Take a ruler and your hand and start thinking. Or do more research on gesture recognition in general. I'm sure you can apply may things they use to get your width.

Related

Setting up a scene for image capturing and image processing

I'm building a box where the user will put their foot and then have measurements of their feet taken.
My 1st tier goal is to take basic measurements and my reach goal is to build a 3d model of the person's foot.
Here are what some images from my first attempts and prototyping.
back of the foot | inside of the foot | outside of the foot | top of the foot
So, my big advantage is that I have a lot of control over the scene.
I want to use this fact to set things up so I can get reliable measurements using pictures.
So my questions are as follows:
1) What is the best way to set the scene up? Right now I'm going to have a blue background, lights, and a contrasting sock to create a consistent internal image. Is there a more 'optimal' contrast to use? As you can see below, it's working decently.
2) What's an easy way for me to get reliable pixel to mm measurements? I can use a patterned sock (to increase feature density) and then two cameras from each viewpoint, but it would be great to minimize the number of cameras I need.
I'm going to leave the questions there as not to overload this post - but if people have any other tips it would be very helpful. Thank you!
My approach to 1) would essentially be a "green screen" or "blue screen".
The idea is to carefully illuminate the background so that there are no shadows. Then, you can apply a color threshold, and everything that is not that specific color is the foreground. So far in your images, there is a quite a bit of shadow, which may be able to be eliminated by careful lighting. You'll have to experiment with how much of that is an issue for you.
2) This a little tougher, but possible. You will need to know the position and direction of your cameras, the lens parameters (such as F/#), the sensor parameters (pixel pitch/spacing). With this information, you should be able to locate the extrema of the foot and get some measurements. Here's a general diagram of how this might work. You could use the top view to locate the mid-line of the foot so you know how far it is from the side cameras. Then, you have all the information you need to solve of pixel to real-space measurements. The top camera is easy; since everything is in a plane (assuming the camera is properly aligned and rectified) all you have to do is put a ruler on the floor and take some pictures of it. Then, you can measure the pixels to real-space conversion directly from the image.
For your 3-d modeling issue, I'd like to point out that you don't actually have to get a full point cloud. You could just get a model of a foot and scale it for display based on the measurements you make. In any case, good luck on your project!

Arrow recognition in video

I would like to create a program that can identify arrows in a video feed and determine the direction they are pointing at (left or right). My aim is to use this program with an arduino robot in order to determine the direction in which the bot should move.
my problem is which method to use. I ve narrowed my options down to template matching or SURF. template matching is good because it is rotation independent, therefore it can determine between left and right arrows. However since the bot will be moving, the size of the template arrow might not be equal to that of the video feed, resulting in no matches.
SURF solves this problem however it is rotation invariant. This means that Left arrows and right arrows will be considered as the same thing.
Can anyone please suggest an approach I can use for this program.
Thanks in advance for any help
P.S I will be using OpenCV for implementation.
I managed to solve the problem by using canny edge detection and HoughLinesP. The system works pretty well but has a limited rotation range at which it will detect the direction correctly (approx 15 degrees).
basically I first performed colour detection to detect the arrow, then used houghlinesp to find its outline. Out of these lines, I eliminated all those which are horizontal or vertical, leaving just the ones at the tip as shown in red. I then used the end points of each line to determine the direction.

What's a simple and efficient method for extracting line segments from a simple 2D image?

Specifically, I'm trying to extract all of the relevant line segments from screenshots of the game 'asteroids'. I've looked through the various methods for edge detection, but none seem to fit my problem for two reasons:
They detect smooth contours, whereas I just need the detection of straight line segments, and only those within a certain range of length. Now, these constraints should make my task considerably easier than the general case, but I don't want to just use a full blown edge detector and then clear the result of curved lines, as that would be prohibitively costly. Speed is of the utmost importance for my purposes.
They output a modified image where the edges are highlights, whereas I want a set of pixel coordinates depicting the endpoints of the detected line segments. Alternatively, a list of all of the pixels included in each segment would work as well.
I have an inkling that one possible solution would involve a hough transform, but I don't know how to use this to get the actual locations of the line segments (i.e. endpoints in pixel space). Though even if I did, I have no idea if that would be the simplest or most efficient way of doing things, hence the general wording of the question title.
Lastly, here's a sample image:
Notice that all of the major lines are similar in length and density, and that the overall image contrast is very high. I'm hoping the solution to my problem will exploit these features, because again, efficiency is paramount.
One caveat: while most of the line segments in this context are part of a polygon, I don't want a solution that relies on this fact.
Have a look at the Line Segment Detector algorithm.
Here's what they do :
You can find an impressive video at the bottom of the page.
There's a C implementation (that works with C++ compilers) that works out of the box. There are just one or two files, and no additional dependencies
But, be warned, the algorithm is under the GNU Allegro GPL license.
Also check out EDlines http://ceng.anadolu.edu.tr/cv/EDLines/
Very fast and provides a very useful output

How to align two different pictures in such a way, that they match as close as possible?

I need to automatically align an image B on top of another image A in such a way, that the contents of the image match as good as possible.
The images can be shifted in x/y directions and rotated up to 5 degrees on z, but they won't be distorted (i.e. scaled or keystoned).
Maybe someone can recommend some good links or books on this topic, or share some thoughts how such an alignment of images could be done.
If there wasn't the rotation problem, then I could simply try to compare rows of pixels with a brute-force method until I find a match, and then I know the offset and can align the image.
Do I need AI for this?
I'm having a hard time finding resources on image processing which go into detail how these alignment-algorithms work.
So what people often do in this case is first find points in the images that match then compute the best transformation matrix with least squares. The point matching is not particularly simple and often times you just use human input for this task, you have to do it all the time for calibrating cameras. Anyway, if you want to fully automate this process you can use feature extraction techniques to find matching points, there are volumes of research papers written on this topic and any standard computer vision text will have a chapter on this. Once you have N matching points, solving for the least squares transformation matrix is pretty straightforward and, again, can be found in any computer vision text, so I'll assume you got that covered.
If you don't want to find point correspondences you could directly optimize the rotation and translation using steepest descent, trouble is this is non-convex so there are no guarantees you will find the correct transformation. You could do random restarts or simulated annealing or any other global optimization tricks on top of this, that would most likely work. I can't find any references to this problem, but it's basically a digital image stabilization algorithm I had to implement it when I took computer vision but that was many years ago, here are the relevant slides though, look at "stabilization revisited". Yes, I know those slides are terrible, I didn't make them :) However, the method for determining the gradient is quite an elegant one, since finite difference is clearly intractable.
Edit: I finally found the paper that went over how to do this here, it's a really great paper and it explains the Lucas-Kanade algorithm very nicely. Also, this site has a whole lot of material and source code on image alignment that will probably be useful.
for aligning the 2 images together you have to carry out image registration technique.
In matlab, write functions for image registration and select your desirable features for reference called 'feature points' using 'control point selection tool' to register images.
Read more about image registration in the matlab help window to understand properly.

An algorithm for a drawing and painting robot - any tips?

Algorithm for a drawing and painting robot -
Hello
I want to write a piece of software which analyses an image, and then produces an image which captures what a human eye perceives in the original image, using a minimum of bezier path objects of varying of colour and opacity.
Unlike the recent twitter super compression contest (see: stackoverflow.com/questions/891643/twitter-image-encoding-challenge), my goal is not to create a replica which is faithful to the image, but instead to replicate the human experience of looking at the image.
As an example, if the original image shows a red balloon in the top left corner, and the reproduction has something that looks like a red balloon in the top left corner then I will have achieved my goal, even if the balloon in the reproduction is not quite in the same position and not quite the same size or colour.
When I say "as perceived by a human", I mean this in a very limited sense. i am not attempting to analyse the meaning of an image, I don't need to know what an image is of, i am only interested in the key visual features a human eye would notice, to the extent that this can be automated by an algorithm which has no capacity to conceptualise what it is actually observing.
Why this unusual criteria of human perception over photographic accuracy?
This software would be used to drive a drawing and painting robot, which will be collaborating with a human artist (see: video.google.com/videosearch?q=mr%20squiggle).
Rather than treating marks made by the human which are not photographically perfect as necessarily being mistakes, The algorithm should seek to incorporate what is already on the canvas into the final image.
So relative brightness, hue, saturation, size and position are much more important than being photographically identical to the original. The maintaining the topology of the features, block of colour, gradients, convex and concave curve will be more important the exact size shape and colour of those features
Still with me?
My problem is that I suffering a little from the "when you have a hammer everything looks like a nail" syndrome. To me it seems the way to do this is using a genetic algorithm with something like the comparison of wavelet transforms (see: grail.cs.washington.edu/projects/query/) used by retrievr (see: labs.systemone.at/retrievr/) to select fit solutions.
But the main reason I see this as the answer, is that these are these are the techniques I know, there are probably much more elegant solutions using techniques I don't now anything about.
It would be especially interesting to take into account the ways the human vision system analyses an image, so perhaps special attention needs to be paid to straight lines, and angles, high contrast borders and large blocks of similar colours.
Do you have any suggestions for things I should read on vision, image algorithms, genetic algorithms or similar projects?
Thank you
Mat
PS. Some of the spelling above may appear wrong to you and your spellcheck. It's just international spelling variations which may differ from the standard in your country: e.g. Australian standard: colour vs American standard: color
There is an model that can implemented as an algorithm to calculate a saliency map for an image, determining which parts of the image would get the most attention from a human.
The model is called itti koch model
You can find a startin paper here
And more resources and c++ sourcecode here
I cannot answer your question directly, but you should really take a look at artist/programmer (Lisp) Harold Cohen's painting machine Aaron.
That's quite a big task. You might be interested in image vectorizing (don't know what it's called officially), which is used to take in rasterized images (such as pictures you take with a camera) and outputs a set of bezier lines (i think) that approximate the image you put in. Since good algorithms often output very high quality (read: complex) line sets you'd also be interested in simplification algorithms which can help enormously.
Unfortunately I am not next to my library, or I could reccomend a number of books on perceptual psychology.
The first thing you must consider is the physiology of the human eye is such that when we examine an image or scene, we are only capturing very small bits at a time, as our eyes dart around rapidly. Our mind peices the different parts together to try and form a whole.
You might start by finding an algorithm for the path of an eyeball as it darts around. Perhaps it is attracted to contrast?
Next is that our eyes adjust the "exposure" depending on the context. It's like those high dynamic range images, if they were peiced together not by multiple exposures of a whole scene, but by many small images, each balanced on its own, but blended into its surroundings to form a high dynamic range.
Now there was a finding in a monkey brain that there is a single neuron that lights up if there's a diagonal line in the upper left of its field of vision. Similar neurons can be found for vertical lines, and horizontal lines in various areas of that monkey's field of vision. The "diagonalness" determines the frequency with which that neuron fires.
one might speculated that other neurons might be found and mapped to other qualities such as redness, or texturedness, and other things.
There's something humans can do that I've not seen a computer program ever able to do. it's something called "closure", where a human is able to fill in information about something that they are seeing, that doesn't actually exist in the image. an example:
*
* *
is that a triangle? If you knew that it was in advance, then you could probably make a program to connect the dots. But what if it's just dots? How can you know? I wouldn't attempt this one unless I had some really clever way of dealing with that one.
There are many other facts about human perception you might be able to use. Good luck, you've not picked a straightforward task.
i think a thing that could help you in this enormous task is human involvement. i mean data. like you could have many people sitting staring at random dots (like from the previous post) and connect them as they see right. you could harness that data.

Resources