I am looking for a method to determine the distance between the eyes and top of the shoulder of a person in an image. At first I tried to use haar cascade in opencv to detect the eye position, which worked fine. However, I couldn't find any way to detect the shoulders. I think that it will be a lot of work to come up with my own training sets for my own shoulder detection model, so I am wondering if there is any easier way to do this.
If you can afford enough computing power, OpenPose is a good solution. A simpler approach is described here, I have tried it and it works ok.
You can get a rough estimate of the shoulder width given the eye location using the following rules (used by artists):
Width of the head is twice the distance between the eyes.
Eyes are located halfway between the top of the head and the tip of the chin
This diagram can help you to estimate the distance between the chin tip and the shoulders.
Related
The problem
I don't have access to the camera that was taking pictures below. You can find the source video here https://youtu.be/C7hS3enWh94?t=343
I would like to perform coarse camera calibration using only the information I have in the video frames (which is a road line that supposed to be straight but looks rounded in the images and luckily covers most of the sensor area over time).
What I need
I'm looking for a quick and dirty way to find coarse camera distortion parameters because I think there is no way to accurately estimate camera calibration parameters using only available information.
I'm out of ideas on how to progress with this problem. My idea is too complicated and would take much effort to implement with low guarantee that it would actually work. So the question I have is actually more of a brainstorm on hypothetical approaches to solving this problem.
P.S.
I thought it should be possible to implement Hough transform to look for circle curvature (circle radius that would accommodate most pixels) but the curvature we're looking at would definitely result with very large radius. In turn it might not form perfect circle and rather be an ellipse because of imperfect 90 degrees angle at which camera looks down to road. This complicates Hough Transform implementation significantly.
Another way I was thinking was to use random search algorithms such as genetic algorithm to fiddle distortion + scale + rotation + translation parameters that would result with one image being fit on top of another image perfectly. But again it would take me much time to complete anything like this.
Any better ideas from OpenCV gurus out there?
We have this camera array arranged in an arc around a person (red dot). Think The Matrix - each camera fires at the same time and then we create an animated gif from the output. The problem is that it is near impossible to align the cameras exactly and so I am looking for a way in OpenCV to align the images better and make it smoother.
Looking for general steps. I'm unsure of the order I would do it. If I start with image 1 and match 2 to it, then 2 is further from three than it was at the start. And so matching 3 to 2 would be more change... and the error would propagate. I have seen similar alignments done though. Any help much appreciated.
Here's a thought. How about performing a quick and very simple "calibration" of the imaging system by using a single reference point?
The best thing about this is you can try it out pretty quickly and even if results are too bad for you, they can give you some more insight into the problem. But the bad thing is it may just not be good enough because it's hard to think of anything "less advanced" than this. Here's the description:
Remove the object from the scene
Place a small object (let's call it a "dot") to position that rougly corresponds to center of mass of object you are about to record (the center of area denoted by red circle).
Record a single image with each camera
Use some simple algorithm to find the position of the dot on every image
Compute distances from dot positions to image centers on every image
Shift images by (-x, -y), where (x, y) is the above mentioned distance; after that, the dot should be located in the center of every image.
When recording an actual object, use these precomputed distances to shift all images. After you translate the images, they will be roughly aligned. But since you are shooting an object that is three-dimensional and has considerable size, I am not sure whether the alignment will be very convincing ... I wonder what results you'd get, actually.
If I understand the application correctly, you should be able to obtain the relative pose of each camera in your array using homographies:
https://docs.opencv.org/3.4.0/d9/dab/tutorial_homography.html
From here, the next step would be to correct for alignment issues by estimating the transform between each camera's actual position and their 'ideal' position in the array. These ideal positions could be computed relative to a single camera, or relative to the focus point of the array (which may help simplify calculation). For each image, applying this corrective transform will result in an image that 'looks like' it was taken from the 'ideal' position.
Note that you may need to estimate relative camera pose in 3-4 array 'sections', as it looks like you have a full 180deg array (e.g. estimate homographies for 4-5 cameras at a time). As long as you have some overlap between sections it should work out.
Most of my experience with this sort of thing comes from using MATLAB's stereo camera calibrator app and related functions. Their help page gives a good overview of how to get started estimating camera pose. OpenCV has similar functionality.
https://www.mathworks.com/help/vision/ug/stereo-camera-calibrator-app.html
The cited paper by Zhang gives a great description of the mathematics of pose estimation from correspondence, if you're interested.
I want to detect a human hand and determin its width. Is there a way to it in openCV, or any technique to do that.
I've tried searching google but couldn't find any solution.
My segmentation result:
As your question is too broad to be answered without writing a 10 page essay I'll give you a few ideas at least.
Option 1:
Detect the finger tips and fit a hand model (there should be plenty of papers, open source code and other resources available online that do hand and gesture detection). you can then position your line using that hand model.
Option 2:
The dimension you are looking for should be the shortest cross section of any hand. Place a scan line over your hand, rotate it around it's centroid and measure the distance between the transition hand - background on both ends. Then pick the minimum. Make sure you have the right rotation center. Maybe cut the fingers of using morphological operations to move the centroid a bit further down so you don't get a wrong measurement.
Option 3: Estimate the width of the hand by its total size. Human proportions usually follow some rules. See if you can find some correlation between that measure and other hand features. If you don't need too exact measures (your image resoltion suggests this) this should be the quickest and easiest solution.
There are many other options. Take a ruler and your hand and start thinking. Or do more research on gesture recognition in general. I'm sure you can apply may things they use to get your width.
Disclaimer: I am not looking for a robust method that works under all conditions and/or requiring complex analysis of the image (e.g., http://vis-www.cs.umass.edu/lfw/part_labels/). Therefore please do not link me to one of the many papers on hair segmentation that exist. I am looking for something fast and simple (not perfectly robust).
That being said, my goal is to extract the area containing the hair in an image of a human face. If I am able to get at least some tiny set of pixels that I am sure are hair pixels, then I can use one of several algorithms to find the rest (e.g., a "photoshop magic wand" type algorithm).
An example (left side is the original face, right side is the gradient magnitude):
http://imgur.com/YX85MKB
Here's the information I have access to regarding any image of a human face: all locations of important features (e.g., nose, eyes, mouth and chin). One dumb/simple way of finding hair pixels could be to perform edge detection, and work up from the nose until I find two "horizontal edges" which we assume are the lower and upper boundaries of the hair, then take a sample from in-between.
Any ideas on other simple methods to try?
Instead of using image processing techniques (edge detection) you could use simple maths. You say you know where the nose, eyes, mouth and chin are. From the distance between these body parts you can certainly determine the distance to look up from the eyes to find hair. I'm not sure which distance ratio you can use, but the hair are certainly not 10x farther than the distance between eyes and chin.
Obviously this technique is not bald-proof.
How can I implement the A* algorithm on a gridless 2D plane with no nodes or cells? I need the object to maneuver around a relatively high number of static and moving obstacles in the way of the goal.
My current implementation is to create eight points around the object and treat them as the centers of imaginary adjacent squares that might be a potential position for the object. Then I calculate the heuristic function for each and select the best. The distances between the starting point and the movement point, and between the movement point and the goal I calculate the normal way with the Pythagorean theorem. The problem is that this way the object often ignores all obstacle and even more often gets stuck moving back and forth between two positions.
I realize how silly mu question might seem, but any help is appreciated.
Create an imaginary grid at whatever resolution is suitable for your problem: As coarse grained as possible for good performance but fine-grained enough to find (desirable) gaps between obstacles. Your grid might relate to a quadtree with your obstacle objects as well.
Execute A* over the grid. The grid may even be pre-populated with useful information like proximity to static obstacles. Once you have a path along the grid squares, post-process that path into a sequence of waypoints wherever there's an inflection in the path. Then travel along the lines between the waypoints.
By the way, you do not need the actual distance (c.f. your mention of Pythagorean theorem): A* works fine with an estimate of the distance. Manhattan distance is a popular choice: |dx| + |dy|. If your grid game allows diagonal movement (or the grid is "fake"), simply max(|dx|, |dy|) is probably sufficient.
Uh. The first thing that come to my mind is, that at each point you need to calculate the gradient or vector to find out the direction to go in the next step. Then you move by a small epsilon and redo.
This basically creates a grid for you, you could vary the cell size by choosing a small epsilon. By doing this instead of using a fixed grid you should be able to calculate even with small degrees in each step -- smaller then 45° from your 8-point example.
Theoretically you might be able to solve the formulas symbolically (eps against 0), which could lead to on optimal solution... just a thought.
How are the obstacles represented? Are they polygons? You can then use the polygon vertices as nodes. If the obstacles are not represented as polygons, you could generate some sort of convex hull around them, and use its vertices for navigation. EDIT: I just realized, you mentioned that you have to navigate around a relatively high number of obstacles. Using the obstacle vertices might be infeasible with to many obstacles.
I do not know about moving obstacles, I believe A* doesn't find an optimal path with moving obstacles.
You mention that your object moves back and fourth - A* should not do this. A* visits each movement point only once. This could be an artifact of generating movement points on the fly, or from the moving obstacles.
I remember encountering this problem in college, but we didn't use an A* search. I can't remember the exact details of the math but I can give you the basic idea. Maybe someone else can be more detailed.
We're going to create a potential field out of your playing area that an object can follow.
Take your playing field and tilt or warp it so that the start point is at the highest point, and the goal is at the lowest point.
Poke a potential well down into the goal, to reinforce that it's a destination.
For every obstacle, create a potential hill. For non-point obstacles, which yours are, the potential field can increase asymptotically at the edges of the obstacle.
Now imagine your object as a marble. If you placed it at the starting point, it should roll down the playing field, around obstacles, and fall into the goal.
The hard part, the math I don't remember, is the equations that represent each of these bumps and wells. If you figure that out, add them together to get your final field, then do some vector calculus to find the gradient (just like towi said) and that's the direction you want to go at any step. Hopefully this method is fast enough that you can recalculate it at every step, since your obstacles move.
Sounds like you're implementing The Wumpus game based on Norvig and Russel's discussion of A* in Artifical Intelligence: A Modern Approach, or something very similar.
If so, you'll probably need to incorporate obstacle detection as part of your heuristic function (hence you'll need to have sensors that alert your agent to the signs of obstacles, as seen here).
To solve the back and forth issue, you may need to store the traveled path so you can tell if you've already been to a location and have the heurisitic function examine the past N number of moves (say 4) and use that as a tie-breaker (i.e. if I can go north and east from here, and my last 4 moves have been east, west, east, west, go north this time)