How does meanshift tracking work? (using histograms) - opencv

I know, that the Meanshift-Algorithm calculates the mean of a pixel density and checks, if the center of the roi is equal with this point. If not, it moves the new ROI center to the mean center and checks again... . Like in this picture:
For density, it is clear how to find the mean point. But it can't simply calculate the mean of a histogram and get the new position by this point. How can this algorithm work using color histogram?

The feature space in your image is 2D.
Say you have an intensity image (so it's 1D) then you would just have a line (e.g. from 0 to 255) on which the points are located. The circles shown above would just be line segments on that [0,255] line. Depending on their means, these line segments would then shift, just like the circles do in 2D.
You talked about color histograms, so I assume you are talking about RGB.
In that case your feature space is 3D, so you have a sphere instead of a line segment or circle. Your axes are R,G,B and pixels from your image are points in that 3D feature space. You then still look where the mean of a sphere is, to then shift the center towards that mean.

Related

find rectangle coordinates in a given image

I'm trying to blindly detect signals in a spectra.
one way that came to my mind is to detect rectangles in the waterfall (a 2D matrix that can be interpret as an image) .
Is there any fast way (in the order of 0.1 second) to find center and width of all of the horizontal rectangles in an image? (heights of rectangles are not considered for me).
an example image will be uploaded (Note I know that all rectangles are horizontal.
I would appreciate it if you give me any other suggestion for this purpose.
e.g. I want the algorithm to give me 9 center and 9 coordinates for the above image.
Since the rectangle are aligned, you can do that quite easily and efficiently (this is not the case with unaligned rectangles since they are not clearly separated). The idea is first to compute the average color of each line and for each column. You should get something like that:
Then, you can subtract the background color (blue), compute the luminance and then compute a threshold. You can remove some artefact using a median/blur before.
Then, you can just scan the resulting 1D array filled with binary values so to locate where each rectangle start/stop. The center of each rectangle is ((x_start+x_end)/2, (y_start+y_end)/2).

Problem in understanding how colors are applied to each pixel

I'm new to action recognition and anything related to image processing. I'm studying a paper about image processing. It is about action recognition based on human pose estimation. Here is a summary of how it works:
We first run a state-of-the-art human pose estimator [4] in every
frame and obtain heatmaps for every human joint. These heatmaps encode
the probabilities of each pixel to contain a particular joint. We
colorize these heatmaps using a color that depends on the relative
time of the frame in the video clip. For each joint, we sum the
colorized heatmaps over all frames to obtain the PoTion representation
for the entire video clip.
So for each joint j in frame t, it extracts a heatmap H^t_j[x, y] that is the likelihood of pixel (x, y) containing joint j at frame t. The resolution of this heatmap is denoted by W*H.
My first question: What is a heatmap exactly? I wanted to be sure whether heatmap is a probability matrix in which, for example, the element in (1,1) contains a number which is an indicator of the probability that (1,1) pixel may contain the joint.
In the next step this heatmap is colorized with C channels which C shows the number of colors for visualizing each pixel. Here the idea is to use the same color for the joint heatmaps of a frame.
We start by presenting the proposed colorization scheme for 2 channels
(C = 2). For visualization we can for example use red and green colors
for channel 1 and 2. The main idea is to colorize the first frame in
red, the last one in green, and the middle one with equal proportion
(50%) of green and red. The exact proportion of red and green is a
linear function of the relative time t, i.e., t−1/T−1 , see Figure 2
(left). For C = 2, we have o(t) = (t−1/T−1 , 1−(t−1/T−1). The
colorized heatmap of joint j for a pixel (x, y) and a channel c at
time t is given by:
And here is figure 2 which is mentioned in the context:
My problem is that I cannot figure out whether this equation ( o(t) = (t−1/T−1 , 1−(t−1/T−1) ) represents the degree of one color (i.e red) in a frame or it shows the proportion of both of these colors. If it is used for each color channel separately, What does o_red(t) = (1/6 , 5/6) means when the number of frames (T) is equal to 7?
Or if it used for both channels, since the article says that the first frame is colored red and the last frame is colored green, how we can interpret o(1) = (0,1) if the first element indicates the proportion of red and the second one the proportion of green? As far as I can understand it means the first frame is colored green not red!
In this concept there is a subtle relationship between time and pixel positions.
As far as I know: This kind of heatmap is for involving Time in your image. The purpose is to show the movement of a moving object which is captured by a video, in only one image, so every pixel of the image that is related to the fixed (unmoving) objects of the scene (like background pixels) get to be zero (black). In contrast, if in the video, the moving object pass from a pixel position, that corresponding pixel in the image, will be colorful and it's color depends on the number (time) of the frame that moving object has been seen in the pixel.
For example consider we have a completely black curtain in front of the camera and we are filming. We get a 1-second video which is made from 10 frames. At the first moment (frame 1) a very tiny white ball comes into the scene and get captured at pixel (1,1) in frame 1. then at frame two , that small ball got captured at pixel (1,2), and so on. At the end when we stop filming at frame 10, ball will be seen at pixel (1,10). Now we have 10 frames, which one of them has a white pixel at different position and we want to show the whole process in only one image, so 10 pixels of that image will be colorful (pixels: (1,1), (1,2),(1,3),...,(1,10)) and the other pixels are black.
With the formula you mentioned, the color of each pixel is computed according to the related frame number (which the ball got captured):
T=10 # 10 frames
pixel (1,1) got the white ball at frame 1 so its color would be ((0/9),1-(0/9)) which means the green channel has a zero value in that pixels and the red channel has 1 value so this pixel looks completely red.
pixel (1,2) got the white ball at frame 2 so its color would be (1/9 , 8/9), and this pixels is more red than green.
... # continue so on for other 7 pixels
pixel (1,10) got the white ball at frame 2 so its color would be (1 , 0), and this pixels is completely green.
Now at the if you look at the image, you see a colorful line which is 10-pixel long and it is red at the beginning and its color gradually changes to green as it goes to the end (10th pixel). WHICH means the ball moved from pixel one to pixel 10 during that 1 second video.
(If I were unclear at any point of the explanation, please comment and I will elaborate)

Display 3D matrix with its thickness information

I have a problem about plotting 3D matrix. Assume that I have one image with its size 384x384. In loop function, I will create about 10 images with same size and store them into a 3D matrix and plot the 3D matrix in loop. The thickness size is 0.69 between each size (distance between two slices). So I want to display its thickness by z coordinate. But it does not work well. The problem is that slice distance visualization is not correct. And it appears blue color. I want to adjust the visualization and remove the color. Could you help me to fix it by matlab code. Thank you so much
for slice = 1 : 10
Img = getImage(); % get one 2D image.
if slice == 1
image3D = Img;
else
image3D = cat(3, image3D, Img);
end
%Plot image
figure(1)
[x,y,z] = meshgrid(1:384,1:384,1:slice);
scatter3(x(:),y(:),z(:).*0.69,90,image3D(:),'filled')
end
The blue color can be fixed by changing the colormap. Right now you are setting the color of each plot point to the value in image3D with the default colormap of jet which shows lower values as blue. try adding colormap gray; after you plot or whichever colormap you desire.
I'm not sure what you mean by "The problem is that slice distance visualization is not correct". If each slice is of a thickness 0.69 than the image values are an integral of all the values within each voxel of thickness 0.69. So what you are displaying is a point at the centroid of each voxel that represents the integral of the values within that voxel. Your z scale seems correct as each voxel centroid will be spaced 0.69 apart, although it won't start at zero.
I think a more accurate z-scale would be to use (0:slice-1)+0.5*0.69 as your z vector. This would put the edge of the lowest slice at zero and center each point directly on the centroid of the voxel.
I still don't think this will give you the visualization you are looking for. 3D data is most easily viewed by looking at slices of it. you can check out matlab's slice which let's you make nice displays like this one:
slice view http://people.rit.edu/pnveme/pigf/ThreeDGraphics/thrd_threev_slice_1.gif

Pixel-Milimeter Proportion

I have a digital image, and I want to make some calculation based on distances on it. So I need to get the Milimeter/Pixel proportion. What I'm doing right now, is to mark two points wich I know the real world distance, to calculate the Euclidian distance between them, and than obtain the proportion.
The question is, Only with two points can I make the correct Milimeter/Pixel's proportion, or do I need to use 4 points, 2 for the X-Axis and 2 for Y-axis?
If your image is of a flat surface and the camera direction is perpendicular to that surface, then your scale factor should be the same in both directions.
If your image is of a flat surface, but it is tilted relative to the camera, then marking out a rectangle of known proportions on that surface would allow you to compute a perspective transform. (See for example this question)
If your image is of a 3D scene, then of course there is no way in general to convert pixels to distances.
If you know the distance between the points A and B measured on the picture(say in inch) and you also know the number of pixels between the points, you can easily calculate the pixels/inch ratio by dividing <pixels>/<inches>.
I suggest to take the points on the picture such that the line which intersects them is either horizontal either vertical such that calculations do not have errors taking into account the pixels have a rectangular form.

what is the relationship between image edges and gradient?

Is there anybody can help me interpret
"Edge points may be located by the maxima of the
module of the gradient, and the direction of edge contour is orthogonal to the direction of the gradient."
Paul R has given you an answer, so I'll just add some images to help make the point.
In image processing, when we refer to a "gradient" we usually mean the change in brightness over a series of pixels. You can create gradient images using software such as GIMP or Photoshop.
Here's an example of a linear gradient from black (left) to white (right):
The gradient is "linear" meaning that the change in intensity is directly proportional to the distance between pixels. This particular gradient is smooth, and we wouldn't say there is an "edge" in this image.
If we plot the brightness of the gradient vs. X-position (left to right), we get a plot that looks like this:
Here's an example of an object on a background. The edges are a bit fuzzy, but this is common in images of real objects. The pixel brightness does not change from black to white from one pixel to the next: there is a gradient that includes shades of gray. This is not obvious since you typically have to zoom into a photo to see the fuzzy edge.
In image processing we can find those edges by looking at sharp transitions (sharp gradients) from one brightness to another. If we zoom into the upper left corner of that box, we can see that there is a transition from white to black over just a few pixels. This transition is a gradient, too. The difference is that the gradient is located between two regions of constant color: white on the left, black on the right.
The red arrow shows the direction of the gradient from background to foreground: pixels are light on the left, and as we move in the +x direction the pixels become darker. If we plot the brightness sampled along the arrow, we'll get something like the following plot, with red squares representing the brightness for a specific pixel. The change isn't linear, but instead will look like one side of a bell curve:
The blue line segment is a rough approximation of the slope of the curve at its steepest. The "true" edge point is the point at which slope is steepest along the gradient corresponding to the edge of an object.
Gradient magnitude and direction can be calculated using horizontal and vertical Sobel filters. You can then calculate the direction of the gradient as:
gradientAngle = arctan(gradientY / gradientX)
The gradient will be steepest when it is perpendicular to the edge of the object.
If you look at some black and white images of real scenes, you can zoom in, look at individual pixel values, and develop a good sense of how these principles apply.
Object edges typically result in a step change in intensity. So if you take the derivative of intensity it will have a large (positive or negative) value at edges and a smaller value elsewhere. If you can identify the direction of steepest gradient then this will be at right angles to (orthogonal to) the object edge.

Resources