I have a problem about plotting 3D matrix. Assume that I have one image with its size 384x384. In loop function, I will create about 10 images with same size and store them into a 3D matrix and plot the 3D matrix in loop. The thickness size is 0.69 between each size (distance between two slices). So I want to display its thickness by z coordinate. But it does not work well. The problem is that slice distance visualization is not correct. And it appears blue color. I want to adjust the visualization and remove the color. Could you help me to fix it by matlab code. Thank you so much
for slice = 1 : 10
Img = getImage(); % get one 2D image.
if slice == 1
image3D = Img;
else
image3D = cat(3, image3D, Img);
end
%Plot image
figure(1)
[x,y,z] = meshgrid(1:384,1:384,1:slice);
scatter3(x(:),y(:),z(:).*0.69,90,image3D(:),'filled')
end
The blue color can be fixed by changing the colormap. Right now you are setting the color of each plot point to the value in image3D with the default colormap of jet which shows lower values as blue. try adding colormap gray; after you plot or whichever colormap you desire.
I'm not sure what you mean by "The problem is that slice distance visualization is not correct". If each slice is of a thickness 0.69 than the image values are an integral of all the values within each voxel of thickness 0.69. So what you are displaying is a point at the centroid of each voxel that represents the integral of the values within that voxel. Your z scale seems correct as each voxel centroid will be spaced 0.69 apart, although it won't start at zero.
I think a more accurate z-scale would be to use (0:slice-1)+0.5*0.69 as your z vector. This would put the edge of the lowest slice at zero and center each point directly on the centroid of the voxel.
I still don't think this will give you the visualization you are looking for. 3D data is most easily viewed by looking at slices of it. you can check out matlab's slice which let's you make nice displays like this one:
slice view http://people.rit.edu/pnveme/pigf/ThreeDGraphics/thrd_threev_slice_1.gif
Related
I'm new to action recognition and anything related to image processing. I'm studying a paper about image processing. It is about action recognition based on human pose estimation. Here is a summary of how it works:
We first run a state-of-the-art human pose estimator [4] in every
frame and obtain heatmaps for every human joint. These heatmaps encode
the probabilities of each pixel to contain a particular joint. We
colorize these heatmaps using a color that depends on the relative
time of the frame in the video clip. For each joint, we sum the
colorized heatmaps over all frames to obtain the PoTion representation
for the entire video clip.
So for each joint j in frame t, it extracts a heatmap H^t_j[x, y] that is the likelihood of pixel (x, y) containing joint j at frame t. The resolution of this heatmap is denoted by W*H.
My first question: What is a heatmap exactly? I wanted to be sure whether heatmap is a probability matrix in which, for example, the element in (1,1) contains a number which is an indicator of the probability that (1,1) pixel may contain the joint.
In the next step this heatmap is colorized with C channels which C shows the number of colors for visualizing each pixel. Here the idea is to use the same color for the joint heatmaps of a frame.
We start by presenting the proposed colorization scheme for 2 channels
(C = 2). For visualization we can for example use red and green colors
for channel 1 and 2. The main idea is to colorize the first frame in
red, the last one in green, and the middle one with equal proportion
(50%) of green and red. The exact proportion of red and green is a
linear function of the relative time t, i.e., t−1/T−1 , see Figure 2
(left). For C = 2, we have o(t) = (t−1/T−1 , 1−(t−1/T−1). The
colorized heatmap of joint j for a pixel (x, y) and a channel c at
time t is given by:
And here is figure 2 which is mentioned in the context:
My problem is that I cannot figure out whether this equation ( o(t) = (t−1/T−1 , 1−(t−1/T−1) ) represents the degree of one color (i.e red) in a frame or it shows the proportion of both of these colors. If it is used for each color channel separately, What does o_red(t) = (1/6 , 5/6) means when the number of frames (T) is equal to 7?
Or if it used for both channels, since the article says that the first frame is colored red and the last frame is colored green, how we can interpret o(1) = (0,1) if the first element indicates the proportion of red and the second one the proportion of green? As far as I can understand it means the first frame is colored green not red!
In this concept there is a subtle relationship between time and pixel positions.
As far as I know: This kind of heatmap is for involving Time in your image. The purpose is to show the movement of a moving object which is captured by a video, in only one image, so every pixel of the image that is related to the fixed (unmoving) objects of the scene (like background pixels) get to be zero (black). In contrast, if in the video, the moving object pass from a pixel position, that corresponding pixel in the image, will be colorful and it's color depends on the number (time) of the frame that moving object has been seen in the pixel.
For example consider we have a completely black curtain in front of the camera and we are filming. We get a 1-second video which is made from 10 frames. At the first moment (frame 1) a very tiny white ball comes into the scene and get captured at pixel (1,1) in frame 1. then at frame two , that small ball got captured at pixel (1,2), and so on. At the end when we stop filming at frame 10, ball will be seen at pixel (1,10). Now we have 10 frames, which one of them has a white pixel at different position and we want to show the whole process in only one image, so 10 pixels of that image will be colorful (pixels: (1,1), (1,2),(1,3),...,(1,10)) and the other pixels are black.
With the formula you mentioned, the color of each pixel is computed according to the related frame number (which the ball got captured):
T=10 # 10 frames
pixel (1,1) got the white ball at frame 1 so its color would be ((0/9),1-(0/9)) which means the green channel has a zero value in that pixels and the red channel has 1 value so this pixel looks completely red.
pixel (1,2) got the white ball at frame 2 so its color would be (1/9 , 8/9), and this pixels is more red than green.
... # continue so on for other 7 pixels
pixel (1,10) got the white ball at frame 2 so its color would be (1 , 0), and this pixels is completely green.
Now at the if you look at the image, you see a colorful line which is 10-pixel long and it is red at the beginning and its color gradually changes to green as it goes to the end (10th pixel). WHICH means the ball moved from pixel one to pixel 10 during that 1 second video.
(If I were unclear at any point of the explanation, please comment and I will elaborate)
As a result of the faster r-cnn method of object detection, I have obtained a set of boxes of intensity values(each bounding box can be thought of as a 3D matrix with depth of 3 for rgb intensity, a width and a height which can then be converted into a 2D matrix by taking gray scale) corresponding to the region containing the object. What I want to do is to obtain the corresponding co-ordinate points in the original image for each cell of intensity inside of the bounding box. Any ideas how to do so?
From what I understand, you got an R-CNN model that outputs cropped pieces of the input image and you now want to trace those output crops back to their coordinates in the original image.
What you can do is simply use a patch-similarity-measure to find the original position.
Since the output crop should look exactly like itself in the original image, just use Pixel-based distance:
Find the place in the image with the smallest distance (should be zero) and from that you can find your desired coordinates.
In python:
d_min = 10**6
crop_size = crop.shape
for x in range(org_image.shape[0]-crop_size[0]):
for y in range(org_image.shape[1]-crop_size[1]):
d = np.abs(np.sum(np.sum(org_image[x:x+crop_size[0],y:y+crop_size[0]]-crop)))
if d <= d_min:
d_min = d
coord = [x,y]
However, your model should have that info available in it (after all, it crops the output based on some coordinates). Maybe if you add some info on your implementation.
I am trying to compare images based on their Euclidean Distance. I have come across this pseudo code:
sqrt((r1-r2)^2 + (g1-g2)^2 + (b1-b2)^2)
What I am trying to figure out is- in the pseudo code above, does (r1-r2) mean: subtract red values in image-1 from the red values in image-2?
Yeah, this is the most basic form of Euclidean Color Distance. You compare pixel color to other pixel color by comparing the distance between the different components in the pixels.
Pixels are 3 colors (usually) in RGB and you compare the pixels. So you #FFAA00 and #F8A010 has 0xFF for R1 and 0xF8 for R2.
There's a bunch of other distance values like CIELabD2k. But, that's the core behind color distance.
I know, that the Meanshift-Algorithm calculates the mean of a pixel density and checks, if the center of the roi is equal with this point. If not, it moves the new ROI center to the mean center and checks again... . Like in this picture:
For density, it is clear how to find the mean point. But it can't simply calculate the mean of a histogram and get the new position by this point. How can this algorithm work using color histogram?
The feature space in your image is 2D.
Say you have an intensity image (so it's 1D) then you would just have a line (e.g. from 0 to 255) on which the points are located. The circles shown above would just be line segments on that [0,255] line. Depending on their means, these line segments would then shift, just like the circles do in 2D.
You talked about color histograms, so I assume you are talking about RGB.
In that case your feature space is 3D, so you have a sphere instead of a line segment or circle. Your axes are R,G,B and pixels from your image are points in that 3D feature space. You then still look where the mean of a sphere is, to then shift the center towards that mean.
I am not able to under stand the formula ,
What is W (window) and intensity in the formula mean,
I found this formula in opencv doc
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
For a grayscale image, intensity levels (0-255) tells you how bright is the pixel..hope that you already know about it.
So, now the explanation of your formula is below:
Aim: We want to find those points which have maximum variation in terms of intensity level in all direction i.e. the points which are very unique in a given image.
I(x,y): This is the intensity value of the current pixel which you are processing at the moment.
I(x+u,y+v): This is the intensity of another pixel which lies at a distance of (u,v) from the current pixel (mentioned above) which is located at (x,y) with intensity I(x,y).
I(x+u,y+v) - I(x,y): This equation gives you the difference between the intensity levels of two pixels.
W(u,v): You don't compare the current pixel with any other pixel located at any random position. You prefer to compare the current pixel with its neighbors so you chose some value for "u" and "v" as you do in case of applying Gaussian mask/mean filter etc. So, basically w(u,v) represents the window in which you would like to compare the intensity of current pixel with its neighbors.
This link explains all your doubts.
For visualizing the algorithm, consider the window function as a BoxFilter, Ix as a Sobel derivative along x-axis and Iy as a Sobel derivative along y-axis.
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/sobel_derivatives/sobel_derivatives.html will be useful to understand the final equations in the above pdf.