3d bounding box length ,width and height.. kitti python - bounding-box

In kitti dataset the label file has 6 parameter of 3bbox which is in meter and in object coordinate.. for example :
[Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01]
how can i get height, width, length of the 3dbbox projected in the image(in pixels) in python
also the coordinate of all 8 corner points of the 3dbboxes.

Related

How to find AAL atlas regions of mri data with desired volume sizes?

I have a MRI data array with the shape 121 × 145 × 121 and voxel size 1.5 mm × 1.5 mm × 1.5 mm. I want to find the regions of AAL atlas in my data. How can I do that in python?

How to calculate the distance between object and camera, knowing the pixels occupied by the object in an image

By using the segmentation I am able to find the number of pixels occupied by an object in an image. Now I need to kind the distance by using the pixels occupied.
object real dimensions (H x W) = 11 x 5.5 cm.
The object is placed at 50 cm distance pixels occupied are = 42894
The object is placed at 60 cm distance pixels occupied are = 31269.
The total pixel in an image = 480 x 640 = 307200.
what is the distance if the image occupies 22323 pixels ???
The distance to the object is 67.7cm
Please read https://en.wikipedia.org/wiki/Pinhole_camera_model
Image size is inversely proportional to the distance. Repeat your experiement for a few distances and plot a size vs distance to see for yourself.
Of course this is a simplyfied model that only works for a fixed focal length.

Final vectors in Histogram of oriented gradient

The dimension of the image is 64 x 128. That is 8192 magnitude and gradient values. After the binning stage, we are left with 1152 values as we converted 64 pixels into 9 bins based on their orientation. Can you please explain to me how after L2 normalization we get 3780 vectors?
Assumption: You have the gradients of the 64 x 128 patch.
Calculate Histogram of Gradients in 8x8 cells
This is where it starts to get interesting. The image is divided into 8x8 cells and a HOG is calculated for each 8x8 cells. One of the reasons why we use 8x8 cells is that it provides a compact representation. An 8x8 image patch contains 8x8x3 = 192 pixel values (color image). The gradient of this patch contains 2 values (magnitude and direction) per pixel which adds up to 8x8x2 = 128 values. These 128 numbers are represented using a 9-bin histogram which can be stored as an array of 9 numbers. This makes it more compact and calculating histograms over a patch makes this representation more robust to noise.
The histogram is essentially a vector of 9 bins corresponding to angles 0, 20, 40, 60 ... 180 corresponding to unsigned gradients.
16 x 16 Block Normalization
After creating the histogram based on the gradient of the image, we want our descriptor to be independent of lighting variations. Hence, we normalize the histogram. The vector norm for a RGB color [128, 64, 32] is sqrt(128*128 + 64*64 + 32*32) = 146.64, which is the infamous L2-norm. Dividing each element of this vector by 146.64 gives us a normalized vector [0.87, 0.43, 0.22]. If we were to multiply each element of this vector by 2, the normalized vector will remain the same as before.
Although simply normalizing the 9x1 histogram is intriguing, normalizing a bigger sized block of 16 x 16 is better. A 16 x 16 block has 4 histograms, which can be concatenated to form a 36 x 1 element vector and it can be normalized the same way as the 3 x 1 vector in the example. The window is then moved by 8 pixels and a normalized 36 x 1 vector is calculated over this window and the process is repeated (see the animation: Courtesy)
Calculate the HOG feature vector
This is where your question comes in.
To calculate the final feature vector for the entire image patch, the 36 x 1 vectors are concatenated into on giant vector. Let us calculate the size:
How many positions of the 16 x 16 blocks do we have? There are 7 horizontal and 15 vertical positions, which gives - 105 positions.
Each 16 x 16 block is represented by a 36 x 1 vector. So when we concatenate them all into one giant vector we obtain a 36 x 105 = 3780 dimensional vector.
For more details, look at the tutorial where I learned.
Hope it helps!

2592x1944 YUV422 one frame image size calculation issue

I'd like to calculate 2592x1944 YUV422 one frame image size from camera.
I've already seen that https://en.wikipedia.org/wiki/YUV
But I'm not sure whether the below calculation is right or not.
and
YUV444 3 bytes per pixel (12 bytes per 4 pixels)
YUV422 4 bytes per 2 pixels ( 8 bytes per 4 pixels)
YUV411 6 bytes per 4 pixels
YUV420p 6 bytes per 4 pixels, reordered
with meaning of hsync length.
As I know, 2592x1944 YUV422 one frame image size which can be calculated such as
Total number of pixels in a frame = 2592*1944 = 5038848 Pixels
Total number of bytes in a frame = 5038848 *2=10077696 Bytes
Does it mean that 1 hsync real size(length)# YUV422 is 2592*2 and changeable by YUV444, YUV422, YUV411, YUV420 ?
Your calculation is right if we assume 8 bits color depth.
Y needs 2592x1944 x8bits = 5038848 bytes
U needs 2592x1944 /2 x8bits = 2519424 bytes
V needs 2592x1944 /2 x8bits = 2519424 bytes
TOTAL = 10077696 bytes for YUV422, 8bits color
I don't get the question regarding hsync.
A nice explanation of YUV422 https://www.youtube.com/watch?v=7JYZDnenaGc,
and color depth https://www.youtube.com/watch?v=bKjSox0uwnk.

How to mask green pixels?

I need to mask a green pixels in the image.
I have have example of the masking red pixels.
Here the example:
Image<Hsv, Byte> hsv = image.Convert<Hsv, Byte>()
Image<Gray, Byte>[] channels = hsv.Split();
//channels[0] is the mask for hue less than 20 or larger than 160
CvInvoke.cvInRangeS(channels[0], new MCvScalar(20), new MCvScalar(160), channels[0]);
channels[0]._Not();
but, I cant understand from where those parameters where token:
new MCvScalar(20), new MCvScalar(160)
Any idea which parameters I have to take to mask the green pixels?
Thank you in advance.
The code masks pixels with Hue outside the range 20 - 160 (or rather masks pixeles inside the range and then inverts the mask).
First, understand HSV (Hue, Saturation, Value): http://en.wikipedia.org/wiki/HSL_and_HSV
The actual Hue is in degrees and goes from 0 to 360 like:
Then see OpenCV documentation on 8-bit HSV format:
Hue is first calculated in 0 - 360, then divided by 2 to fit into 8-bit integer.
This means that in the original example the masked pixels have actual Hue under 40 or above 320 degrees. Apparently that's 0 degrees plus / minus 40.
For a similar range of greens you'd want 120 +/- 40, i.e. from 80 to 160. Finally converting that to 8-bit representation - from 40 to 80.
The actual code will differ from your sample though: for red they had to mask 20,160 then invert the mask. For green just masking from 40 to 80 is enough (i.e. you'll have to omit the channels[0]._Not(); part).

Resources