Masking in DCT Compression - image-processing

I am trying to do image compression using DCT (Discrete Cosine Transform). Can someone please help me understand how masking affects bit per pixel in DCT? How is the bit allocation done in the masking?
PS: By masking, I mean multiplying the DCT coefficients with a matrix like the one below (element wise multiplication, not matrix multiplication).
mask = [1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0]
Background on "Masking"
Compression using the DCT calculates the DCT of blocks of an image, in this case 8x8 pixels. High frequency components of an image are less important for the human perception of an image and thus can be discarded in order to save space.
The mask Matrix selects which DCT coefficients are kept and which ones get discarded in order to save space. Coefficients towards the top left corner represent low frequencies.
For more information visit Discrete Cosine Transform.

This looks like a variation of quantization matrix.
Low frequencies are in top left, high frequencies are in bottom right. Eye is more sensitive to low frequencies, so removal of high frequency coefficients will remove less important details of the image.

Related

Diagonal Direction Edge Detection

Let say that a given image has an edge in some angle (not neglectable), so we use edge detection mask rotated, for example sobel operator in 45 degrees:
0 1 2
-1 0 1
-2 -1 0
or
-2 -1 0
-1 0 1
0 1 2
In this case, how we will find the magnitude of the edge?

Possibility of only dealing with specific region of binary image

Recently I study the image processing.
When I go through the problem of filling the hole, it confuses me (I assume that the people able to answer the question is familiar with the step of doing this so I skip to the problem):
Let's say if I have a binary image like this:
0 0 0 0 0 0 0
0 0 1 1 0 0 0
0 1 0 0 1 0 0
0 1 0 0 1 0 0
0 0 1 0 1 0 0
0 0 1 0 1 0 0
0 1 0 0 0 1 0
0 1 0 0 0 1 0
0 1 1 1 1 0 0
0 0 0 0 0 0 0
And the book says to start form the region that is inside of the hole and perform the dilation operation and set the bound in case it fills the whole image.
I have no problem understanding the whole process, but if I try to code it, how can I only deal with a specific region (in the hole for this case)? Or the actual implement would be different method ?
If you can assume that the object with holes does not touch the border of the image, you can create an intermediate image where you call flood fill (with value e.g. 2) on the top left pixel. Any remaining '0' pixels have to be inside the contour. Take the position of the first encountered remaining '0' pixel and flood fill it in the original image.

Cannot understand Keras ConvLSTM2D - edited

I'm looking at the example: https://github.com/fchollet/keras/blob/master/examples/conv_lstm.py
This RNN is actually predicting the next frame of the movie, so the output should be a movie too (according to the test data fed in). I wonder if there are information lost due to the conv layers with padding.
For example, the underlying Tensorflow is padding bottom right, if there is a big padding: (n stands for numbers)
n n n n 0 0 0
n n n n 0 0 0
n n n n 0 0 0
n n n n 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
when we do the second conv, the bottom right corner will always be 0, which means the back propagation will never be able to capture anything there. As in this case a movie(a square moves on the whole screen), will it lost the information when the validation label is on the bottom right corner?
The answer is yes after asking a Ph.D. doing AI research.

Why is my convolution result shifted when using FFT

I'm implementing Convolutions using Radix-2 Cooley-Tukey FFT/FFT-inverse, and my output is correct but shifted upon completion.
My solution is to zero-pad both input size and kernel size to 2^m for smallest possible m, tranforming both input and kernel using FFT, then multiply the two element-wise and transform the result back using FFT-inverse.
As an example on the resulting problem:
0 1 2 3 0 0 0 0
4 5 6 7 0 0 0 0
8 9 10 11 0 0 0 0
12 13 14 15 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
with identity kernel
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
becomes
0 0 0 0 0 0 0 0
0 0 1 2 3 0 0 0
0 4 5 6 7 0 0 0
0 8 9 10 11 0 0 0
0 12 13 14 15 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
It seems any sizes of inputs and kernels produces the same shift (1 row and 1 col), but I could be wrong. I've performed the same computations using the online calculator at this link! and get same results, so it's probably me missing some fundamental knowledge. My available litterature has not helped. So my question, why does this happen?
So I ended up finding the answer why this happens myself. The answered is given through the definition of the convolution and the indexing that happens there. So by definition the convolution of s and k is given by
(s*k)(x) = sum(s(k)k(x-k),k=-inf,inf)
The center of the kernel is not "known" by this formula, and thus an abstraction we make. Define c as the center of the convolution. When x-k = c in the sum, s(k) is s(x-c). So the sum containing the interesting product s(x-c)k(c) ends up at index x. In other words, shifted to the right by c.
FFT fast convolution does a circular convolution. If you zero pad so that both the data and kernel are circularly centered around (0,0) in the same size NxN arrays, the result will also stay centered. Otherwise any offsets will add.

OpenCV:Difference between a matrix with 1 column of 8UC3 type and 3 columns of 8UC1

Lets say I create a matrix M1 of 5 rows and 1 column of 8UC3 type to store RGB components of an image.Then I create another matrix M2 of 5 rows and 3 columns of 8UC1 type to again store the RGB components of the image.
Is there a difference in the way these 2 types of matrices are stored in/accessed from the memory? From what I understand from http://www.cs.iit.edu/~agam/cs512/lect-notes/opencv-intro/opencv-intro.html#SECTION00053000000000000000 (commonly recommended OpenCV tutorial on Stackoverflow), the data pointer of the matrix points to the first index of the data array(the matrix is internally stored as an array) and the various RGB components are stored in an interwoven fashion(in case of 8UC3).
My logic says that they should be the same as in case of 1 column 8UC3(M1), for each column RGB components are stored, and in the case of 3 columns 8UC1(M2), each column stores the RGB component.
I hope I have been able to formulate my question well.
Thanks in advance!
Your understanding is correct. The memory layout will be exactly the same. So you can cheaply convert the representation back-and-forth via reshape method.
The thing that would be different is how OpenCV algorithms will handle those matrices.
Let's say the memory footprint is as follow:
255 0 0
255 0 0
255 0 0
255 0 0
255 0 0
And you want to call the resize function to add 3 columns. Then in the case of a 5x1 Mat of CV_8UC3, the result will be
255 0 0 255 0 0
255 0 0 255 0 0
255 0 0 255 0 0
255 0 0 255 0 0
255 0 0 255 0 0
And in case of a 5x3 Mat of CV_8UC1, the result will be
255 255 0 0 0 0
255 255 0 0 0 0
255 255 0 0 0 0
255 255 0 0 0 0
255 255 0 0 0 0

Resources