Say I have two document vectors, X1 and X2. Now I padded these with zero vectors to have the maximum document length contraint. Will it impact the similrity between two vectors? Or in general how is it helping us?
I can answer the first part. It will not affect the similarity between the vectors. Usually, in document handling, one will use the cosine distance between vectors. By adding zeros, the cosine distance will not be changed. You are increasing the dimensionality by adding zeros. For exmaple, in a two dimensional space [1, 2] and [3, 4] are two points. in a three dimensions, the same points are represented as [1, 2, 0] and [3, 4, 0]. Even though the dimension is increased the points remain the same.
Related
I don't understand what the purpose is of the preprocessing step in HOG (histogram of oriented gradients). In step 1 we normalize the image using a square-root normalization. What is the advantage of this step? Also we have a block normalization. Are these two steps the same?
This is easy. Normalization is used to remove local light differences. The pattern in dark scene can be same in light scene but the values are different. [2 2 3 3] is one edge from 2 to 3. [4 4 6 6] is basically same edge 2 times first vector. These two vectors are linearly dependent. Normalization is way to find match these vectors which describes the same in different conditions. First vector l2 norm Sqrt( pow(2)+pow(2)+pow(3)+pow(3)) = 5,09 , the second is sqrt(pow(4)+pow(4)+pow(6)+pow(6)) is = 10,19. If you divide each element of first vector by 5,09 and each element of second vector by 10,19 the result is [0.4 0.4 0.6 0.6]. They are describing the same with different light conditions. This is the basics of Algebra. my blog with cv resources
I am new to deep learning and attempting to understand how CNN performs image classification
i have gone through multiple youtube videos, multiple blogs and papers as well. And they all mention roughly the same thing:
add filters to get feature maps
perform pooling
remove linearity using RELU
send to a fully connected network.
While this is all fine and dandy, i dont really understand how convolution works in essence. Like for example. edge detection.
like for ex: [[-1, 1], [-1,1]] detects a vertical edge.
How? Why? how do we know for sure that this will detect a vertical edge .
Similarly matrices for blurring/sharpening, how do we actually know that they will perform what they are aimed for.
do i simply takes peoples word for it?
Please help/ i feel helpless since i am not able to understand convolution and how the matrices detects edges or shapes
Filters detect spatial patterns such as edges in an image by detecting the changes in intensity values of the image.
A quick recap: In terms of an image, a high-frequency image is the one where the intensity of the pixels changes by a large amount, while a low-frequency image the one where the intensity is almost uniform. An image has both high and low frequency components. The high-frequency components correspond to the edges of an object because at the edges the rate of change of intensity of pixel values is high.
High pass filters are used to enhance the high-frequency parts of an image.
Let's take an example that a part of your image has pixel values as [[10, 10, 0], [10, 10, 0], [10, 10, 0]] indicating the image pixel values are decreasing toward the right i.e. the image changes from light at the left to dark at the right. The filter used here is [[1, 0, -1], [1, 0, -1], [1, 0, -1]].
Now, we take the convolutional of these two matrices that give the output [[10, 0, 0], [10, 0, 0], [10, 0, 0]]. Finally, these values are summed up to give a pixel value of 30, which gives the variation in pixel values as we move from left to right. Similarly, we find the subsequent pixel values.
Here, you will notice that a rate of change of pixel values varies a lot from left to right thus a vertical edge has been detected. Had you used the filter [[1, 1, 1], [0, 0, 0], [-1, -1, -1]], you would get the convolutional output consisting of 0s only i.e. no horizontal edge present. In the similar ways, [[-1, 1], [-1, 1]] detects a vertical edge.
You can check more here in a lecture by Andrew Ng.
Edit: Usually, a vertical edge detection filter has bright pixels on the left and dark pixels on the right (or vice-versa). The sum of values of the filter should be 0 else the resultant image will become brighter or darker. Also, in convolutional neural networks, the filters are learned the same way as hyperparameters through backpropagation during the training process.
I am building a project that is a basic neural network that takes in a 2x2 image with the goal to classify the image as either a forward slash (1-class) or back slash (0-class) shape. The data for the input is a flat numpy array. 1 represents a black pixel 0 represents a white pixel.
0-class: [1, 0, 0, 1]
1-class: [0, 1, 1, 0]
If I start my filter as a random 4x1 matrix, how can I use gradient descent to come to either perfect matrix [1,-1,-1,1] or [-1,1,1,-1] to classify the datapoints.
Side note: Even when multiplied with the "perfect" answer matrix then summed, the label output would be -2 and 2. Would my data labels need to be -2 and 2? What if I want my classes labeled as 0 and 1?
What is the correct mean of normalization in image processing? I googled it but i had different definition. I'll try to explain in detail each definition.
Normalization of a kernel matrix
If normalization is referred to a matrix (such as a kernel matrix for convolution filter), usually each value of the matrix is divided by the sum of the values of the matrix in order to have the sum of the values of the matrix equal to one (if all values are greater than zero). This is useful because a convolution between an image matrix and our kernel matrix give an output image with values between 0 and the max value of the original image. But if we use a sobel matrix (that have some negative values) this is not true anymore and we have to stretch the output image in order to have all values between 0 and max value.
Normalization of an image
I basically find two definition of normalization. The first one is to "cut" values too high or too low. i.e. if the image matrix has negative values one set them to zero and if the image matrix has values higher than max value one set them to max values. The second one is to linear stretch all the values in order to fit them into the interval [0, max value].
I will extend a bit the answer from #metsburg. There are several ways of normalizing an image (in general, a data vector), which are used at convenience for different cases:
Data normalization or data (re-)scaling: the data is projected in to a predefined range (i.e. usually [0, 1] or [-1, 1]). This is useful when you have data from different formats (or datasets) and you want to normalize all of them so you can apply the same algorithms over them. Is usually performed as follows:
Inew = (I - I.min) * (newmax - newmin)/(I.max - I.min) + newmin
Data standarization is another way of normalizing the data (used a lot in machine learning), where the mean is substracted to the image and dividied by its standard deviation. It is specially useful if you are going to use the image as an input for some machine learning algorithm, as many of them perform better as they assume features to have a gaussian form with mean=0,std=1. It can be performed easyly as:
Inew = (I - I.mean) / I.std
Data stretching or (histogram stretching when you work with images), is refereed as your option 2. Usually the image is clamped to a minimum and maximum values, setting:
Inew = I
Inew[I < a] = a
Inew[I > b] = b
Here, image values that are lower than a are set to a, and the same happens inversely with b. Usually, values of a and b are calculated as percentage thresholds. a= the threshold that separates bottom 1% of the data and b=the thredhold that separates top 1% of the data. By doing this, you are removing outliers (noise) from the image.
This is similar (simpler) to histogram equalization, which is another used preprocessing step.
Data normalization, can also be refereed to a normalization of a vector respect to a norm (l1 norm or l2/euclidean norm). This, in practice, is translated as to:
Inew = I / ||I||
where ||I|| refeers to a norm of I.
If the norm is choosen to be the l1 norm, the image will be divided by the sum of its absolute values, making the sum of the whole image be equal to 1. If the norm is choosen to be l2 (or euclidean), then image is divided by the sum of the square values of I, making the sum of square values of I be equal to 1.
The first 3 are widely used with images (not the 3 of them, as scaling and standarization are incompatible, but 1 of them or scaling + streching or standarization + stretching), the last one is not that useful. It is usually applied as a preprocess for some statistical tools, but not if you plan to work with a single image.
Answer by #Imanol is great, i just want to add some examples:
Normalize the input either pixel wise or dataset wise. Three normalization schemes are often seen:
Normalizing the pixel values between 0 and 1:
img /= 255.0
Normalizing the pixel values between -1 and 1 (as Tensorflow does):
img /= 127.5
img -= 1.0
Normalizing according to the dataset mean & standard deviation (as Torch does):
img /= 255.0
mean = [0.485, 0.456, 0.406] # Here it's ImageNet statistics
std = [0.229, 0.224, 0.225]
for i in range(3): # Considering an ordering NCHW (batch, channel, height, width)
img[i, :, :] -= mean[i]
img[i, :, :] /= std[i]
In data science, there are two broadly used normalization types:
1) Where we try to shift the data so that there sum is a particular value, usually 1 (https://stats.stackexchange.com/questions/62353/what-does-it-mean-to-use-a-normalizing-factor-to-sum-to-unity)
2) Normalize data to fit it within a certain range (usually, 0 to 1): https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range
In "Adaptive document image binarization" paper (link: http://www.mediateam.oulu.fi/publications/pdf/24.p) I found SDM, TBM algorithm for Text/Image Segmentation,
But I can't understand what "same quarter" is in the followed this paragraph.
If the average is high and a global histogram peak is in
the same quarter of the histogram and transient differ-
ence is transient, then use SDM.
If the average is medium and a global histogram peak
is not in the same quarter of the histogram and transi-
ent difference is uniform, then use TBM.
I know that a quarter meaning is 1/4. But i think that quarter is different meaning.. right?
After skimming the paper very quickly, I found two possible ways to interpret this.
From the current bin, choose a quarter of the histogram by looking 1/8th to the left and 1/8th to the right. i.e. if your histogram has 256 bins, and you are at bin 50, the quarter you are looking for is [18, 81]. So if the average is high and the peak lies in [18,81], use SDM.
Divide the entire histogram into quarters, and check which quarter your current bin lies in. i.e. if your histogram has 256 bins, divide it into [0, 63], [64, 127], [128, 191], [192, 255]. If your current bin is 50, you are in quarter 1, and so if the average is medium and the peak lies anywhere outside quarter 1, use TBM.
Based on intuition and mathematical sense, option 1 is more likely. But I would try both and see which implementation gives better results.