For MSE the best constant is the mean, for MAE the median. How can I calculate the best constant for MAPE? For example, what is the best constant in the array [12, 31, 51, 76]?
Related
I am new to deep learning and attempting to understand how CNN performs image classification
i have gone through multiple youtube videos, multiple blogs and papers as well. And they all mention roughly the same thing:
add filters to get feature maps
perform pooling
remove linearity using RELU
send to a fully connected network.
While this is all fine and dandy, i dont really understand how convolution works in essence. Like for example. edge detection.
like for ex: [[-1, 1], [-1,1]] detects a vertical edge.
How? Why? how do we know for sure that this will detect a vertical edge .
Similarly matrices for blurring/sharpening, how do we actually know that they will perform what they are aimed for.
do i simply takes peoples word for it?
Please help/ i feel helpless since i am not able to understand convolution and how the matrices detects edges or shapes
Filters detect spatial patterns such as edges in an image by detecting the changes in intensity values of the image.
A quick recap: In terms of an image, a high-frequency image is the one where the intensity of the pixels changes by a large amount, while a low-frequency image the one where the intensity is almost uniform. An image has both high and low frequency components. The high-frequency components correspond to the edges of an object because at the edges the rate of change of intensity of pixel values is high.
High pass filters are used to enhance the high-frequency parts of an image.
Let's take an example that a part of your image has pixel values as [[10, 10, 0], [10, 10, 0], [10, 10, 0]] indicating the image pixel values are decreasing toward the right i.e. the image changes from light at the left to dark at the right. The filter used here is [[1, 0, -1], [1, 0, -1], [1, 0, -1]].
Now, we take the convolutional of these two matrices that give the output [[10, 0, 0], [10, 0, 0], [10, 0, 0]]. Finally, these values are summed up to give a pixel value of 30, which gives the variation in pixel values as we move from left to right. Similarly, we find the subsequent pixel values.
Here, you will notice that a rate of change of pixel values varies a lot from left to right thus a vertical edge has been detected. Had you used the filter [[1, 1, 1], [0, 0, 0], [-1, -1, -1]], you would get the convolutional output consisting of 0s only i.e. no horizontal edge present. In the similar ways, [[-1, 1], [-1, 1]] detects a vertical edge.
You can check more here in a lecture by Andrew Ng.
Edit: Usually, a vertical edge detection filter has bright pixels on the left and dark pixels on the right (or vice-versa). The sum of values of the filter should be 0 else the resultant image will become brighter or darker. Also, in convolutional neural networks, the filters are learned the same way as hyperparameters through backpropagation during the training process.
How does categorical accuract works? By definition
categorical_accuracy checks to see if the index of the maximal true
value is equal to the index of the maximal predicted value.
and
Calculates the mean accuracy rate across all predictions for
multiclass classification problems
What does it mean in practice? Lets say i am prediction bounding box of object
it has (xmin,ymin,xmax,ymax) does it check if xmin predicted is equal with xmin real? So if i xmin and xmax where same in prediction and real values, and ymin and ymax were different i would get 50%?
Please help me undestand this concept
Traditionally for multiclass classification, your labels will have some integer (or equivalently categorical) label; for example:
labels = [0, 1, 2]
The output of a multiclass classification prediction will typically be a probability distribution of confidences; for example:
preds = [0.25, 0.5, 0.25]
Normally the index associated with the most likely event will be the index of the label. In this case, the argmax(preds) is 1, which maps to label 1.
You can see the total accuracy of your predictions a la confusion matrices, where one axis is the "true" value, and the other axis is the "predicted" value. The values for each cell are the sums of the values of CM[y_true][y_pred]. The accuracy will be the sum of main diagonal of the matrix (y_true = y_pred) over the total number of training instances.
In RNN world, does it matter which end of the word vector is padded so they all have the same length?
Example
pad_left = [0, 0, 0, 0, 5, 4, 3, 2]
pad_right = [5, 4, 3, 2, 0, 0, 0, 0]
Some guys did experiments on the pre-padding and post-padding in their paper Effects of Padding on LSTMs and CNNs. Here is their conclusion.
For LSTMs, the accuracy of post-padding (50.117%) is way less than pre-padding (80.321%).
Pre-padding and post padding doesn’t matter much to CNN because unlike LSTMs, CNNs don’t try to remember stuff from the previous output, but instead tries to find pattern in the given data.
I have never expected such big effect of padding positions, so I suggest you verify it yourself.
Say I have two document vectors, X1 and X2. Now I padded these with zero vectors to have the maximum document length contraint. Will it impact the similrity between two vectors? Or in general how is it helping us?
I can answer the first part. It will not affect the similarity between the vectors. Usually, in document handling, one will use the cosine distance between vectors. By adding zeros, the cosine distance will not be changed. You are increasing the dimensionality by adding zeros. For exmaple, in a two dimensional space [1, 2] and [3, 4] are two points. in a three dimensions, the same points are represented as [1, 2, 0] and [3, 4, 0]. Even though the dimension is increased the points remain the same.
In "Adaptive document image binarization" paper (link: http://www.mediateam.oulu.fi/publications/pdf/24.p) I found SDM, TBM algorithm for Text/Image Segmentation,
But I can't understand what "same quarter" is in the followed this paragraph.
If the average is high and a global histogram peak is in
the same quarter of the histogram and transient differ-
ence is transient, then use SDM.
If the average is medium and a global histogram peak
is not in the same quarter of the histogram and transi-
ent difference is uniform, then use TBM.
I know that a quarter meaning is 1/4. But i think that quarter is different meaning.. right?
After skimming the paper very quickly, I found two possible ways to interpret this.
From the current bin, choose a quarter of the histogram by looking 1/8th to the left and 1/8th to the right. i.e. if your histogram has 256 bins, and you are at bin 50, the quarter you are looking for is [18, 81]. So if the average is high and the peak lies in [18,81], use SDM.
Divide the entire histogram into quarters, and check which quarter your current bin lies in. i.e. if your histogram has 256 bins, divide it into [0, 63], [64, 127], [128, 191], [192, 255]. If your current bin is 50, you are in quarter 1, and so if the average is medium and the peak lies anywhere outside quarter 1, use TBM.
Based on intuition and mathematical sense, option 1 is more likely. But I would try both and see which implementation gives better results.