I am building a convolution autoencoder that uses MSE as its error function. How is MSE defined for images? If the image is presented in simple matrix form, is MSE simply the square of the difference of individual determinants? Or is it the square of the determinant of the difference of the matrices?
There is no determinant involved when calculating MSE. MSE stands for Mean Squared Error, and it is simply a sum over squares of the differences per each single pixel in your matrix. In other words - cost is model agnostic, MSE is defined in exactly the same way whether you use conv-autoencoder, simple autoencoder or simple MLP.
Related
Iam a little bit confused about how to normalize/standarize image pixel values before training a convolutional autoencoder. The goal is to use the autoencoder for denoising, meaning that my traning images consists of noisy images and the original non-noisy images used as ground truth.
To my knowledge there are to options to pre-process the images:
- normalization
- standarization (z-score)
When normalizing using the MinMax approach (scaling between 0-1) the network works fine, but my question here is:
- When using the min max values of the training set for scaling, should I use the min/max values of the noisy images or of the ground truth images?
The second thing I observed when training my autoencoder:
- Using z-score standarization, the loss decreases for the two first epochs, after that it stops at about 0.030 and stays there (it gets stuck). Why is that? With normalization the loss decreases much more.
Thanks in advance,
cheers,
Mike
[Note: This answer is a compilation of the comments above, for the record]
MinMax is really sensitive to outliers and to some types of noise, so it shouldn't be used it in a denoising application. You can use quantiles 5% and 95% instead, or use z-score (for which ready-made implementations are more common).
For more realistic training, normalization should be performed on the noisy images.
Because the last layer uses sigmoid activation (info from your comments), the network's outputs will be forced between 0 and 1. Hence it is not suited for an autoencoder on z-score-transformed images (because target intensities can take arbitrary positive or negative values). The identity activation (called linear in Keras) is the right choice in this case.
Note however that this remark on activation only concerns the output layer, any activation function can be used in the hidden layers. Rationale: negative values in the output can be obtained through negative weights multiplying the ReLU output of hidden layers.
I have built a FCN for image segmentation. The object to be segmented is only very few pixels relatively to the image size (1024x1024). This results in that the accuracy is very high, even if I only train with 10 images instead of 18000 (my full training set).
My approach to solve this is to use some kind of weighted accuracy, so that the accuracy actually say something about the performance of identifying the small object (now it gets high accuracy since so many pixels are not the object and by not classifying anything the accuracy still gets high).
How do I decide the weight, anybody with some experience?
As you wrote, use a custom weight function which penalizes misclassification of underrepresented pixels more. You can get the weight by calculating the quotient between the number of object pixels versus all of the pixels in the image, or you can try it by hand - just make sure you follow the metrics which tell you the accuracy of object pixels. Hope it helps.
You can use infogain loss layer for a "weighted" loss.
The infogain loss is a generalization of the cross entropy loss commonly used. It is defined using a weight matrix H (of size L-by-L, where L is the number of classes):
L(p) = -H log(p)
Where p is a vector of class probabilities.
You can find more details on this loss here.
I am trying to implement the stochastic diagonal Levenberg-Marquardt method for Convolutional Neural Network in order to back propagate for learning weights.
i am new in it, and quite confused in it, so I have few questions, i hope you may help me.
1) How can i calculate the second order derivative at output layer from the two outputs.
As i in first order derivative i have to subtract output from desired output and multiply it with derivative of the output.
But in second derivative how can i do that?
2) In MaxPooling layer of convolutional Neural Network, I select max value in 2x2 window, and multiply it with the weight, now Does i have to pass it through activation function or not?
Can some one give me explanation how to do it in opencv, or how with mathematical explanation or any reference which show the mathematics.
thanks in advance.
If you have calculated Jacobian matrix already (the matrix of partial first order derivatives) then you can obtain an approximation of the Hessian (the matrix of partial second order derivatives) by multiplying J^T*J (if residuals are small).
You can calculate second derivative from two outputs: y and f(X) and Jacobian this way:
In other words Hessian approximation B is chosen to satisfy:
In this paper you can find more about it.
Ananth Ranganathan: The Levenberg-Marquardt Algorithm
Can both Naive Bayes and Logistic regression classify both of these dataset perfectly ? My understanding is that Naive Bayes can , and Logistic regression with complex terms can classify these datasets. Please help if I am wrong.
Image of datasets is here:
Lets run both algorithms on two similar datasets to the ones you posted and see what happens...
EDIT The previous answer I posted was incorrect. I forgot to account for the variance in Gaussian Naive Bayes. (The previous solution was for naive bayes using Gaussians with fixed, identity covariance, which gives a linear decision boundary).
It turns out that LR fails at the circular dataset while NB could succeed.
Both methods succeed at the rectangular dataset.
The LR decision boundary is linear while the NB boundary is quadratic (the boundary between two axis-aligned Gaussians with different covariances).
Applying NB the circular dataset gives two means in roughly the same position, but with different variances, leading to a roughly circular decision boundary - as the radius increases, the probability of the higher variance Gaussian increases compared to that of the lower variance Gaussian. In this case, many of the inner points on the inner circle are incorrectly classified.
The two plots below show a gaussian NB solution with fixed variance.
In the plots below, the contours represent probability contours of the NB solution.
This gaussian NB solution also learns the variances of individual parameters, leading to an axis-aligned covariance in the solution.
Naive Bayes/Logistic Regression can get the second (right) of these two pictures, in principle, because there's a linear decision boundary that perfectly separates.
If you used a continuous version of Naive Bayes with class-conditional Normal distributions on the features, you could separate because the variance of the red class is greater than that of the blue, so your decision boundary would be circular. You'd end up with distributions for the two classes which had the same mean (the centre point of the two rings) but where the variance of the features conditioned on the red class would be greater than that of the features conditioned on the blue class, leading to a circular decision boundary somewhere in the margin. This is a non-linear classifier, though.
You could get the same effect with histogram binning of the feature spaces, so long as the histograms' widths were narrow enough. In this case both logistic regression and Naive Bayes will work, based on histogram-like feature vectors.
How would you use Naive Bayes on these data sets?
In the usual form, Naive Bayes needs binary / categorial data.
In the original paper of HOG (Histogram of Oriented Gradients) http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf there are some images, which shows the hog representation of an image (Figure 6).In this figure the f, g part says "HOG descriptor weighted by respectively the positive and the negative SVM weights".
I don't understand what does this mean. I understand that when I train a SVM, I get a Weigth vector, and to classify, I have to use the features (HOG descriptors) as the input of the function. So what do they mean by positive and negative weigths? And how would I plot them like the paper?
Thanks in Advance.
The weights tell you how significant a specific element of the feature vector is for a given class. That means that if you see a high value in your feature vector you can lookup the corresponding weight
If the weight is a high positiv number it's more likely that your object is of the class
If your weight is a high negative number it's more likely that your object is NOT of the class
If your weight is close to zero this position is mostly irrelavant for the classification
Now your using those weights to scale the feature vector you have where the length of the gradients are mapped to the color-intensity. Because you can't display negative color intensities they decided to split the positive and negative visualization. In the visualizations you can now see which parts of the input-image contributes to the class (positiv) and which don't (negative).