I would like to know if there is a change in the LRN layer implementation in Tensor flow from Caffe implementation.
From the link http://caffe.berkeleyvision.org/tutorial/layers/lrn.html i see that the formula is (1+alpha/n*(sum_of_square))**beta. But in the Tensor Flow documentation here
https://www.tensorflow.org/api_docs/python/tf/nn/local_response_normalization
division with n is missing (i.e) sum_of_square value is not divided by n here.
The documentation is missing this part or it's the implementation itself that has been changed in Tensor Flow.
Thanks in advance !
Related
I'm trying to implement Deep Mind's DNC - Nature paper- with PyTorch 0.4.0.
When implementing the variant of LSTM they used I encountered some troubles with dimensions.
To simplify suppose BATCH=1.
The equations they list in the paper are these:
where [x;h] means a concatenation of x and h into one single vector, and i, f and o are column vectors.
My question is about how the state s_t is computed.
The second addendum is obtained by multiplying i with a column vector and so the result is either a scalar (transpose i first, then do scalar product) or wrong (two column vectors multiplied).
So the state results in a single scalar...
With the same reasoning the hidden state h_t is a scalar too, but it has to be a column vector.
Obviously I'm wrong somewhere, but I can't figure out where.
By looking at Wikipedia LSTM Article I think I figured it out.
This is the formal implementation of standard LSTM found in the article:
The circle represents element-by-element product.
By using this product in the corresponding parts of DNC equations (s_t and o_t) the dimensions work.
I'm trying to define custom loss function for Caffe using Python layer but I can't clarify what is a required output.
Let's a function for the layer is defined as L = sum(F(xi, yi))/batch_size, where L is loss function to be minimized (i.e. top[0]), x is a network output (bottom[0]), y is ground truth label (i.e. bottom[1]) and xi,yi are i-th samples in a batch.
Widely known example with EuclideanLossLayer (https://github.com/BVLC/caffe/blob/master/examples/pycaffe/layers/pyloss.py) shows that backward level in this case must return bottom[0].diff[i] = dL(x,y)/dxi. Another reference I've found shows the same: Implement Bhattacharyya loss function using python layer Caffe
But in other examples I have seen that it should be multiplied by top[0].diff.
1. What is correct? bottom[0][i] = dL/dx or bottom[0].diff[i] = dL/dxi*top[0].diff[i]
Each loss layer may have loss_weight: indicating the "importance" of this specific loss (in case there are several loss layers for the net). Caffe implements this weight as top[0].diff to be multiplied by the gradients.
Let's back off to basic principles: the purpose of back-propagation is to adjust the layer weights according to the ground-truth feedback. The most basic parts of this include "how far off is my current guess" and "how hard should I yank the change lever?" These are formalized as top.diff and learning_rate, respectively.
At a micro level, the ground truth for each layer is that top feedback, so top.diff is the local avatar of "how far off ...". Thus at some point, you need to include top[0].diff as a primary factor in your adjustment computation.
I know this isn't a complete, direct answer -- but I hope it continues to help even after you solve the immediate problem.
I am using pylearn2 library to design a CNN. I want to use Leaky ReLus as the activation function in one layer. Is there any possible way to do this using pylearn2? Do I have to write a custom function for it or does pylearn2 have inbuilt funtions for tha? If so, how to write a custom code? Please can anyone help me out here?
ConvElemwise super-class is a generic convolutional elemwise layer. Among its subclasses ConvRectifiedLinear is a convolutional rectified linear layer that uses RectifierConvNonlinearity class.
In the apply() method:
p = linear_response * (linear_response > 0.) + self.left_slope *\
linear_response * (linear_response < 0.)
As this gentle review points out:
... Maxout neuron (introduced recently by Goodfellow et al.) that generalizes the ReLU and its leaky version.
Examples are MaxoutLocalC01B or MaxoutConvC01B.
The reason for lack of answer in pylearn2-user may be that pylearn2 is mostly written by researches at LISA lab and, thus, the threshold for point 13 in FAQ may be high.
I'm trying to use the PCA class in OpenCv to perform the principal component analysis operation in my C++ application . I'm new to OpenCV and I'm having a problem So I wish if someone could help.
I'm trying a demo Example on both Matlab and the PCA class to check the answers
when I'm using 2*10 data array, and the parameter (CV_PCA_DATA_AS_COL), here I'm having two dimensions so I'm expecting to have 2 Eigenvectors each has 2 elements, and this worked fine as expected with the same results as Matlab.
But while using 10*2 data array (generally when number of samples is less than number of dimension), I get (2*10) array of eiegnvectors. I.e: 10 eigenvectors with 2 elements each. This is not expected and it's not the result given by Matlab (Matlab give 10*10 matrix of eigenvectors).
I don't know why I'm having those results and due this I can't project the Data on principal components in my application, any help?
P.S : The code I used :
Mat Mean ;
Mat H(10, 2, CV_32F); // then the matrix is filled by data
PCA pca(H,Mean,CV_PCA_DATA_AS_COL,0) ;
pca.operator()(H,Mean,CV_PCA_DATA_AS_COL,0) ;
cout<<pca.eigenvectors.rows // gives 2 instead of 10
cout<<pca.eigenvectors.cols // gives 10
I'd state it as follows:
If the number of samples is less than the data dimension then the number of retained components will be clamped at the number of samples.
We did 3x3 PCA for mechanics subject at uni, also some non-linear control algorithms used similar approaches - my memory is foggy, but it may have something to do with assumptions regarding psuedo-inverses and non-square matrices...
Once you delve into the theory - websearch 'pca with less samples than dimensions' - it gets messy fast!
I just want to clarify something about PCA in OpenCV. Suppose, I have two rows of data (A, B).
A 3 8 7
B 2 4 5
If I wanted to create a PCA model in OpenCV, what must I do to the data? Do I have to subtract the means (e.g. subtract the mean of A from its data points) or does the PCA function do this?
Someone said that OpenCV PCA expects the data to be normalised (between 0 and 1). If so, how do I normalise?
Hope someone can clarify this for me as PCA in OpenCV is very badly documented on the Net.
Cheers...
The data for PCA in OpenCV needs not to be normalized. But if you already have the mean (from some previuos calculations), you can send it to the PCACompute() function to speed it up.
OpenCV refman:
PCACompute(data[, mean[, eigenvectors[, maxComponents ]]]) !mean, eigenvectors
Parameters
data – Input samples stored as the matrix rows or as the matrix columns.
mean – Optional mean value. If the matrix is empty ( noArray() ), the mean is computed
from the data.
There is a good article on data normalization on Wikipedia.
For complete documentation check out the opencv.pdf file that should be in the doc/ folder of your instalation. On some versions it is named opencv2refman.pdf
And also try to find the book "Learning OpenCV", by Gary Bradsky, it's more than well exlained.