Caffe only trains for one label - machine-learning

This is really weird. I'm implementing this model:
Except that I read data from a text file using an ImageData blob, batch_size: 1. There are only two labels and the text file is organized as usual
/home/.../pathToFile 0
...
/home/.../pathToFile 1
Still, Caffe only trains and tests label 0!
I run caffe using the regular tool.
./build/tools/caffe train --solver=solver.prototxt
When I open the net in pycaffe I get this message for the first time ever:
WARNING: Logging before InitGoogleLogging() is written to STDERR
and the size of the
net.blobs['label'].data
is now 1, when it should be 2!
Not only that but that label seems to be a float rather than an integer.
In: net.blobs['label'].data
Out: array([ 0.], dtype=float32)
I know that this has worked before, I just can't get my head around what I'm doing wrong or where to begin troubleshoot.

The output shape of your network depends on the input batch_size: if you define batch_size: 1 than your net processes a single example each time, thus it only reads a single label. If you change batch_size to 2, caffe will read two samples and consequently the shape of label will become 2.
One exception to this "shape rule" is the loss output: the loss defines a scalar function with respect to which gradients are computed. Thus, the loss output will always be a scalar regardless of the input shape.
Regarding the data type of label: Caffe stores all variables in "Blobs" of type float32.

Related

How to interpret patterns occuring in convolutional layer after training?

I am pretty sure I understood' the principle of cnn and why they are prefered over just fully connected neural networks. What I try to comprehend is how to interpret the occuring patterns after training the model.
So let's assume I want to recognize the number "1" written on an 256x256 big image-plane (only 1 bit image, black/white) that is then forwared to the output that either says "is a one", or "is not a one".
If the model is untrained and the first handwritten "1" is forwared, the result could be "[0.28, 0.72] which is obiously wrong. I then calculate the error between [0.28, 0.72] and [1, 0] (for example based on the mean squared error), derive it and try to find the local minimas of the derivative (backpropagation). Then I calculate the delta values for each weight (by using chainrule and partial derivative) until I finally reach the convolutional layer for which delta values are also calculated.
But my question now is: What exactly do the patterns that will occur by adding up bunch of delta values to the convolutional layer "weights" mean? Why do they find certain features characteristical for the number "1"? Or is it more like, it does not find any specific features per se, but rather it "encodes" the relationship between handwritten "1"s and the desired output [1, 0] into the convolutional layers?
_

RNN Encoder Decoder using keras

I am trying to build an architecture which will be used for machine language translation (from English to French)
model = Sequential()
model.add(LSTM(256, input_shape =(15,1)))
model.add(RepeatVector(output_sequence_length))
model.add(LSTM(21,return_sequences=True))
model.add(TimeDistributed(Dense(french_vocab_size, activation='sigmoid'))
Max length of English sentence is 15 and that of French is 21. Max number of English words is 199 and that of French is 399. output_sequence_length is 21.
This model throws me an error
Error when checking input: expected lstm_40_input to have shape (None, 15, 1) but got array with shape (137861, 21, 1)
I am stuck with the understanding of the LSTM in keras.
1.The first argument according to documentation must be 'dimensionality of output space'. I did not understand what that means. Also,
what exactly happens return_sequences is set to True
Please let me know.
What Kind of data are you trying to feed your network ? Because it seems to me that you didn't convert your words to vectors (binary vectors or encoded vectors).
Anyway, a LSTM Netword need a 3 dimensional entry, the dimensions correspond to that : (samples , timesteps , features).
In your case, samples correspond to the numebr of your sentences, I guess 137861. Timesteps correspond to the length of each sequence, which In your case is 15, and features is the size of each encoded word ( Depending on which type of encoding you choose. If you choose OneHotEncoding, it will be 199).
The error that you got shows that you fed your network sequences with 21 timesteps instead of 15.
For your second question, when return_sequences is set to False, it returns only one output per LSTM layer, which in your case will be (256, ) for your first LSTM layer. when it's set to True, it will have one output per timestep, giving you an overall output of shape (15 , 256). When you want to stack two or more LSTM layers, you always have to set the first layers to return_sequences = True.
Also, what you are building is called a Many to Many architecture, with different timestep lengths for the input and the output (15 vs 21). As far as I know, it's not that easy to implement in keras.

Sampled softmax loss over variable sequence batches?

Background info: I'm working on sequence-to-sequence models, and right now my model accepts variable-length input tensors (not lists) with input shapes corresponding to [batch size, sequence length]. However, in my implementation, sequence length is unspecified (set to None) to allow for variable length inputs. Specifically, input sequence batches are padded only to the length of the longest sequence in that batch. This has sped up my training time considerably, so I'd prefer to keep it this way, as opposed to going back to bucketed models and/or padded all sequences in the training data to the same length. I'm using TensorFlow 1.0.0.
Problem: I'm currently using the following to compute the loss (which runs just fine).
loss = tf.losses.sparse_softmax_cross_entropy(
weights=target_labels, # shape: [batch size, None]
logits=outputs[:, :-1, :], # shape: [batch size, None, vocab size]
weights=target_weights[:, :-1]) # shape: [batch size, None]
where vocab size is typically about 40,000. I'd like to use a sampled softmax, but I've ran into an issue that's due to the unspecified nature of the input shape. According to the documentation for tf.nn.sampled_softmax_loss, it requires the inputs to be fed separately for each timestep. However, I can't call, for example,
tf.unstack(target_labels, axis=1)
since the axis is unknown beforehand.Does anyone know how I might go about implementing this? One would assume that since both dynamic_rnn and tf.losses.sparse_softmax_cross_entropy seem to have no issue doing this, that a workaround could be implemented with the sampled softmax loss somehow. After digging around in the source code and even models repository, I've come up empty handed. Any help/suggestions would be greatly appreciated.

When I run my own images on caffe , it stop at Iteration 0, Testing net (#0)

I ran caffe and got this output:
who can tell me what is the problem?
I will really appreciate!!
It seems like one (or more) of your label values are invalid, see this PR for information:
If you have an invalid ground truth label, "SoftmaxWithLoss" will silently access invalid memory [...] The old check only worked in DEBUG mode and also only worked for CPU.
Make sure your prediction vector length matches the number of labels you try to predict.
From your comments, it seems like you have labels in the range 0..10575, but on the other hand, your classification layer, "fc7" only predicts probabilities for 1000 classes. Thus, "SoftmaxWithLoss" layer tries to compute the loss for predicting label l>1000, and access memory outside the probability array, resulting with a segmentation fault.

How to generate the predicted label in caffe from the output of the last layer?

I have trained my own dataset of images (traffic light images 11x27) with LeNet, using caffe and DIGITS interface. I get 99% accuracy and when I give new images via DIGITS, it predicts the good label, so the network seems to work very well.
However, I struggle to predict the labels through Python/Matlab API for caffe. The last layer output (ip2) is a vector with 2 elements (I have 2 classes), which looks like [4.8060, -5.2608] for example (the first component is always positive, the second always negative and the absolute values range from 4 to 20). I know it from many tests in Python, Matlab and DIGITS.
My problem is :
Argmax can't work directly on this layer (it always gives 0)
If I use a softmax function, it will always give me [1, 0] (and that's actually the value of net.blobs['prob'] or out['prob'] in the python interface, no matter the class of my image)
So, how can I get the good label predicted ?
Thanks!

Resources