What is a "label" in Caffe? - machine-learning

In Caffe when you are defining your inputs for the NN in the protobuf file, you can input "data" and "label". I'm guessing label contains the expected output for training data (what it is normally considered the y values in Machine Learning literature).
My problem is that in the caffe.proto file, label is defined as a scalar (int or long). At least with data, I can set it to an numpy array, because it takes String values. If I'm training for more than one prediction output, how could I pass it as an array?
Or am I mistaken? What is label? What is it for? And how can I pass the y values to caffe?

The basic use case of caffe used to be image classification: assigning a single integer label per input image. Thus, the "datum" data structure reserves space for a 4D float array (batches of 3 channels images) and an integer "label" per image in the batch.
This restriction can be easily overcome using HDF5 input data layer.
See e.g., this answer.

Related

What is Sequence length in LSTM?

The dimensions for the input data for LSTM are [Batch Size, Sequence Length, Input Dimension] in tensorflow.
What is the meaning of Sequence Length & Input Dimension ?
How do we assign the values to them if my input data is of the form :
[[[1.23] [2.24] [5.68] [9.54] [6.90] [7.74] [3.26]]] ?
LSTMs are a subclass of recurrent neural networks. Recurrent neural nets are by definition applied on sequential data, which without loss of generality means data samples that change over a time axis. A full history of a data sample is then described by the sample values over a finite time window, i.e. if your data live in an N-dimensional space and evolve over t-time steps, your input representation must be of shape (num_samples, t, N).
Your data does not fit the above description. I assume, however, that this representation means you have a scalar value x which evolves over 7 time instances, such that x[0] = 1.23, x[1] = 2.24, etc.
If that is the case, you need to reshape your input such that instead of a list of 7 elements, you have an array of shape (7,1). Then, your full data can be described by a 3rd order tensor of shape (num_samples, 7, 1) which can be accepted by a LSTM.
Simply put seq_len is number of time steps that will be inputted into LSTM network, Let's understand this by example...
Suppose you are doing a sentiment classification using LSTM.
Your input sentence to the network is =["I hate to eat apples"]. Every single token would be fed as input at each timestep, So accordingly here the seq_Len would total number of tokens in a sentence that is 5.
Coming to the input_dim you might know we can't directly feed words to the netowrk you would need to encode those words into numbers. In Pytorch/tensorflow embedding layers are used where we have to specify embedding dimension.
Suppose your embedding dimension is 50 that means that embedding layer will take index of respective token and convert it into vector representation of size 50. So the input dim to LSTM network would become 50.

cntk.input_variable() v.s. cntk.sequence.input_variable()

what is the difference between cntk.input_variable() and cntk.sequence.input_variable()? I want to create some placeholders for the inputs and output, but not sure which one I should use? PS. my CNTK is version 2.4.
cntk.input_variable creates an input variable that only has a single dynamic axis (the batch axis). Values passed to this type of input variable should be samples (or minibatches of samples), not sequences. cntk.sequence.input_variable creates an input variable with two dynamic axes, a sequence axis and a batch axis. The expected input value for this type of input is an entire sequence (of an arbitrary length).

Input of LSTM seq2seq network - Tensorflow

Using the Tensorflow seq2seq tutorial code I am creating a character-based chatbot. I don't use word embeddings. I have an array of characters (the alphabet and some punctuation marks) and special symbols like the GO, EOS and UNK symbol.
Because I'm not using word embeddings, I use the standard tf.nn.seq2seq.basic_rnn_seq2seq() seq2seq model. However, I am confused about what shape encoder_inputs and decoder_inputs should have. Should they be an array of integers, corresponding to the index of the characters in the alphabet-array, or should I turn those integers into one-hot vectors first?
How many input nodes does one LSTM cell have? Can you specify that? Because I guess in my case an LSTM cell should have an input neuron for each letter in the alphabet (therefore the one-hot vectors?).
Also, what is the LSTM "size" you have to pass in the constructor tf.nn.rnn_cell.BasicLSTMCell(size)?
Thank you.
Appendix: these are the bugs I am trying to fix.
When I use the following code, according to the tutorial:
for i in xrange(buckets[-1][0]): # Last bucket is the biggest one.
self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i)))
for i in xrange(buckets[-1][1] + 1):
self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i)))
self.target_weights.append(tf.placeholder(dtype, shape=[None], name="weight{0}".format(i)))
And run the self_test() function, I get the error:
ValueError: Linear is expecting 2D arguments: [[None], [None, 32]]
Then, when I change the shapes in the above code to shape=[None, 32] I get this error:
TypeError: Expected int32, got -0.21650635094610965 of type 'float' instead.
The number of inputs of an lstm cell is the dimension of whatever tensor you pass as inputs to the tf.rnn function when instantiating things.
The size argument is the number of hidden units in your lstm (so a bigger number is slower but can lead to more accurate models).
I'd need a bigger stack trace to understand these errors.
It turns out the size argument passed to BasicLSTMCell represents both the size of the hidden state of the LSTM and the size of the input layer. So if you want a different hidden size than input size, you can first propagate your inputs through an additional projection layer or use the built-in seq2seq word embeddings function.

Image classification with Sift features and Knn?

Can you help me waith Image classification using SIFT feature?
I want to classify images based on SIFT features:
Given a training set of images, extract SIFT from them
Compute K-Means over the entire set of SIFTs extracted form the
training set. the "K" parameter (the number of clusters) depends on
the number of SIFTs that you have for training, but usually is around
500->8000 (the higher, the better).
Now you have obtained K cluster centers.
You can compute the descriptor of an image by assigning each SIFT of
the image to one of the K clusters. In this way you obtain a
histogram of length K.
I have 130 images in training set so my training set 130*K
dimensional
I want to classify my test images ı have 1 images so my sample is 1*k
dimensional. I wrote this code knnclassify(sample,training
set,group).
I want to classify to 7 group. So, knnclassify(sample(1*10),trainingset(130*10),group(7*1))
The error is: The length of GROUP must equal the number of rows in TRAINING. What can I do?
Straight from the docs:
CLASS = knnclassify(SAMPLE,TRAINING,GROUP) classifies each row of the
data in SAMPLE into one of the groups in TRAINING using the nearest-
neighbor method. SAMPLE and TRAINING must be matrices with the same
number of columns. GROUP is a grouping variable for TRAINING. Its
unique values define groups, and each element defines the group to
which the corresponding row of TRAINING belongs. GROUP can be a
numeric vector, a string array, or a cell array of strings. TRAINING
and GROUP must have the same number of rows.
What this means, is that group should be 130x1, and should indicate which group each of the training samples belong to. unique(group) should return 7 values in your case - the seven categories represented in your training set.
If you don't already have a group vector which specifies which categories which image falls into, you could use kmeans to split your training set into 7 groups:
group = kmeans(trainingset,7);
knnclassify(sample, trainingset, group);

how to predict using scikit?

I have trained an estimator, called clf, using fit method and save the model to disk. The next time to run the program , which will load clf from disk.
my problem is :
how to predict a sample which saved on disk? I mean, how to load it and predict?
how to get the sample label instead of label integer after predict?
how to predict a sample which saved on disk? I mean, how to load it and predict?
You have to use the same array representation for the new samples as the one used for the samples passed to fit method. If you want to predict a single sample, the input must be a 2D numpy array with shape (1, n_features).
The way to read your original file on the HDD and convert it to a numpy array representation suitable for classifier is a domain specific issue: it depends whether you are trying to classify text files, jpeg files, frames in a video file, rows in database, log lines for syslog monitored services...
how to get the sample label instead of label integer after predict?
Just keep a list of label names and ensure that the integer used as target values when fitting are in the range [0, n_classes). For instance ['spam', 'ham'], if you have predictions in the range [0, 1] then you can do:
new_samples = # 2D array with shape (n_samples, n_features)
label_names = ['ham', 'spam']
predictions = [label_names[pred] for pred in clf.predict(new_samples)]

Resources