How to configure a convolution network that maps sentences to a label? - deeplearning4j

I'd like to train a model which maps a sentence to a label (e.g. "Canon EOS 77D DSLR Camera" maps to a label "Digital Camera").
I understand that strings need to be converted to a vector first. I found an example of word2vec which does this.
I then found a separate example on how to build a convolution network.
That said, I don't understand how to put it all together. Given:
A text file containing: sentence,label
A word2vec trained against all sentences, labels
How do I parse the text file into vectors (taken from word2vec) and pass it into a convolution network for training?

Answering my own question:
Convert sentences, labels to vectors using Word2Vec: [example]
Use CnnSentenceDataSetIterator to feed training/test data into a convolution network using the aforementioned Word2Vec: [example]
There is also an example using ParagraphVectorsClassifier that does this without a convolution network.


Time series Autoencoder with Metadata

At the moment I'm trying to build an Autoencoder for detecting anomalies in time series data.
My approach is based on this tutorial:
But as often, my data is more complex then this simple tutorial.
I have two different time series, from two sensors and some metadata, like from which machine the time series was recorded.
with a normal MLP network you could have one network for the time series and one for the metadata and merge them in higher layers. But how can you use this data as an input to an Autoencoder?
Do you have any ideas, links to tutorials or papers I didn't found?
in this tutorial you can see a LSTM-VAE where the input time series is somehow concatenated with categorical data:
There is an article explayining the code (but not on detail). There you can find the following explanation of the model:
"The encoder consists of an LSTM cell. It receives as input 3D sequences resulting from the concatenation of the raw traffic data and the embeddings of categorical features. As in every encoder in a VAE architecture, it produces a 2D output that is used to approximate the mean and the variance of the latent distribution. The decoder samples from the 2D latent distribution upsampling to form 3D sequences. The generated sequences are then concatenated back with the original categorical embeddings which are passed through an LSTM cell to reconstruct the original traffic sequences."
But sadly I don't understand exactly how they concatenate the input datas. If you understand it it would be nice if you could explain it =)
I think I understood it. you have to take a look at the input of the .fit() funktion. It is not one array, but there are seperate arrays for seperate categorical datas. additionaly there is the original input (in this case a time series). Because he has so many arrays in the input, he needs to have a corresponding number of input layers. So there is one Input layer for the Timeseries, another for the same time series (It's an autoencoder so x_train works like y_train) and a list of input layers, directly stacked with the embedding layers for the categorical data. after he has all the data in the corresponding Input layers he can concatenate them as you said.
by the way, he's using the same list for the decoder to give him additional information. I tried it out and it turns out that it was helpfull to add a dropout layer (high dropout e.g. 0.6) between the additional inputs and the decoder. If you do so, the decoder has to learn from the latent z and not only from the additional data!
hope I could help you =)

What is difference between Features embedding and Image Features?

I have to calculate the similarity between 2 images, and I was guided to use feature embeddings of images extracted by Auto-encoders rather than Features extracted by CNN.
Can I know what is exact difference why Feature embeddings & why it can be used to calculate similarity but not image features extracted by CNN?
I have a high-level idea of Image features, that it is a generated data by running a single Foward prop on a pre-trained network (N-1)th layer, not the prediction layer(softmax or sigmoid).
And I know word embedding that projecting a dimension of a given word into more convenient feature dimensional space.
But what is the intuition of embeddings in Image?
When to use one over another ?

Character-Word Embeddings from lm_1b in Keras

I would like to use some pre-trained word embeddings in a Keras NN model, which have been published by Google in a very well known article. They have provided the code to train a new model, as well as the embeddings here.
However, it is not clear from the documentation how to retrieve an embedding vector from a given string of characters (word) from a simple python function call. Much of the documentation seems to center on dumping vectors to a file for an entire sentence presumably for sentimental analysis.
So far, I have seen that you can feed in pretrained embeddings with the following syntax:
embedding_layer = Embedding(number_of_words??,
However, converting the different files and their structures to pre_trained_matrix_here is not quite clear to me.
They have several softmax outputs, so I am uncertain which one would belong - and furthermore how to align the words in my input to the dictionary of words for which they have.
Is there a simple manner to use these word/char embeddings in keras and/or to construct the character/word embedding portion of the model in keras such that further layers may be added for other NLP tasks?
The Embedding layer only picks up embeddings (columns of the weight matrix) for integer indices of input words, it does not know anything about the strings. This means you need to first convert your input sequence of words to a sequence of indices using the same vocabulary as was used in the model you take the embeddings from.
For NLP applications that are related to word or text encoding I would use CountVectorizer or TfidfVectorizer. Both are announced and described in a brief way for Python in the following reference:
CounterVectorizer can be used for simple application as a SPAM-HAM detector, while TfidfVectorizer gives a deeper insight of how relevant are each term (word) in terms of their frequency in the document and the number of documents in which appears this result in an interesting metric of how discriminant are the terms considered. This text feature extractors may consider a stop-word removal and lemmatization to boost features representations.

Using Bag of words/features and neural network

I'm trying to implement an object detection module which contains the following steps:
1) extract image descriptors with SURF, creating a matrix of size [x, 64], where x depends of the number of keypoints found in the image;
2) fix the descriptor size to a [k,64] format using bag of features/words approach. Where k is the number of clusters created using k-means.
3) feed a neural network using the resulting bag of words matrix as trainingSamples.
So far I've implemented steps 1 and 2 but I'm not quite sure how to format the output vector of the NN. On OpenCV CvANN_MLP, the number of rows in the output vector should have the same number of the input rows (otherwise returns an what() exception), but the number of input rows are the number of the k clusters on step 2, so I'm not understanding how to write the output matrix based on that.
I know the output matrix should have n columns corresponding to the number of classes in the output that I want (e.g. 3 classes: cat, dog and bird will result on a matrix with 3 columns), but how do I organize the rows of this matrix based on the input rows? I read this related post , it uses matlab and it says that each feature should be a row, but I'm not sure how to do this on OpenCV C++.
If anyone has any idea/tips of how to proceed with that, it would be very appreciated.
Have you done this:
However, before you train your neural network, as you suspected, you
must represent every image you wish to train with this feature vector.
Before feeding your neural network? I lack in experience in using neural networks, however after reading this and your question, it seems that you are trying to feed the bag-of-words clusters to your neural network, which is incorrect.

Embeddings with recurrent neural networks

I am working on a research project on text data (it's about search engine queries supervised classification). I have already implemented different methods and I have also used different models for the text (such as binary vectors of the dimention of my vocabulary - 1 if the i-th word appears in the text, 0 otherwise - or words embedding with the model word2vec).
My advisor told me that maybe we could find another representation of the queries using Recurrent Neural Network. This representation should keep into account the sequentiality of the words in the text thanks to the recurrence relation. I have read some documentation about RNN but I haven't find anything useful for this goal. I have read lot of things about language modelling (which predict probabilities of the words), but I don't understand how I could adapt this model in order to obtain something like an embedded vector.
Thank you very much!
Usually, if one wants to obtain embeddings from a query or a sentence exploiting RNN, the logits are used. The logits are simply the output values of the network after the forward pass of the full sentence/query.
The logit values produce a vector that has the dimensions of the output layer (i.e. number of the target classes): usually, it is the vocabulary, since they are extracted from a language model.
For hints have a look at these:
How does word2vec give one hot word vector from the embedding vector?
Note that in principle one could use also use bidirectional networks or networks trained on other tasks, obtaining smaller embeddings, even if this last option is kind of fancy and it has not been explored up to my knowledge.
