convolutional neural networks for sentiment analysis - machine-learning

I was trying to modify YoonKim's code for sentiment analysis using CNN's. He applies three filters of heights=[3,4,5] and width=300 on
input=(batch_size, 1, len(sentence_vector), len(wordVector))
I'm stuck after the first Conv,Pool computation. Consider
input=(batch_size, 1, 64, 300)
64 is the length of every sentence vector and 300 is the word embedding size.
map=(20, 1, 3, 300)
In his implementation, he first applies a kernel of height=3 and width=300. Hence the output would be
convolution_output=(batch_size, 20, 62, 1)
After which he downsamples using poolsize=(62, 1). The output after MaxPooling becomes
maxpool_output=(batch_size, 20, 1, 1)
This is where i'm stuck.
In the paper he applies 3 filters of heights[3,4,5] and width=300. But after applying the first filter, there is no input left for convolution. How(and on what) do I apply the second kernel?.
Any help or suggestions would be great. The git page contains a link to the paper.

Related

From an input vector of parabolic shape values, predict a scalar value using machine learning

I was wondering if you could train a neural network model where from a vector a parabolic shape values you could predict a scalar value.
For example :
let's say the input vector is [5, 10, 15, 20, 22, 25, 22, 15, 10, 5] then the ouput should be 23.
And to train it I just give the model lots of input vector like the one in the example and the values that should be returned for each one of these vectors.
I looked it up on the internet but didn't find anything that was matching my case but I'm a newbie in this so maybe I just don't understand certain algorithms.

TensorFlow Concat

I am trying to rebuild the 3D U-Net in this paper:
https://arxiv.org/pdf/1606.06650.pdf
And unfortunately, when I get to the first merge, I get the following error from Keras:
ValueError: "concat" mode can only merge layers with matching output shapes except for the concat axis. Layer shapes: [(None, 512, 14, 8, 10), (None, 256, 15, 8, 10)]
I understand that based on this thread:
https://github.com/fchollet/keras/issues/633
The following is true:
the concat axis is the axis along which to concatenate the two tensors.
Lets say you have two three-dimensional tensors of shape (2,3,5) and (2,3,7). Then you can only concatenate them along the third (zero based index: 2) axis, because then – figuratively – the two "faces" of the "cuboid" that you "glue together" are each 2-by-3 and only those fit. So you need to set concat_axis = 2 (or -1, since it is the last one) resulting in a new tensor (2,3,12).
Typically in a NN you would merge along the axis of the features, which depends on the type of layers you use and the implementation in keras. If you are not sure you can try out a few, most likely only one will work for the reason given above. If the figurative "faces don't fit" you will get an error message like the one in my opening post."
Which means I should be merging on the 14 and 15, which are axis=0, correct?
Can someone help explain what I am missing in this setup?
Thanks!

How to perform max pooling on a 1-dimensional ConvNet (conv1d) in TensowFlow?

I'm training a convolutional neural network on text (on the character level) and I want to do max-pooling. tf.nn.max_pool expects a rank 4 Tensor, but 1-d convnets are rank 3 in tensorflow ([batch, width, depth]), so when I pass the output of conv1d to the max pool function, this is the error:
ValueError: Shape (1, 144, 512) must have rank 4
I'm new to tensorflow and deep learning frameworks in general and would like advice on the best practice here, because I can imagine there are multiple workarounds. How can I perform max-pooling in the 1-d case?
Thanks.
A quick way would be to add an extra singleton dimension i.e. make the shape (1, 1, 144, 512), from there you can reduce it back with tf.squeeze.
I'm curious about other approaches though.

Artificial Neural Network Training and Testing

I'm trying to create an ANN that will solve a simple classification problem the example I am using is a degree classification so the input will be a percentage between 0-100 and the output will be one of five (1st, 2:1, 2:2...).
Currently I have set up a neural network with three layers, 1 input neuron, 3 hidden neurons and 5 output neurons, I have managed to train the network using one input e.g. 60 and the output (1,0,0,0,0). I am unsure though how i would go about properly training the network for each input and output combination so that after training I would be able to input the percentage and the correct output neuron would be the number closest to 1.
The network uses standard feed forward and back propagation algorithms, random weights and the Sigmoid function.
I have a file which I was thinking would work with inputs 0-100 with the outputs inbetween:
0
1, 0, 0, 0, 0
1
1, 0, 0, 0, 0
.....
40
0, 1, 0, 0, 0
....
100
0, 0, 0, 0, 1
Thanks
I don't quite understand the function you are trying to learn, but it doesn't matter. The usual way to train an ANN is to use SGD (stochastic gradient descent) where backpropagation is used to compute the gradient for each example at a time. You just repeat looping this over all input examples until it has learned those examples.
One thing you didn't mention is that you need a loss function. In your case, a simple mean squared error might be appropriate.
I suggest that you take a look at the classifer.py python script used for classification at this link - http://www.marekrei.com/blog/theano-tutorial/
The complete code for the above tutorial is available at this link -
https://github.com/marekrei/theano-tutorial
The classifier script at above link is meant for predicting whether the GDP per capita for a country is more than the average GDP. I, however, used the script for different kind of dataset.
I was able to successfully train neural networks in Theano using the above classifier script for classifying a speech sound as the letter "A" or "E".

How to evaluate predictions from incomplete data, where not all data is incomplete

I am using Non-negative Matrix Factorization and Non-negative Least Squares for predictions, and I want to evaluate how good the predictions are depending on the amount of data given. For example the original Data was
original = [1, 1, 0, 1, 1, 0]
And now I want to see how good I can reconstruct the original data when the given data is incomplete:
incomplete1 = [1, 1, 0, 1, 0, 0],
incomplete2 = [1, 1, 0, 0, 0, 0],
incomplete3 = [1, 0, 0, 0, 0, 0]
And I want to do this for every example in a big dataset. Now the problem is, the original data varies in the amount of positive data, in the original above there are 4, but for other examples in the dataset it could be more or less. Let´s say I make an evaluation round with 4 positives given, but half of my dataset only has 4 positives, the other half has 5,6 or 7. Should I exclude the half with 4 positives, because they have no data missing which makes the "prediction" much better? On the other side I would change the trainingset if I excluded data. What can I do? Or shouldn´t I evaluate with 4 at all in this case?
EDIT:
Basically I want to see how good I can reconstruct the input matrix. For simplicity, say the "original" stands for a user who watched 4 movies. And then I want to know how good I can predict each user, based on just 1 movie that the user acually watched. I get a prediction for lots of movies. Then I plot a ROC and Precision-Recall curve (using top-k of the prediction). And I will repeat all of this with n movies that the users actually watched. I will get a ROC curve in my plot for every n. When I come to the point where I use e.g. 4 movies that the user actually watched, to predict all movies he watched, but he only watched those 4, the results get too good.
The reason why I am doing this is to see how many "watched movies" my system needs to make reasonable predictions. If it would return only good results when there are already 3 movies watched, It would not be so good in my application.
I think it's first important to be clear what you are trying to measure, and what your input is.
Are you really measuring ability to reconstruct the input matrix? In collaborative filtering, the input matrix itself is, by nature, very incomplete. The whole job of the recommender is to fill in some blanks. If it perfectly reconstructed the input, it would give no answers. Usually, your evaluation metric is something quite different from this when using NNMF for collaborative filtering.
FWIW I am commercializing exactly this -- CF based on matrix factorization -- as Myrrix. It is based on my work in Mahout. You can read the docs about some rudimentary support for tests like Area under curve (AUC) in the product already.
Is "original" here an example of one row, perhaps for one user, in your input matrix? When you talk about half, and excluding, what training/test split are you referring to? splitting each user, or taking a subset across users? Because you seem to be talking about measuring reconstruction error, but that doesn't require excluding anything. You just multiply your matrix factors back together and see how close they are to the input. "Close" means low L2 / Frobenius norm.
But for convention recommender tests (like AUC or precision recall), which are something else entirely, you would either split your data into test/training by time (recent data is the test data) or value (most-preferred or associated items are the test data). If I understand the 0s to be missing elements of the input matrix, then they are not really "data". You wouldn't ever have a situation where the test data were all the 0s, because they're not input to begin with. The question is, which 1s are for training and which 1s are for testing.

Resources