Doc2Vec with Keras - machine-learning

According to Micholov paper I want to compute Doc2Vec using Keras. I'm new on Keras so I need your help.
There is a corpus of documents with an Id and I want to get two embeddings matrices : one for words and one for paragraphs, isn't it ?
Is it possible to adapt my Word2Vec code to get these embeddings ?
This an extract of my W2V code :
from keras.models import Sequential
cbow = Sequential()
cbow.add(Embedding(input_dim=V, output_dim=dim,input_length=window_size*2))
cbow.add(Lambda(lambda x: K.mean(x, axis=1), output_shape=(dim,)))
cbow.add(Dense(V, activation='softmax'))
Should I add another embedding layer to take into account the paragraph id ?

Related

Combination of classification and regression

The dataset I am working with contains the readings of an 8-sensors gas-sensor-array. The response of a sensor depends on the gas stimuli (methane, ethylene, etc.) and the concentration of the gas (20 ppm, 50 ppm, etc.). The dataset consists of 640 examples and each example is of shape=(6000,8) since there are 8 sensors on the array.
(sensor-array response to 100ppm of Methane)
My task is to make a model that will predict the class of the sensor-array reading (from which gas this reading is) and after that, I want to predict the concentration of that gas.
So far I have built a classification model based on 1D convolutional layers which successfully classifies examples into four categories (gases) with 98% accuracy.
How could I predict the concentration value of the gas? Is it possible to perform a regression analysis on the classified examples or should I look for a whole different approach?
For this task, I would just make a multi output neural network like this:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
inp = Input(shape=(n_features,))
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal')(inp)
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal')(hidden1)
out_reg = Dense(1, activation='linear')(hidden2)
out_class = Dense(n_class, activation='softmax')(hidden2)
model = Model(inputs=inp, outputs=[out_reg, out_class])
model.compile(loss=['mse','sparse_categorical_crossentropy'], optimizer='adam')
model.fit(X_train, [y_train_reg, y_train_class], epochs=150, batch_size=32, verbose=2)
One output for regression and another for classification. Below is the image of neural network architecture:
If you don't know how to create such networks, please read the documentation.

How to load unlabelled data for sentiment classification after training SVM model?

I am trying to do sentiment classification and I used sklearn SVM model. I used the labeled data to train the model and got 89% accuracy. Now I want to use the model to predict the sentiment of unlabeled data. How can I do that? and after classification of unlabeled data, how to see whether it is classified as positive or negative?
I used python 3.7. Below is the code.
import random
import pandas as pd
data = pd.read_csv("label data for testing .csv", header=0)
sentiment_data = list(zip(data['Articles'], data['Sentiment']))
random.shuffle(sentiment_data)
train_x, train_y = zip(*sentiment_data[:350])
test_x, test_y = zip(*sentiment_data[350:])
from nltk import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn import metrics
clf = Pipeline([
('vectorizer', CountVectorizer(analyzer="word",
tokenizer=word_tokenize,
preprocessor=lambda text: text.replace("<br />", " "),
max_features=None)),
('classifier', LinearSVC())
])
clf.fit(train_x, train_y)
pred_y = clf.predict(test_x)
print("Accuracy : ", metrics.accuracy_score(test_y, pred_y))
print("Precision : ", metrics.precision_score(test_y, pred_y))
print("Recall : ", metrics.recall_score(test_y, pred_y))
When I run this code, I get the output:
ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. "the number of iterations.", ConvergenceWarning)
Accuracy : 0.8977272727272727
Precision : 0.8604651162790697
Recall : 0.925
What is the meaning of ConvergenceWarning?
Thanks in Advance!
What is the meaning of ConvergenceWarning?
As Pavel already mention, ConvergenceWArning means that the max_iteris hitted, you can supress the warning here: How to disable ConvergenceWarning using sklearn?
Now I want to use the model to predict the sentiment of unlabeled
data. How can I do that?
You will do it with the command: pred_y = clf.predict(test_x), the only thing you will adjust is :pred_y (this is your free choice), and test_x, this should be your new unseen data, it has to have the same number of features as your data test_x and train_x.
In your case as you are doing:
sentiment_data = list(zip(data['Articles'], data['Sentiment']))
You are forming a tuple: Check this out
then you are shuffling it and unzip the first 350 rows:
train_x, train_y = zip(*sentiment_data[:350])
Here you train_x is the column: data['Articles'], so all you have to do if you have new data:
new_ data = pd.read_csv("new_data.csv", header=0)
new_y = clf.predict(new_data['Articles'])
how to see whether it is classified as positive or negative?
You can run then: pred_yand there will be either a 1 or a 0 in your outcome. Normally 0 should be negativ, but it depends on your dataset-up
Check out this site about model's persistence. Then you just load it and call predict method. Model will return predicted label. If you used any encoder (LabelEncoder, OneHotEncoder), you need to dump and load it separately.
If I were you, I'd rather do full data-driven approach and use some pretrained embedder. It'll also work for dozens of languages out-of-the-box with is quite neat.
There's LASER from facebook. There's also pypi package, though unofficial. It works just fine.
Nowadays there's a lot of pretrained models, so it shouldn't be that hard to reach near-seminal scores.
Now I want to use the model to predict the sentiment of unlabeled data. How can I do that? and after classification of unlabeled data, how to see whether it is classified as positive or negative?
Basically, you aggregate unlabeled data in same way as train_x or test_x is generated. Probably, it's 2D matrix of shape n_samples x 1, which you would then use in clf.predict to obtain predictions. clf.predict outputs most probable class. In your case 0 is negative and 1 is positive, but it's hard to tell without the dataset.
What is the meaning of ConvergenceWarning?
LinearSVC model is optimized using iterative algorithm. There is an argument max_iter (1000 by default) that controls maximum amount of iterations. If stopping criteria wasn't met during this process, you will get ConvergenceWarning. It shouldn't bother you much, as long as you have acceptable performance in terms of accuracy, or other metrics.

How to build a binary classifier/predictor for 1-d vector data in Python

[Disclaimer] This is my first excursion into machine learning.
I have a list of 1-d numpy real vectors that represent experimental conditions known to be associated to two mutually exclusive classes. To each vector a 1 or 0 can be assigned as the class label.
What is the best way to construct a classifier/predictor using these classes in Python such that the differences between the two classes are maximized?
Let's say you have 1000 vectors with 10 values. Your x data has shape (1000,10), y data (1000,1) (it's either 0 or 1, according to class). You want to predict y from x.
The simplest model could look like (using Keras):
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
mdl = Sequential() // create model
mdl.add(Dense(8, input_shape=(10,), activation='sigmoid'))
mdl.add(Dense(1, activation='sigmoid')
mdl.compile(optimizer = 'adam', loss='binary_crossentropy')
mdl.fit(x, y, epochs = 30)
Note that I can use sigmoid in the last layer of classification problem only if there are 2 classes. With more classes you should use softmax.
I recommend you check this page: https://keras.io/
Also, I think keras is better to begin with than tensorflow.

Implementing Luong and Manning's hybrid model

hybrid word character model
As shown in the above image I need to create a hybrid encoder-decoder network(seq2seq) which takes in both word and character embeddings as input.
As shown in image consider the sentence:
A cute cat
Hypothetically the words in vocabulary are:
a , cat
and Out of vocabulary words are:
cute
we feed the words a, cat as their respective embeddings
but since cute is out of vocabulary we generally feed it with embedding of a universal token.
But instead in this case I need to pass that unique word (cute which is out of vocabulary) through another seq2seq layer character by character to generate its embedding on the fly.
The both seq2seq layers must be trained jointly end to end.
The following is a snippet of my code where I tried the main encoder decoder network which takes word based inputs in Keras
model=Sequential()
model.add(Embedding(X_vocab_len+y_vocab_len, 300,weights=[embedding_matrix], input_length=X_max_len, mask_zero=True))
for i in range(num_layers):
return_sequences = i != num_layers-1
model.add(LSTM(hidden_size,return_sequences=return_sequences))
model.add(RepeatVector(y_max_len))
# Creating decoder network
for _ in range(num_layers):
model.add(LSTM(hidden_size, return_sequences=True))
model.add(TimeDistributed(Dense(y_vocab_len)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
here X is my input sentence and y is the sentence to be generated ,vocabulary size is what I fixed consisting of frequent words and rare words are considered out of vocabulary based on vocabulary size
here I created a sequential model in Keras where I added embeddings from pre-trained vectors generated by GloVe(embedding_matrix)
How to model input to achieve such senario ?
The reference paper is :
http://aclweb.org/anthology/P/P16/P16-1100.pdf

Keras: Optimizing Tweet-Specific Pre-Trained Word Embeddings Layer

I'm working on a classification task where I would like to classify tweets into 5 different classes. I'm following the Keras Github IMDB classification examples for building models, but would like to modify the Embedding layer in this model. Instead of passing weights for initialization to the Embedding layer, I have word2vec weights that I would like to look up for each tweet in my dataset, so I can construct a matrix of (tweet_words x vector_dimension) for each tweet.
For example, the tweet "I'm so tired of hearing about this election #tuningout" would be represented as a matrix like:
vector_dim1 vector_dim2 vector_dim3 ... vector_dimN
I'm value1 value2 value3 valueN
so value1 value2 value3 valueN
tired (... and so on...)
of
hearing
about
this
election
#tuningout
I'm doing this lookup because I have embeddings that are learned separately for different countries, and I would like to look up the specific embedding based on location of the tweet, instead of passing weights from a joint embedding to the Embedding layer for initialization. I can pass such a matrix directly to a really simple LSTM with the following Keras architecture:
model = Sequential()
# layer here would normally be:
# model.add(Embedding())
model.add(LSTM(width, input_shape=(max_len, 100), dropout_W=0.2, dropout_U=0.2))
model.add(Dense(class_size))
model.add(Activation(activation))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
but, the disadvantage of this compared to the example in the link is that this architecture cannot further optimize an Embedding layer. Is there a way to pass these matrices for each tweet to an Embedding layer for further optimization as in the example? Thanks for reading.

Resources