Keras: Optimizing Tweet-Specific Pre-Trained Word Embeddings Layer - machine-learning

I'm working on a classification task where I would like to classify tweets into 5 different classes. I'm following the Keras Github IMDB classification examples for building models, but would like to modify the Embedding layer in this model. Instead of passing weights for initialization to the Embedding layer, I have word2vec weights that I would like to look up for each tweet in my dataset, so I can construct a matrix of (tweet_words x vector_dimension) for each tweet.
For example, the tweet "I'm so tired of hearing about this election #tuningout" would be represented as a matrix like:
vector_dim1 vector_dim2 vector_dim3 ... vector_dimN
I'm value1 value2 value3 valueN
so value1 value2 value3 valueN
tired (... and so on...)
of
hearing
about
this
election
#tuningout
I'm doing this lookup because I have embeddings that are learned separately for different countries, and I would like to look up the specific embedding based on location of the tweet, instead of passing weights from a joint embedding to the Embedding layer for initialization. I can pass such a matrix directly to a really simple LSTM with the following Keras architecture:
model = Sequential()
# layer here would normally be:
# model.add(Embedding())
model.add(LSTM(width, input_shape=(max_len, 100), dropout_W=0.2, dropout_U=0.2))
model.add(Dense(class_size))
model.add(Activation(activation))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
but, the disadvantage of this compared to the example in the link is that this architecture cannot further optimize an Embedding layer. Is there a way to pass these matrices for each tweet to an Embedding layer for further optimization as in the example? Thanks for reading.

Related

Converting Neural Network output to classes

I am working on document classification problem from Kaggle.
It has 5 classes - 'business', 'tech', 'politics', 'sport', 'entertainment'
I have trained my Deep Learning model and got the results for the test set as well. But the result I am getting is the list of probabilities of different classes.
Output for one row
How to get the actual classes(labels) from the output I got?
My Neural Network architecture looks like this-
Network Architecture
You should choose the entry with the highest value as the predicted class. For example, in your provided example: [0.045, 0.030, 0.015, 0.889, 0.019], the predicted class is the forth class (i.e., idx=3) which has the highest probability value.
The argmax function of NumPy is probably what you should be using. Considering that pred are the output probablities from your network in the shape of: (batch_size, num_labels), then np.argmax(pred, axis=1) will give you the indices (i.e., labels) associated with the predicted classes.

Explanation Needed for Autokeras's AutoModel and GraphAutoModel

I understand what AutoKeras ImageClassifier does (https://autokeras.com/image_classifier/)
clf = ImageClassifier(verbose=True, augment=False)
clf.fit(x_train, y_train, time_limit=12 * 60 * 60)
clf.final_fit(x_train, y_train, x_test, y_test, retrain=True)
y = clf.evaluate(x_test, y_test)
But i am unable to Understand what does AutoModel class (https://autokeras.com/auto_model/) does, or how is it different from ImageClassifier
autokeras.auto_model.AutoModel(
inputs,
outputs,
name="auto_model",
max_trials=100,
directory=None,
objective="val_loss",
tuner="greedy",
seed=None)
Documentation for Arguments Inputs and Outputs Says
inputs: A list of or a HyperNode instance. The input node(s) of the AutoModel.
outputs: A list of or a HyperHead instance. The output head(s) of the AutoModel.
What is HyperNode Instance ?
Similarly, what is GraphAutoModel class ? (https://autokeras.com/graph_auto_model/)
autokeras.auto_model.GraphAutoModel(
inputs,
outputs,
name="graph_auto_model",
max_trials=100,
directory=None,
objective="val_loss",
tuner="greedy",
seed=None)
Documentation Reads
A HyperModel defined by a graph of HyperBlocks. GraphAutoModel is a subclass of HyperModel. Besides the HyperModel properties, it also has a tuner to tune the HyperModel. The user can use it in a similar way to a Keras model since it also has fit() and predict() methods.
What is HyperBlocks ?
If Image Classifier automatically does HyperParameter Tuning, what is the use of GraphAutoModel ?
Links to Any Documents / Resources for better understanding of AutoModel and GraphAutoModel appreciated .
Having worked with autokeras recently, I can share my little knowledge.
Task API
When doing a classical task such as image classification/regression, text classification/regression, ..., you can use the simplest APIs provided by autokeras called Task API: ImageClassifier, ImageRegressor, TextClassifier, TextRegressor, ... In this case you have one input (image or text or tabular data, ...) and one output (classification, regression).
Automodel
However when you are in a situation where you have for example a task that requires multi inputs/outputs architecture, then you cannot use directly Task API, and this is where Automodel comes into play with the I/O API. you can check the example provided in the documentation where you have two inputs (image and structured data) and two outputs (classification and regression)
GraphAutoModel
GraphAutomodel works like keras functional API. It assembles different blocks (Convolutions, LSTM, GRU, ...) and create a model using this block, then it will look for the best hyperparameters given this architecture you provided. Suppose for instance I want to do a binary classification task using time series as input data.
First let's generate a toy dataset :
import numpy as np
import autokeras as ak
x = np.random.randn(100, 7, 3)
y = np.random.choice([0, 1], size=100, p=[0.5, 0.5])
Here x is a time series of 100 samples, each sample is a sequence of length 7 and a features dimension of 3. The corresponding target variable y is binary (0, 1).
Using GraphAutomodel, I can specify the architecture I want, using what is called HyperBlocks. There are many blocks: Conv, RNN, Dense, ... check the full list here.
In my case I want to use RNN blocks to create a model because I have time series data :
input_layer = ak.Input()
rnn_layer = ak.RNNBlock(layer_type="lstm")(input_layer)
dense_layer = ak.DenseBlock()(rnn_layer)
output_layer = ak.ClassificationHead(num_classes=2)(dense_layer)
automodel = ak.GraphAutoModel(input_layer, output_layer, max_trials=2, seed=123)
automodel.fit(x, y, validation_split=0.2, epochs=2, batch_size=32)
(If you are not familiar with the above style of defining model, then you should check the keras functional API documentation).
So in this example I have more flexibility for creating the skeleton of architecture I would like to use : LSTM block followed by a Dense layer, followed by a Classification layer, However I didn't specify any hyperparameter, (number of lstm layers, number of dense layers, size of lstm layers, size of dense layers, activation functions, dropout, batchnorm, ....), Autokeras will do the hyperparameters tuning automatically based on the architecture (skeleton) I provided.

Can the number of units in NN input layer be different than the number of features in the data?

Based on the tensorflow keras API tutorial;
model = keras.Sequential([
keras.layers.Dense(10, activation='softmax', input_shape=(32,)),
keras.layers.Dense(10, activation='softmax')
])
I couldn't understand that why the number of units in the input layer is 10 while the input shape is 32. Also, there are many examples like this one in the tensorflow tutorials.
This is a rather common confusion by new practitioners, and not without a reason: the answer, as it has already been hinted at in the comments, is that in the Keras Sequential API there is an implicit input layer, determined by the input_shape argument of the first explicit layer.
This is directly visible in the Keras Functional API (check the example in the docs), where Input is an explicit layer itself, and in which your model would be written as:
inputs = Input(shape=(32,)) # input layer
x = Dense(10, activation='softmax')(inputs) # hidden layer
outputs = Dense(10, activation='softmax')(x) # output layer
model = Model(inputs, outputs)
i.e. your model is actually an example of a "good old" neural net with three layers (input, hidden, and output), despite that it looks like a two-layer net in the Keras Sequential API.
(BTW, and irrelevant to the question, it does not make much sense to have softmax as activation for your hidden layer.)

Implementing Luong and Manning's hybrid model

hybrid word character model
As shown in the above image I need to create a hybrid encoder-decoder network(seq2seq) which takes in both word and character embeddings as input.
As shown in image consider the sentence:
A cute cat
Hypothetically the words in vocabulary are:
a , cat
and Out of vocabulary words are:
cute
we feed the words a, cat as their respective embeddings
but since cute is out of vocabulary we generally feed it with embedding of a universal token.
But instead in this case I need to pass that unique word (cute which is out of vocabulary) through another seq2seq layer character by character to generate its embedding on the fly.
The both seq2seq layers must be trained jointly end to end.
The following is a snippet of my code where I tried the main encoder decoder network which takes word based inputs in Keras
model=Sequential()
model.add(Embedding(X_vocab_len+y_vocab_len, 300,weights=[embedding_matrix], input_length=X_max_len, mask_zero=True))
for i in range(num_layers):
return_sequences = i != num_layers-1
model.add(LSTM(hidden_size,return_sequences=return_sequences))
model.add(RepeatVector(y_max_len))
# Creating decoder network
for _ in range(num_layers):
model.add(LSTM(hidden_size, return_sequences=True))
model.add(TimeDistributed(Dense(y_vocab_len)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
here X is my input sentence and y is the sentence to be generated ,vocabulary size is what I fixed consisting of frequent words and rare words are considered out of vocabulary based on vocabulary size
here I created a sequential model in Keras where I added embeddings from pre-trained vectors generated by GloVe(embedding_matrix)
How to model input to achieve such senario ?
The reference paper is :
http://aclweb.org/anthology/P/P16/P16-1100.pdf

Train High Definition images with Tensorflow and inception V3 pre trained model

I'm looking to do some image classification on PDF documents that I convert to images. I'm using tensorflow inception v3 pre trained model and trying to retrain the last layer with my own categories following the tensorflow tuto. I have ~1000 training images per category and only 4 categories. With 200k iterations I can reach up to 90% of successful classifications, which is not bad but still need some work:
The issue here is this pre-trained model takes only 300*300p images for input. Obviously it messes up a lot with the characters involved in the features I try to recognize in the documents.
Would it be possible to alter the model input layer so I can give him images with better resolution ?
Would I get better results with a home made and way simpler model ?
If so, where should I start to build a model for such image classification ?
If you want to use a different image resolution than the pre-trained model uses , you should use only the convolution blocks and have a set of fully connected blocks with respect to the new size. Using a higher level library like Keras will make it a lot easier. Below is an example on how to do that in Keras.
import keras
from keras.layers import Flatten,Dense,GlobalAveragePooling2D
from keras.models import Model
from keras.applications.inception_v3 import InceptionV3
base_model = InceptionV3(include_top=False,input_shape=(600,600,3),weights='imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024,activation='relu')(x)
#Add as many dense layers / Fully Connected layers required
pred = Dense(10,activation='softmax')(x)
model = Model(base_model.input,pred)
for l in model.layers[:-3]:
l.trainable=False
The input_top = False will give you only the convolution blocks. You can use the input_shape=(600,600,3) to set the required shape you want. And you can add a couple of dense blocks/Fully connected blocks/layers to the model.The last layer should contain the required number of categories .10 represent the number of classes.By this approach you use all the weights associated with the convolution layers of the pre trained model and train only the last dense layers.

Resources