Which embedding layer does skip-gram use? - machine-learning

In Skip-gram, it has pairs of a focused word and context words. Then, it outputs each pair is correct or wrong pairs.
In contrast to CBOW, Skip-gram uses two embedding layer. After training, which embedding layer should be used?
In below examples in Keras, it looks using first (focused word embedding, not context one) embedding layer.
https://github.com/nzw0301/keras-examples/blob/master/Skip-gram-with-NS.ipynb

Related

How to add new vector to Keras embedding matrix

Background
I am using an embedding layer for a categorical data column in Keras.
My understanding is that an embedding layer is simply a matrix which consist of trainable vectors, each mapped to an index.
My problem
After training finished, I want to add a new index-vector pair to the embedding matrix.
(The vector is generated by me, no training involved at this stage.)
How do I do this?
I wish to use the newly added embedding in predictions as well.
Code
keras.layers.Embedding(number_of_categories, embedding_size, input_length=1)
I am especially stuck, since number of categories are coded in the model architecture. Is there any way around this?

Fine-tune a model with larger input size

I was wondering does it make sense to fine-tune a model with larger input size? Ideally, the properties I would like to have:
Fine-tune: meaning reuse the weights by pre-training
Larger input size: Not down-sampling before feeding in the model. Maybe have a larger stride size?
Specifically I'm trying to fine-tune InceptionV3 in Keras with my specific label set. I want a larger data size since I hope the model can implicitly learn some important characters. With InceptionV3 default size (299x299) this doesn't sounds possible to me.
But that sounds like I have to change the specific model that I'm re-using (say by modifying specific layer in model architecure), then re-using pre-trained weights doesn't make sense?
If you want to fine-tune a classification model usually you would remove a few of the top layers, which acts as the classifier, and add your own layers. This is the same with fine-tuning the Inception_V3 model: you can remove the top layers and add your own classifier with the desired number of units (i.e. number of classes in your dataset). For example:
from keras.applications.inception_v3 import InceptionV3
# let's say our images are of size (1000, 1000, 3)
inc_v3 = InceptionV3(include_top=False, input_shape=(1000, 1000, 3), pooling)
# add your desired layers to the top
# we only add one layer just for illustration
# but you can add as many layers as you want
out = Dense(num_classes, activation='softmax')(inc_v3.output)
# construct the new model
model = Model(inc_v3.input, out)
However, note that you need to first freeze all the base layers (i.e. layers of Inception_V3 model) for fine-tuning. Further, instead of adding a pooling layer at the top (i.e. pooling='avg'), you can also use other alternatives such as using a Flatten layer.
Further, I recommend you to read the relevant official Keras tutorial: Building powerful image classification models using very little data (the second and third sections are mostly relevant to this).

TensorFlow Combining Dense Layer with LSTM Cell

How do i combine a Tensorflow Dense Layer, which is then followed by a LSTM!
Given a sequence of variable length, i want to backprop through both the layers, since i will be using this for RL.
How do i format my input sequence/ define my layers to be consistent with size requirements?

Can't understand how filters in a Conv net are calculated

I've been studying machine learning for 4 months, and I understand the concepts behind the MLP. The problem came when I started reading about Convolutional Neural Networks. Let me tell you what I know and then ask what I'm having trouble with.
The core parts of a CNN are:
Convolutional Layer: you have "n" number of filters that you use to generate "n" feature maps.
RELU Layer: you use it for normalizing the output of the convolutional layer.
Sub-sampling Layer: used for "generating" a new feature map that represents more abstract concepts.
Repeat the first 3 layers some times and the last part is a common Classifier, such as a MLP.
My doubts are the following:
How do I create the filters used in the Convolutional Layer? Do I have to create a filter, train it, and then put it in the Conv Layer, or do I train it with the backpropagation algorithm?
Imagine I have a conv layer with 3 filters, then it will output 3 feature maps. After applying the RELU and Sub-sampling layer, I will still have 3 feature maps (smaller ones). When passing again through the Conv Layer, how do I calculate the output? Do I have to apply the filter in each feature map separately, or do some kind of operation over the 3 feature maps and then make the sum? I don't have any idea of how to calculate the output of this second Conv Layer, and how many feature maps it will output.
How do I pass the data from the Conv layers to the MLP (for classification in the last part of the NN)?
If someone knows of a simple implementation of a CNN without using a framework I will appreciate it. I think the best way of learning how stuff works is by doing it by yourself. In another time, when you already know how stuff works, you can use frameworks, because they save you a lot of time.
You train it with backpropagation algorithm, the same way as you train MLP.
You apply each filter separately. For example if you have 10 feature maps in the first layer and the filter shape of one of the feature maps from the second layer is 3*3, then you apply 3*3 filter to each of the ten feature maps in the first layer, weights for each feature map are different, in this case one filter will have 3*3*10 weights.
To understand it easier, keep in mind that a pixel of a non-grayscale image has three values - red, green and blue, so if you're passing images to a convolutional neural network ,then in the input layer you alredy have 3 feature maps(for RGB), so one value in the next layer will be connected too all 3 feature maps in the first layer
You should flatten the convolutional feature maps, for example if you have 10 feature maps with the size of 5*5, then you will have a layer with 250 values and then nothing different from MLP, you connect all of these artificial neurons to all of the artificial neurons in the next layer by weights.
Here someone has implemented convolutional neural network without frameworks.
I would also recommend you those lectures.

Interpreting a Self Organizing Map

I have been doing reading about Self Organizing Maps, and I understand the Algorithm(I think), however something still eludes me.
How do you interpret the trained network?
How would you then actually use it for say, a classification task(once you have done the clustering with your training data)?
All of the material I seem to find(printed and digital) focuses on the training of the Algorithm. I believe I may be missing something crucial.
Regards
SOMs are mainly a dimensionality reduction algorithm, not a classification tool. They are used for the dimensionality reduction just like PCA and similar methods (as once trained, you can check which neuron is activated by your input and use this neuron's position as the value), the only actual difference is their ability to preserve a given topology of output representation.
So what is SOM actually producing is a mapping from your input space X to the reduced space Y (the most common is a 2d lattice, making Y a 2 dimensional space). To perform actual classification you should transform your data through this mapping, and run some other, classificational model (SVM, Neural Network, Decision Tree, etc.).
In other words - SOMs are used for finding other representation of the data. Representation, which is easy for further analyzis by humans (as it is mostly 2dimensional and can be plotted), and very easy for any further classification models. This is a great method of visualizing highly dimensional data, analyzing "what is going on", how are some classes grouped geometricaly, etc.. But they should not be confused with other neural models like artificial neural networks or even growing neural gas (which is a very similar concept, yet giving a direct data clustering) as they serve a different purpose.
Of course one can use SOMs directly for the classification, but this is a modification of the original idea, which requires other data representation, and in general, it does not work that well as using some other classifier on top of it.
EDIT
There are at least few ways of visualizing the trained SOM:
one can render the SOM's neurons as points in the input space, with edges connecting the topologicaly close ones (this is possible only if the input space has small number of dimensions, like 2-3)
display data classes on the SOM's topology - if your data is labeled with some numbers {1,..k}, we can bind some k colors to them, for binary case let us consider blue and red. Next, for each data point we calculate its corresponding neuron in the SOM and add this label's color to the neuron. Once all data have been processed, we plot the SOM's neurons, each with its original position in the topology, with the color being some agregate (eg. mean) of colors assigned to it. This approach, if we use some simple topology like 2d grid, gives us a nice low-dimensional representation of data. In the following image, subimages from the third one to the end are the results of such visualization, where red color means label 1("yes" answer) andbluemeans label2` ("no" answer)
onc can also visualize the inter-neuron distances by calculating how far away are each connected neurons and plotting it on the SOM's map (second subimage in the above visualization)
one can cluster the neuron's positions with some clustering algorithm (like K-means) and visualize the clusters ids as colors (first subimage)

Resources