cnn max pooling - non consecutive sliding window (skip gram like)? - machine-learning

When using keras to build a simple cnn like the code below and when it is used on text-based problems such as document classification, I understand that this is as if we are extracting 4-grams from the text (kernel_size of 4) and use them as features.
model = Sequential()
model.add(embedding_layer)
model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(Dense(4, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
and in this case, the kernel size in the conv1D layer is like a sliding window of size 4 that walks over sequences of tokens in the text to emit 4-grams.
I wonder if here is a way such that we can create 'non-consecutive sliding window in the convolution, i.e., that would generate 'skip-gram' equivalent. So for example, given the following 1d vector:
[a, b, c, d, e, f]
a conv1d with a kernel_size=3 skip=1 will scan the following sequences:
[(a,c,d),(b,d,e),(c,e,f),(d,f,padding),(e,padding,padding)] union [(a,b,d),(b,c,e),(c,d,f),(d,e,padding),(e,f,padding),(f,padding,padding)]
The reason I say 'union' is simply because I suppose from the implementation point of view, it may be easier to generate either part 1 or part 2, giving another parameter for the revised conv1d layer. and if thhat's the case and doable, I can work around this by concatenating multiple layers. But the minimum is really to have an extended conv1d layer that would take additional parameters such that it does either the first or the second part of scanning.
The idea is not new as this paper already experimented it: http://www.aclweb.org/anthology/D/D16/D16-1085.pdf
But excuse my lack of in-depth knowledge of keras I do not know how to implement it. Any suggestions please,
Many thanks in advance

You can do this creating a custom convolutional layer where certain elements in the weight matrix are zero.
You can take the regular Conv1D layer as the base class.
But before doing this, notice that you can create a "dilated" convolution by passing the dilation_rate=n parameter when creating a regular convolutional layer. This will skip n-1 grams between each taken gram in the window. Your window will have fixed regular spaces.
Creating a custom layer for that:
import keras.backend as K
#a 1D convolution that skips some entries
class SkipConv1D(Conv1D):
#in the init, let's just add a parameter to tell which grams to skip
def __init__(self, validGrams, **kwargs):
#for this example, I'm assuming validGrams is a list
#it should contain zeros and ones, where 0's go on the skip positions
#example: [1,1,0,1] will skip the third gram in the window of 4 grams
assert len(validGrams) == kwargs.get('kernel_size')
self.validGrams = K.reshape(K.constant(validGrams),(len(validGrams),1,1))
#the chosen shape matches the dimensions of the kernel
#the first dimension is the kernel size, the others are input and ouptut channels
#initialize the regular conv layer:
super(SkipConv1D,self).__init__(**kwargs)
#here, the filters, size, etc, go inside kwargs, so you should use them named
#but you may make them explicit in this __init__ definition
#if you think it's more comfortable to use it like this
#in the build method, let's replace the original kernel:
def build(self, input_shape):
#build as the original layer:
super(SkipConv1D,self).build(input_shape)
#replace the kernel
self.originalKernel = self.kernel
self.kernel = self.validGrams * self.originalKernel
Be aware of some things that weren't taken care of in this answer:
The method get_weights() will still return the original kernel, not the kernel with the skipped mask. (It's possible to fix this, but there will be an extra work, if necessary, please tell me)
There are unused weights in this layer. This is a simple implementation. The focus here was to keep it the most similar possible to an existing Conv layer, with all its features. It's also possible to use only strictly necessary weights, but this will increase the complexity a lot, and require lots of rewriting of the keras original code for recreating all the original possibilities.
If your kernel_size is too long, it will be very boring to define the validGrams var. You may want to create a version that takes some skipped indices and then converts it in the type of list used above.
Different channels skipping different grams:
It's possible to do this inside a layer as well, if instead of using a validGrams with shape (length,), you use one with shape (length,outputFilters).
In this case, at the point where we create the validGrams matrix, we should reshape it like:
validGrams = np.asarray(validGrams)
shp = (validGrams.shape[0],1,validGrams.shape[1])
validGrams = validGrams.reshape(shp)
self.validGrams = K.constant(validGrams)
You can also simply use many parallel SkipConv1D with different parameters and then concatenate their results.
inputs = Input(yourInputShape)
out = embedding_layer(inputs)
out1 = SkipConv1D(filters=50,kernel_size=4,validGrams=[1,0,1,1])(out)
out2 = SkipConv1D(filters=50,kernel_size=4,validGrams=[1,1,0,1])(out)
out = Concatenate()([out1,out2]) #if using 'channels_first' use Concatenate(axis=1)
out = MaxPooling1D(pool_size=4)(out)
out = Dense(4, activation='softmax')(out)
model = Model(inputs,out)

Related

What is the purpose of having the same input and output in PyTorch nn.Linear function?

I think this is a comprehension issue, but I would appreciate any help.
I'm trying to learn how to use PyTorch for autoencoding. In the nn.Linear function, there are two specified parameters,
nn.Linear(input_size, hidden_size)
When reshaping a tensor to its minimum meaningful representation, as one would in autoencoding, it makes sense that the hidden_size would be smaller. However, in the PyTorch tutorial there is a line specifying identical input_size and hidden_size:
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
I guess my question is, what is the purpose of having the same input and hidden size? Wouldn't this just return an identical tensor?
I suspect that this just a requirement after calling the nn.ReLU() activation function.
As well stated by wikipedia:
An autoencoder is a type of artificial neural network used to learn
efficient codings of unlabeled data. The
encoding is validated and refined by attempting to regenerate the
input from the encoding.
In other words, the idea of the autoencoder is to learn an identity. This identity-function will be learned only for particular inputs (i.e. without anomalies). From this, the following points derive:
Input will have same dimensions as output
Autoencoders are (generally) built to learn the essential features of the input
Because of point (1), you have that autoencoder will have a series of layers (e.g. a series of nn.Linear() or nn.Conv()).
Because of point (2), you generally have an Encoder which compresses the information (as your code-snippet, you start from 28x28 to the ending 10) and a Decoder that decompress the information (10 -> 28x28). Generally the latent space dimensionality (10) is much smaller than the input (28x28) across several implementation of this theoretical architecture. Now that the end-goal of the Encoder part is clear, you may appreciate that the compression may produce additional data during the compression itself (nn.Linear(28*28, 512)), which will disappear when the series of layers will give the final output (10).
Note that because the model in your question includes a nonlinearity after the linear layer, the model will not learn an identity transform between the input and output. In the specific case of the relu nonlinearity, the model could learn an identity transform if all of the input values were positive, but in general this won't be the case.
I find it a little easier to imagine the issue if we had an even smaller model consisting of Linear --> Sigmoid --> Linear. In such a case, the input will be mapped through the first matrix transform and then "squashed" into the space [0, 1] as the "hidden" layer representation. The next ("output") layer would need to take this squashed view of the input and come up with some way of "unsquashing" it back into the original. But with an affine output layer, it's not possible to do this, so the model will have to learn some other, non-identity, transforms for the two matrices.
There are some neat visualizations of this concept on Chris Olah's blog that are well worth a look.

Autoencoders: Find the important neurons

I have implemented Autoencoder using Keras that takes 112*112*3 neurons as input and 100 neurons as the compressed/encoded state. I want to find the neurons out of these 100 that learns the important features. So far i have calculated eigen values(e) and eigen vectors(v) using the following steps. And i found out that around first 30 values of (e) is greater than 0. Does that mean the first 30 modes are the important ones? Is there any other method that could find the important neurons?
Thanks in Advance
x_enc = enc_model.predict(x_train, batch_size=BATCH_SIZE) # shape (3156,100)
x_mean = np.mean(x_enc, axis=0) # shape (100,)
x_stds = np.std(x_enc, axis=0) # shape (100,)
x_cov = np.cov((x_enc - x_mean).T) # shape (100,100)
e, v = np.linalg.eig(x_cov) # shape (100,) and (100,100) respectively
I don't know if the approach you are using will actually give you any useful results since the way the network learns and what it exactly learns aren't known, I suggest you use a different kind of autoencoder, that automatically learns disentangled representations of the data in a latent space, this way you can be sure that all the parameters you find are actually contributing to the representation of your data. check this article

One-hot-encoded labels___multi-hot-encoded output_Keras

I have a 1D-image with 1x2048 pixels as input and 32 classes for which I have defined a layer of 32 filters with the same size of the image(1x2048) which are L1-regularized.
My image examples are one-hot encodded. However, my goal is to get a multi-hot encoded output when I sum some of these images together and feed it to the trained model.
The training goes well and it can classify each class seperately, but if I sum two image and feed it to the model it only outputs a one-hot encoded vector( although I expect a two-hot encoded vector). If I look at the kernels after training, they make sense as most of the weights are zero except the ones which define my class.
I don't understand why I get a one-hot vector output rather than multi-hot vector.
The reason I don't already sum the images and use them for training the model is that the possible making the possible combination of the images exceed my memory power.
An image of the network I have in mind
input_shape=(1,2048,1)
model = Sequential()
model.add(Conv2D(32, kernel_size=(1, 2048), strides=(1, 1),
activation='sigmoid',
input_shape=input_shape,
kernel_regularizer=keras.regularizers.l1(0.01),
kernel_constraint=keras.constraints.non_neg() ))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=optimizer,metrics=['accuracy'])
You are using the wrong loss function
categorical_crossentropy will always return you exactly one 1-value in your vector, no matter the input. It tries to classify every instance into one (and only one) available class.
What you desire, though, is (potentially) mutliple ones in your output. Therefore, you should use binary_crossentropy instead. Also see this post.
On a side note, I would heavily advice you to really consider this twice, since - if you don't really have the case with multiple classes that often, it will maybe result in a lot of false positives. I.e., cases where you get more than one class predicted.
On another note, you might want to consider using Conv1D since your signal is 1-dimensional only.
#Azerila
The thing you are looking for is Mixup augmentation. It is implemented as follows:
def mixup(entry1,entry2):
image1,label1 = entry1
image2,label2 = entry2
alpha = [0.2]
dist = tfd.Beta(alpha, alpha)
l = dist.sample(1)[0][0]
img = l*image1+(1-l)*image2
lab = l*label1+(1-l)*label2
return img, lab

How to tie word embedding and softmax weights in keras?

Its commonplace for various neural network architectures in NLP and vision-language problems to tie the weights of an initial word embedding layer to that of an output softmax. Usually this produces a boost to sentence generation quality. (see example here)
In Keras its typical to embed word embedding layers using the Embedding class, however there seems to be no easy way to tie the weights of this layer to the output softmax. Would anyone happen to know how this could be implemented ?
Be aware that Press and Wolf dont't propose to freeze the weights to some pretrained ones, but tie them. That means, to ensure that input and output weights are always the same during training (in the sense of synchronized).
In a typical NLP model (e.g. language modelling/translation), you have an input dimension (vocabulary) of size V and a hidden representation size H. Then, you start with an Embedding layer, which is a matrix VxH. And the output layer is (probably) something like Dense(V, activation='softmax'), which is a matrix H2xV. When tying the weights, we want that those matrices are the same (therefore, H==H2).
For doing this in Keras, I think the way to go is via shared layers:
In your model, you need to instantiate a shared embedding layer (of dimension VxH), and apply it to either your input and output. But you need to transpose it, to have the desired output dimensions (HxV). So, we declare a TiedEmbeddingsTransposed layer, which transposes the embedding matrix from a given layer (and applies an activation function):
class TiedEmbeddingsTransposed(Layer):
"""Layer for tying embeddings in an output layer.
A regular embedding layer has the shape: V x H (V: size of the vocabulary. H: size of the projected space).
In this layer, we'll go: H x V.
With the same weights than the regular embedding.
In addition, it may have an activation.
# References
- [ Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
"""
def __init__(self, tied_to=None,
activation=None,
**kwargs):
super(TiedEmbeddingsTransposed, self).__init__(**kwargs)
self.tied_to = tied_to
self.activation = activations.get(activation)
def build(self, input_shape):
self.transposed_weights = K.transpose(self.tied_to.weights[0])
self.built = True
def compute_mask(self, inputs, mask=None):
return mask
def compute_output_shape(self, input_shape):
return input_shape[0], K.int_shape(self.tied_to.weights[0])[0]
def call(self, inputs, mask=None):
output = K.dot(inputs, self.transposed_weights)
if self.activation is not None:
output = self.activation(output)
return output
def get_config(self):
config = {'activation': activations.serialize(self.activation)
}
base_config = super(TiedEmbeddingsTransposed, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
The usage of this layer is:
# Declare the shared embedding layer
shared_embedding_layer = Embedding(V, H)
# Obtain word embeddings
word_embedding = shared_embedding_layer(input)
# Do stuff with your model
# Compute output (e.g. a vocabulary-size probability vector) with the shared layer:
output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=shared_embedding_layer, activation='softmax')(intermediate_rep)
I have tested this in NMT-Keras and it trains properly. But, as I try to load a trained model, it gets an error, related to the way Keras loads the models: it doesn't load the weights from the tied_to. I've found several questions regarding this (1, 2, 3), but I haven't managed to solve this issue. If someone have any ideas on the next steps to take, I'd be very glad to hear them :)
As you may read here you should simply set trainable flag to False. E.g.
aux_output = Embedding(..., trainable=False)(input)
....
output = Dense(nb_of_classes, .. ,activation='softmax', trainable=False)

How do I obtain the layer names for use in the iOS sample app? (Tensorflow)

I'm very new to Tensorflow, and I'm trying to train something using the inception v3 network for use in an iPhone app. I managed to export my graph as a protocolbuffer file, manually remove the dropout nodes (correctly, I hope), and have placed that .pb file in my iOS project, but now I am receiving the following error:
Running model failed:Not found: FeedInputs: unable to find feed output input
which seems to indicate that my input_layer_name and output_layer_name variables in the iOS app are misconfigured.
I see in various places that it should be Mul and softmax respectively, for inception v3, but these values don't work for me.
My question is: what is a layer (with regards to this context), and how do I find out what mine are?
This is the exact definition of the model that I trained, but I don't see "Mul" or "softmax" present.
This is what I've been able to learn about layers, but it seems to be a different concept, since "Mul" isn't present in that list.
I'm worried that this might be a duplicate of this question but "layers" aren't explained (are they tensors?) and graph.get_operations() seems to be deprecated, or maybe I'm using it wrong.
As MohamedEzz wrote there are no layers in Tensorflow graphs. There are only operations that can be placed under the same name scope.
Usually operations of a single layer placed under the same scope and applications that aware of name scope concept can display them grouped.
One of such applications is Tensorboard. I believe that using Tensorboard is the easiest way to find node names.
Consider the following example:
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
input_placeholder = tf.placeholder(tf.float32, shape=(None, 224, 224, 3))
network = nets.inception.inception_v3(input_placeholder)
writer = tf.summary.FileWriter('.', tf.get_default_graph())
writer.close()
It creates placeholder for input data then creates Inception v3 network and saves event data (with graph) in current directory.
Launching Tensorflow in the same directory makes it possible to view graph structure.
tensorboard --logdir .
Tensorboard prints UI url to the console
Starting TensorBoard 41 on port 6006
(You can navigate to http://192.168.128.73:6006)
Below is an image of this graph.
Locate node you are interested in and select it to find its name (in the upper left information pane).
Input:
Output:
Please note that usually you need not node names but tensor names. In most cases it is enough to add :0 to node name to get tensor name.
For example to run Inception v3 network created above using names from the graph use the following code (continuation of the above code):
import numpy as np
data = np.random.randn(1, 224, 224, 3) # just random data
session = tf.InteractiveSession()
session.run(tf.global_variables_initializer())
result = session.run('InceptionV3/Predictions/Softmax:0', feed_dict={'Placeholder:0': data})
# result.shape = (1, 1000)
In the core of tensorflow, there are ops (operations) and tensors (n-dimensional arrays). Each op takes tensors and gives back tensors. Layers are just convenience wrappers around a number of ops that represent a neural network layer.
For example a convolution layer is composed of mainly 3 ops :
conv2d op : this is what slides a kernel over the input tensor and does element-wise multiplication between the kernel and the underlying input window.
bias_add op : adds the biases to the tensor coming out of the conv2d op
activation op : applies an activation function element-wise to the output tensor of the bias_add op
To run a tensorflow model, you provide feeds (inputs) and fetches (desired outputs). These are tensors, or tensor names.
From this line of code Inception_model, it seems that what you need is a tensor named 'predictions' which has the n_class output probabilities.
What you observed (softmax) is the type of the op that produced the predictions tensor
As for the input tensor name, the inception_model.py code does not show the input tensor name, since it's an argument to the function. So it depends on what name you have given to that input tensor.
When you create your layers or variable add the parameter called name
with tf.name_scope("output"):
W2 = tf.Variable(tf.truncated_normal([num_filters, num_classes], stddev=0.1), name="W2")
b2 = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b2")
scores = tf.nn.xw_plus_b(h_pool_flat, W2, b2, name="scores")
pred_y = tf.nn.softmax(scores,name="pred_y")
In this case I can access final predicted values by using "output/pred_y". If you dont have name_scope, you can just use "pred_y" to get to the values
conv = tf.nn.conv1d(word_embeddedings,
W1,
stride=stride_size,
padding="VALID",
name="conv") #will have dimensions [batch_size,out_width,num_filters] out_width is a function of max_words,filter_size and stride_size
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b1), name="relu")
I called the layer "conv" and used it in the next layer. Paste your snippet like I have done here

Resources