Using keras.layers.Concatenate correctly [duplicate] - machine-learning

I am trying to merge output from two models and give them as input to the third model using keras sequential model.
Model1 :
inputs1 = Input(shape=(750,))
x = Dense(500, activation='relu')(inputs1)
x = Dense(100, activation='relu')(x)
Model1 :
inputs2 = Input(shape=(750,))
y = Dense(500, activation='relu')(inputs2)
y = Dense(100, activation='relu')(y)
Model3 :
merged = Concatenate([x, y])
final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(100, activation='relu'))
final_model.add(Dense(3, activation='softmax'))
Till here, my understanding is that, output from two models as x and y are merged and given as input to the third model. But when I fit this all like,
module3.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
module3.fit([in1, in2], np_res_array)
in1 and in2 are two numpy ndarray of dimention 10000*750 which contains my training data and np_res_array is the corresponding target. This gives me error as 'list' object has no attribute 'shape' As far as know, this is how we give multiple inputs to a model, but what is this error? How do I resolve it?

You can't do this using Sequential API. That's because of two reasons:
Sequential models, as their name suggests, are a sequence of layers where each layer is connected directly to its previous layer and therefore they cannot have branches (e.g. merge layers, multiple input/output layers, skip connections, etc.).
The add() method of Sequential API accepts a Layer instance as its argument and not a Tensor instance. In your example merged is a Tensor (i.e. concatenation layer's output).
Further, the correct way of using Concatenate layer is like this:
merged = Concatenate()([x, y])
However, you can also use concatenate (note the lowercase "c"), its equivalent functional interface, like this:
merged = concatenate([x, y])
Finally, to be able to construct that third model you also need to use the functional API.

Related

Multiple instance model in Keras

I have a multiple instance dataset for which I want to predict the instance category as well as a (derived) bag label using Keras' Functional API. Simple instance prediction works and getting a bag label from that also works. But since the bag label is outside of the model the results seem to be suboptimal.
My thinking is as follows:
For each instance in a bag, start up a separate branch of the model.
After running each instance through its branch, concatenate the results.
After concatenation, predict the bag label based on probabilities
What I have written so far - here, n_instances is the number of instances per bag, n_feat the number of features per instance, and n_classes the number of possible categories an instance/bag can belong to.
from keras.layers import *
inputs = []
instance_layer = [None] * n_instances
for i in range(n_instances):
inp = Input(shape=n_feat)
inputs.append(inp)
instance_layer[i] = Dense(units=256, activation='ReLU')(inp)
instance_layer[i] = Dense(units=128, activation='ReLU')(instance_layer[i])
instance_layer[i] = Dense(units=64, activation='ReLU')(instance_layer[i])
instance_layer[i] = Dense(units=n_classes + 1, activation='sigmoid')(instance_layer[i]) # output to be converted to one-hot vector
output_tensor = Concatenate()(instance_layer)
"""
Code to go from concatenated tensor to a single bag prediction
"""
model = tf.keras.models.Model(inputs, output_tensor)
Issues:
It seems to me like each instance sees a separate model while I want to keep the models identical
Concatenate() produces a tensor of length n_instances*n_classes, whereas I'm interested in a tensor of shape (n_instances, n_classes). I would prefer to use CategoricalCrossEntropy as a loss function.
Any pointers on how to go from this tensor of instance predictions to a bag prediction?
For posterity:
instance_model = tf.keras.models.Sequential([
Dense(units=256, name='fc_256', activation='ReLU', input_dim=n_feat),
Dense(units=128, name='fc_128', activation='ReLU'),
Dense(units=64, name='fc_64', activation='ReLU'),
Dense(units=n_classes+1, name='label_predictions', activation='sigmoid')
])
This is then wrapped in a TimeDistributed layer which returns a tensor with n_instances rows and n_classes+1 columns, for an input tensor of n_instances rows and n_feat columns. n_instances is variable here, hence the None in the input shape:
inputs = Input(shape=(None, n_feat), name="input")
instance_output = TimeDistributed(instance_model)(inputs)
# Condense into bag prediction
bag_output = GlobalAveragePooling1D(name="pooling")(instance_output)
model = tf.keras.models.Model(inputs, bag_output)

usage of GRU in non sequential context

I was implementing a GRU in keras, I was still a bit confused about some things, but got to a model:
modelGRU = tf.keras.models.Sequential()
modelGRU.add(layers.Bidirectional(tf.keras.layers.GRU(50, activation='tanh', input_shape=(1, 4))))
modelGRU.add(layers.Dense(99))
Then I've found out that my model does not make any sense, since I put the model parameters (which are 4 parameters like depth, angle, ..., which are the same at all times) in a single GRU. This gives me an output of dimensions 100 (50*2), and then a dense layer is used to generate the 99 outputs. These 99 outputs are a timeseries and that is why I initially taught of GRU, but of course this implementation above is not right, since my model parameters have no time or sequential information. However, this model seems to be working better than the model I have implemented once I understood everything better:
params_input = keras.Input(shape=(99,4))
aantal_units = 5
naRNN = (tf.keras.layers.GRU(aantal_units,input_shape=(99,5),return_sequences=True))(params_input)
ylist = tf.unstack(naRNN,num=99,axis=1)
ylistdense = []
for ii in range(0,99):
yy = tf.keras.layers.Dense(1,activation='linear')(ylist[ii])
ylistdense.append(yy)
conc = tf.keras.layers.concatenate(ylistdense)
model = keras.Model(inputs=params_input,outputs=conc)
Here for my input I copied 99 times the model parameters, in order to have an input of shape (99,4), put these in an GRU layer, and then for every timestep individually I make a dense layer in order to predict the outcome.
Her the architecture of my second implementation is visualized
So my question is: can a GRU be used for non sequential input? or is there something wrong with my second implementation?

Transformation mapping between input and output files using Machine Learning

I have two files with me, an input file and an output file. The input file goes through a transformation logic and produces the output file. The issue here is that, I am not aware of the transformation logic between the input and output file.
The input file contains 10 fields and output file contains 7 fields. These 10 fields are transformed into 7 fields using a transformation logic.
Is there way using some machine learning algorithm, to build a model that automatically deduce the relationship between input and output and will be able to predict the output based on the data in the input file?
I think I have something, that might help you solve your issue:
You have various inputs with different datatypes. Also you have different outputs with different datatypes. Lets have this dataset as example when working with tensorflow and keras:
x_categorical=[1,2,3,4,5]
x_categorical_2=np.random.choice(x_categorical, len(x_categorical))
x_continuus=np.random.random_sample(len(x_categorical))
y_categorical = [0,2,3,4,5]
y_continuus = np.random.random_sample(len(x_categorical))
Create the tf.data.Datasets and zip the x and y values together, so it fits the model input:
ds_x = tf.data.Dataset.from_tensor_slices(x_categorical)
ds_x1 = tf.data.Dataset.from_tensor_slices(x_categorical_2)
ds_x2 = tf.data.Dataset.from_tensor_slices(x_continuus)
dataset_x = tf.data.Dataset.zip((ds_x,ds_x1,ds_x2))
ds_y = tf.data.Dataset.from_tensor_slices(y_categorical)
ds_y1 = tf.data.Dataset.from_tensor_slices(y_continuus)
dataset_y = tf.data.Dataset.zip((ds_y,ds_y1))
dataset_train = tf.data.Dataset.zip((dataset_x, dataset_y))
Build an example model, which concatenates the inputs, has one layer which contains the "logic" for the data combination and two more layers, that have a logic for each output:
from tensorflow.keras import layers as layer
layer_input_categorical = layer.Input(shape=(1),name="x_categorical", dtype=tf.float32)
layer_input_categorical_2 = layer.Input(shape=(1),name="x_categorical_2", dtype=tf.float32)
layer_input_continuus = layer.Input(shape=(1),name="x_continuus", dtype=tf.float32)
concat_layer = layer.Concatenate()([layer_input_categorical,layer_input_categorical_2, layer_input_continuus])
dense_layer = layer.Dense(100)(concat_layer)
dense_layer_out_cat = layer.Dense(50)(dense_layer)
dense_layer_out_con = layer.Dense(50)(dense_layer)
output_categorical = layer.Dense(5, activation="softmax")(dense_layer_out_cat)
output_continuus = layer.Dense(1, activation="sigmoid")(dense_layer_out_con)
model = tf.keras.Model(inputs=[layer_input_categorical, layer_input_categorical_2, layer_input_continuus], \
outputs=[output_categorical, output_continuus])
model.compile(optimizer="Nadam", loss=["mse","sparse_categorical_crossentropy"])
Please note the usage of two loss functions (mse for regression and sparse_categorical_crossentropy for classification.
Also note, that the two output layers have different activation functions, softmax for classification (for each class, you get the probability) and one sigmoid for regression.
At the end, just start the training with:
model.fit(dataset_train.batch(1), epochs=20)
For sure this is not the best way of doing it, but it proves, that is is possible.

Use only certain layers of pretrained torchvision network

I'm trying to use only certain layers in a pretrained torchvision Faster-RCNN network initialized by:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
This works. However, passing model.modules() or model.children() into an nn.Sequential yields an error. Even passing the whole model leads to errors, e.g.
model = torch.nn.Sequential(*model.modules())
model.eval()
# x is a [C, H, W] image
y = model(x)
leads to
AttributeError: 'dict' object has no attribute 'dim'
and
model = torch.nn.Sequential(*model.children())
model.eval()
# x is a [C, H, W] image
y = model(x)
leads to
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple
This confuses me because I have modified other PyTorch pretrained models like that in the past. How can I use the FasterRCNN pretrained model to create a new (pretrained) model that uses only certain layers, e.g. all layers but the last one?
Unlike other simple CNN models, it is not trivial to convert an R-CNN based detector to a simple nn.Sequential model. If you look at the functionality of R-CNN ('generalized_rcnn.py') you'll see that the output features (computed by the FCN backbone) are not just passed to the RPN component, but rather combined with the input image and even with the targets (during training).
Therefore, I suppose if you want to change the way faster R-CNN behaves, you'll have to use the base class torchvision.models.detection.FasterRCNN() and provide it with different roi pooling parameters.

cnn max pooling - non consecutive sliding window (skip gram like)?

When using keras to build a simple cnn like the code below and when it is used on text-based problems such as document classification, I understand that this is as if we are extracting 4-grams from the text (kernel_size of 4) and use them as features.
model = Sequential()
model.add(embedding_layer)
model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(Dense(4, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
and in this case, the kernel size in the conv1D layer is like a sliding window of size 4 that walks over sequences of tokens in the text to emit 4-grams.
I wonder if here is a way such that we can create 'non-consecutive sliding window in the convolution, i.e., that would generate 'skip-gram' equivalent. So for example, given the following 1d vector:
[a, b, c, d, e, f]
a conv1d with a kernel_size=3 skip=1 will scan the following sequences:
[(a,c,d),(b,d,e),(c,e,f),(d,f,padding),(e,padding,padding)] union [(a,b,d),(b,c,e),(c,d,f),(d,e,padding),(e,f,padding),(f,padding,padding)]
The reason I say 'union' is simply because I suppose from the implementation point of view, it may be easier to generate either part 1 or part 2, giving another parameter for the revised conv1d layer. and if thhat's the case and doable, I can work around this by concatenating multiple layers. But the minimum is really to have an extended conv1d layer that would take additional parameters such that it does either the first or the second part of scanning.
The idea is not new as this paper already experimented it: http://www.aclweb.org/anthology/D/D16/D16-1085.pdf
But excuse my lack of in-depth knowledge of keras I do not know how to implement it. Any suggestions please,
Many thanks in advance
You can do this creating a custom convolutional layer where certain elements in the weight matrix are zero.
You can take the regular Conv1D layer as the base class.
But before doing this, notice that you can create a "dilated" convolution by passing the dilation_rate=n parameter when creating a regular convolutional layer. This will skip n-1 grams between each taken gram in the window. Your window will have fixed regular spaces.
Creating a custom layer for that:
import keras.backend as K
#a 1D convolution that skips some entries
class SkipConv1D(Conv1D):
#in the init, let's just add a parameter to tell which grams to skip
def __init__(self, validGrams, **kwargs):
#for this example, I'm assuming validGrams is a list
#it should contain zeros and ones, where 0's go on the skip positions
#example: [1,1,0,1] will skip the third gram in the window of 4 grams
assert len(validGrams) == kwargs.get('kernel_size')
self.validGrams = K.reshape(K.constant(validGrams),(len(validGrams),1,1))
#the chosen shape matches the dimensions of the kernel
#the first dimension is the kernel size, the others are input and ouptut channels
#initialize the regular conv layer:
super(SkipConv1D,self).__init__(**kwargs)
#here, the filters, size, etc, go inside kwargs, so you should use them named
#but you may make them explicit in this __init__ definition
#if you think it's more comfortable to use it like this
#in the build method, let's replace the original kernel:
def build(self, input_shape):
#build as the original layer:
super(SkipConv1D,self).build(input_shape)
#replace the kernel
self.originalKernel = self.kernel
self.kernel = self.validGrams * self.originalKernel
Be aware of some things that weren't taken care of in this answer:
The method get_weights() will still return the original kernel, not the kernel with the skipped mask. (It's possible to fix this, but there will be an extra work, if necessary, please tell me)
There are unused weights in this layer. This is a simple implementation. The focus here was to keep it the most similar possible to an existing Conv layer, with all its features. It's also possible to use only strictly necessary weights, but this will increase the complexity a lot, and require lots of rewriting of the keras original code for recreating all the original possibilities.
If your kernel_size is too long, it will be very boring to define the validGrams var. You may want to create a version that takes some skipped indices and then converts it in the type of list used above.
Different channels skipping different grams:
It's possible to do this inside a layer as well, if instead of using a validGrams with shape (length,), you use one with shape (length,outputFilters).
In this case, at the point where we create the validGrams matrix, we should reshape it like:
validGrams = np.asarray(validGrams)
shp = (validGrams.shape[0],1,validGrams.shape[1])
validGrams = validGrams.reshape(shp)
self.validGrams = K.constant(validGrams)
You can also simply use many parallel SkipConv1D with different parameters and then concatenate their results.
inputs = Input(yourInputShape)
out = embedding_layer(inputs)
out1 = SkipConv1D(filters=50,kernel_size=4,validGrams=[1,0,1,1])(out)
out2 = SkipConv1D(filters=50,kernel_size=4,validGrams=[1,1,0,1])(out)
out = Concatenate()([out1,out2]) #if using 'channels_first' use Concatenate(axis=1)
out = MaxPooling1D(pool_size=4)(out)
out = Dense(4, activation='softmax')(out)
model = Model(inputs,out)

Resources