I'm trying to use only certain layers in a pretrained torchvision Faster-RCNN network initialized by:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
This works. However, passing model.modules() or model.children() into an nn.Sequential yields an error. Even passing the whole model leads to errors, e.g.
model = torch.nn.Sequential(*model.modules())
model.eval()
# x is a [C, H, W] image
y = model(x)
leads to
AttributeError: 'dict' object has no attribute 'dim'
and
model = torch.nn.Sequential(*model.children())
model.eval()
# x is a [C, H, W] image
y = model(x)
leads to
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple
This confuses me because I have modified other PyTorch pretrained models like that in the past. How can I use the FasterRCNN pretrained model to create a new (pretrained) model that uses only certain layers, e.g. all layers but the last one?
Unlike other simple CNN models, it is not trivial to convert an R-CNN based detector to a simple nn.Sequential model. If you look at the functionality of R-CNN ('generalized_rcnn.py') you'll see that the output features (computed by the FCN backbone) are not just passed to the RPN component, but rather combined with the input image and even with the targets (during training).
Therefore, I suppose if you want to change the way faster R-CNN behaves, you'll have to use the base class torchvision.models.detection.FasterRCNN() and provide it with different roi pooling parameters.
Related
I am trying to train an image classifier on an unbalanced training set. In order to cope with the class imbalance, I want either to weight the classes or the individual samples. Weighting the classes does not seem to work. And somehow for my setup I was not able to find a way to specify the samples weights. Below you can read how I load and encode the training data and the two approaches that I tried.
Training data loading and encoding
My training data is stored in a directory structure where each image is place in the subfolder corresponding to its class (I have 32 classes in total). Since the training data is too big too all load at once into memory I make use of image_dataset_from_directory and by that describe the data in a TF Dataset:
train_ds = keras.preprocessing.image_dataset_from_directory (training_data_dir,
batch_size=batch_size,
image_size=img_size,
label_mode='categorical')
I use label_mode 'categorical', so that the labels are described as a one-hot encoded vector.
I then prefetch the data:
train_ds = train_ds.prefetch(buffer_size=buffer_size)
Approach 1: specifying class weights
In this approach I try to specify the class weights of the classes via the class_weight argument of fit:
model.fit(
train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
class_weight=class_weights
)
For each class we compute weight which are inversely proportional to the number of training samples for that class. This is done as follows (this is done before the train_ds.prefetch() call described above):
class_num_training_samples = {}
for f in train_ds.file_paths:
class_name = f.split('/')[-2]
if class_name in class_num_training_samples:
class_num_training_samples[class_name] += 1
else:
class_num_training_samples[class_name] = 1
max_class_samples = max(class_num_training_samples.values())
class_weights = {}
for i in range(0, len(train_ds.class_names)):
class_weights[i] = max_class_samples/class_num_training_samples[train_ds.class_names[i]]
What I am not sure about is whether this solution works, because the keras documentation does not specify the keys for the class_weights dictionary in case the labels are one-hot encoded.
I tried training the network this way but found out that the weights did not have a real influence on the resulting network: when I looked at the distribution of predicted classes for each individual class then I could recognize the distribution of the overall training set, where for each class the prediction of the dominant classes is most likely.
Running the same training without any class weight specified led to similar results.
So I suspect that the weights don't seem to have an influence in my case.
Is this because specifying class weights does not work for one-hot encoded labels, or is this because I am probably doing something else wrong (in the code I did not show here)?
Approach 2: specifying sample weight
As an attempt to come up with a different (in my opinion less elegant) solution I wanted to specify the individual sample weights via the sample_weight argument of the fit method. However from the documentation I find:
[...] This argument is not supported when x is a dataset, generator, or keras.utils.Sequence instance, instead provide the sample_weights as the third element of x.
Which is indeed the case in my setup where train_ds is a dataset. Now I really having trouble finding documentation from which I can derive how I can modify train_ds, such that it has a third element with the weight. I thought using the map method of a dataset can be useful, but the solution I came up with is apparently not valid:
train_ds = train_ds.map(lambda img, label: (img, label, class_weights[np.argmax(label)]))
Does anyone have a solution that may work in combination with a dataset loaded by image_dataset_from_directory?
I am trying to merge output from two models and give them as input to the third model using keras sequential model.
Model1 :
inputs1 = Input(shape=(750,))
x = Dense(500, activation='relu')(inputs1)
x = Dense(100, activation='relu')(x)
Model1 :
inputs2 = Input(shape=(750,))
y = Dense(500, activation='relu')(inputs2)
y = Dense(100, activation='relu')(y)
Model3 :
merged = Concatenate([x, y])
final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(100, activation='relu'))
final_model.add(Dense(3, activation='softmax'))
Till here, my understanding is that, output from two models as x and y are merged and given as input to the third model. But when I fit this all like,
module3.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
module3.fit([in1, in2], np_res_array)
in1 and in2 are two numpy ndarray of dimention 10000*750 which contains my training data and np_res_array is the corresponding target. This gives me error as 'list' object has no attribute 'shape' As far as know, this is how we give multiple inputs to a model, but what is this error? How do I resolve it?
You can't do this using Sequential API. That's because of two reasons:
Sequential models, as their name suggests, are a sequence of layers where each layer is connected directly to its previous layer and therefore they cannot have branches (e.g. merge layers, multiple input/output layers, skip connections, etc.).
The add() method of Sequential API accepts a Layer instance as its argument and not a Tensor instance. In your example merged is a Tensor (i.e. concatenation layer's output).
Further, the correct way of using Concatenate layer is like this:
merged = Concatenate()([x, y])
However, you can also use concatenate (note the lowercase "c"), its equivalent functional interface, like this:
merged = concatenate([x, y])
Finally, to be able to construct that third model you also need to use the functional API.
I have a 1D-image with 1x2048 pixels as input and 32 classes for which I have defined a layer of 32 filters with the same size of the image(1x2048) which are L1-regularized.
My image examples are one-hot encodded. However, my goal is to get a multi-hot encoded output when I sum some of these images together and feed it to the trained model.
The training goes well and it can classify each class seperately, but if I sum two image and feed it to the model it only outputs a one-hot encoded vector( although I expect a two-hot encoded vector). If I look at the kernels after training, they make sense as most of the weights are zero except the ones which define my class.
I don't understand why I get a one-hot vector output rather than multi-hot vector.
The reason I don't already sum the images and use them for training the model is that the possible making the possible combination of the images exceed my memory power.
An image of the network I have in mind
input_shape=(1,2048,1)
model = Sequential()
model.add(Conv2D(32, kernel_size=(1, 2048), strides=(1, 1),
activation='sigmoid',
input_shape=input_shape,
kernel_regularizer=keras.regularizers.l1(0.01),
kernel_constraint=keras.constraints.non_neg() ))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=optimizer,metrics=['accuracy'])
You are using the wrong loss function
categorical_crossentropy will always return you exactly one 1-value in your vector, no matter the input. It tries to classify every instance into one (and only one) available class.
What you desire, though, is (potentially) mutliple ones in your output. Therefore, you should use binary_crossentropy instead. Also see this post.
On a side note, I would heavily advice you to really consider this twice, since - if you don't really have the case with multiple classes that often, it will maybe result in a lot of false positives. I.e., cases where you get more than one class predicted.
On another note, you might want to consider using Conv1D since your signal is 1-dimensional only.
#Azerila
The thing you are looking for is Mixup augmentation. It is implemented as follows:
def mixup(entry1,entry2):
image1,label1 = entry1
image2,label2 = entry2
alpha = [0.2]
dist = tfd.Beta(alpha, alpha)
l = dist.sample(1)[0][0]
img = l*image1+(1-l)*image2
lab = l*label1+(1-l)*label2
return img, lab
I'm trying to make a network that outputs a depth map, and semantic segmentation data separately.
In order to train the network, I'd like to use categorical cross entropy for the segmentation branch, and mean squared error for the branch that outputs the depth map.
I couldn't find any info on implementing the two loss functions for each branches in the Keras documentation for the Functional API.
Is it possible for me to use these loss functions simultaneously during training, or would it be better for me to train the different branches separately?
From the documentation of Model.compile:
loss: String (name of objective function) or objective function. See
losses. If the model has multiple outputs, you can use a different
loss on each output by passing a dictionary or a list of losses. The
loss value that will be minimized by the model will then be the sum of
all individual losses.
If your output is named, you can use a dictionary mapping the names to the corresponding losses:
x = Input((10,))
out1 = Dense(10, activation='softmax', name='segmentation')(x)
out2 = Dense(10, name='depth')(x)
model = Model(x, [out1, out2])
model.compile(loss={'segmentation': 'categorical_crossentropy', 'depth': 'mse'},
optimizer='adam')
Otherwise, use a list of losses (in the same order as the corresponding model outputs).
x = Input((10,))
out1 = Dense(10, activation='softmax')(x)
out2 = Dense(10)(x)
model = Model(x, [out1, out2])
model.compile(loss=['categorical_crossentropy', 'mse'], optimizer='adam')
Its commonplace for various neural network architectures in NLP and vision-language problems to tie the weights of an initial word embedding layer to that of an output softmax. Usually this produces a boost to sentence generation quality. (see example here)
In Keras its typical to embed word embedding layers using the Embedding class, however there seems to be no easy way to tie the weights of this layer to the output softmax. Would anyone happen to know how this could be implemented ?
Be aware that Press and Wolf dont't propose to freeze the weights to some pretrained ones, but tie them. That means, to ensure that input and output weights are always the same during training (in the sense of synchronized).
In a typical NLP model (e.g. language modelling/translation), you have an input dimension (vocabulary) of size V and a hidden representation size H. Then, you start with an Embedding layer, which is a matrix VxH. And the output layer is (probably) something like Dense(V, activation='softmax'), which is a matrix H2xV. When tying the weights, we want that those matrices are the same (therefore, H==H2).
For doing this in Keras, I think the way to go is via shared layers:
In your model, you need to instantiate a shared embedding layer (of dimension VxH), and apply it to either your input and output. But you need to transpose it, to have the desired output dimensions (HxV). So, we declare a TiedEmbeddingsTransposed layer, which transposes the embedding matrix from a given layer (and applies an activation function):
class TiedEmbeddingsTransposed(Layer):
"""Layer for tying embeddings in an output layer.
A regular embedding layer has the shape: V x H (V: size of the vocabulary. H: size of the projected space).
In this layer, we'll go: H x V.
With the same weights than the regular embedding.
In addition, it may have an activation.
# References
- [ Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
"""
def __init__(self, tied_to=None,
activation=None,
**kwargs):
super(TiedEmbeddingsTransposed, self).__init__(**kwargs)
self.tied_to = tied_to
self.activation = activations.get(activation)
def build(self, input_shape):
self.transposed_weights = K.transpose(self.tied_to.weights[0])
self.built = True
def compute_mask(self, inputs, mask=None):
return mask
def compute_output_shape(self, input_shape):
return input_shape[0], K.int_shape(self.tied_to.weights[0])[0]
def call(self, inputs, mask=None):
output = K.dot(inputs, self.transposed_weights)
if self.activation is not None:
output = self.activation(output)
return output
def get_config(self):
config = {'activation': activations.serialize(self.activation)
}
base_config = super(TiedEmbeddingsTransposed, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
The usage of this layer is:
# Declare the shared embedding layer
shared_embedding_layer = Embedding(V, H)
# Obtain word embeddings
word_embedding = shared_embedding_layer(input)
# Do stuff with your model
# Compute output (e.g. a vocabulary-size probability vector) with the shared layer:
output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=shared_embedding_layer, activation='softmax')(intermediate_rep)
I have tested this in NMT-Keras and it trains properly. But, as I try to load a trained model, it gets an error, related to the way Keras loads the models: it doesn't load the weights from the tied_to. I've found several questions regarding this (1, 2, 3), but I haven't managed to solve this issue. If someone have any ideas on the next steps to take, I'd be very glad to hear them :)
As you may read here you should simply set trainable flag to False. E.g.
aux_output = Embedding(..., trainable=False)(input)
....
output = Dense(nb_of_classes, .. ,activation='softmax', trainable=False)