load_weights failing : the order of weight values changed in keras - machine-learning

This is my network. I loaded weights and then fine tuned the network. the architecture remained same throughout. But when I loaded the weights after fine tuning( block5 and fc layers trainable), the order of weights in the weights values have changed so load weights is failing.
input_layer = Input(shape=(img_width,img_height,3),name = 'image_input')
model_vgg16_conv = VGG16(weights='imagenet',
include_top=False,input_shape=(200,200,3))
output_vgg16_conv = model_vgg16_conv(input_layer)
model_vgg16_conv.summary()
fl = Flatten(name='flatten')(output_vgg16_conv)
dense = Dense(512, activation='relu', name='fc1')(fl)
drop = Dropout(0.5, name='drop')(dense)
pred = Dense(nb_classes, activation='softmax', name='predictions')(drop)
fine_model = Model(outputs=pred,inputs=input_layer)
Before fine tuning:
<HDF5 group "/image_input" (0 members)> []
<HDF5 group "/vgg16" (13 members)> [<HDF5 dataset "kernel:0": shape (3, 3, 3, 64), type "<f4">, <HDF5 dataset "bias:0": shape (64,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 64, 64), type "<f4">, <HDF5 dataset "bias:0": shape (64,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 64, 128), type "<f4">, <HDF5 dataset "bias:0": shape (128,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 128, 128), type "<f4">, <HDF5 dataset "bias:0": shape (128,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 128, 256), type "<f4">, <HDF5 dataset "bias:0": shape (256,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 256, 256), type "<f4">, <HDF5 dataset "bias:0": shape (256,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 256, 256), type "<f4">, <HDF5 dataset "bias:0": shape (256,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 256, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">]
<HDF5 group "/flatten" (0 members)> []
<HDF5 group "/fc1" (1 members)> [<HDF5 dataset "kernel:0": shape (18432, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">]
<HDF5 group "/drop" (0 members)> []
<HDF5 group "/predictions" (1 members)> [<HDF5 dataset "kernel:0": shape (512, 40), type "<f4">, <HDF5 dataset "bias:0": shape (40,), type "<f4">]
After fine tuning the weights won't load and hence the error:
<HDF5 group "/image_input" (0 members)> []
<HDF5 group "/vgg16" (13 members)> [<HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 3, 64), type "<f4">, <HDF5 dataset "bias:0": shape (64,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 64, 64), type "<f4">, <HDF5 dataset "bias:0": shape (64,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 64, 128), type "<f4">, <HDF5 dataset "bias:0": shape (128,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 128, 128), type "<f4">, <HDF5 dataset "bias:0": shape (128,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 128, 256), type "<f4">, <HDF5 dataset "bias:0": shape (256,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 256, 256), type "<f4">, <HDF5 dataset "bias:0": shape (256,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 256, 256), type "<f4">, <HDF5 dataset "bias:0": shape (256,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 256, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">, <HDF5 dataset "kernel:0": shape (3, 3, 512, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">]
<HDF5 group "/flatten" (0 members)> []
<HDF5 group "/fc1" (1 members)> [<HDF5 dataset "kernel:0": shape (18432, 512), type "<f4">, <HDF5 dataset "bias:0": shape (512,), type "<f4">]
<HDF5 group "/drop" (0 members)> []
<HDF5 group "/predictions" (1 members)> [<HDF5 dataset "kernel:0": shape (512, 40), type "<f4">, <HDF5 dataset "bias:0": shape (40,), type "<f4">]
Traceback (most recent call last):
File "construct_index.py", line 87, in <module>
fine_model.load_weights(filepath)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2538, in load_weights
load_weights_from_hdf5_group(f, self.layers)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2970, in load_weights_from_hdf5_group
K.batch_set_value(weight_value_tuples)
File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2153, in batch_set_value
get_session().run(assign_ops, feed_dict=feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 961, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (3, 3, 512, 512) for Tensor u'Placeholder:0', which has shape '(3, 3, 3, 64)'
For some reason the order has changed!
Please help, It took so many days to train this network and I can't afford to lose these weights. Thanks

Related

Image segmentation and area measurement

I have done image segmentation of the image using PyTorch. I am trying to get the pixel count of Boat class to measure the area. As an example in the image I want to get the pixel count to measure the boat. How do I do that? from the pixel count is it possible to measure the are of the boat?
I am confused and trying to find a way. I would appreciate if anybody can guide me for that.
**The coding is as below:
**
from torchvision import models
fcn = models.segmentation.fcn_resnet101(pretrained=True).eval()
from PIL import Image
import matplotlib.pyplot as plt
import torch
img = Image.open('boat.jpg')
plt.imshow(img)
plt.show()
# Apply the transformations needed
#Resize the image to (256 x 256)
#CenterCrop it to (224 x 224)
import torchvision.transforms as T
trf = T.Compose([T.Resize(256),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize(mean = [0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])])
inp = trf(img).unsqueeze(0)
out = fcn(inp)['out']
print (out.shape)
#now this 21 channeled output into a 2D image or a 1 channeled image, where each pixel of that image corresponds to a class.
import numpy as np
om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy()
print (om.shape)
print (np.unique(om))
# Define the helper function
def decode_segmap(image, nc=21):
label_colors = np.array([(0, 0, 0), # 0=background
# 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
(128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128),
# 6=bus, 7=car, 8=cat, 9=chair, 10=cow
(0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0),
# 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person
(192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (192, 128, 128),
# 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
(0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])
r = np.zeros_like(image).astype(np.uint8)
g = np.zeros_like(image).astype(np.uint8)
b = np.zeros_like(image).astype(np.uint8)
for l in range(0, nc):
idx = image == l
r[idx] = label_colors[l, 0]
g[idx] = label_colors[l, 1]
b[idx] = label_colors[l, 2]
rgb = np.stack([r, g, b], axis=2)
return rgb
rgb = decode_segmap(om)
plt.imshow(rgb); plt.show()
I want to find some guidance
You are looking for skimage.measure.regionprops. Once you have the predicted label map (om in your code) you can apply regionprops to it and get the area of each region.
According to your code snippet, the output om is a tensor of category indices (0 - background, 1 - aeroplane, 2 - bicycle,....).
In order to get the area of a specific category, you just need to compare the output map with the corresponding index, then sum up the results.
For example, with the category boat with the index 4:
BOAT_INDEX = 4
area = torch.sum(om == BOAT_INDEX).item()

What should be the input shape for 2D-CNN in this scenario?

The shape of my dataset is :
shape of original dataset : (343889, 80)
shape of - training dataset (257916, 79)
shape of - training Labels (257916,)
shape of - testing dataset (85973, 79)
shape of - testing Labels (85973,)
Was using the input shape for 1D-CNN as (79,1, ) but it isn't working with 2D-CNN

How to fix a noisy jagged Segmentation using U-NET Architecture

I'm trying to do a segmentation of Satellite images for identifying built and non-built areas using U-NET Architecture. The Model Architecture is as follows:
def get_unet(input_img, n_filters = 16, dropout = 0.1, batchnorm = True):
Function to define the UNET Model
# Contracting Path
c1 = conv2d_block(input_img, n_filters * 1, kernel_size = 3, batchnorm = batchnorm)
p1 = MaxPooling2D((2, 2))(c1)
p1 = Dropout(dropout)(p1)
c2 = conv2d_block(p1, n_filters * 2, kernel_size = 3, batchnorm = batchnorm)
p2 = MaxPooling2D((2, 2))(c2)
p2 = Dropout(dropout)(p2)
c3 = conv2d_block(p2, n_filters * 4, kernel_size = 3, batchnorm = batchnorm)
p3 = MaxPooling2D((2, 2))(c3)
p3 = Dropout(dropout)(p3)
c4 = conv2d_block(p3, n_filters * 8, kernel_size = 3, batchnorm = batchnorm)
p4 = MaxPooling2D((2, 2))(c4)
p4 = Dropout(dropout)(p4)
c5 = conv2d_block(p4, n_filters = n_filters * 16, kernel_size = 3, batchnorm = batchnorm)
# Expansive Path
u6 = Conv2DTranspose(n_filters * 8, (3, 3), strides = (2, 2), padding = 'same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block(u6, n_filters * 8, kernel_size = 3, batchnorm = batchnorm)
u7 = Conv2DTranspose(n_filters * 4, (3, 3), strides = (2, 2), padding = 'same')(c6)
u7 = concatenate([u7, c3])
u7 = Dropout(dropout)(u7)
c7 = conv2d_block(u7, n_filters * 4, kernel_size = 3, batchnorm = batchnorm)
u8 = Conv2DTranspose(n_filters * 2, (3, 3), strides = (2, 2), padding = 'same')(c7)
u8 = concatenate([u8, c2])
u8 = Dropout(dropout)(u8)
c8 = conv2d_block(u8, n_filters * 2, kernel_size = 3, batchnorm = batchnorm)
u9 = Conv2DTranspose(n_filters * 1, (3, 3), strides = (2, 2), padding = 'same')(c8)
u9 = concatenate([u9, c1])
u9 = Dropout(dropout)(u9)
c9 = conv2d_block(u9, n_filters * 1, kernel_size = 3, batchnorm = batchnorm)
outputs = Conv2D(1, (1, 1), activation='sigmoid')(c9)
model = Model(inputs=[input_img], outputs=[outputs])
return model
However, my segmentation masks are noisy and pixelated, like this:
U-NET mask for an airport
Also, When I tested my model with a sample image like this with solid colours it returned results like this.
Tested with a solid colour using Photoshop
the U-Net masks in a jagged way thats not smooth. Is there any fix for this?
Thanks in Advance

How labelling works in image segmentation [SegNet]

I am trying to understand image segmentation using SegNet implementation in keras. I have read the original paper using the Conv and Deconv architechture and also using the Dilated conv layers. However, I have trouble understanding how the labelling of the pixel works.
I am considering the following implementation:
https://github.com/nicolov/segmentation_keras
Here the pascal dataset attributes are used:
21 Classes:
# 0=background
# 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
# 6=bus, 7=car, 8=cat, 9=chair, 10=cow
# 11=diningtable, 12=dog, 13=horse, 14=motorbike, 15=person
# 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
The classes are represented by:
pascal_nclasses = 21
pascal_palette = np.array([(0, 0, 0)
, (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128)
, (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0)
, (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (192, 128, 128)
, (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)], dtype=np.uint8)
I was trying to open the labelled images for cat and boat, as cat is in only in R space and boat only in blue. I used following to show the labelled images:
For boat:
label = cv2.imread("2008_000120.png")
label = np.multiply(label, 100)
cv2.imshow("kk", label[:,:,2])
cv2.waitKey(0)
For cat:
label = cv2.imread("2008_000056.png")
label = np.multiply(label, 100)
cv2.imshow("kk", label[:,:,0])
cv2.waitKey(0)
However, it doesnt matter which space I choose both images always gives same results. i.e. the following code also gives same results
For boat:
label = cv2.imread("2008_000120.png")
label = np.multiply(label, 100)
cv2.imshow("kk", label[:,:,1]) # changed to Green space
cv2.waitKey(0)
For cat:
label = cv2.imread("2008_000056.png")
label = np.multiply(label, 100)
cv2.imshow("kk", label[:,:,1]) # changed to Green space
cv2.waitKey(0)
My assumption was that I will see the cat only in Red color space and boat only in blue. However, the output in all cases:
I am confused now how these pixels are labelled and how are they read and uniquely used to pair with categories in the process of creating the logits.
It will be great if someone can explain or put some relevant links to understand this process. I tried to search but most of the tutorials only discuss the CNN architecture, not the labelling process or how these labels are used within the CNN.
I have attached the labelled images of cat and boat for reference.
The labels are just binary image masks so single channel images. The pixel value at each location of your label image changes depending on the class present at each pixel. So it will be value 0 when there is no object at a pixel and a value 1-20 depending on the class otherwise.
Semantic segmentation is a classification task so you are trying to classify each pixel with a class ( in this case class labels 0-20).
Your model will produce an output image and you want to perform softmax cross entropy between each output image pixel and each label image pixel.
In the multiclass case where you have K classes (like here K=21) each pixel will have K channels and you perform softmax cross entropy across the channels at each pixel. Why a channel for each class? Think about in classification we produce a vector of length K for K classes and this is compared to a one hot vector of length K.

Can you only have stride size 1 with resize convolutions?

I read this article about using "resize convolutions" rather than the "deconvolution" (i.e. transposed convolution) method for generating images with neural networks. It's clear how this works with a stride size of 1, but how would you implement it for a stride size >1?
Here is how I've implemented this in TensorFlow. Note: This is the second "deconvolution" layer in the decoder part of an autoencoder network.
h_d_upsample2 = tf.image.resize_images(images=h_d_conv3,
size=(int(self.c2_size), int(self.c2_size)),
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
h_d_conv2 = tf.layers.conv2d(inputs=h_d_upsample2,
filters=FLAGS.C2,
kernel_size=(FLAGS.c2_kernel, FLAGS.c2_kernel),
padding='same',
activation=tf.nn.relu)
Resizing images really not a viable option for intermediate layers of network. you may try conv2d_transpose
how would you implement it for a stride size >1?
# best practice is to use the transposed_conv2d function, this function works with stride >1 .
# output_shape_width_height = stride * input_shape_width_height
# input_shape = [32, 32, 48], output_shape = [64, 64, 128]
stride = 2
filter_size_w =filter_size_h= 2
shape = [filter_size_w, filter_size_h, output_shape[-1], input_shape[-1]]
w = tf.get_variable(
name='W',
shape=shape,
initializer=tf.contrib.layers.variance_scalling_initializer(),
trainable=trainable)
output = tf.nn.conv2d_transpose(
input, w, output_shape=output_shape, strides=[1, stride, stride, 1])

Resources