How to use masking layer to mask input/output in LSTM autoencoders? - machine-learning

I am trying to use LSTM autoencoder to do sequence-to-sequence learning with variable lengths of sequences as inputs, using following code:
inputs = Input(shape=(None, input_dim))
masked_input = Masking(mask_value=0.0, input_shape=(None,input_dim))(inputs)
encoded = LSTM(latent_dim)(masked_input)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
where inputs are raw sequence data padded with 0s to the same length (timesteps). Using the code above, the output is also of length timesteps, but when we calculate loss function we only want first Ni elements of the output (where Ni is length of input sequence i, which may be different for different sequences). Does anyone know if there is some good way to do that?
Thanks!

Option 1: you can always train without padding if you accept to train separate batches.
See this answer to a simple way of separating batches of equal length: Keras misinterprets training data shape
In this case, all you have to do is to perform the "repeat" operation in another manner, since you don't have the exact length at training time.
So, instead of RepeatVector, you can use this:
import keras.backend as K
def repeatFunction(x):
#x[0] is (batch,latent_dim)
#x[1] is inputs: (batch,length,features)
latent = K.expand_dims(x[0],axis=1) #shape(batch,1,latent_dim)
inpShapeMaker = K.ones_like(x[1][:,:,:1]) #shape (batch,length,1)
return latent * inpShapeMaker
#instead of RepeatVector:
Lambda(repeatFunction,output_shape=(None,latent_dim))([encoded,inputs])
Option2 (doesn't smell good): use another masking after RepeatVector.
I tried this, and it works, but we don't get 0's at the end, we get the last value repeated until the end. So, you will have to make a weird padding in your target data, repeating the last step until the end.
Example: target [[[1,2],[5,7]]] will have to be [[[1,2],[5,7],[5,7],[5,7]...]]
This may unbalance your data a lot, I think....
def makePadding(x):
#x[0] is encoded already repeated
#x[1] is inputs
#padding = 1 for actual data in inputs, 0 for 0
padding = K.cast( K.not_equal(x[1][:,:,:1],0), dtype=K.floatx())
#assuming you don't have 0 for non-padded data
#padding repeated for latent_dim
padding = K.repeat_elements(padding,rep=latent_dim,axis=-1)
return x[0]*padding
inputs = Input(shape=(timesteps, input_dim))
masked_input = Masking(mask_value=0.0)(inputs)
encoded = LSTM(latent_dim)(masked_input)
decoded = RepeatVector(timesteps)(encoded)
decoded = Lambda(makePadding,output_shape=(timesteps,latent_dim))([decoded,inputs])
decoded = Masking(mask_value=0.0)(decoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
Option 3 (best): crop the outputs directly from the inputs, this also eliminates the gradients
def cropOutputs(x):
#x[0] is decoded at the end
#x[1] is inputs
#both have the same shape
#padding = 1 for actual data in inputs, 0 for 0
padding = K.cast( K.not_equal(x[1],0), dtype=K.floatx())
#if you have zeros for non-padded data, they will lose their backpropagation
return x[0]*padding
....
....
decoded = LSTM(input_dim, return_sequences=True)(decoded)
decoded = Lambda(cropOutputs,output_shape=(timesteps,input_dim))([decoded,inputs])

For this LSTM Autoencoder architecture, which I assume you understand, the Mask is lost at the RepeatVector due to the LSTM encoder layer having return_sequences=False.
So another option, instead of cropping like above, could also be to create custom bottleneck layer that propagates the mask.

Related

Remove digit from MNIST, PyTorch

I'm experimenting with rotating the MNIST digits. Because a 9 is more or less a rotated 6, I'd like to remove all occurrences from the dataset.
As per this answer, I tried
dataset = datasets.MNIST(root='./data')
idx = dataset.train_labels!=9
dataset.train_labels = dataset.train_labels[idx]
dataset.train_data = dataset.train_data[idx]
which fails because the properties of the MNIST class are only readable.
I'd really like to not have to manually iterate through the entire dataset, and create a list of tuples that I feed to my dataloaders. Am I out of luck?
You might proceed as follows, namely by replacing train_labels with targets and train_data with data:
from torchvision import datasets
dataset = datasets.MNIST(root='data')
idx = dataset.targets!=9
dataset.targets = dataset.targets[idx]
dataset.data = dataset.data[idx]
Indeed, as you can see at https://pytorch.org/vision/stable/_modules/torchvision/datasets/mnist.html#MNIST, train_labels and train_data have eventually been marked as properties and as such they can't be set to some values, while targets and data have been probably added as public attributes in the meanwhile.
#property
def train_labels(self):
warnings.warn("train_labels has been renamed targets")
return self.targets
#property
def train_data(self):
warnings.warn("train_data has been renamed data")
return self.data

How to use tf.image.resize_with_pad but pad with ones instead of zeros?

According to Tensorflow documentation, the padding is always with zeros instead of ones.
Is there there a way to change the padding to ones?
If not, what is the best alternative for a tensorflow dataset?
Here is my code example:
def resize_with_pad(image, label):
image = tf.image.resize_with_pad(image=image,
target_height=resized_wh,
target_width=resized_wh,
method=ResizeMethod.BILINEAR,
antialias=False)
return image, label
def create_tf_dataset_pipeline(tf_dataset):
tf_dataset = tf_dataset.map(load_image, num_parallel_calls=AUTOTUNE)
tf_dataset = tf_dataset.map(normalize, num_parallel_calls=AUTOTUNE)
tf_dataset = tf_dataset.map(resize_with_pad, num_parallel_calls=AUTOTUNE)
tf_dataset = tf_dataset.batch(batch_size)
tf_dataset = tf_dataset.prefetch(AUTOTUNE)
return tf_dataset
train_data = tf.data.Dataset.from_tensor_slices((x_train_filepaths, y_train_class))
train_data = create_tf_dataset_pipeline(train_data)
I tried resizing and padding the images and saving it in a directory (i.e. frontloading the processing), but that is very inflexible as I need to create a new dataset every time I want to train a model on a different size. It would be much better if I could do it dynamically with tensor flow.

How to save UNET predicted mask as an image to the disk in Pytorch?

AS a part of my Master's thesis, I have trained a UNET using Pytorch for detecting some objects in X-ray images. And to generate the predications, I have implemented the following function:
def make_predictions(model, imagePath):
# set model to evaluation mode
model.eval()
# turn off gradient tracking
with torch.no_grad():
# load the image from disk, expand its dimensions, cast it
# to float data type, and scale its pixel values
image = cv2.imread(imagePath, 0)
image = np.expand_dims(image, 0)
image = np.expand_dims(image, 0)
image = image.astype("float32") / 255.0
# find the filename and generate the path to ground truth mask
filename = imagePath.split(os.path.sep)[-1]
groundTruthPath = os.path.join(Config.Mask_dataset_dir, filename)
# load the ground-truth segmentation mask in grayscale mode and resize it
gtMask = cv2.imread(groundTruthPath, 0)
gtMask = cv2.resize(gtMask, (Config.Input_Height, Config.Input_Height))
# create a PyTorch tensor, and flash it to the current device
image = torch.from_numpy(image).to(Config.DEVICE)
# make the prediction, pass the results through the sigmoid
# function, and convert the result to a NumPy array
predMask = model(image)
predMask = torch.sigmoid(predMask)
predMask = predMask.cpu().numpy()
# filter out the weak predictions and convert them to integers
predMask = (predMask > Config.Thresh) * 255
predMask = predMask.astype(np.uint8)
filename = imagePath.split(os.path.sep)[-1]
cv2.imwrite(Config.Base_Out+'\\'+filename, predMask)
return(gtMask, predMask)
This function runs well for making the predictions and even plotting them. but the function cv2.imwrite() doesn't save the predictions as images in the passed directory, noting that filename already has the .PNG extension at the end. What could be the problem here?

Denoising Autoencoder for Images with large shape

I want to create a denoising autoencoder for images of any shape. Most of the solutions out there have image shape not greater than (500,500) while the images I have are document scans of shape (3000,2000). I tried to reshape the images and build the model, but the predictions are incorrect. Could someone help me?
I have tried to build model with the code here https://github.com/mrdragonbear/Autoencoders/blob/master/Autoencoder-Tutorial.ipynb, playing around the image shape but the predictions fails.
I have a document denoiser already.
There is no need to have a model for a large shape, you can simply split them, feed them to the model, and then merge the predicted chunks together again.
My model accepts images of shape 512x512, so I have to split the images by 512x512 chunks.
The images must be larger than or equal to 512x512.
If the image is smaller then all you need is to resize it or fit it in a 512x512 shape.
def split_page(page):
chunk_size = (512, 512)
main_size = page.shape[:2]
chunks=[]
chunk_grid = tuple(np.array(main_size)//np.array(chunk_size))
extra_chunk = tuple(np.array(main_size)%np.array(chunk_size))
for yc in range(chunk_grid[0]):
row = []
for xc in range(chunk_grid[1]):
chunk = page[yc*chunk_size[0]:yc*chunk_size[0]+chunk_size[0], xc*chunk_size[1]: xc*chunk_size[1]+chunk_size[1]]
row.append(chunk)
if extra_chunk[1]:
chunk = page[yc*chunk_size[0]:yc*chunk_size[0]+chunk_size[0], page.shape[1]-chunk_size[1]:page.shape[1]]
row.append(chunk)
chunks.append(row)
if extra_chunk[0]:
row = []
for xc in range(chunk_grid[1]):
chunk = page[page.shape[0]-chunk_size[0]:page.shape[0], xc*chunk_size[1]: xc*chunk_size[1]+chunk_size[1]]
row.append(chunk)
if extra_chunk[1]:
chunk = page[page.shape[0]-chunk_size[0]:page.shape[0], page.shape[1]-chunk_size[1]:page.shape[1]]
row.append(chunk)
chunks.append(row)
return chunks, page.shape[:2]
def merge_chunks(chunks, osize):
extra = np.array(osize)%512
page = np.ones(osize)
for i, row in enumerate(chunks[:-1]):
for j, chunk in enumerate(row[:-1]):
page[i*512:i*512+512,j*512:j*512+512]=chunk
page[i*512:i*512+512,osize[1]-512:osize[1]]=chunks[i,-1]
if extra[0]:
for j, chunk in enumerate(chunks[-1][:-1]):
page[osize[0]-512:osize[0],j*512:j*512+512]=chunk
page[osize[0]-512:osize[0],osize[1]-512:osize[1]]=chunks[-1,-1]
else:
for j, chunk in enumerate(chunks[-1][:-1]):
page[osize[0]-512:osize[0],j*512:j*512+512]=chunk
page[osize[0]-512:osize[0],osize[1]-512:osize[1]]=chunks[-1,-1]
return page
def denoise(chunk):
chunk = chunk.reshape(1,512,512,1)/255.
denoised = model.predict(chunk).reshape(512,512)*255.
return denoised
def denoise_page(page):
chunks, osize= split_page(page)
chunks = np.array(chunks)
denoised_chunks = np.ones(chunks.shape)
for i, row in enumerate(chunks):
for j, chunk in enumerate(row):
denoised = denoise(chunk)
denoised_chunks[i][j]=denoised
denoised_page = merge_chunks(denoised_chunks, osize)
return denoised_page

How to split a model trained in keras?

I trained a model with 4 hidden layers and 2 dense layers and I have saved that model.
Now I want to load that model and want to split into two models, one with one hidden layers and another one with only dense layers.
I have splitted the model with hidden layer in the following way
model = load_model ("model.hdf5")
HL_model = Model(inputs=model.input, outputs=model.layers[7].output)
Here the model is loaded model, in the that 7th layer is my last hidden layer. I tried to split the dense in the like
DL_model = Model(inputs=model.layers[8].input, outputs=model.layers[-1].output)
and I am getting error
TypeError: Input layers to a `Model` must be `InputLayer` objects.
After splitting, the output of the HL_model will the input for the DL_model.
Can anyone help me to create a model with dense layer?
PS :
I have tried below code too
from keras.layers import Input
inputs = Input(shape=(9, 9, 32), tensor=model_1.layers[8].input)
model_3 = Model(inputs=inputs, outputs=model_1.layers[-1].output)
And getting error as
RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("conv2d_1_input:0", shape=(?, 144, 144, 3), dtype=float32) at layer "conv2d_1_input". The following previous layers were accessed without issue: []
here (144, 144, 3) in the input image size of the model.
You need to specify a new Input layer first, then stack the remaining layers over it:
DL_input = Input(model.layers[8].input_shape[1:])
DL_model = DL_input
for layer in model.layers[8:]:
DL_model = layer(DL_model)
DL_model = Model(inputs=DL_input, outputs=DL_model)
A little more generic. You can use the following function to split a model
from keras.layers import Input
from keras.models import Model
def get_bottom_top_model(model, layer_name):
layer = model.get_layer(layer_name)
bottom_input = Input(model.input_shape[1:])
bottom_output = bottom_input
top_input = Input(layer.output_shape[1:])
top_output = top_input
bottom = True
for layer in model.layers:
if bottom:
bottom_output = layer(bottom_output)
else:
top_output = layer(top_output)
if layer.name == layer_name:
bottom = False
bottom_model = Model(bottom_input, bottom_output)
top_model = Model(top_input, top_output)
return bottom_model, top_model
bottom_model, top_model = get_bottom_top_model(model, "dense_1")
Layer_name is just the name of the layer that you want to split at.

Resources