How to use tf.image.resize_with_pad but pad with ones instead of zeros? - image-processing

According to Tensorflow documentation, the padding is always with zeros instead of ones.
Is there there a way to change the padding to ones?
If not, what is the best alternative for a tensorflow dataset?
Here is my code example:
def resize_with_pad(image, label):
image = tf.image.resize_with_pad(image=image,
target_height=resized_wh,
target_width=resized_wh,
method=ResizeMethod.BILINEAR,
antialias=False)
return image, label
def create_tf_dataset_pipeline(tf_dataset):
tf_dataset = tf_dataset.map(load_image, num_parallel_calls=AUTOTUNE)
tf_dataset = tf_dataset.map(normalize, num_parallel_calls=AUTOTUNE)
tf_dataset = tf_dataset.map(resize_with_pad, num_parallel_calls=AUTOTUNE)
tf_dataset = tf_dataset.batch(batch_size)
tf_dataset = tf_dataset.prefetch(AUTOTUNE)
return tf_dataset
train_data = tf.data.Dataset.from_tensor_slices((x_train_filepaths, y_train_class))
train_data = create_tf_dataset_pipeline(train_data)
I tried resizing and padding the images and saving it in a directory (i.e. frontloading the processing), but that is very inflexible as I need to create a new dataset every time I want to train a model on a different size. It would be much better if I could do it dynamically with tensor flow.

Related

Remove digit from MNIST, PyTorch

I'm experimenting with rotating the MNIST digits. Because a 9 is more or less a rotated 6, I'd like to remove all occurrences from the dataset.
As per this answer, I tried
dataset = datasets.MNIST(root='./data')
idx = dataset.train_labels!=9
dataset.train_labels = dataset.train_labels[idx]
dataset.train_data = dataset.train_data[idx]
which fails because the properties of the MNIST class are only readable.
I'd really like to not have to manually iterate through the entire dataset, and create a list of tuples that I feed to my dataloaders. Am I out of luck?
You might proceed as follows, namely by replacing train_labels with targets and train_data with data:
from torchvision import datasets
dataset = datasets.MNIST(root='data')
idx = dataset.targets!=9
dataset.targets = dataset.targets[idx]
dataset.data = dataset.data[idx]
Indeed, as you can see at https://pytorch.org/vision/stable/_modules/torchvision/datasets/mnist.html#MNIST, train_labels and train_data have eventually been marked as properties and as such they can't be set to some values, while targets and data have been probably added as public attributes in the meanwhile.
#property
def train_labels(self):
warnings.warn("train_labels has been renamed targets")
return self.targets
#property
def train_data(self):
warnings.warn("train_data has been renamed data")
return self.data

Skimage's cut_normalized return a single label

I'm trying to learn how to segment an image using Normalization Cut. My problem is that I would use superpixel and then NCUT, but the cut_normalized method gives me a single value, and so if I plot it, I have a single color, as follows:
This is my code:
def normcut_segmentations(img):
labels, superpixels = get_super_pixels(img)
g = graph.rag_mean_color(img, labels, mode='similarity')
ncuts_labels = graph.cut_normalized(labels, g)
print("Segmentation label: ", np.unique(labels))
print("NCUTs Label:",np.unique(ncuts_labels))
ncuts_result = color.label2rgb(ncuts_labels, img, kind='avg')
return ncuts_labels,ncuts_result
To read the image (that is a bitmap), I use skimage.io.imread(img_filename).
What would be the problem?
Thanks!

How to detect contiguos images

I am trying to detect when two images correspond to a chunk that matches the other image but there is no overlap.
That is, suppose we have the Lenna image:
Someone unknown to me has split it vertically in two and I must know if both pieces are connected or not (assume that they are independent images or that one is a piece of the other).
A:
B:
The positive part is that I know the order of the pieces, the negative part is that there may be other images and I must know which of them fit or not to join them.
My first idea has been to check if the MAE between the last row of A and the first row B is low.
def mae(a, b):
min_mae = 256
for i in range(-5, 5, 1):
a_s = np.roll(a, i, axis=1)
value_mae = np.mean(abs(a_s - b))
min_mae = min(min_mae, value_mae)
return min_mae
if mae(im_a[im_a.shape[0] - 1:im_a.shape[0], ...], im_b[0:1, ...]) < threshold:
# join images a and b
The problem is that it is a not very robust metric.
I have done the same using the horizontal derivative, as well as applying various smoothing filters, but I find myself in the same situation.
Is there a way to solve this problem?
Your method seems like a decent one. Even on visual inspection it looks reasonable:
Top (Bottom row expanded)
Bottom (Top row expanded)
Diff of the images:
It might even be more clear if you also check neighboring columns, but this already looks like the images are similar enough.
Code
import cv2
import numpy as np
# load images
top = cv2.imread("top.png");
bottom = cv2.imread("bottom.png");
# gray
tgray = cv2.cvtColor(top, cv2.COLOR_BGR2GRAY);
bgray = cv2.cvtColor(bottom, cv2.COLOR_BGR2GRAY);
# expand rows
texp = tgray;
bexp = bgray;
trow = np.zeros_like(texp);
brow = np.zeros_like(bexp);
trow[:] = texp[-1, :];
brow[:] = bexp[0, :];
trow = trow[:100, :];
brow = brow[:100, :];
# check absolute difference
ldiff = trow - brow;
rdiff = brow - trow;
diff = np.minimum(ldiff, rdiff);
# show
cv2.imshow("top", trow);
cv2.imshow("bottom", brow);
cv2.imshow("diff", diff);
cv2.waitKey(0);
# save
cv2.imwrite("top_out.png", trow);
cv2.imwrite("bottom_out.png", brow);
cv2.imwrite("diff_out.png", diff);

Denoising Autoencoder for Images with large shape

I want to create a denoising autoencoder for images of any shape. Most of the solutions out there have image shape not greater than (500,500) while the images I have are document scans of shape (3000,2000). I tried to reshape the images and build the model, but the predictions are incorrect. Could someone help me?
I have tried to build model with the code here https://github.com/mrdragonbear/Autoencoders/blob/master/Autoencoder-Tutorial.ipynb, playing around the image shape but the predictions fails.
I have a document denoiser already.
There is no need to have a model for a large shape, you can simply split them, feed them to the model, and then merge the predicted chunks together again.
My model accepts images of shape 512x512, so I have to split the images by 512x512 chunks.
The images must be larger than or equal to 512x512.
If the image is smaller then all you need is to resize it or fit it in a 512x512 shape.
def split_page(page):
chunk_size = (512, 512)
main_size = page.shape[:2]
chunks=[]
chunk_grid = tuple(np.array(main_size)//np.array(chunk_size))
extra_chunk = tuple(np.array(main_size)%np.array(chunk_size))
for yc in range(chunk_grid[0]):
row = []
for xc in range(chunk_grid[1]):
chunk = page[yc*chunk_size[0]:yc*chunk_size[0]+chunk_size[0], xc*chunk_size[1]: xc*chunk_size[1]+chunk_size[1]]
row.append(chunk)
if extra_chunk[1]:
chunk = page[yc*chunk_size[0]:yc*chunk_size[0]+chunk_size[0], page.shape[1]-chunk_size[1]:page.shape[1]]
row.append(chunk)
chunks.append(row)
if extra_chunk[0]:
row = []
for xc in range(chunk_grid[1]):
chunk = page[page.shape[0]-chunk_size[0]:page.shape[0], xc*chunk_size[1]: xc*chunk_size[1]+chunk_size[1]]
row.append(chunk)
if extra_chunk[1]:
chunk = page[page.shape[0]-chunk_size[0]:page.shape[0], page.shape[1]-chunk_size[1]:page.shape[1]]
row.append(chunk)
chunks.append(row)
return chunks, page.shape[:2]
def merge_chunks(chunks, osize):
extra = np.array(osize)%512
page = np.ones(osize)
for i, row in enumerate(chunks[:-1]):
for j, chunk in enumerate(row[:-1]):
page[i*512:i*512+512,j*512:j*512+512]=chunk
page[i*512:i*512+512,osize[1]-512:osize[1]]=chunks[i,-1]
if extra[0]:
for j, chunk in enumerate(chunks[-1][:-1]):
page[osize[0]-512:osize[0],j*512:j*512+512]=chunk
page[osize[0]-512:osize[0],osize[1]-512:osize[1]]=chunks[-1,-1]
else:
for j, chunk in enumerate(chunks[-1][:-1]):
page[osize[0]-512:osize[0],j*512:j*512+512]=chunk
page[osize[0]-512:osize[0],osize[1]-512:osize[1]]=chunks[-1,-1]
return page
def denoise(chunk):
chunk = chunk.reshape(1,512,512,1)/255.
denoised = model.predict(chunk).reshape(512,512)*255.
return denoised
def denoise_page(page):
chunks, osize= split_page(page)
chunks = np.array(chunks)
denoised_chunks = np.ones(chunks.shape)
for i, row in enumerate(chunks):
for j, chunk in enumerate(row):
denoised = denoise(chunk)
denoised_chunks[i][j]=denoised
denoised_page = merge_chunks(denoised_chunks, osize)
return denoised_page

How to use masking layer to mask input/output in LSTM autoencoders?

I am trying to use LSTM autoencoder to do sequence-to-sequence learning with variable lengths of sequences as inputs, using following code:
inputs = Input(shape=(None, input_dim))
masked_input = Masking(mask_value=0.0, input_shape=(None,input_dim))(inputs)
encoded = LSTM(latent_dim)(masked_input)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
where inputs are raw sequence data padded with 0s to the same length (timesteps). Using the code above, the output is also of length timesteps, but when we calculate loss function we only want first Ni elements of the output (where Ni is length of input sequence i, which may be different for different sequences). Does anyone know if there is some good way to do that?
Thanks!
Option 1: you can always train without padding if you accept to train separate batches.
See this answer to a simple way of separating batches of equal length: Keras misinterprets training data shape
In this case, all you have to do is to perform the "repeat" operation in another manner, since you don't have the exact length at training time.
So, instead of RepeatVector, you can use this:
import keras.backend as K
def repeatFunction(x):
#x[0] is (batch,latent_dim)
#x[1] is inputs: (batch,length,features)
latent = K.expand_dims(x[0],axis=1) #shape(batch,1,latent_dim)
inpShapeMaker = K.ones_like(x[1][:,:,:1]) #shape (batch,length,1)
return latent * inpShapeMaker
#instead of RepeatVector:
Lambda(repeatFunction,output_shape=(None,latent_dim))([encoded,inputs])
Option2 (doesn't smell good): use another masking after RepeatVector.
I tried this, and it works, but we don't get 0's at the end, we get the last value repeated until the end. So, you will have to make a weird padding in your target data, repeating the last step until the end.
Example: target [[[1,2],[5,7]]] will have to be [[[1,2],[5,7],[5,7],[5,7]...]]
This may unbalance your data a lot, I think....
def makePadding(x):
#x[0] is encoded already repeated
#x[1] is inputs
#padding = 1 for actual data in inputs, 0 for 0
padding = K.cast( K.not_equal(x[1][:,:,:1],0), dtype=K.floatx())
#assuming you don't have 0 for non-padded data
#padding repeated for latent_dim
padding = K.repeat_elements(padding,rep=latent_dim,axis=-1)
return x[0]*padding
inputs = Input(shape=(timesteps, input_dim))
masked_input = Masking(mask_value=0.0)(inputs)
encoded = LSTM(latent_dim)(masked_input)
decoded = RepeatVector(timesteps)(encoded)
decoded = Lambda(makePadding,output_shape=(timesteps,latent_dim))([decoded,inputs])
decoded = Masking(mask_value=0.0)(decoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
Option 3 (best): crop the outputs directly from the inputs, this also eliminates the gradients
def cropOutputs(x):
#x[0] is decoded at the end
#x[1] is inputs
#both have the same shape
#padding = 1 for actual data in inputs, 0 for 0
padding = K.cast( K.not_equal(x[1],0), dtype=K.floatx())
#if you have zeros for non-padded data, they will lose their backpropagation
return x[0]*padding
....
....
decoded = LSTM(input_dim, return_sequences=True)(decoded)
decoded = Lambda(cropOutputs,output_shape=(timesteps,input_dim))([decoded,inputs])
For this LSTM Autoencoder architecture, which I assume you understand, the Mask is lost at the RepeatVector due to the LSTM encoder layer having return_sequences=False.
So another option, instead of cropping like above, could also be to create custom bottleneck layer that propagates the mask.

Resources