How to apply a mask to an input image in Keras - image-processing

From Keras documentation this is one example to use IamgeDataGenerator to get images from disk and feed it to fit method:
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
For each input image I want to apply a mask on that image and then use it for the fit method. The masks are not already on disk and must be calculated for each input image. For example, I binarize the input image to select certain regions of the image to make the mask, then by applying the mask I want to have that specific region of the image as input. How can I apply masks to images as they are being read from disk by ImageDataGenerator?

Related

PyTorch is tiling images when loaded with Dataloader

I am trying to load an Images Dataset using the PyTorch dataloader, but the resulting transformations are tiled, and don't have the original images cropped to the center as I am expecting them.
transform = transforms.Compose([transforms.Resize(224),
transforms.CenterCrop(224),
transforms.ToTensor()])
dataset = datasets.ImageFolder('ml-models/downloads/', transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)
images, labels = next(iter(dataloader))
import matplotlib.pyplot as plt
plt.imshow(images[6].reshape(224, 224, 3))
The resulting image is tiled, and not center cropped.[![as shown in the Jupyter snapshot here][1]][1]
Is there something wrong in the provided transformation? (Image shown below on link: )
[1]: https://i.stack.imgur.com/HtrIa.png
Pytorch stores tensors in channel-first format, so a 3 channel image is a tensor of shape (3, H, W). Matplotlib expects data to be in channel-last format i.e. (H, W, 3). Reshaping does not rearrange the dimensions, for that you need Tensor.permute.
plt.imshow(images[6].permute(1, 2, 0))

how to fit lines to edges and find the center point (opencv)

I have an image to which I apply a bilateral filter, followed by adaptive thresholding to get the image below.
original image (this is a screenshot off the depth image of the object)
thresholded image
I would like to fit lines to the vertical parts/lines and find the center poiint, output like image below:
I cant seem to understand the output of the cv2.adaptiveThreshold(). How are the purple pixels (i.e my edges) represented? and how can a line be fitted? MWE:
import cv2
image = cv2.imread("depth_frame0009.jpg")
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
bilateral_filter = cv2.bilateralFilter(gray_image, 15, 50, 50)
plt.figure()
plt.imshow(bilateral_filter)
plt.title("bilateral filter")
#plt.imsave("2dimage_gaussianFilter.png",blurred)
plt.imsave("depthmap_image_bilateralFilter.png",bilateral_filter)
th3 = cv2.adaptiveThreshold(bilateral_filter,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2)
plt.figure()
plt.imshow(th3)
========
edit:
Canny edges
contours
They are represented as an image, a matrix of uint8.
The reason it is purple and yellow is because matplotlib is applying a colormap to it.
I generally prefer to use some specific parameters when plotting image processing output images, eg
plt.imshow(th3, cmap='gray', interpolation='nearest')
If you are specifically interested in finding and fitting lines you may want to use a different representation, such as Hough lines. Once you have the lines in the image you can take the best fit lines and find your center point between them.

what happens if i set input size to 32,32 mnist

I want to train MNIST on VGG16.
MNIST image size is 28*28 and I set the input size to 32*32 in keras VGG16. When I train I get good metrics, but I´m not sure what really happens. Is keras filling in with empty space or is the image being expanded linearly, like in a zoom function? Anyone understands how I can get a test accuracy of +95% after 60 epochs?
Here I define target size:
target_size = (32, 32)
This is where I define my flow_from_dataframe generator:
train_df = pd.read_csv("cv1_train.csv", quoting=3)
train_df_generator = train_image_datagen.flow_from_dataframe(
dataframe=train_df,
directory="../../../MNIST",
target_size=target_size,
class_mode='categorical',
batch_size=batch_size,
shuffle=False,
color_mode="rgb",
classes=["zero","one","two","three","four","five","six","seven","eight","nine"]
)
Here I define my input size:
model_base = VGG16(weights=None, include_top=False,
input_shape=(32, 32, 3), classes=10)
The images would be simply resized to the specified target_size. This has been clearly stated in the documentation:
target_size: tuple of integers (height, width), default: (256, 256). The dimensions to which all images found will be resized.
You can also inspect the source code and find the relevant part in the load_img function. Also the default interpolation method used to resize the images is nearest. You can find more information about various interpolation methods here (MATLAB) or here (PIL).

How does PyTorch handle labels when loading image/mask files for image segmentation?

I am starting an image segmentation project using PyTorch. I have a reduced dataset in a folder and 2 subfolders - "image" to store the images and "mask" for the masked images. Images and masks are .png files with 3 channels and 256x256 pixels. Because it is image segmentation, the labelling has to be performed a pixel by pixel. I am working only with 2 classes at the moment for simplicity. So far, I achieved the following:
I was able to load my files into classes "images" or "masks" by
root_dir="./images_masks"
train_ds_untransf = torchvision.datasets.ImageFolder(root=root_dir)
train_ds_untransf.classes
Out[621]:
['images', 'masks']
and transform the data into tensors
from torchvision import transforms
train_trans = transforms.Compose([transforms.ToTensor()])
train_dataset = torchvision.datasets.ImageFolder(root=root_dir,transform=train_trans)
Each tensor in this "train_dataset" has the following shape:
train_dataset[1][0].shape
torch.Size([3, 256, 256])
Now I need to feed the loaded data into the CNN model, and have explored the PyTorch DataLoader for this
train_dataloaded = DataLoader(train_dataset, batch_size=2, shuffle=False, num_workers=4)
I use the following code to check the resulting tensor's shape
for x, y in train_dl:
print (x.shape)
print (y.shape)
print(y)
and get
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 0])
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 1])
.
.
.
Shapes seem correct. However, the first problem is that I got tensors of the same folder, indicated by some "y" tensors with the same value [0, 0]. I would expect that they all are [1, 0]: 1 representing image, 0 representing masks.
The second problem is that, although the documentation is clear when labels are entire images, it is not clear as to how to apply it for labeling at the pixel level, and I am certain the labels are not correct.
What would be an alternative to correctly label this dataset?
thank you
The class torchvision.datasets.ImageFolder is designed for image classification problems, and not for segmentation; therefore, it expects a single integer label per image and the label is determined by the subfolder in which the images are stored. So, as far as your dataloader concern you have two classes of images "images" and "masks" and your net tries to distinguish between them.
What you actually need is a different implementation of dataset that for each __getitem__ return an image and the corresponding mask. You can see examples of such classes here.
Additionally, it is a bit weird that your binary pixel-wise labels are stored as 3 channel image. Segmentation masks are usually stored as a single channel image.

Use SMOTE to oversample image data

I'm doing a binary classification with CNNs and the data is imbalanced where the positive medical image : negative medical image = 0.4 : 0.6. So I want to use SMOTE to oversample the positive medical image data before training.
However, the dimension of the data is 4D (761,64,64,3) which cause the error
Found array with dim 4. Estimator expected <= 2
So, I reshape my train_data:
X_res, y_res = smote.fit_sample(X_train.reshape(X_train.shape[0], -1), y_train.ravel())
And it works fine. Before feed it to CNNs, I reshape it back by:
X_res = X_res.reshape(X_res.shape[0], 64, 64, 3)
Now, I'm not sure is it a correct way to oversample and will the reshape operator change the images' structer?
I had a similar issue. I had used the reshape function to reshape the image (basically flattened the image)
X_train.shape
(8000, 250, 250, 3)
ReX_train = X_train.reshape(8000, 250 * 250 * 3)
ReX_train.shape
(8000, 187500)
smt = SMOTE()
Xs_train, ys_train = smt.fit_sample(ReX_train, y_train)
Although, this approach is pathetically slow, but helped to improve the performance.
As soon as you flatten an image you are loosing localized information, this is one of the reasons why convolutions are used in image-based machine learning.
8000x250x250x3 has an inherent meaning - 8000 samples of images, each image of width 250, height 250 and all of them have 3 channels when you do 8000x250*250*3 reshape is just a bunch of numbers unless you use some kind of sequence network to teach its bad.
oversampling is bad for image data, you can do image augmentations (20crop, introducing noise like a gaussian blur, rotations, translations, etc..)
First Flatten the image
Apply SMOTE on this flattened image data and its labels
Reshape the flattened image to RGB image
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
train_rows=len(X_train)
X_train = X_train.reshape(train_rows,-1)
(80,30000)
X_train, y_train = sm.fit_resample(X_train, y_train)
X_train = X_train.reshape(-1,100,100,3)
(>80,100,100,3)

Resources