How to create a class dataset to link labels and images for image classification in pytorch? - image-processing

I have images and labels both in tensor format, where the shape of the images is torch.Size([6656, 300, 300, 3]) and the labels with shape of torch.Size([6656]), and I wish to create a class dataset in order to link both images and labels together for image classification in pytorch, is that possible?
I wish to create a class dataset in order to link both images and labels together for image classification in pytorch

You are looking for torch.utils.data.TensorDataset:
my_data = TensorDataset(images, labels)
img, label = my_data[6] # get the 7th elements

Related

Applying binary mask for Unet to image labelled in Yolo or COCO format

I have mammography images labeled in yolo(txt) and coco(xml) formats. I want to apply a binary mask for Unet to parts of these images that are labeled with bounding boxes. Is there a method for this?
Yolo/Coco format like this:
I want to convert above image to this:

How does PyTorch handle labels when loading image/mask files for image segmentation?

I am starting an image segmentation project using PyTorch. I have a reduced dataset in a folder and 2 subfolders - "image" to store the images and "mask" for the masked images. Images and masks are .png files with 3 channels and 256x256 pixels. Because it is image segmentation, the labelling has to be performed a pixel by pixel. I am working only with 2 classes at the moment for simplicity. So far, I achieved the following:
I was able to load my files into classes "images" or "masks" by
root_dir="./images_masks"
train_ds_untransf = torchvision.datasets.ImageFolder(root=root_dir)
train_ds_untransf.classes
Out[621]:
['images', 'masks']
and transform the data into tensors
from torchvision import transforms
train_trans = transforms.Compose([transforms.ToTensor()])
train_dataset = torchvision.datasets.ImageFolder(root=root_dir,transform=train_trans)
Each tensor in this "train_dataset" has the following shape:
train_dataset[1][0].shape
torch.Size([3, 256, 256])
Now I need to feed the loaded data into the CNN model, and have explored the PyTorch DataLoader for this
train_dataloaded = DataLoader(train_dataset, batch_size=2, shuffle=False, num_workers=4)
I use the following code to check the resulting tensor's shape
for x, y in train_dl:
print (x.shape)
print (y.shape)
print(y)
and get
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 0])
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 1])
.
.
.
Shapes seem correct. However, the first problem is that I got tensors of the same folder, indicated by some "y" tensors with the same value [0, 0]. I would expect that they all are [1, 0]: 1 representing image, 0 representing masks.
The second problem is that, although the documentation is clear when labels are entire images, it is not clear as to how to apply it for labeling at the pixel level, and I am certain the labels are not correct.
What would be an alternative to correctly label this dataset?
thank you
The class torchvision.datasets.ImageFolder is designed for image classification problems, and not for segmentation; therefore, it expects a single integer label per image and the label is determined by the subfolder in which the images are stored. So, as far as your dataloader concern you have two classes of images "images" and "masks" and your net tries to distinguish between them.
What you actually need is a different implementation of dataset that for each __getitem__ return an image and the corresponding mask. You can see examples of such classes here.
Additionally, it is a bit weird that your binary pixel-wise labels are stored as 3 channel image. Segmentation masks are usually stored as a single channel image.

Use SMOTE to oversample image data

I'm doing a binary classification with CNNs and the data is imbalanced where the positive medical image : negative medical image = 0.4 : 0.6. So I want to use SMOTE to oversample the positive medical image data before training.
However, the dimension of the data is 4D (761,64,64,3) which cause the error
Found array with dim 4. Estimator expected <= 2
So, I reshape my train_data:
X_res, y_res = smote.fit_sample(X_train.reshape(X_train.shape[0], -1), y_train.ravel())
And it works fine. Before feed it to CNNs, I reshape it back by:
X_res = X_res.reshape(X_res.shape[0], 64, 64, 3)
Now, I'm not sure is it a correct way to oversample and will the reshape operator change the images' structer?
I had a similar issue. I had used the reshape function to reshape the image (basically flattened the image)
X_train.shape
(8000, 250, 250, 3)
ReX_train = X_train.reshape(8000, 250 * 250 * 3)
ReX_train.shape
(8000, 187500)
smt = SMOTE()
Xs_train, ys_train = smt.fit_sample(ReX_train, y_train)
Although, this approach is pathetically slow, but helped to improve the performance.
As soon as you flatten an image you are loosing localized information, this is one of the reasons why convolutions are used in image-based machine learning.
8000x250x250x3 has an inherent meaning - 8000 samples of images, each image of width 250, height 250 and all of them have 3 channels when you do 8000x250*250*3 reshape is just a bunch of numbers unless you use some kind of sequence network to teach its bad.
oversampling is bad for image data, you can do image augmentations (20crop, introducing noise like a gaussian blur, rotations, translations, etc..)
First Flatten the image
Apply SMOTE on this flattened image data and its labels
Reshape the flattened image to RGB image
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
train_rows=len(X_train)
X_train = X_train.reshape(train_rows,-1)
(80,30000)
X_train, y_train = sm.fit_resample(X_train, y_train)
X_train = X_train.reshape(-1,100,100,3)
(>80,100,100,3)

Use pretrained model with different input shape and class model

I am working on a classification problem using CNN where my input image size is 64X64 and I want to use pretrained model such as VGG16,COCO or any other. But the problem is input image size of pretrained model is 224X224. How do I sort this issue. Is there any data augmentation way for input image size.
If I resize my input image to 224X224 then there is very high chance of image will get blurred and that may impact the training. Please correct me if I am wrong.
Another question is related to pretrained model. If I am using transfer learning then generally how layers I have to freeze from pretrained model. Considering my classification is very different from pretrained model classes. But I guess first few layers we can freeze it to get the edges, curve etc.. of the images which is very common in all the images.
But the problem is input image size of pretrained model is 224X224.
I assume you work with Keras/Tensorflow (It's the same for other DL frameworks). According to the docs in the Keras Application:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 48. E.g. (200, 200, 3) would be one
So there are two options to solve your issue:
Resize your input image to 244*244 by existing library and use VGG classifier [include_top=True].
Train your own classifier on top of the VGG models. As mentioned in the above documentation in Keras if your image is different than 244*244, you should train your own classifier [include_top=False]. You can do such things easily with:
inp = keras.layers.Input(shape=(64, 64, 3), name='image_input')
vgg_model = VGG19(weights='imagenet', include_top=False)
vgg_model.trainable = False
x = keras.layers.Flatten(name='flatten')(vgg_model)
x = keras.layers.Dense(512, activation='relu', name='fc1')(x)
x = keras.layers.Dense(512, activation='relu', name='fc2')(x)
x = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
new_model = keras.models.Model(inputs=inp, outputs=x)
new_model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
If I am using transfer learning then generally how layers I have to
freeze from pretrained model
It is really depend on what your new task, how many training example you have, whats your pretrained model, and lots of other things. If I were you, I first throw away the pretrained model classifier. Then, If not worked, remove some other Convolution layer and do it step by step until I get good performance.
The following code works for me for image size 128*128*3:
vgg_model = VGG16(include_top=False, weights='imagenet')
print(vgg_model.summary())
#Get the dictionary of config for vgg16
vgg_config = vgg_model.get_config()
vgg_config["layers"][0]["config"]["batch_input_shape"] = (None, 128, 128, 3)
vgg_updated = Model.from_config(vgg_config)
vgg_updated.trainable = False
model = Sequential()
# Add the vgg convolutional base model
model.add(vgg_updated)
# Flattedn Layer must be added
model.add(Flatten())
vgg_updated.summary()
model.summary()

Calculate histogram of gradients (HOG) of a fixed length regardless of image size

I'm training a HOG + SVM model, and my training data comes in various sizes and aspect ratios. The SVM model can't be trained on variable sized lists, so I'm looking to calculate a histogram of gradients that is the same length regardless of image size.
Is there a clever way to do that? Or is it better to resize the images or pad them?
What people usually do in such case is one of the follow two things:
Resize all images (or image patches) to a fixed size and extract the HOG features from those.
Use the "Bag of Words/Features" method and don't resize the images.
The first method 1. is quite simple but it has some problems which method 2. tries to solve.
First, think of what a hog descriptor does. It divides an image into cells of a fixed length, calculates the gradients cell-wise to generate cell-wise histograms(based on voting). At the end, you'll have a concatenated histogram of all the cells and that's your descriptor.
So there is a problem with it, because the object (that you want to detect) has to cover the images in similar manner. Otherwise your descriptor would look different depending on the location of the object inside the image.
Method 2. works as follows:
Extract the HOG features from both positive and negative images in your training set.
Use an clustering algorithm like k-means to define a fixed amount of k centroids.
For each image in your dataset, extract the HOG features and compare them element-wise to the centroids to create a frequency histogram.
Use the frequency histograms for the training of your SVM and use it for the classification phase. This way, the location doesn't matter and you'll always have a fixed sized of inputs. You'll also benefit from the reduction of dimensions.
You can normalize the images to a given target shape using cv2.resize(), divide image into number of blocks you want and calculate the histogram of orientations along with the magnitudes. Below is a simple implementation of the same.
img = cv2.imread(filename,0)
img = cv2.resize(img,(16,16)) #resize the image
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0) #horizontal gradinets
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1) # vertical gradients
mag, ang = cv2.cartToPolar(gx, gy)
bin_n = 16 # Number of bins
# quantizing binvalues in (0-16)
bins = np.int32(bin_n*ang/(2*np.pi))
# divide to 4 sub-squares
s = 8 #block size
bin_cells = bins[:s,:s],bins[s:,:s],bins[:s,s:],bins[s:,s:]
mag_cells = mag[:s,:s], mag[s:,:s], mag[:s,s:], mag[s:,s:]
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells,mag_cells)]
hist = np.hstack(hists) #histogram feature data to be fed to SVM model
Hope that helps!

Resources