Applying binary mask for Unet to image labelled in Yolo or COCO format - image-processing

I have mammography images labeled in yolo(txt) and coco(xml) formats. I want to apply a binary mask for Unet to parts of these images that are labeled with bounding boxes. Is there a method for this?
Yolo/Coco format like this:
I want to convert above image to this:

Related

how to segment a binary image to get bounding boxes

my input image:
I want get bounding box for each object:
the input image is a binary image which segmented with cnn(salient object detection,but the black areas sticked to each other,I want to seperate them to get the bounding box for each box。
the original image:
with salient object detection and post-processing we get:
but I want this:
ps:I don't want approaches like yolo/SSD which need to train with custom dataset

What is the output of the convolutional layer if the filter is the same image itself with which the convolution is being carried out?

Lets say, image A has to be convolved with a filter and the filter is image A only with the exact same dimensions. What should be the output of this convolution? Will it be the image itself? Blurred or intensified?

How does PyTorch handle labels when loading image/mask files for image segmentation?

I am starting an image segmentation project using PyTorch. I have a reduced dataset in a folder and 2 subfolders - "image" to store the images and "mask" for the masked images. Images and masks are .png files with 3 channels and 256x256 pixels. Because it is image segmentation, the labelling has to be performed a pixel by pixel. I am working only with 2 classes at the moment for simplicity. So far, I achieved the following:
I was able to load my files into classes "images" or "masks" by
root_dir="./images_masks"
train_ds_untransf = torchvision.datasets.ImageFolder(root=root_dir)
train_ds_untransf.classes
Out[621]:
['images', 'masks']
and transform the data into tensors
from torchvision import transforms
train_trans = transforms.Compose([transforms.ToTensor()])
train_dataset = torchvision.datasets.ImageFolder(root=root_dir,transform=train_trans)
Each tensor in this "train_dataset" has the following shape:
train_dataset[1][0].shape
torch.Size([3, 256, 256])
Now I need to feed the loaded data into the CNN model, and have explored the PyTorch DataLoader for this
train_dataloaded = DataLoader(train_dataset, batch_size=2, shuffle=False, num_workers=4)
I use the following code to check the resulting tensor's shape
for x, y in train_dl:
print (x.shape)
print (y.shape)
print(y)
and get
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 0])
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 1])
.
.
.
Shapes seem correct. However, the first problem is that I got tensors of the same folder, indicated by some "y" tensors with the same value [0, 0]. I would expect that they all are [1, 0]: 1 representing image, 0 representing masks.
The second problem is that, although the documentation is clear when labels are entire images, it is not clear as to how to apply it for labeling at the pixel level, and I am certain the labels are not correct.
What would be an alternative to correctly label this dataset?
thank you
The class torchvision.datasets.ImageFolder is designed for image classification problems, and not for segmentation; therefore, it expects a single integer label per image and the label is determined by the subfolder in which the images are stored. So, as far as your dataloader concern you have two classes of images "images" and "masks" and your net tries to distinguish between them.
What you actually need is a different implementation of dataset that for each __getitem__ return an image and the corresponding mask. You can see examples of such classes here.
Additionally, it is a bit weird that your binary pixel-wise labels are stored as 3 channel image. Segmentation masks are usually stored as a single channel image.

Should I gray scale the image?

I'm categorizing 30 types of clothes from the image using R-CNN Object Detection Library from tensorflow : https://github.com/tensorflow/models/tree/master/research/object_detection
Does color matter when we collect images for training and testing?
If I put only purple and blue shirts, I guess it won't recognize red shirts?
Should I gray scale all images to detect the types of clothes? :)
Yes, colour does matter. The underlying visual feature extraction is based on a convolutional neural network, pre-trained to perform image recognition on colour images in the ImageNet dataset.
The R-CNN repository instructions on bringing in your own dataset asks for RGB images.
Dataset Requirements
For every example in your dataset, you should have the following information:
An RGB image for the dataset encoded as jpeg or png.
A list of bounding boxes for the image. Each bounding box should contain:
A bounding box coordinates (with origin in top left corner) defined by 4 floating point numbers [ymin, xmin, ymax, xmax]. Note that we store the normalized coordinates (x / width, y / height) in the TFRecord dataset.
The class of the object in the bounding box.

OpenCV - Dynamically find HSV ranges for color

When given an image such as this:
And not knowing the color of the object in the image, I would like to be able to automatically find the best H, S and V ranges to threshold the object itself, in order to get a result such as this:
In this example, I manually found the values and thresholded the image using cv::inRange.The output I'm looking for, are the best H, S and V ranges (min and max value each, total of 6 integer values) to threshold the given object in the image, without knowing in advance what color the object is. I need to use these values later on in my code.
Keypoints to remember:
- All given images will be of the same size.
- All given images will have the same dark background.
- All the objects I'll put in the images will be of full color.
I can brute force over all possible permutations of the 6 HSV ranges values, threshold each one and find a clever way to figure out when the best blob was found (blob size maybe?). That seems like a very cumbersome, long and highly ineffective solution though.
What would be good way to approach this? I did some research, and found that OpenCV has some machine learning capabilities, but I need to have the actual 6 values at the end of the process, and not just a thresholded image.
You could create a small 2 layer neural network for the task of dynamic HSV masking.
steps:
create/generate ground truth annotations for image and its HSV range for the required object
design a small neural network with at least 1 conv layer and 1 fcn layer.
Input : Mask of the image after applying the HSV range from ground truth( mxn)
Output : mxn mask of the image in binary
post processing : multiply the mask with the original image to get the required object highligted

Resources