How do I resample SimpleITK images to same direction (i.e. view)? - image-processing

I would like to know how can I resample, for example 3D MRI Brain scans, from Coronal to Axial image view, without changing target image dimensions.
Image 1 information:
View: Coronal
Dimension: (256, 256, 128)
Image 2 information:
View: Axial
Dimension (512, 512, 200)
I would like to resample Image 1 to Axial view and keep its size (i.e 256x256x128), meaning, I do not want to use methods such as SimpleITK's resample method in order to resample target to reference image. Reason is because I have large dataset for which I wish to change image view ONLY.
Thanks!

Related

Why did I got 2 different feature map with the same kernel on the same image?

I took a kernel at random, and applied it to an image with padding='valid', stride=(1,1), the kernel was of size = (3,3).
This was the normal image:
This is the image after applying the filter:
Then I rotated the original image by 90 degrees, and it changed to this:
Then I applied the same filter on this rotated image, and got this as the output:
And then I rotated this output image, so that its rotation is the same as previous feature map.
After I rotated this rotated-feature map, and compared it to the previous feature map, I noticed, they were not the same.
Here is the image of both the feature maps:
As you can see, they both are clearly not the same.
This means that feature maps changes as the location of features in the input image changes.
But why does this happen?
Eg: lets say we have a kernel that detects 'eye', after training the kernel on images of 'eye' present at the center of the image, we give it an image, where this time, the 'eye' is present at top-left of the image. It would still do a good job, because it would search the eye on every (5,5) part of the image (assuming the kernel is of size (5, 5)), so it shouldn't matter where the feature is present in the image.
So, why did the feature map when we changed the location of features in the input image, in the example of '5'?
The filter is not invariant to rotation: for a window of 3×3 pixels, you have 9 weights that say how much the pixels in the window will be represented in the output and those are not invariant to rotation. If you rotated the kernel weights, the output would be the same.
Let's say you have a kernel with such weights:
010
000
000
The result is that you move all pixels one position to the bottom. If you rotate the image, apply the filter and rotate back, you have all pixels shifted to the left.

How does PyTorch handle labels when loading image/mask files for image segmentation?

I am starting an image segmentation project using PyTorch. I have a reduced dataset in a folder and 2 subfolders - "image" to store the images and "mask" for the masked images. Images and masks are .png files with 3 channels and 256x256 pixels. Because it is image segmentation, the labelling has to be performed a pixel by pixel. I am working only with 2 classes at the moment for simplicity. So far, I achieved the following:
I was able to load my files into classes "images" or "masks" by
root_dir="./images_masks"
train_ds_untransf = torchvision.datasets.ImageFolder(root=root_dir)
train_ds_untransf.classes
Out[621]:
['images', 'masks']
and transform the data into tensors
from torchvision import transforms
train_trans = transforms.Compose([transforms.ToTensor()])
train_dataset = torchvision.datasets.ImageFolder(root=root_dir,transform=train_trans)
Each tensor in this "train_dataset" has the following shape:
train_dataset[1][0].shape
torch.Size([3, 256, 256])
Now I need to feed the loaded data into the CNN model, and have explored the PyTorch DataLoader for this
train_dataloaded = DataLoader(train_dataset, batch_size=2, shuffle=False, num_workers=4)
I use the following code to check the resulting tensor's shape
for x, y in train_dl:
print (x.shape)
print (y.shape)
print(y)
and get
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 0])
torch.Size([2, 3, 256, 256])
torch.Size([2])
tensor([0, 1])
.
.
.
Shapes seem correct. However, the first problem is that I got tensors of the same folder, indicated by some "y" tensors with the same value [0, 0]. I would expect that they all are [1, 0]: 1 representing image, 0 representing masks.
The second problem is that, although the documentation is clear when labels are entire images, it is not clear as to how to apply it for labeling at the pixel level, and I am certain the labels are not correct.
What would be an alternative to correctly label this dataset?
thank you
The class torchvision.datasets.ImageFolder is designed for image classification problems, and not for segmentation; therefore, it expects a single integer label per image and the label is determined by the subfolder in which the images are stored. So, as far as your dataloader concern you have two classes of images "images" and "masks" and your net tries to distinguish between them.
What you actually need is a different implementation of dataset that for each __getitem__ return an image and the corresponding mask. You can see examples of such classes here.
Additionally, it is a bit weird that your binary pixel-wise labels are stored as 3 channel image. Segmentation masks are usually stored as a single channel image.

Capsule Networks - Facial Expression Recognition

I want to experiment Capsule Networks on FER. For now I am using fer2013 Kaggle dataset.
One thing that I didn't understand in Capsule Net was in the first conv layer, size was reduced to 20x20 - having input image as 28x28 and filters as 9x9 with 1 stride. But in the capsules, the size reduces to 6x6. How did this happen? Because with input size as 20x20 and filters as 9x9 and 2 strides, I couldn't get 6x6. Maybe I missed something.
For my experiment, input size image is 48x48. Should I use the same hyperparams for the start or is there any suggested hyperparams that I can use?
At the beginning, the picture is 28*28 and you apply a kernel of size 9, so you lose (9-1) pixels. (4 for each side). So at the end of the first convolutional layer, you have (28-8)*(28-8)=20*20 pixels, and you apply the same kernel, so again, (20-8)*(20-8)=12*12. But for the second layer the stride is 2, so there is only 12/2=6 pixels left.
With 48*48 pixel, if you apply the same convolutional layer, you will have at the end 16*16 picture. ((48-8-8)/2)
Standard Capsnet has two convolution layers. the first one has stride of 1 and the second one has 2.
if you want to have 6*6 capsule numbers, your filter size should be 19*19.
because:
48 - (19-1) = 30
30 - (19-1) = 12
12 / 2 = 6 (because stride is 2)

How to ensure Caffe segmentation network output size is the same as input?

I'm training a segmentation model with the U-net architecture. The input image size is 250x250.
Currently, I've manually tweaked the paddings of some of the convolutional layers to ensure that the model output is of the same size, i.e. 250x250.
But when I input a differently sized image, for example a 500x500 one, the output size is 506x506.
How do I make sure the output size remains the same as input for all sizes?
You can use "Crop" layer to force the output shape to be identical.
With a U-net I suggest using a crop layer after every upsampling and not only at the end, to avoid accumulating padding errors.
Regarding "padding errors":
Suppose you have an input of shape 100x100, you down sample it 3 times by a factor if 2, you'll end up with 13x13 feature map.
Now, if you upsample three times by x2 each time
13x13 --> 26x26 --> 52x52 --> 104x104
You have "additional" 4 pixels that were added due to padding/rounding (in your question you have 6).
However, if you "Crop" after each upsample
13x13 --> 26x26 \crop 25x25 --> 50x50 --> 100x100
You see that only after the first upsample there is a non-trivial crop, and it only has 1 pixel error at that level, and not 4.

structure of opencv's hog output

I'm extracting the HOG features of a grayscale image using OpenCV's HOG implementation. Assuming that my image matches the default window size, i.e. 128x64, I'm struggling to understand correctly how that feature vector is organised. This is what I know:
Every cell outputs a 9 elements histogram quantifying the orientations of the edges lying within that cell (8x8 pixels by default).
Each block contains 2x2 cells.
By default, an 8x8 block stride is used.
This results in a 7*15*9*4 = 3780 elements feature vector. 7 and 15 are the number of blocks that fit horizontally and vertically when a 50% block overlap is used. All great until here.
If we examine the features of the first block, i.e. the first 9*4 elements, how are they arranged? Do the first 9 bins correspond to the top left cell in the block? what about the next 9? and the next?
And which orientation angle does each of the 9 bins represent? Does bins[0] = 0, bins[1] = 20, bins[2] = 40, ... bins[8] = 160. Or is the order different, for instance going from -pi/2 to +pi/2?

Resources