Data annotation after or before applying augmentation on dataset? - data-annotations

I have a custom datasets. I am trying to apply annotation on the dataset to determine the bounding box on Region of Interest.
My Questions is:
1- Do I shall apply data annotation before applying augmentation or vice versa?
2- If I applied data augmentation on annotated image dataset, is this will lead to loss the information of annotation (Bounding box coordinates) because the augmentation will rotate and flip the images.
3- How I can keep annotation of Region of Interest (ROI) and Bounding Box coordinates even after applying augmentation.

Related

Bounding Box Regression

I have approximately 100K X ray pics of dimension 1024x1024. Only ~970 of them have pre existing bounding box coordinates. I am training the model on 70:30 training and testing ratio. My question is, how do i train the model if the rest of the images do not have bounding box? Since I'm no medical expert, I can't manually draw a bounding box around the image. There are 14 classes and it gets really difficult to draw bounding box manually
If you have a knowledge about the remaining not labelled images, for example if you know if an image has a particular class you can use weakly supervised learning to train image detection on all of them

How to add an extra parameter to a CNN while training

So, I have to train a network where I have an image, ground-truth, and an extra parameter related to an image (current image state).
There's a camera which captures images at different zoom level. For a particular surrounding, I have four images with different zoom levels (0,25,50,75). I need to train the network such that given a test image, I can classify if I want to zoom in or zoom out.
So, the dataset I have is the image, ground-truth (zoom in or zoom out or no zoom), and the current zoom level.
How can I add this current zoom level in my network so that the network trains properly?
I'm planning to use VGG or AlexNet for now and then move to Inception or ResNet in future.
What you could do is create a model which processes the image via CNN and then somehow combines other inputs to the model. So your model should have a few inputs: image, (zoom in or zoom out or no zoom), current zoom level. So you pass the image to CNN (or few CNN layers) and then flatten the feature map and append other input values and then continue through some other layers. Or you augment the image on the beginning (if you have to zoom out, zoom out...) and then pass the image to CNN. I don't know which framework are you using but I know I would try to prototype it in Keras with functional API.

Key Point classification in an image

I am trying to compare two image of drawings using corner features in the images. Here is a sample image:
Query image:
I used SIFT algorithm to compare images but it did not work because in SIFT we consider a window of 16X16 pixel to extract the features at point of interest but here in this case(drawing objects) we will get only corner points as feature points and SIFT feature descriptor will give very similar feature to all corner points and hence in the feature matching step it will reject the corners because of their close similarity scores.
So i am using below approach to compare the images.
I am using Shi-Tomasi algorithm based function in opencv ie. cv2.goodFeaturesToTrack() to find the corners(feature points) in an image. After finding corners i want to classify them in 4 categories and compare them in two images. Below is corner categories defined as of now which my vary because of huge variations in corner types(angle, no. of lines crossing at corners, irregular pixel variation at corner point):
Corner categories:
Type-1: L-shaped
Type-2: Line intersection
Type-3: Line-curve intersection
type-4: Curve-Curve intersectio
I am trying to solve this using below approach:
=> Take a patch of fixed window size surrounding the corner pixel say a window of 32X32
=> Find the gradient information ie. gradient magnitude and its direction in this window and use this information to classify the corner in above 4 classes.After going through image classification i came to know that Using HOG algorithm image gradient information can be converted to feature vectors.
=> HOG feature vector calculated in above step can be used to train SVM to get a model.
=> This model can be used for new feature point classification.
After implementing above algorithm i am getting poor accuracy.
If there is any other way to classify the corners please suggest.

How do I perform data augmentation in object localization

Performing data augmentation for classification task is easy as most transform do not change the ground truth label of the image.
However in the case of object localization:
The position of the bounding box is relative to the crop that has been taken.
There can be the case that the bounding box is only partially in the crop window, do we perform some sort of clipping in this case.
There will also be the case that the object bounding box are not included in the crop, do we discard these examples during training.
I am unable to understand how such cases are handled in object localization. Most papers suggest the use of Multi-Scale training but dont address these issues.
The augmentation methods have to alter the content of the bounding box. In the case of Color augmentations, the pixel distribution would be changed and the coordinates of the bounding box would not change. But in the case of geometric augmentations such as cropping or scaling, not only the pixel distribution would be affected but also the coordinates of the bounding box. Those changes should be kept in the annotation files so the algorithm can read it.
Custom scripts are common to solve this problem. However, In my repository I have a library that would help you. Here is the link https://github.com/lozuwa/impy . With this library you can perform the operations I described previously.

Faster RCNN: how to translate coordinates

I'm trying to understand and use the Faster R-CNN algorithm on my own data.
My question is about ROI coordinates: what we have as labels, and what we want in the end, are ROI coordinates in the input image. However, if I understand it correctly, anchor boxes are given in the convolutional feature map, then the ROI regression gives ROI coordinates relatively to an anchor box (so easily translatable to coordinates in conv feature map coordinates), and then the Fast-RCNN part does the ROI pooling using the coordinates in the convolutional feature map, and itself (classifies and) regresses the bounding box coordinates.
Considering that between the raw image and the convolutional features, some convolutions and poolings occured, possibly with strides >1 (subsampling), how do we associate coordinates in the raw images to coordinates in feature space (in both ways) ?
How are we supposed to give anchor boxes sizes: relatively to the input image size, or to the convolutional feature map ?
How is the bounding box regressed by Fast-RCNN expressed ? (I would guess: relatively to the ROI proposal, similarly to the encoding of the proposal relatively to the anchor box; but I'm not sure)
It looks like it's actually an implementation question, the method itself does not answer that.
A good way to do it though, that is used by Tensorflow Object Detection API, is to always give coordinates and ROI sizes relatively to the layer's input size. That is, all coordinates and sizes will be real numbers between 0 and 1. Likewise for the anchor boxes.
This handles nicely the problem of the downsampling, and allows easy computations of ROI coordinates.
When you don't use an activation function on a layer, the result will be raw numbers. These raw numbers are basically associated with the coordinates (labels) directly.
Using an activation function such as softmax or relu will give a probability value, which leads into a classification solution, instead of regression.

Resources