I have approximately 100K X ray pics of dimension 1024x1024. Only ~970 of them have pre existing bounding box coordinates. I am training the model on 70:30 training and testing ratio. My question is, how do i train the model if the rest of the images do not have bounding box? Since I'm no medical expert, I can't manually draw a bounding box around the image. There are 14 classes and it gets really difficult to draw bounding box manually
If you have a knowledge about the remaining not labelled images, for example if you know if an image has a particular class you can use weakly supervised learning to train image detection on all of them
Related
I have image patches from DDSM Breast Mammography that are 150x150 in size. I would like to augment my dataset by randomly cropping these images 2x times to 120x120 size. So, If my dataset contains 6500 images, augmenting it with random crop should get me to 13000 images. Thing is, I do NOT want to lose potential information in the image and possibly change ground truth label.
What would be best way to do this? Should I crop them randomly from 150x150 to 120x120 and hope for the best or maybe pad them first and then perform the cropping? What is the standard way to approach this problem?
If your ground truth contains the exact location of what you are trying to classify, use the ground truth to crop your images in an informed way. I.e. adjust the ground truth, if you are removing what you are trying to classify.
If you don't know the location of what you are classifying, you could
attempt to train a classifier on your un-augmented dataset,
find out, what the regions of the images are that your classifier reacts to,
make note of these location
crop your images in an informed way
train a new classifier
But how do you "find out, what regions your classifier reacts to"?
Multiple ways are described in Visualizing and Understanding Convolutional Networks by Zeiler and Fergus:
Imagine your classifier classifies breast cancer or no breast cancer. Now simply take an image that contains positive information for breast cancer and occlude part of the image with some blank color (see gray square in image above, image by Zeiler et al.) and predict cancer or not. Now move the occluded square around. In the end you'll get rough predictions scores for all parts of your original image (see (d) in the image above), because when you covered up the important part that is responsible for a positive prediction, you (should) get a negative cancer prediction.
If you have someone who can actually recognize cancer in an image, this is also a good way to check for and guard against confounding factors.
BTW: You might want to crop on-the-fly and randomize how you crop even more to generate way more samples.
If the 150x150 is already the region of interest (ROI) you could try the following data augmentations:
use a larger patch, e.g. 170x170 that always contains your 150x150 patch
use a larger patch, e.g. 200x200, and scale it down to 150x150
add some gaussian noise to the image
rotate the image slightly (by random amounts)
change image contrast slightly
artificially emulate whatever other (image-)effects you see in the original dataset
Shown above is a sample image of runway that needs to be localized(a bounding box around runway)
i know how image classification is done in tensorflow, My question is how do I label this image for training?
I want model to output 4 numbers to draw bounding box.
In CS231n they say that we use a classifier and a localization head.
but how does my model knows where are the runnway in 400x400 images?
In short How do I LABEL this image for training? So that after training my model detects and localizes(draw bounding box around this runway) runways from input images.
Please feel free to give me links to lectures, videos, github tutorials from where I can learn about this.
**********Not CS231n********** I already took that lecture and couldnt understand how to solve using their approach.
Thanks
If you want to predict bounding boxes, then the labels are also bounding boxes. This is what most object detection systems use for training. You can just have bounding box labels, or if you want to detect multiple object classes, then also class labels for each bounding box would be required.
Collect data from google or any resources that contains only runway photos (From some closer view). I would suggest you to use a pre-trained image classification network (like VGG, Alexnet etc.) and fine tune this network with downloaded runway data.
After building a good image classifier on runway data set you can use any popular algorithm to generate region of proposal from the image.
Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object(Runway) is present in that region. Otherwise it's not.
If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.
I would like to detect shapes namely circle, square, rectangle, triangle, etc., using Machine Learning Techniques.
Following are the specifications for shape detection,
Convolutional Neural Network ( CNN ) is used.
For Training, Dataset contains 1000 images in each category for 10 shapes.
For Testing, Dataset contains 100 images in each category for 10 shapes.
All images are 28x28 resize with one channel ( gray channel ).
All the images in the dataset are edge-detected images.
Questions
Is it possible for the machine learning algorithm to differentiate between a square and a rectangle...?, square and a rhombus...?
How can i improve the dataset for shape detection ?
Thanks in Advance...!!!
Yes, and it is not a very hard task for a CNN to do.
One way to improve the dataset is to use image augmentation. I think you can do both horizontal and vertical flips as all these figures are still the same kind of figures when applying this transformation. You can think of other transformations as long as they don't change the axes sizes, because if you change the sizes of the axes a square becomes a rectangle, and viceversa.
I'm trying to use a pretrained VGG16 as an object localizer in Tensorflow on ImageNet data. In their paper, the group mentions that they basically just strip off the softmax layer and either toss on a 4D/4000D fc layer for bounding box regression. I'm not trying to do anything fancy here (sliding windows, RCNN), just get some mediocre results.
I'm sort of new to this and I'm just confused about the preprocessing done here for localization. In the paper, they say that they scale the image to 256 as its shortest side, then take the central 224x224 crop and train on this. I've looked all over and can't find a simple explanation on how to handle localization data.
Questions: How do people usually handle the bounding boxes here?...
Do you use something like the tf.sample_distorted_bounding_box command, and then rescale the image based on that?
Do you just rescale/crop the image itself, and then interpolate the bounding box with the transformed scales? Wouldn't this result in negative box coordinates in some cases?
How are multiple objects per image handled?
Do you just choose a single bounding box from the beginning ,crop to that, then train on this crop?
Or, do you feed it the whole (centrally cropped) image, and then try to predict 1 or more boxes somehow?
Does any of this generalize to the Detection or segmentation (like MS-CoCo) challenges, or is it completely different?
Anything helps...
Thanks
Localization is usually performed as an intersection of sliding windows where the network identifies the presence of the object you want.
Generalizing that to multiple objects works the same.
Segmentation is more complex. You can train your model on a pixel mask with your object filled, and you try to output a pixel mask of the same size
I need to prepare training data which I will then use with OpenCV's cascaded classifier. I understand that for training data I'll need to provide rectangular images as samples with aspect ratios that correspond to the -w and -h parameters in OpenCV's training commands.
I was fine with this idea, but then I saw web-based annotation tool LabelMe.
People have labelled in LabelMe using complex polygons!
Can these polygons be somehow used in cascaded training?
Wouldn't using irregular polygons improve the classification results?
If not, then what is the use of the complex polygons that outline objects in LabelMe'd images?
Data sets annotated with LabelMe are used for many different purposes. Some of them, like image segmentation, require tight boundaries, rather than bounding boxes.
On the other hand, the cascade classifier in OpenCV is designed to classify rectangular image regions. It is then used as part of a sliding-window object detector, which also works with bounding boxes.
Whether tight boundaries help improve object detection is an interesting question. There is evidence that the background pixels caught by the bounding box actually help the classification.