Bounding Box Prediction vs Classification - opencv

I have a question about bounding box prediction and classification. Why do they separate these two things into two parts (or stages)?
Went through my exam and this question got stuck in my head without answer. Please help!

Related

My Image Captioning Model giving me same caption on All images

I am doing a project related to the captioning of Medical Images. I am using the Code from this link.
I am using Indiana Data set of Radiographs and using Findings as Captions for training. I trained successfully and loss value is 0.75. But my final model giving me the same caption for all the images I had checked (Some people also facing the same issue. please check the Comments of this link).
Can you please suggest me any changes in any part of the code or anything else so it starts giving me proper captions for every image I will check.
Thanks in advance.
Looking at the dataset, what I can see is that most of the data is quite similar (black and white images of Chest X rays) - please correct me if I am wrong. So what seems to be happening is that the CNN is learning common features across most of the images. The network is not just deep/advanced enough to pick out the distinguishing patterns. According to the tutorial you are following, I don't think the VGG-16 or 19 network is learning the distinguishing patterns in the images.
The image captioning model will only be as good as the underlying CNN network. If you have a class label field in your data (like the indication/impression field provided here), you can actually confirm this hypothesis by training the network to predict the class of each image and if the performance is poor you can confirm this. If you have the class label, try experimenting with a bunch of CNNs and use the one which achieves the best classification accuracy as the feature extractor.
If you do not have a class label, I would suggest trying out some deeper CNN architectures like Inception or ResNet and see if the performance improves. Hope this was helpful!
Make sure you have an equal number of images in each class. If you have 1,000 images in the “pneumonia” category and only 5 in the “broken rib” category, your model will pick the label “pneumonia” almost every time.

CNN approaches to detection which are not bounding box based

Most of the approaches to detection problems are based more or less on some form of bounding box proposal, which turns to two class (positive/negative) classification. I found lots of materials on these topics.
But I was wondering, aren't there some approaches taking the whole image as an input, then sending it through several convolutional and pooling layers, output of which would be two numbers (x, y position of the object)? Of course this would mean that there's just one object in the image. So far I haven't find anything about this, should I consider it not usable?

Labeling runways for localization and detection using deep learning

Shown above is a sample image of runway that needs to be localized(a bounding box around runway)
i know how image classification is done in tensorflow, My question is how do I label this image for training?
I want model to output 4 numbers to draw bounding box.
In CS231n they say that we use a classifier and a localization head.
but how does my model knows where are the runnway in 400x400 images?
In short How do I LABEL this image for training? So that after training my model detects and localizes(draw bounding box around this runway) runways from input images.
Please feel free to give me links to lectures, videos, github tutorials from where I can learn about this.
**********Not CS231n********** I already took that lecture and couldnt understand how to solve using their approach.
Thanks
If you want to predict bounding boxes, then the labels are also bounding boxes. This is what most object detection systems use for training. You can just have bounding box labels, or if you want to detect multiple object classes, then also class labels for each bounding box would be required.
Collect data from google or any resources that contains only runway photos (From some closer view). I would suggest you to use a pre-trained image classification network (like VGG, Alexnet etc.) and fine tune this network with downloaded runway data.
After building a good image classifier on runway data set you can use any popular algorithm to generate region of proposal from the image.
Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object(Runway) is present in that region. Otherwise it's not.
If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.

Whether Data augmentation really needed in Machine Learning [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am interested in knowing the importance of data augmentation(rotation at various angles, flipping the images) while providing a dataset to a Machine Learning problem.
Whether it is really needed? Or the CNN networks using will handle that as well no matter how different the data are transformed?
So I took a classification task with 2 classes to conclude some results
Arrow shapes
Circle shapes
The idea is to train the shapes with only one orientation(I have taken arrows pointing right) and check the model with a different orientation(I have taken arrows pointing downwards) which is not at all given during the training stage.
Some of the samples used in Training
Some of the samples used in Testing
This is the entire dataset I am using in for creating a tensorflow model.
https://bitbucket.org/akhileshmalviya/samples/src/bab50b85d826?at=master
I am wondering with the results I got,
(i) Except a few downward arrows all others are getting predicted correctly as arrow. Does it mean data augmentation is not at all needed?
(ii) Or is this the right use case I have taken to understand the importance of data augmentation?
Kindly share your thoughts, Any help could be really appreciated!
Data augmentation is a data-depended process.
In general, you need it when your training data is complex and you have a few samples.
A neural network can easily learn to extract simple patterns like arcs or straight lines and these patterns are enough to classify your data.
In your case data augmentation can barely help, the features the network will learn to extract are easy and highly different from each other.
When you, instead, have to deal with complex structures (cats, dogs, airplanes, ...) you can't rely on simple features like edges, arcs, etc..
Instead, you have to show to your network that the instances you're trying to classify got an high variance and that the features extracted can be combined in a lot of different ways for the same subject.
Think about a cat: it can be of any color, the picture can be taken in different light conditions, its whole body can be in any position, the picture could be taken with a certain orientation...
To correctly classify instances so different, the network must learn to extract robust features that could be learned only after seeing a lot of different inputs.
In your case, instead, simple features can completely discriminate your input, thus any sort of data augmentation could help by just a little bit.
The task you are solving can be easily solved without any NN and even without machine learning.
Just because the problem is so simple it does not really matter whether you do a data augmentation or not. The need for data augmentation is task specific and depends on many things:
how easy is to augment the data with preserving the ability to correctly mark the class. For image, sounds which we used to see/hear it is not a problem (we know that adding small noise to the sound does not change the meaning, rotating the lizard is still a lizard). For other things augmenting without preserving the class/value is hard (for example in Go, randomly adding a stone can change the value of the position dramatically)
does the augmented data is drawn from the same distribution you care about. Adding random stones to Go does not work, but rotating flipping the board works and preserves distribution. But for example in a racing king game (variant of chess) it will not help. You can't flip the position (left <-> right), the evaluation stays the same, but it will never happen in real game and therefore drawn from different distribution and useless
how much data do you have and how expressive is your model. The more parameters you model have, the bigger the chance of overfitting and the more is your need for data. If you train a linear regression in n dims, you will have n + 1 params. You do not really need to augment this. Also if you already have 10bln data points, the augmentation is probably will not be helpful.
how expensive the augmentation procedure. For rotating/scaling the image it is very cheap, but for other augmentation it can be computationally expensive
something else that I forgot.

Train TensorFlow to modify images

I am interested in the possibility of training a TensorFlow model to modify images, but I'm not quite sure where to get started. Almost all of the examples/tutorials dealing with images are for image classification, but I think I am looking for something a little different.
Image classification training data typically includes the images plus a corresponding set of classification labels, but I am thinking of a case of an image plus a "to-be" version of the image as the "label". Is this possible? Is it really just a classification problem in disguise?
Any help on where to get started would be appreciated. Also, the solution does not have to use TensorFlow, so any suggestions on alternate machine learning libraries would also be appreciated.
For example, lets say we want to train TensorFlow to draw circles around objects in a picture.
Example Inbound Image:
(source: pbrd.co)
Label/Expected Output:
(source: pbrd.co)
How could I accomplish that?
I can second that, its really hard to find information about Image modification with tensorflow :( But have a look here: https://affinelayer.com/pix2pix/
From my understanding, you do use a GAN, but insead of feeding the Input of the generator with random data during training, you use a sample Input.
Two popular ways (the ones that I know about) to make models generate/edit images are:
Deep Convolutional Generative Adversarial Networks
Back-Propagation through a pre-trained image classification model (in a similar manner to deep dream) but you can start from the final layer to feed back the wanted label and the gradient descent should be applied to the image only. This was explained in more details in the following course: CS231n (this lecture)
But I don't think they fit the circle around "3" example that you gave. I think object detection and instance segmentation would be more helpful. Detect the object you are looking for, extract its boundaries via segmentation and post-process it to make the circle that you wish for (or any other shape).
Reference for the images: Intro to Deep Learning for Computer Vision

Resources