My Image Captioning Model giving me same caption on All images - machine-learning

I am doing a project related to the captioning of Medical Images. I am using the Code from this link.
I am using Indiana Data set of Radiographs and using Findings as Captions for training. I trained successfully and loss value is 0.75. But my final model giving me the same caption for all the images I had checked (Some people also facing the same issue. please check the Comments of this link).
Can you please suggest me any changes in any part of the code or anything else so it starts giving me proper captions for every image I will check.
Thanks in advance.

Looking at the dataset, what I can see is that most of the data is quite similar (black and white images of Chest X rays) - please correct me if I am wrong. So what seems to be happening is that the CNN is learning common features across most of the images. The network is not just deep/advanced enough to pick out the distinguishing patterns. According to the tutorial you are following, I don't think the VGG-16 or 19 network is learning the distinguishing patterns in the images.
The image captioning model will only be as good as the underlying CNN network. If you have a class label field in your data (like the indication/impression field provided here), you can actually confirm this hypothesis by training the network to predict the class of each image and if the performance is poor you can confirm this. If you have the class label, try experimenting with a bunch of CNNs and use the one which achieves the best classification accuracy as the feature extractor.
If you do not have a class label, I would suggest trying out some deeper CNN architectures like Inception or ResNet and see if the performance improves. Hope this was helpful!

Make sure you have an equal number of images in each class. If you have 1,000 images in the “pneumonia” category and only 5 in the “broken rib” category, your model will pick the label “pneumonia” almost every time.

Related

Neural Network for Learning Cut VS Uncut Grass

I've got a script to take pictures like the one provided, with colored loops encircling either uncut grass, cut grass, or other background details (for purposes of rejecting non-grass regions), and generate training data in the form of a bunch of small images from inside the colored loops of those types of training data. I'm struggling to find which type of neural network that would work best for learning from this training data and telling me in real time from a video feed mounted on a lawn mower which sections of the image is uncut grass or cut grass as it is mowing though a field. Is there anyone on here experienced with neural networks, and can either tell me some I could use, or just point me in the right direction?
Try segmentation network. There are many types of segmentation.
Mind that for neuron networks, training data is necessary. Your case (to detect cut and uncut grass) is considered special, which means existing models may not fit your purpose. If so, you'll need a dataset including images and annotations. There are also tools for labeling segmentation images.
Hope it helps.

Recognize objects while falling - viewpoint variation

I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.

Train TensorFlow to modify images

I am interested in the possibility of training a TensorFlow model to modify images, but I'm not quite sure where to get started. Almost all of the examples/tutorials dealing with images are for image classification, but I think I am looking for something a little different.
Image classification training data typically includes the images plus a corresponding set of classification labels, but I am thinking of a case of an image plus a "to-be" version of the image as the "label". Is this possible? Is it really just a classification problem in disguise?
Any help on where to get started would be appreciated. Also, the solution does not have to use TensorFlow, so any suggestions on alternate machine learning libraries would also be appreciated.
For example, lets say we want to train TensorFlow to draw circles around objects in a picture.
Example Inbound Image:
(source: pbrd.co)
Label/Expected Output:
(source: pbrd.co)
How could I accomplish that?
I can second that, its really hard to find information about Image modification with tensorflow :( But have a look here: https://affinelayer.com/pix2pix/
From my understanding, you do use a GAN, but insead of feeding the Input of the generator with random data during training, you use a sample Input.
Two popular ways (the ones that I know about) to make models generate/edit images are:
Deep Convolutional Generative Adversarial Networks
Back-Propagation through a pre-trained image classification model (in a similar manner to deep dream) but you can start from the final layer to feed back the wanted label and the gradient descent should be applied to the image only. This was explained in more details in the following course: CS231n (this lecture)
But I don't think they fit the circle around "3" example that you gave. I think object detection and instance segmentation would be more helpful. Detect the object you are looking for, extract its boundaries via segmentation and post-process it to make the circle that you wish for (or any other shape).
Reference for the images: Intro to Deep Learning for Computer Vision

Designing a classifier with minimal image data

I want to train a 3-class classifier with tissue images, but only have around 50 labelled images in total. I can't take patches from the images and train on them, so I am looking for another way to deal with this problem.
Can anyone suggest an approach to this? Thank you in advance.
The question is very broad but here are some recommendations:
It could make sense to generate variations of your input images. Things like modifying contrast, brightness or color, rotating the image, adding noise. But which of these operations, if any, make sense really depends on the type of classification problem.
Generally, the less data you have, the fewer parameters (weights etc.) your model should have. Otherwise it will result in overlearning, meaning that your classifier will classify the training data but nothing else.
You should check for overlearning. A simple method would be to split your training data into a training set and a control set. Once you have found that the classification is correct for the control set as well, you could do additional training including the control set.

How to select appropriate positive and negative training image sets for image classification using BOW

Im trying implement a real time object classification program using SVM classification and BoW clustering algorithms. My questions is what are the good practices for selecting positive and negative training images?
Positive image sets
Should the background be empty? Meaning, should the image only contain the object of interest? When implementing this algorithm in real time, the test image will not contain only the object of interest, it will definitely have some information from the background as well. So instead of using isolated image collection, should I choose images which look more similar to the test images?
Negative image sets
Can these be any image set without the object of interest? Or should they be from the environment where this algorithm is going to be tested without object of interest?. For example, if I'm going to classify phones in my living room environment, should negatives be the background image set of my living room environment without the phone in the foreground? or can it be any image set? (like kitchen, living room, bedroom or outdoor images) Im asking this because, I don't want the system to be environment-specific. Must be robust at any environment (indoors and outdoors)
Thank you. Any help or advice is much appreciated.
Positive image sets
Yes you should definitely choose images which look more similar to the test images.
Negative image sets
It can be any image set however, it is better to include images from the environment where this algorithm is going to be tested without object of interest.
Generally
Please read my answer to some other SO question, it would be useful. Discussion continued in comments, so that might be useful as well.

Resources