What is the best method to evaluate the quality of a segmentation algorithm when the majority of the image has multiple objects all belonging to the same class.
For example:
If I had an algorithm that segments books in this image of a bookcase - with a single bounding box per book.
Bookcase
I have had a look at various blog posts on segmentation evaluation and the majority seem to showcase examples of multiclass problems where it is fairly obvious if a prediction is not accurate - the bounding boxes do/do-not overlap for that class.
My first thoughts are that a tradition IoU or thematic accuracy would not work on this kind of problem because an output containing a single 'book' polygon (completely under-segmenting) that covers the entire image would still return high scoring metrics as almost all of the image is in fact 'book', however it is in fact very poorly segmenting the image.
I'm not sure if I have framed my problem well, any help would be appreciated.
I would try to tackle this problem using these solutions:
Compute the Dice/IoU coefficients of the background class
this is a simple solution for the semantic segmentation results
if the algorithm would get good results in both foreground and background metrics you can at least generally tell it performs well
Computing the average of metrics for instance segmented separated objects
Your problem seems like an instance segmentation to me
Calculating the metrics for each distinct object is then easy - you can compute dice/jaccard coefficients for every object separately and than average the results of all instances as in this great article with more information about segmentation metrics.
Related
I have many evolution curves (on time), of a system as images.
These evolution curves are plotted when the system behave in a normal way ('ok').
I want to train a model, which learn the shapes of the curves (or parts of the shapes) when it behave in a normal way, so it will be able to classify new curves to normal (or abnormal).
Any ideas of the model to use, or how to proceed ?
Thank you
You can perform PCA, and then classify. Also look for functional data analysis
Here is a nice getting started guide with PCA
You can start with labeling (annotating) the images. The label can be as Normal/ Not Normal as 0/1 or as many classes you want to divide the data into.
Since it's a chart so the orientation is important, a wrong orientation can destroy the meaning of the image.
So make an algorithm which always orient the chart in the same way while reading.
Now that the labeling is done you need to train these images for correct classification.
Augment the data if needed
Find a image classification model
Use the trained weights
feed you images and annotations in the desired format
Train the model
Check for the output error or classification errors.
Create an evaluation matrix like confusion matrix in case of classification.
If the model is right and training is properly done you will get good accuracy.
Otherwise repeat the steps.
This is just an overview, with this you can start towards your goal.
I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.
I've seen quite a few CNN code examples for ID'ing images, but they generally relate to a 1-to-1 input to target relationship (like the MNISt handwritten numerals set), and most seem to use similar image dimensions (pixels) for the input image and training images.
So...what is the usual approach for identifying multiple objects in one image? (like several people, or any other relatively complex scene). I've seen it done often enough, but haven't seen design approaches mentioned. Does this require some type of preprocessing or can this be handled directly by a CNN?
I would say the most known family of techniques to retrieve multiple objects from an images would be the Detection family.
With Detection, the basic idea is to have one or more Proposal windows of different sizes and ratios within an image, generated with either a calculated or random array of algorithms.
For each Proposal window, the Classification algorithm is then executed to reveal what that specific area of the image represents.
The next step would usually be to run a Merge process to combine all neighbouring areas into one single classification output.
Note: A None class is often also used to represent an area with no specific class found.
Recently I began to study computer vision. In a series of text, I encountered "segmentation" which creates a group of pixels (super-pixels) based on pixel features.
Here is an example:
But, I'm not sure how do they use super-pixels at the first place. Do we use it for paint-like visualization? Or some kinds of object detection?
The segmentation you did for this image is useless since it does not split any thing useful. But consider this segmentation for example:
It splits the duck and the other objects from the background.
You can here find some useful application of Image Segmentation : https://en.wikipedia.org/wiki/Image_segmentation#Applications
Usually super pixel alone is not enough to perform segmentation. It can be the first step in performing segmentation. But further processing need to be done to perform segmentation.
In one of the papers I have read they use seam processing to measure the energy of the edges.
There is another paper for jitendra Malik about using super pixels in segmentation.
I'm working on a project to do a segmentation of tissu. So far i so good for now. But her i want to segment the destructed from the good tissu. Her is an image example. So as you can see the good tissus are smooth and the destructed ones are not. I have the idea to detected the edges to do the segmentation but it give bad results.
I'm opening to any i'm open to any suggestions.
Use a convolutional neural network for example any prebuilt in the Caffe package. Label the different kinds of areas in as many images as you have, then use many (1000s) small (32x32) patches from those to train the network. This will produce much better results than any kind of handcrafted algorithm.
A very simple approach which can be used as an intermediate test could be following:
Blur the image to reduce the noise. This is an important step. OpenCV provides an inbuilt method for it.
Find contours using the OpenCV method findContour().
Then if the perimeter of contour is greater than a set threshold (you will have to set a value) then, you can consider it to be a smooth tissue else you can discard the tissue.
This is a really simple approach and a simple program can be written for it really fast.