ade20k dataset label issue - image-processing

I'm having a hard time understanding how to identify the labels in the ade20k dataset.
I was looking at the csv
https://github.com/CSAILVision/sceneparsing/blob/master/objectInfo150.csv
and grabbed one example for floor , index 4
and then looked at one sample image ADE_train_00000001.png which in photoshop looks like the following when selecting a pixel on the the floor
in that screenshot, the floor has an rgb value of 3 for all channels, and assuming the index is used as the rgb value, shouldn't it be rgb(4,4,4) in photoshop?
I must be misunderstanding, can you help explain?

it turns our photoshop was giving different values compared to Digital Color Meter (mac app)

Related

Impact of converting image to grayscale

I am seeing many Machine learning(CNN) tutorial which converts the read image in grayscale. I want to know how the model will understand original color/use color as one identification criteria if the colors are converted throughout the model creation ?
In consideration with colours, there can be 2 cases in an image processing problem:
Colours are not relevant in object-identification
In this case, converting a coloured image to a grayscale image will not matter, because eventually the model will be learning from the geometry present in the image. The image-binarization will help in sharpening the image by identifying the light and dark areas.
Colours are relevant in object-identification
As you might know that all the colours can be represented as some combination of three primary RGB colours. Each of these R, G and B values usually vary from 0 to 255 for each pixel. However, in gray-scaling, a certain pixel value will be one-dimensional instead of three-dimensional, and it will just vary from 0 to 255. So, yes, there will be some information loss in terms of actual colours, but, that is in tradeoff with the image-sharpness.
So, there can be a combined score of R, G, B values at each point (probably their mean (R+G+B)/3), which can give a number between 0 to 255, which can eventually be used as their representative. So that, instead of specific colour information, the pixel just carries the intensity information.
Reference:
https://en.wikipedia.org/wiki/Grayscale
I would like to add to Shashank's answer.
A model when fed with an image, does not perceive it as we do. Humans perceive images with the variations in colors, stauration of the colors and the brightness of it. We are able to recognize objects and other shapes as well.
However, a model sees an image as a matrix with a bunch of numbers in it (if it is a greyscale image). In case of a color image, it sees it as three matrices stacked above one another filled with numbers(0 -255) in it.
So how does it learn color? Well it doesn't. What it does learn is the variation in the numbers within this matrix (in case of greyscale image). These variations are crucial to determine changes in the image. If the CNN is trained in this respect, it will be able to detect a structure in the image and can also be used for bject detection.

Good features to compare an image with a capture of this image

I want to capture one frame with all the frames stored in a database. This frame is captured by the mobile phone, while the database is with the original ones. I have been searching most days in order to find a good method to compare them, taking into account that they have not the same resolution, colors and luminance, etc. Does anyone have an idea?
I have already done the preprocessing step of the captured frame to be as faithful as possible than the original one with C++ and the OpenCV library. But then, I do not know what can be a good feature to compare them or not.
Any comment will be very helpful, thank you!
EDIT: I implemented an algorithm which compares the difference between the two images resized to 160x90, in grayscale and quantized. The results are the following:
The mean value of the image difference is 13. However, if I use two completely different images, the mean value of the image difference is 20. So, I do not know if this measure can be improved on some manner in order to have a better margin for the matching.
Thanks for the help in advance.
Cut the color depth from 24-bits per pixel (or whatever) to 8 or 16 bits per pixel. You may be able use a posterize function for this. Then resize both images to a small size (maybe 16x16 or 100x100, depending on your images), and then compare. This should match similar images fairly closely. It will not take into account different rotation and locations of objects in the image.

Sketching histogram from "image table"

I've stumbled upon a somewhat easy question, yet it differs from regular histogram drawing:
"Sketch a histogram of the 4-bit image shown below:"
I know that histogram is drawn from collecting some data and it's frequency, and then drawing higher waves when higher frequency in the histogram.
I'm guessing this table is supposed to represent an image and the numbers probably the intensity of some color or grey-level... I don't really know how to collect the data and frequency from it, just take each number in particular and how many times it appears?
I know the answer should be simple ^^
Thank you
In a 4-bit image gray scale values ranges from 0-15.
Apart from 18th (and plus) row(s) of your excel sheet all is good.
I've drawn this into an excel table, is this correct or should it be done in a different way?
The whole "4-bit image" thing confuses me...
Thank you

Most prevalent color on a background by changing color space

I have a sheet of paper on another surface. I want to figure out when the paper ends and the surface begins.
I'm using this approach:
1. Convert image to pixel array
2. Pick 3 random 20x20 squares and frequency count the colors
3. The highest frequency is the background
However, the problem is that I get over 100 colors every time I do this on an actual image taken by the camera.
I think I can fix it by putting the image in 16 colors palette. Is there a way to do this change on a UIImage or CGImage?
Thanks
Your colours are probably very close together. How about calculating the distance (the cumulative absolute difference between red, green and blue values) from each sampled colour to a reference colour - just use the first one you sample as reference. If the distance is large, you have a different colour. If the distance is small, you have the same colour with minor variations in lighting or other camera artefacts.
Basically this is applying a filter in a very simple manner. It is up to you to decide how big the difference has to be for the colours to be considered different, but you could decide that by looking at the median difference of all the colours and grouping them into over/under samples.
You might also get good results from applying a Core Image filter to the sample images, such as CIColorClamp (CISpotColor looks better but is OS X only). if you can find a suitable filter there is a good chance it will be simpler and faster than doing it yourself.

Determining the number of red apples in a basket using opencv

I am trying to detect the number of red apples I have in a basket.
The samples are an empty basket, a basket with just one apple and a basket with two apples.
My approach to solve the problem is to find out when there is no apple in the basket ( by using the absence of red)
Plotting a histogram that shows when there is one apple in the basket.
I have no clue how to find out how n apples there are in a basket.
I don't know how is the apple and basket in your test image. You can calculate histogram of your samples using this OpenCV tutorial.
You must use a LookUp Table if you are insisting on using histograms for this problem.
1. In this way you have to provide samples of histograms of images on every class that you are going to classify the images to (empty basket, a basket and one apple inside, a basket and two of them and so forth).
2. Using this sample you can create a unique criterion histogram for every class (the LUT), and a error function which can estimate how similar a histogram is to each of these criterion histograms (a simple error function can be the differences summation of the histogram value in every red intensity)
3. Whereby you can retrieve the error value of a histogram (of an input image) to each of LUT histograms and the one with minimum error is the class which your image belongs to.
Hope this helps
Here are some clues:
Convert your color to HSV space instead of on RGB space.
Count the pixels which you determined as "red" row by row (that's the way make the spatial histogram), find the location with the max number of red pixels.
If you are familiar with machine learning and computer vision, I would recommend Haar classification (similar way of face detecting).

Resources