I have an image whose RGB values for each pixel is stored in a 2-D array. Assuming I want to apply, a basic 3X3 averaging algorithm for smoothing the image. How can I implement such an algorithm using map-reduce paradigm.
Thanks in advance.
This took me a while to think in map reduce paradigm but anyways here it is -
Map Task
Input - (x-coordinate,y-coordinate,RGB value)
Output - 9 tuples which are these {(x,y,RGB),(x-1,y,RGB),(x-1,y-1,RGB),(x,y-1,RGB),(x+1,y-1,RGB),(x+1,y,RGB),(x-1,y+1,RGB),(x,y+1,RGB),(x+1,y+1,RGB)}
Reduce Task
The framework will sort all these tuples based on the keys(x-coordinate,y-coordinate) and rearrange them.So now for each pixel you have 9 RGB values of it's neighboring pixels. We simply average them in the reduce task and output a tuple ----> (x,y,avg_RGB)
So, basically instead of each pixel telling the RGB values of all its neighboring pixels for itself, it broadcasts it's own RGB value as pixel value of it's neighbors.
Hope this helps :)
Related
I know that we take a 16x16 window of "in-between" pixels around the key point. we split that window into sixteen 4x4 windows. From each 4x4 window, we generate a histogram of 8 bins. Each bin corresponding to 0-44 degrees, 45-89 degrees, etc. Gradient orientations from the 4x4 are put into these bins. This is done for all 4x4 blocks. Finally, we normalize the 128 values you get.
Where they get their value
but I misunderstand where the 128 number get their value from? did it refer to the corresponding magnitude of the orientation value or what?
I would be grateful if anyone describes any numerical example Regards!
In SIFT (Scale-Invariant Feature Transform), the 128 dimensional feature vector is made up of 4x4 samples per window in 8 directions per sample -- 4x4x8 = 128.
For an illustrated guide see A Short introduction to descriptors, and in particular this image, showing 8-direction measurements (cardinal and inter-cardinal) embedded in each of the 4x4 grid squares (center image) and then a histogram of directions (right image):
From your question I believe you are also unclear on what the information inside the descriptor is -- it is called Histograms of Oriented Gradients (HOG). For further reading, Wikipedia has an overview of HOG gradient computation:
Each pixel within the cell casts a weighted vote for an orientation-based histogram channel based on the values found in the gradient computation.
Everything is built on those per-pixel "votes".
I have a basic question regarding pattern learning, or pattern representation. Assume I have a complex pattern of this form, could you please provide me with some research directions or concepts that I can follow to learn how to represent (mathematically describe) these forms of patterns? in general the pattern does not have a closed contour nor it can be represented with analytical objects like boxes, circles etc.
By mathematically describe I'm assuming you mean derive from the image a vector of values that represents the content of the image. In computer vision/image processing we call this an "image descriptor".
There are several image descriptors that could be applied to pixel based data of the form you showed, which appear to be 1 value per pixel i.e. greyscale images.
One approach is to perform "spatial gridding" where you divide the image up into a regular grid of a constant size e.g. a 4x4 grid. You then average the pixel values within each cell of the grid. Then concatenate these values to form a 16 element vector - this coarsely describes the pixel distribution of the image.
Another approach would be to use "image moments" which are 2D statistical moments. Use this equation:
where f(x,y) is they pixel value at coordinates (x,y). W and H are the image width and height. The mu_x and mu_y indicate the average x and y. The values i and j select the order of moment you want to compute. Various orders of moment can be combined in different ways for example in the "Hu moments" we can compute 7 numbers using combinations of image moments:
The cool thing about the Hu moments is you can scale, rotate, flip etc the image and you still get the same 7 values which makes this a robust ("affine invariant") image descriptor.
Hope this helps as a general direction to read more in.
Is there any easy way of finding the median value of a RGB image in OpenCV using C?
In MATLAB we can just extract the arrays corresponding to the three channels and compute median values for each of the arrays by median(median(array)). Finally, the median value of these three medians (for three channels) can be calculated for the final median value.
You can convert the matrix to a histogram via the calcHist function (once for each channel), then calculate the median for a given channel by using the function available here.
Note: I have not tested that linked code, but it should at least give you an idea of how to get started.
When given an image such as this:
And not knowing the color of the object in the image, I would like to be able to automatically find the best H, S and V ranges to threshold the object itself, in order to get a result such as this:
In this example, I manually found the values and thresholded the image using cv::inRange.The output I'm looking for, are the best H, S and V ranges (min and max value each, total of 6 integer values) to threshold the given object in the image, without knowing in advance what color the object is. I need to use these values later on in my code.
Keypoints to remember:
- All given images will be of the same size.
- All given images will have the same dark background.
- All the objects I'll put in the images will be of full color.
I can brute force over all possible permutations of the 6 HSV ranges values, threshold each one and find a clever way to figure out when the best blob was found (blob size maybe?). That seems like a very cumbersome, long and highly ineffective solution though.
What would be good way to approach this? I did some research, and found that OpenCV has some machine learning capabilities, but I need to have the actual 6 values at the end of the process, and not just a thresholded image.
You could create a small 2 layer neural network for the task of dynamic HSV masking.
steps:
create/generate ground truth annotations for image and its HSV range for the required object
design a small neural network with at least 1 conv layer and 1 fcn layer.
Input : Mask of the image after applying the HSV range from ground truth( mxn)
Output : mxn mask of the image in binary
post processing : multiply the mask with the original image to get the required object highligted
I read on Wikipedia and see that if we need to perform spatial filtering on an image, we have to have a filter, for example 3x3, what I don't understand here is how can we choose the value for the filter? Let say that the original image is grey scale so its intensity goes from 0 to 255 (8 bits).
Another question is that if the image is 9x9, how can we apply the filter to boundary pixels of that image? If we choose to pad the image so the filter can work with all boundary pixels, what would be the value for new padded pixels?
Thank you very much
The value of the filter depends on what you want to achieve by filtering. There are a lot of filter design to perform a specific task. For example the simplest filter f=[-1 1 -1] kind of perform image derivation by performing first degree differencing on each pixel in horizontal direction (x-derivative) while f' perform the same thing in the vertical (y-derivative). The values -1,1,-1 are choose for such purpose. The same goes for 3*3 filters. In general the choose of the values come from a 2D(bi directional) designing of finite impulse response (FIR) and infinite impulse response (IIR) filters.
You should keep in mind that the value of filter operation on the boarders are not that much accurate. Filtering operation for boarder pixel are done interpolating out-of range pixel by a process called boarder interpolation.In OpenCV and similar image processing/computer vision libraries there are ways to do it. For example as the following in opencv
Various border types, image boundaries are denoted with '|'
BORDER_REPLICATE: aaaaaa|abcdefgh|hhhhhhh
BORDER_REFLECT: fedcba|abcdefgh|hgfedcb
BORDER_REFLECT_101: gfedcb|abcdefgh|gfedcba
BORDER_WRAP: cdefgh|abcdefgh|abcdefg
BORDER_CONSTANT: iiiiii|abcdefgh|iiiiiii with some specified 'i'
Thus according to you choose you pad the boarder pixels.