Resizing a target segmentation map without converting individual pixel values to floats - image-processing

I have a dataset with drone view images of size 4000x6000, grayscale. Each individual pixel value corresponds to a class (I have 20 classes in total), so a pixel value of 3 would mean "tree" for example. Using the original image, I can very easily create binary masks for all 20 of the classes by using equality operators in NumPy and I get pixel-perfect masks.
Here's an example of what one row would look like:
[[2, 2, 2, 2, ...... , 5, 5, 5]]
However, 4000x6000 is much too big for my purposes, and I want to resize these segmentation targets to something a bit more bearable, such as 400x400 or 400x600. Though I've tried a few different Python libraries, all of them convert my pixel values to different float values causing me to lose my segmentation map labels. Is there any method (not including cropping), where I can resize my segmentation target maps AND the original RGB input images without losing my labels?

When one resizes an image, one usually needs to interpolate pixel values (e.g., decide on the "intensity" at sub-pixel locations). Natural images tend to vary smoothly between pixels, which makes interpolation with large support very appealing (see detailed discussion here).
However, as you observed, interpolating between integer values of labels makes no sense at all.
Therefore, you can:
Do not interpolate - use nearest-neighbor resizing for the label map.
That is, use whatever interpolation method you like (LANCZOS, BICUBIC...) for the input image, but use NEAREST method for the label map.
Interpolate the per-label probability maps - for each 4000x6000 label map, produce 20 per-class probability maps and interpolate them to the desired size (using the same interpolation method as used for the image: LANCZOS, BICUBIC...). Now, for each resized pixel you have a 20-dim target distribution. You can train with these "soft labels", or take the argmax and train with the most dominant label per pixel.

Related

direction on image pattern description and representation

I have a basic question regarding pattern learning, or pattern representation. Assume I have a complex pattern of this form, could you please provide me with some research directions or concepts that I can follow to learn how to represent (mathematically describe) these forms of patterns? in general the pattern does not have a closed contour nor it can be represented with analytical objects like boxes, circles etc.
By mathematically describe I'm assuming you mean derive from the image a vector of values that represents the content of the image. In computer vision/image processing we call this an "image descriptor".
There are several image descriptors that could be applied to pixel based data of the form you showed, which appear to be 1 value per pixel i.e. greyscale images.
One approach is to perform "spatial gridding" where you divide the image up into a regular grid of a constant size e.g. a 4x4 grid. You then average the pixel values within each cell of the grid. Then concatenate these values to form a 16 element vector - this coarsely describes the pixel distribution of the image.
Another approach would be to use "image moments" which are 2D statistical moments. Use this equation:
where f(x,y) is they pixel value at coordinates (x,y). W and H are the image width and height. The mu_x and mu_y indicate the average x and y. The values i and j select the order of moment you want to compute. Various orders of moment can be combined in different ways for example in the "Hu moments" we can compute 7 numbers using combinations of image moments:
The cool thing about the Hu moments is you can scale, rotate, flip etc the image and you still get the same 7 values which makes this a robust ("affine invariant") image descriptor.
Hope this helps as a general direction to read more in.

Matlab Camera Calibration - Correct lens distortion

In the Computer Vision System Toolbox for Matlab there are three types of interpolation methods used for Correct lens distortion.
Interpolation method for the function to use on the input image. The interp input interpolation method can be the string, 'nearest', 'linear', or 'cubic'.
My question is: what is the difference between 'nearest', 'linear', or 'cubic' ? and which one implemented in "Zhang" and "Heikkila, J, and O. Silven" methods.
I can't access the paged at the link you wrote in your question (it asks for a username and password) and so I assume your linked page has the same contents of the page http://www.mathworks.it/it/help/vision/ref/undistortimage.html which I quote here:
J = undistortImage(I,cameraParameters,interp) removes lens distortion from the input image, I and specifies the
interpolation method for the function to use on the input image.
Input Arguments
I — Input image
cameraParameters — Object for storing camera parameters
interp — Interpolation method
'linear' (default) | 'nearest' | 'cubic'
Interpolation method for the function to use on
the input image. The interp input interpolation method can be the
string, 'nearest', 'linear', or 'cubic'.
Furthermore, I assume you are referring to these papers:
ZHANG, Zhengyou. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2000, 22.11: 1330-1334.
HEIKKILA, Janne; SILVEN, Olli. A four-step camera calibration procedure with implicit image correction. In: Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 1997. p. 1106-1112.
I have searched for the word "interpolation" in the two pdf documents Zhang and Heikkila and Silven and I did not find any direct statement about the interpolation method they have used.
To my knowledge, in general, a camera calibration method is concerned on how to estimate the intrinsic, extrinsic and lens distortion parameters (all these parameters are inside the input argument cameraParameters of Matlab's undistortImage function); the interpolation method is part of a different problem, i.e. the problem of "Geometric Image Transformations".
I quote from the OpenCV's page Geometric Image Transformation (I have slightly modified the original omitting some details and adding some definitions, I assume you are working with grey level image):
The functions in this section perform various geometrical
transformations of 2D images. They do not change the image content but
deform the pixel grid and map this deformed grid to the destination
image. In fact, to avoid sampling artifacts, the mapping is done in
the reverse order, from destination to the source. That is, for each
pixel (x, y) of the destination image, the functions compute
coordinates of the corresponding “donor” pixel in the source image and
copy the pixel value:
dst(x,y) = src(f_x(x,y), f_y(x,y))
where
dst(x,y) is the grey value of the pixel located at row x and column y in the destination image
src(x,y) is the grey value of the pixel located at row x and column y in the source image
f_x is a function that maps the row x and the column y to a new row, it just uses coordinates and not the grey level.
f_y is a function that maps the row x and the column y to a new column, it just uses coordinates and not the grey level.
The actual implementations of the geometrical transformations, from
the most generic remap() and to the simplest and the fastest resize()
, need to solve two main problems with the above formula:
• Extrapolation of non-existing pixels. Similarly to the filtering
functions described in the previous section, for some (x,y) , either
one of f_x(x,y) , or f_y(x,y) , or both of them may fall outside of
the image. In this case, an extrapolation method needs to be used.
OpenCV provides the same selection of extrapolation methods as in the
filtering functions. In addition, it provides the method
BORDER_TRANSPARENT . This means that the corresponding pixels in the
destination image will not be modified at all.
• Interpolation of pixel
values. Usually f_x(x,y) and f_y(x,y) are floating-point numbers. This
means that <f_x, f_y> can be either an affine or
perspective transformation, or radial lens distortion correction, and
so on. So, a pixel value at fractional coordinates needs to be
retrieved. In the simplest case, the coordinates can be just rounded
to the nearest integer coordinates and the corresponding pixel can be
used. This is called a nearest-neighbor interpolation. However, a
better result can be achieved by using more sophisticated
interpolation methods, where a polynomial function is fit into some
neighborhood of the computed pixel (f_x(x,y), f_y(x,y)), and then the
value of the polynomial at (f_x(x,y), f_y(x,y)) is taken as the
interpolated pixel value. In OpenCV, you can choose between several
interpolation methods. See resize() for details.
For a "soft" introduction see also for example Cambridge in colour - DIGITAL IMAGE INTERPOLATION.
So let's say you need the grey level of pixel at x=20.2 y=14.7, since x and y are number with a fractional part different from zero you will need to "invent" (compute) the grey level in some way. In the simplest case ('nearest' interpolation) you just say that the grey level at (20.2,14.7) is the grey level you retrieve at (20,15), it is called "nearest" because 20 is the nearest integer value to 20.2 and 15 is the nearest integer value to 14.7.
In the (bi)'linear' interpolation you will compute the value at (20.2,14.7) with a combination of the grey levels of the four pixels at (20,14), (20,15), (21,14), (21,15); for the details on how to compute the combination see the Wikipedia page which has a numeric example.
The (bi)'cubic' interpolation considers the combination of sixteen pixels in order to compute the value at (20.2,14.7), see the Wikipedia page.
I suggest you to try all the three methods, with the same input image, and see the differences in the output image.
Interpolation method is actually independent of the camera calibration. Any time you apply a geometric transformation to an image, such as rotation, re-sizing, or distortion compensation, the pixels in the new image will correspond to points between the pixels of the old image. So you have to interpolate their values somehow.
'nearest' means you simply use the value of the nearest pixel.
'linear' means you use bi-linear interpolation. The new pixel's value is a weighted sum of the values of the neighboring pixels in the input image, where the weights are proportional to distances.
'cubic' means you use a bi-cubic interpolation, which is more complicated than bi-linear, but may give you a smoother image.
A good description of these interpolation methods is given in the documentation for the interp2 function.
And finally, just to clarify, the undistortImage function is in the Computer Vision System Toolbox.

OpenCV - Dynamically find HSV ranges for color

When given an image such as this:
And not knowing the color of the object in the image, I would like to be able to automatically find the best H, S and V ranges to threshold the object itself, in order to get a result such as this:
In this example, I manually found the values and thresholded the image using cv::inRange.The output I'm looking for, are the best H, S and V ranges (min and max value each, total of 6 integer values) to threshold the given object in the image, without knowing in advance what color the object is. I need to use these values later on in my code.
Keypoints to remember:
- All given images will be of the same size.
- All given images will have the same dark background.
- All the objects I'll put in the images will be of full color.
I can brute force over all possible permutations of the 6 HSV ranges values, threshold each one and find a clever way to figure out when the best blob was found (blob size maybe?). That seems like a very cumbersome, long and highly ineffective solution though.
What would be good way to approach this? I did some research, and found that OpenCV has some machine learning capabilities, but I need to have the actual 6 values at the end of the process, and not just a thresholded image.
You could create a small 2 layer neural network for the task of dynamic HSV masking.
steps:
create/generate ground truth annotations for image and its HSV range for the required object
design a small neural network with at least 1 conv layer and 1 fcn layer.
Input : Mask of the image after applying the HSV range from ground truth( mxn)
Output : mxn mask of the image in binary
post processing : multiply the mask with the original image to get the required object highligted

Threshold to amplify black lines

Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.

Photoshop's RGB levels with ImageMagick

I'm attempting to convert some effects created in Photoshop into code for use with php/imagemagick. Right now I'm specifically interested in how to recreate Photoshop's RGB levels feature. I'm not really familiar with the Photoshop interface, but this is the info that I am given:
RGB Level Adjust
Input levels: Shadow 0, Midtone 0.92, Highlight 255
Output levels: Shadow 0, Highlight 255
What exaclty are the input levels vs. the output levels? How would I translate this into ImageMagick? Below you can see what I have tried, but it does not correctly render the desired effect (converting Photoshop's 0-255 scale to 0-65535):
$im->levelImage(0, 0.92, 65535);
$im->levelImage(0, 1, 65535);
This was mostly a stab in the dark since the parameter names don't line up and for output levels the number of parameters don't even match. Basically I don't understand exactly what is going on when photoshop applies the adjustment. I think that's my biggest hurdle right now. Once I get that, I'll need to find corresponding effects in ImageMagick.
Can anyone shed some light on what's going on in Photoshop and how to replicate that with ImageMagick?
Shadows, Midtones and Highlights are colors that fall within a certain range of luminosity. For example, shadows are the lower range of the luminosity histogram, midtones are colors in the middle and highlights are the ones up high. However - you can't use a hard limit on these values, which is why you will have to use curves like these that weight the histogram (a pixel may lie in multiple ranges at the same time).
To adjust shadows, midtones and highlights separately, you will need to create a weighted sum per pixel that uses the current shadow, midtone and highlight values to create a resultant value.
I don't think you can do this directly using ImageMagick API's - perhaps you could simply write it as a filter.
Hope this helps.
So I stumbled across this website: http://www.fmwconcepts.com/imagemagick/levels/index.php
Based on the information there, I was able to come up with the following php which seems pretty effective at simulating what Photoshop does with input and output and all that.
function levels($im, $inshadow, $midtone, $inhighlight, $outshadow, $outhighlight, $channel = self::CHANNEL_ALL) {
$im->levelImage($inshadow, $midtone, $inhighlight, $channel);
$im->levelImage(-$outshadow, 1.0, 255 + (255 - $outhighlight), $channel);
}
This assumes that the parameters to levelImage for blackpoint and whitepoint are on a scale of 0-255. They might actually be 0-65535 on your system. If that's they can it's easy enough to adjust it. You can also check what value your setup uses with $im->getQuantumRange(). It will return an array with a string version and a long version of the quantum. From there it should be easy enough to normalize the values or just use the new range.
See the documentation: The first value is the black point (shadow) input value, the middle is a gamma (which I'm guessing is the same as Photoshop's midpoint), and the last is the white point (highlight) input value.
The output values are fixed at the quantum values of the image type, there's no need to specify them.

Resources