Why does Metal constant with h suffix produce bad byte range results? - metal

I have run into a very strange problem in my Metal shader that has to do with a byte value in the range (0, 255). This byte value is represented as a ushort that is converted to a half precision by code like (x / 255.0h). What is strange is that this literal constant divide seems to be optimized incorrectly when run on a A7 device (A10 does not do this). Has anyone else run into this? Is there some way I can write Metal code that is used only on this GPU family 1 device?
I found a workaround, by leaving the h suffix off of the inline constant:
// This method accepts 4 byte range input values and encodes them as a half4 vector
// that works properly on A7 class hardware. The issue with A7 devices is that
// there seems to be a compler bug or range issue with an operation like (x / 255.0h).
// What should be the same operation (x / 255.0) does not show the range problem on A7.
half4
encodeBytesAsHalf4(const ushort4 b4) {
return half4(b4.x/255.0, b4.y/255.0, b4.z/255.0, b4.a/255.0);
}

Related

Difference between absdiff and normal subtraction in OpenCV

I am currently planning on training a binary image classification model. The images I want to train on are the difference between two original pictures. In other words, for each data entry, I start out with 2 pictures, take their difference, and the label that difference as a 0 or 1. My question is what is the best way to find this difference. I know about cv2.absdiff and then normal subtraction of images - what is the most effective way to go about this?
About the data: The images I'm training on are screenshots that usually are the same but may have small differences. I found that normal subtraction seems to show the differences less than absdiff.
This is the code I use for absdiff:
diff = cv2.absdiff(img1, img2)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
th = 1
imask = mask>1
canvas = np.zeros_like(img2, np.uint8)
canvas[imask] = img2[imask]
And then this for normal subtraction:
def extract_diff(self,imageA, imageB, image_name, path):
subtract = imageB.astype(np.float32) - imageA.astype(np.float32)
mask = cv2.inRange(np.abs(subtract),(30,30,30),(255,255,255))
th = 1
imask = mask>1
canvas = np.zeros_like(imageA, np.uint8)
canvas[imask] = imageA[imask]
Thanks!
A difference can be negative or positive.
For some number types, such as uint8 (unsigned 8-bit int), which can't be negative (have no sign), a negative value wraps around and the value would make no sense anymore. Other types can be signed (e.g. floats, signed ints), so a negative value can be represented correctly.
That's why cv.absdiff exists. It always gives you absolute differences, and those are okay to represent in an unsigned type.
Example with numbers: a = 4, b = 6. a-b should be -2, right?
That value, as an uint8, will wrap around to become 0xFE, or 254 in decimal. The 254 value has some relation to the true -2 difference, but it also incorporates the range of values of the data type (8 bits: 256 values), so it's really just "code".
cv.absdiff would give you the absolute of the difference (-2), which is 2.

How are floating-point pixel values converted to integer values?

How does image library (such as PIL, OpenCV, etc) convert floating-point values to integer pixel values?
For example
import numpy as np
from PIL import Image
# Creates a random image and saves in a file
def get_random_img(m=0, s=1, fname='temp.png'):
im = m + s * np.random.randn(60, 60, 3) # For eg. min: -3.8947058634971179, max: 3.6822041760496904
print(im[0, 0]) # for eg. array([ 0.36234732, 0.96987366, 0.08343])
imp = Image.fromarray(im, 'RGB') # (*)
print(np.array(imp)[0, 0]) # [140 , 74, 217]
imp.save(fname)
return im, imp
For the above method, an example is provided in the comment (which is randomly produced). My question is: how does (*) convert ndarray (which can range from - infinity to plus infinity) to pixel values between 0 and 255?
I tried to investigate the Pil.Image.fromarray method and eventually ended by at line #798 d.decode(data) within Pil.Image.Image().frombytes method. I could find the implementation of decode method, thus unable to know what computation goes behind the conversion.
My initial thought was that maybe the method use minimum (to 0) and maximum (to 255) value from the array and then map all the other values accordingly between 0 and 255. But upon investigations, I found out that's not what is happening. Moreover, how does it handle when the values of the array range between 0 and 1 or any other range of values?
Some libraries assume that floating-point pixel values are between 0 and 1, and will linearly map that range to 0 and 255 when casting to 8-bit unsigned integer. Some others will find the minimum and maximum values and map those to 0 and 255. You should always explicitly do this conversion if you want to be sure of what happened to your data.
In general, a pixel does not need to be 8-bit unsigned integer. A pixel can have any numerical type. Usually a pixel intensity represents an amount of light, or a density of some sort, but this is not always the case. Any physical quantity can be sampled in 2 or more dimensions. The range of meaningful values thus depends on what is imaged. Negative values are often also meaningful.
Many cameras have 8-bit precision when converting light intensity to a digital number. Likewise, displays typically have an b-bit intensity range. This is the reason many image file formats store only 8-bit unsigned integer data. However, some cameras have 12 bits or more, and some processes derive pixel data with a higher precision that one does not want to quantize. Therefore formats such as TIFF and ICS will allow you to save images in just about any numeric format you can think of.
I'm afraid it has done nothing anywhere near as clever as you hoped! It has merely interpreted the first byte of the first float as a uint8, then the second byte as another uint8...
from random import random, seed
import numpy as np
from PIL import Image
# Generate repeatable random data, so other folks get the same results
np.random.seed(42)
# Make a single RGB pixel
im = np.random.randn(1, 1, 3)
# Print the floating point values - not that we are interested in them
print(im)
# OUTPUT: [[[ 0.49671415 -0.1382643 0.64768854]]]
# Save that pixel to a file so we can dump it
im.tofile('array.bin')
# Now make a PIL Image from it and print the uint8 RGB values
imp = Image.fromarray(im, 'RGB')
print(imp.getpixel((0,0)))
# OUTPUT: (124, 48, 169)
So, PIL has interpreted our data as RGB=124/48/169
Now look at the hex we dumped. It is 24 bytes long, i.e. 3 float64 (8-byte) values, one for red, one for green and one for blue for the 1 pixel in our image:
xxd array.bin
Output
00000000: 7c30 a928 2aca df3f 2a05 de05 a5b2 c1bf |0.(*..?*.......
00000010: 685e 2450 ddb9 e43f h^$P...?
And the first byte (7c) has become 124, the second byte (30) has become 48 and the third byte (a9) has become 169.
TLDR; PIL has merely taken the first byte of the first float as the Red uint8 channel of the first pixel, then the second byte of the first float as the Green uint8 channel of the first pixel and the third byte of the first float as the Blue uint8 channel of the first pixel.

MedianBlur() calculate max kernel size

I want to use MedianBlur function with very high Ksize, like 301 or more. But if I pass ksize too high, sometimes the function will crash. The error message is:
OpenCV Error: (k < 16) in cv::medianBlur_8u_O1, in file ../opencv\modules\imgproc\src\smooth.cpp
(I use opencv4nodejs, but I also tried the original OpenCV 3.4.6).
I did reduce the ksize in a try/catch loop, but not so effective, since I have to work with videos.
I did checkout the OpenCV source code and did some research.
In OpenCV 3.4.6, the crash come from line 241, file opencv\modules\imgproc\src\median_blur.simd.hpp:
for ( k = 0; k < 16 ; ++k )
{
sum += H.coarse[k];
if ( sum > t )
{
sum -= H.coarse[k];
break;
}
}
CV_Assert( k < 16 ); // Error here
t is caculated base on ksize. But sum and H.coarse array's calculations are quite complicated.
Did further researches, I found a scientific document about the algorithm: https://www.researchgate.net/publication/321690537_Efficient_Scalable_Median_Filtering_Using_Histogram-Based_Operations
I am trying to read but honestly, I don't understand too much.
How do I calculate the maximum ksize with a given image?
The maximum kernel size is determined from the bit depth of the image. As mentioned in the publication you cited:
"An 8-bit value is limited to a max value of 255. Our goal is to
support larger kernel sizes, including kernels that are greater in
size than 17 × 17, thus the larger 32-bit data type is used"
so for an image of data type CV_8U the maximum kernel size is 255.

Real to Complex FFT with CUFFT, using OpenCV as Data source

I'm having an issue trying to perform a two dimensional transform on an array of floats using cuFFT. I've had a look at the documentation, but some of the information is contradictory/not clear; so I have a few questions:
My data is 480 rows, with 640 columns (e.g. float data[480][640] but in a single dimension so float data[480*640])
If we say my input dimensions (of real data) are N1 = 480 and N2 = 640. Are the dimensions (after a real to complex transform) N1=480, N2=321?
Can I cudaMemcpy the data directly into a cufftReal array of the same size? Or must it be acufftComplex array?
If it must be acufftComplex array, I am assuming the elements need to be in the place of the real components?
What is the correct structure of a call to cufftPlan2d, cufftExecR2C and cufftC2R given the above values.
I think that's all for now...
Many thanks in advance
EDIT: So, I've implemented the Forward and Inverse transforms as suggested by JackOLantern. However my results are not what I am expecting (an identical Result after FFT as Before it). I have an image gallery here showing two sets of examples. The first is from my room, the second from my University Project.
In the cuFFT Documentation, there is ambiguity in the use of cufftPlan2d (hence why I asked). In the documentation, for a two dimensional array, the data should be input as above (float data[480][640] == float data[NY][NX]) So NY represents the rows. However in the function listing for cufftPlan2d, it states that nx (the parameter) is for the rows...
Swapping the values of NX and NY in the function call gives the result as in the project image (correct orientation, but split into three partially overlapping images at 1/4 the normal size) however, using the parameters as JackOLantern states in his answer gives a slanted/skewed result.
Am I doing something wrong here? Or does the cuFFT library have issues with this type of thing.
ALSO: I have undone a couple of the edits made by JackOLantern to this question as my issues MAY stem from the fact my data is coming from OpenCV.
EDIT: I've recently found out that I was the one who made a mistake in the way I used the function.
Originally I though the function definition referred to the size of the data being passed into it.
However, it appears that the parameters actually refer directly to the size of the REAL part.
This means that the parameters refer to:
The size of the input data when using R2C (Real to Complex)
The size of the output data when using C2R (Complex to Real)
So it appears that the cuFFT documentation and the library itself do not correspond.
When performing an R2C followed by a C2R (real to complex, complex to real respectively), the documentation states that for a Real input of NX x NY dimensions, the Complex output is NX x (floor(NY/2) +1); and vice versa.
However the actual output is of dimensions NX x NY and the actual input is of dimensions NX x NY. This is (half) mentioned on the very first page as
C2R - Symmetric complex input to real output
Implying that the complex data must be Symmetric, i.e. must also have the redundant data in addition to the non-redundant data.
There are a number of other contradictions within the documentation as well which I won't go into.
Needless to say, the problem has been solved.
I have included a MWE below. Near the top are a couple of lines with #define NUM_C2 and appropriate comments. Changing this changes whether the documentation format is followed, or my "fix".
The output is
The Input Real data
The Intermediate Complex data
The output Real data
The ratio of the output data to the input data (there are minor FFT errors, ~1 indicates correct)
Feel free to change the parameters (NUM_R and NUM_C) and feel free to comment if you think I have made a mistake somewhere.
#include <iostream>
#include <math.h>
#include <cufft.h>
// e.g. float data[NUM_R][NUM_C]
#define NUM_R 12
#define NUM_C 16
// Documentation Version
//#define NUM_C2 (1+NUM_C/2)
// "Correct" Version
#define NUM_C2 NUM_C
using namespace std;
int main(int argc, char** argv)
{
cufftReal *in_h, *out_h, *in_d, *out_d;
cufftComplex *mid_d, *mid_h;
cufftHandle pF, pI;
int r, c;
in_h = (cufftReal*) malloc(NUM_R * NUM_C * sizeof(cufftReal));
out_h= (cufftReal*) malloc(NUM_R * NUM_C * sizeof(cufftReal));
mid_h= (cufftComplex*)malloc(NUM_C2*NUM_R*sizeof(cufftComplex));
cudaMalloc((void**) &in_d, NUM_R * NUM_C * sizeof(cufftReal));
cudaMalloc((void**)&out_d, NUM_R * NUM_C * sizeof(cufftReal));
cudaMalloc((void**)&mid_d, NUM_C2 * NUM_R * sizeof(cufftComplex));
cufftPlan2d(&pF, NUM_R, NUM_C, CUFFT_R2C);
cufftPlan2d(&pI, NUM_R,NUM_C2, CUFFT_C2R);
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
in_h[c + NUM_C * r] = cos(2.0*M_PI*(c*7.0/NUM_C+r*3.0/NUM_R));
out_h[c+ NUM_C * r] = 0.f;
cout<<in_h[c+NUM_C*r];
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}
cudaMemcpy((cufftReal*)in_d, (cufftReal*)in_h, NUM_R * NUM_C * sizeof(cufftReal),cudaMemcpyHostToDevice);
cufftExecR2C(pF, (cufftReal*)in_d, (cufftComplex*)mid_d);
cudaMemcpy((cufftComplex*)mid_h, (cufftComplex*)mid_d, NUM_C2*NUM_R*sizeof(cufftComplex), cudaMemcpyDeviceToHost);
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C2; c++)
{
cout<<mid_h[c+(NUM_C2)*r].x<<"|"<<mid_h[c+(NUM_C2)*r].y;
if(c<(NUM_C2-1)) cout<<", ";
else cout<<endl;
}
}
cufftExecC2R(pI, (cufftComplex*)mid_d, (cufftReal*)out_d);
cudaMemcpy((cufftReal*)out_h, (cufftReal*)out_d, NUM_R*NUM_C*sizeof(cufftReal), cudaMemcpyDeviceToHost);
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
cout<<out_h[c+NUM_C*r]/(NUM_R*NUM_C);
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
cout<<(out_h[c+NUM_C*r]/(NUM_R*NUM_C))/in_h[c+NUM_C*r];
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}
free(in_h);
free(out_h);
free(mid_h);
cudaFree(in_d);
cudaFree(out_h);
cudaFree(mid_d);
return 0;
}
1) If we say my input dimensions (of real data) are N1 = 480 and N2 = 640. Are the dimensions (after a real to complex transform) N1=480, N2=321?
The output of cufftExecR2C is a NX*(NY/2+1) cufftComplex matrix. So in your case, you will have a 480x321 float2 matrix as output.
2) Can I cudaMemcpy the data directly into a cufftReal array of the same size? Or must it be a cufftComplex array?
If it must be a cufftComplex array, I am assuming the elements need to be in the place of the real components?
Yes, you can copy the data to a cufftReal array and the N1xN2 data.
3) What is the correct structure of a call to cufftPlan2d, cufftExecR2C and cufftC2R given the above values.
cufftPlan2d(&plan, N1, N2, CUFFT_R2C);
cufftExecR2C(plan, (cufftReal*)idata, (cufftComplex*) odata);

Converting RGB to grayscale/intensity

When converting from RGB to grayscale, it is said that specific weights to channels R, G, and B ought to be applied. These weights are: 0.2989, 0.5870, 0.1140.
It is said that the reason for this is different human perception/sensibility towards these three colors. Sometimes it is also said these are the values used to compute NTSC signal.
However, I didn't find a good reference for this on the web. What is the source of these values?
See also these previous questions: here and here.
The specific numbers in the question are from CCIR 601 (see Wikipedia article).
If you convert RGB -> grayscale with slightly different numbers / different methods,
you won't see much difference at all on a normal computer screen
under normal lighting conditions -- try it.
Here are some more links on color in general:
Wikipedia Luma
Bruce Lindbloom 's outstanding web site
chapter 4 on Color in the book by Colin Ware, "Information Visualization", isbn 1-55860-819-2;
this long link to Ware in books.google.com
may or may not work
cambridgeincolor :
excellent, well-written
"tutorials on how to acquire, interpret and process digital photographs
using a visually-oriented approach that emphasizes concept over procedure"
Should you run into "linear" vs "nonlinear" RGB,
here's part of an old note to myself on this.
Repeat, in practice you won't see much difference.
### RGB -> ^gamma -> Y -> L*
In color science, the common RGB values, as in html rgb( 10%, 20%, 30% ),
are called "nonlinear" or
Gamma corrected.
"Linear" values are defined as
Rlin = R^gamma, Glin = G^gamma, Blin = B^gamma
where gamma is 2.2 for many PCs.
The usual R G B are sometimes written as R' G' B' (R' = Rlin ^ (1/gamma))
(purists tongue-click) but here I'll drop the '.
Brightness on a CRT display is proportional to RGBlin = RGB ^ gamma,
so 50% gray on a CRT is quite dark: .5 ^ 2.2 = 22% of maximum brightness.
(LCD displays are more complex;
furthermore, some graphics cards compensate for gamma.)
To get the measure of lightness called L* from RGB,
first divide R G B by 255, and compute
Y = .2126 * R^gamma + .7152 * G^gamma + .0722 * B^gamma
This is Y in XYZ color space; it is a measure of color "luminance".
(The real formulas are not exactly x^gamma, but close;
stick with x^gamma for a first pass.)
Finally,
L* = 116 * Y ^ 1/3 - 16
"... aspires to perceptual uniformity [and] closely matches human perception of lightness." --
Wikipedia Lab color space
I found this publication referenced in an answer to a previous similar question. It is very helpful, and the page has several sample images:
Perceptual Evaluation of Color-to-Grayscale Image Conversions by Martin Čadík, Computer Graphics Forum, Vol 27, 2008
The publication explores several other methods to generate grayscale images with different outcomes:
CIE Y
Color2Gray
Decolorize
Smith08
Rasche05
Bala04
Neumann07
Interestingly, it concludes that there is no universally best conversion method, as each performed better or worse than others depending on input.
Heres some code in c to convert rgb to grayscale.
The real weighting used for rgb to grayscale conversion is 0.3R+0.6G+0.11B.
these weights arent absolutely critical so you can play with them.
I have made them 0.25R+ 0.5G+0.25B. It produces a slightly darker image.
NOTE: The following code assumes xRGB 32bit pixel format
unsigned int *pntrBWImage=(unsigned int*)..data pointer..; //assumes 4*width*height bytes with 32 bits i.e. 4 bytes per pixel
unsigned int fourBytes;
unsigned char r,g,b;
for (int index=0;index<width*height;index++)
{
fourBytes=pntrBWImage[index];//caches 4 bytes at a time
r=(fourBytes>>16);
g=(fourBytes>>8);
b=fourBytes;
I_Out[index] = (r >>2)+ (g>>1) + (b>>2); //This runs in 0.00065s on my pc and produces slightly darker results
//I_Out[index]=((unsigned int)(r+g+b))/3; //This runs in 0.0011s on my pc and produces a pure average
}
Check out the Color FAQ for information on this. These values come from the standardization of RGB values that we use in our displays. Actually, according to the Color FAQ, the values you are using are outdated, as they are the values used for the original NTSC standard and not modern monitors.
What is the source of these values?
The "source" of the coefficients posted are the NTSC specifications which can be seen in Rec601 and Characteristics of Television.
The "ultimate source" are the CIE circa 1931 experiments on human color perception. The spectral response of human vision is not uniform. Experiments led to weighting of tristimulus values based on perception. Our L, M, and S cones1 are sensitive to the light wavelengths we identify as "Red", "Green", and "Blue" (respectively), which is where the tristimulus primary colors are derived.2
The linear light3 spectral weightings for sRGB (and Rec709) are:
Rlin * 0.2126 + Glin * 0.7152 + Blin * 0.0722 = Y
These are specific to the sRGB and Rec709 colorspaces, which are intended to represent computer monitors (sRGB) or HDTV monitors (Rec709), and are detailed in the ITU documents for Rec709 and also BT.2380-2 (10/2018)
FOOTNOTES
(1) Cones are the color detecting cells of the eye's retina.
(2) However, the chosen tristimulus wavelengths are NOT at the "peak" of each cone type - instead tristimulus values are chosen such that they stimulate on particular cone type substantially more than another, i.e. separation of stimulus.
(3) You need to linearize your sRGB values before applying the coefficients. I discuss this in another answer here.
Starting a list to enumerate how different software packages do it. Here is a good CVPR paper to read as well.
FreeImage
#define LUMA_REC709(r, g, b) (0.2126F * r + 0.7152F * g + 0.0722F * b)
#define GREY(r, g, b) (BYTE)(LUMA_REC709(r, g, b) + 0.5F)
OpenCV
nVidia Performance Primitives
Intel Performance Primitives
Matlab
nGray = 0.299F * R + 0.587F * G + 0.114F * B;
These values vary from person to person, especially for people who are colorblind.
is all this really necessary, human perception and CRT vs LCD will vary, but the R G B intensity does not, Why not L = (R + G + B)/3 and set the new RGB to L, L, L?

Resources