Making an image appear only in grayscale - image-processing

I was writing a program that would scramble the colors of each pixel in an image so that the resulting image would be very noisy and hard to recogize but that it would appear normal in grayscale.
As soon as I wrote the program, I noticed a problem. The alogrithm I used assumed that the way to calculate grayscale was something like
Grayscale(Pixel p) {
average := (p.red + p.green + p.blue) / 3
p.red := p.green := p.blue := average;
}
I later realized that the numbers have weights attached to them:
Grayscale(Pixel p) {
average := (c1 * p.red^gamma) + (c2 * p.green^gamma) + (c3 * p.blue^gamma)
p.red := p.green := p.blue := average
}
and thus my program did not work.
I was specifically targeting the iOS grayscale filter, although the algorithm should be the same for any system.
(side note: I don't know the weights of the iOS values, so if anyone could give me those in the algorithm as well, that'd be pretty cool. I found this thread discussing a similar problem but it doesn't answer my question: https://www.reddit.com/r/compsci/comments/a1yrf1/what_algorithm_do_ios_devices_use_to_grayscale/ )
Assuming that the grayscale uses some sort of weights, what algorithm could I use to change the rgb values of the given pixel while keeping the grayscale values of the pixel the same?

Related

Difference between absdiff and normal subtraction in OpenCV

I am currently planning on training a binary image classification model. The images I want to train on are the difference between two original pictures. In other words, for each data entry, I start out with 2 pictures, take their difference, and the label that difference as a 0 or 1. My question is what is the best way to find this difference. I know about cv2.absdiff and then normal subtraction of images - what is the most effective way to go about this?
About the data: The images I'm training on are screenshots that usually are the same but may have small differences. I found that normal subtraction seems to show the differences less than absdiff.
This is the code I use for absdiff:
diff = cv2.absdiff(img1, img2)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
th = 1
imask = mask>1
canvas = np.zeros_like(img2, np.uint8)
canvas[imask] = img2[imask]
And then this for normal subtraction:
def extract_diff(self,imageA, imageB, image_name, path):
subtract = imageB.astype(np.float32) - imageA.astype(np.float32)
mask = cv2.inRange(np.abs(subtract),(30,30,30),(255,255,255))
th = 1
imask = mask>1
canvas = np.zeros_like(imageA, np.uint8)
canvas[imask] = imageA[imask]
Thanks!
A difference can be negative or positive.
For some number types, such as uint8 (unsigned 8-bit int), which can't be negative (have no sign), a negative value wraps around and the value would make no sense anymore. Other types can be signed (e.g. floats, signed ints), so a negative value can be represented correctly.
That's why cv.absdiff exists. It always gives you absolute differences, and those are okay to represent in an unsigned type.
Example with numbers: a = 4, b = 6. a-b should be -2, right?
That value, as an uint8, will wrap around to become 0xFE, or 254 in decimal. The 254 value has some relation to the true -2 difference, but it also incorporates the range of values of the data type (8 bits: 256 values), so it's really just "code".
cv.absdiff would give you the absolute of the difference (-2), which is 2.

Image Processing Pipelining in VHDL

I am currently trying to develop a Sobel filter in VHDL. I am using a 640x480 picture that is stored in a BRAM. The algorithm uses a 3x3 matrix of pixels of the image for processing each output pixel. My problem is that I currently only know of putting an image into a BRAM where each address of the BRAM holds one pixel value. This means I can only read one pixel per clock. My problem is that I am trying to pipeline the data so I would ideally need to be able to get three pixel values (one from each row of the picture) per clock so after my initial latency, I can load in three new pixel values per clock and get an output pixel on every clock. I am looking for a way to do this but cannot figure it out.
The only way I can think of to fix this is to have the image in 3 BRAMs. that way I can read in values from 3 rows per each clock cycle. However, there is not enough memory space to fit even one more RAM large enough to fit a 640x480 image let alone three. I could lower the picture size to do it this way, but I really want to do it with my current 640x480 image size.
Any help or guidance would be greatly appreciated.
A simple solution would be to store 1/4th of the image in 4 separate memories. First memory contain every 4th line, second every 4th line, starting from second line, etc. I would use 4 even if you need 3 lines, since 4 evenly divides 480 and every other standard resolution. Also, finding a binary number modulo 4 is trivial, which is needed to order the memories.
You can use the MSB of the line number to address your RAM, and the LSBs to figure out the relative order of each RAM output (code is only to demonstrate idea, it's not usable as is...):
address <= line(line'left downto 2) & col; -- Or something more efficent on packing
data0 <= ram0(address);
data1 <= ram1(address);
data2 <= ram2(address);
data3 <= ram3(address);
case line(1 downto 0) is
when "00" =>
line0 <= data0;
line1 <= data1;
line2 <= data2;
when "01" =>
line0 <= data1;
line1 <= data2;
line2 <= data3;
when "10" =>
line0 <= data2;
line1 <= data3;
line2 <= data0;
when "11" =>
line0 <= data3;
line1 <= data0;
line2 <= data1;
when others => null;
end case;
I made a sobel filter few years ago. To do that, i wrote a pipeline that gives 9 pixels at each clock cycle:
architecture rtl of matrix_3x3_builder_8b is
type fifo_t is array (0 to 2*IM_WIDTH + 2) of std_logic_vector(7 downto 0);
signal fifo_int : fifo_t;
begin
p0_build_5x5: process(rst_i,clk_i)
begin
if( rst_i = '1' )then
fifo_int <= (others => (others => '0'));
elsif( rising_edge(clk_i) )then
if(data_valid_i = '1')then
for i in 1 to 2*IM_WIDTH + 2 loop
fifo_int(i) <= fifo_int(i-1);
end loop;
fifo_int(0) <= data_i;
end if;
end if;
end process p0_build_5x5;
data_o1 <= fifo_int(0*IM_WIDTH + 0);
data_o2 <= fifo_int(0*IM_WIDTH + 1);
data_o3 <= fifo_int(0*IM_WIDTH + 2);
data_o4 <= fifo_int(1*IM_WIDTH + 0);
data_o5 <= fifo_int(1*IM_WIDTH + 1);
data_o6 <= fifo_int(1*IM_WIDTH + 2);
data_o7 <= fifo_int(2*IM_WIDTH + 0);
data_o8 <= fifo_int(2*IM_WIDTH + 1);
data_o9 <= fifo_int(2*IM_WIDTH + 2);
end rtl;
Here you read the image pixel by pixel to build your 3x3 matrix. The pipeline is longer to fill up but once completed, you have a new matrix each clock pulse.
If you want to continue storing the whole image, then I would do as Jonathan Drolet recommended and cycle between four rams while writing and read all 4 at once (muxing the three you care about into 3 registers).
This works because your rams will be deep enough that you will still be able to get full BRAM utilization at 1/4 the depth (77k deep still) and your reads can be predictably segmented.
For the specifics of this problem, Nicolas Roudel's method is much cheaper with BRAM, although you can't store the whole image at one time, so wherever you send your results can't backpressure you unless you can backpressure your data source. That may or may not matter for your application.
When you try to do something like this with extremely wide, but fairly shallow (1k deep) rams segmenting will use more block ram (or even start inferring distributed ram). When the reads do not follow a particular pattern (the pattern in your case is that they are all sequential and adjacent locations), the ram cannot be segmented. The best strategy to maintain efficient BRAM use is often to build quad port rams from the natively dual port block rams by clocking them with a 2x clock that is phase aligned with your normal clock, allowing you to do a write and 3 reads every 1x clock cycle.

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .

OpenCV DFT_INVERSE different from Matlab's ifft

I try to filter a signal using opencv's dft function. The way I try to this is taking the signal in time domain:
x = [0.0201920000000000 -0.0514940000000000 0.0222140000000000 0.0142460000000000 -0.00313500000000000 0.00270600000000000 0.0111770000000000 0.0233470000000000 -0.00162700000000000 -0.0306280000000000 0.0239410000000000 -0.0225840000000000 0.0281410000000000 0.0265510000000000 -0.0272180000000000 0.0223850000000000 -0.0366850000000000 0.000515000000000000 0.0213440000000000 -0.0107180000000000 -0.0222150000000000 -0.0888300000000000 -0.178814000000000 -0.0279280000000000 -0.144982000000000 -0.199606000000000 -0.225617000000000 -0.188347000000000 0.00196200000000000 0.0830530000000000 0.0716730000000000 0.0723950000000000]
Convert it to FOURIER domain using :
cv::dft(x, x_fft, cv::DFT_COMPLEX_OUTPUT, 0);
Eliminate the unwanted frequencies:
for(int k=0; k<32;k++){
if(k==0 || k>6 )
{
x_fft.ptr<float>(0)[2*k+0]=0;
x_fft.ptr<float>(0)[2*k+1]=0;
}
}
Convert it back to time domain:
cv::dft(x_fft, x_filt, cv::DFT_INVERSE, 0);
In order to check my results I've compared them to Matlab. I took the same signal x, convert it to FOURIER using x_mfft = fft(x); The results are similar to the ones I get from opencv, excepting the fact that in opencv I only get the left side, while in matlab I get the symmetric values too.
After this I set to 0 in Matlab the values of x_mfft(0) and x_mfft(8:32) and now the signal look exactly the same except the fact that in Matlab they are in complex form, while in opencv they are separated, real part in one channel, imaginary part in the other.
The problem is that when I perform the inverse transform in matlab using x_mfilt = ifft(x_mfft) the results are completely different from what I get using opencv.
Matlab:
0.0126024108604191 + 0.0100628178150509i 0.00278762121814893 - 0.00615997579216921i 0.0116716145588075 - 0.0150834711251450i 0.0204808089882897 - 0.00937680194210788i 0.0187164132302469 - 0.000843687942567208i 0.0132322795522116 - 0.000108642129381095i 0.0140282455278201 - 0.00325620843335947i 0.0190436542174946 - 0.000556561558544529i 0.0182379867325824 + 0.00764390022568001i 0.00964801276734883 + 0.0107158342431018i 0.00405220362962359 + 0.00339496875258604i 0.0108096973356501 - 0.00476499376334313i 0.0236507440224628 - 0.000415067678294738i 0.0266197220512826 + 0.0154626911663024i 0.0142805873081583 + 0.0267004219364679i 0.000314527358302778 + 0.0215255889620223i 0.00173512964620177 + 0.00865151513638104i 0.0169666351363477 + 0.00836162056544561i 0.0255915540012784 + 0.0277878383595920i 0.0118710562486680 + 0.0506446948330055i -0.0160165379892836 + 0.0553846122152651i -0.0354343989166415 + 0.0406080858067314i -0.0370261047451452 + 0.0261077990289579i -0.0365120038155127 + 0.0268311542287801i -0.0541841640123775 + 0.0312446266697320i -0.0854132555297956 + 0.0125342802025550i -0.0989182320365535 - 0.0377079727602073i -0.0686133217915410 - 0.0925138855355046i -0.00474198249025186 - 0.111728716441247i 0.0515933837210975 - 0.0814138940625859i 0.0663201317560107 - 0.0279433757588921i 0.0426055814586485 + 0.00821080477569232i
OpenCV after cv::dft(x_fft, x_filt, cv::DFT_INVERSE, 0);
Channel 1:
0.322008 -0.197121 -0.482671 -0.300055 -0.026996 -0.003475 -0.104199 -0.017810 0.244606 0.342909 0.108642 -0.152477 -0.013281 0.494806 0.854412 0.688818 0.276848 0.267571 0.889207 1.620622 1.772298 1.299452 0.835450 0.858602 0.999833 0.401098 -1.206658 -2.960446 -3.575316 -2.605239 -0.894184 0.262747
Channel 2:
0.403275 0.089205 0.373494 0.655387 0.598925 0.423432 0.448903 0.609397 0.583616 0.308737 0.129670 0.345907 0.756820 0.851827 0.456976 0.010063 0.055522 0.542928 0.818924 0.379870 -0.512527 -1.133893 -1.184826 -1.168379 -1.733893 -2.733226 -3.165383 -2.195622 -0.151738 1.650990 2.122242 1.363375
What am I missing? Shouldn't the results be similar? How can I check if the inverse transform in opencv is done correctly?
Later EDIT:
After struggling with the problems for a few hours now I've decided to plot the results from Matlab and OpenCV and to my surprise they were very much similar.
Imaginary parts
Real parts:
So obviously it's something about a SCALE factor. After dividing them element by element apparently this factor is 32 - the length of the signal. Can someone explain why this happens?
The obvious solution is to use cv::dft(x_fft, x_filt, cv::DFT_INVERSE+cv::DFT_SCALE, 0); so I guess this topic is answered but I'm still interested in why is it this way.
There is no standard for scale factor used by all FFT libraries. Some use none, some include a scale factor of 1/N, some 1/sqrt(N). You have to test or look in the documentation for each particular library.

Converting RGB to grayscale/intensity

When converting from RGB to grayscale, it is said that specific weights to channels R, G, and B ought to be applied. These weights are: 0.2989, 0.5870, 0.1140.
It is said that the reason for this is different human perception/sensibility towards these three colors. Sometimes it is also said these are the values used to compute NTSC signal.
However, I didn't find a good reference for this on the web. What is the source of these values?
See also these previous questions: here and here.
The specific numbers in the question are from CCIR 601 (see Wikipedia article).
If you convert RGB -> grayscale with slightly different numbers / different methods,
you won't see much difference at all on a normal computer screen
under normal lighting conditions -- try it.
Here are some more links on color in general:
Wikipedia Luma
Bruce Lindbloom 's outstanding web site
chapter 4 on Color in the book by Colin Ware, "Information Visualization", isbn 1-55860-819-2;
this long link to Ware in books.google.com
may or may not work
cambridgeincolor :
excellent, well-written
"tutorials on how to acquire, interpret and process digital photographs
using a visually-oriented approach that emphasizes concept over procedure"
Should you run into "linear" vs "nonlinear" RGB,
here's part of an old note to myself on this.
Repeat, in practice you won't see much difference.
### RGB -> ^gamma -> Y -> L*
In color science, the common RGB values, as in html rgb( 10%, 20%, 30% ),
are called "nonlinear" or
Gamma corrected.
"Linear" values are defined as
Rlin = R^gamma, Glin = G^gamma, Blin = B^gamma
where gamma is 2.2 for many PCs.
The usual R G B are sometimes written as R' G' B' (R' = Rlin ^ (1/gamma))
(purists tongue-click) but here I'll drop the '.
Brightness on a CRT display is proportional to RGBlin = RGB ^ gamma,
so 50% gray on a CRT is quite dark: .5 ^ 2.2 = 22% of maximum brightness.
(LCD displays are more complex;
furthermore, some graphics cards compensate for gamma.)
To get the measure of lightness called L* from RGB,
first divide R G B by 255, and compute
Y = .2126 * R^gamma + .7152 * G^gamma + .0722 * B^gamma
This is Y in XYZ color space; it is a measure of color "luminance".
(The real formulas are not exactly x^gamma, but close;
stick with x^gamma for a first pass.)
Finally,
L* = 116 * Y ^ 1/3 - 16
"... aspires to perceptual uniformity [and] closely matches human perception of lightness." --
Wikipedia Lab color space
I found this publication referenced in an answer to a previous similar question. It is very helpful, and the page has several sample images:
Perceptual Evaluation of Color-to-Grayscale Image Conversions by Martin Čadík, Computer Graphics Forum, Vol 27, 2008
The publication explores several other methods to generate grayscale images with different outcomes:
CIE Y
Color2Gray
Decolorize
Smith08
Rasche05
Bala04
Neumann07
Interestingly, it concludes that there is no universally best conversion method, as each performed better or worse than others depending on input.
Heres some code in c to convert rgb to grayscale.
The real weighting used for rgb to grayscale conversion is 0.3R+0.6G+0.11B.
these weights arent absolutely critical so you can play with them.
I have made them 0.25R+ 0.5G+0.25B. It produces a slightly darker image.
NOTE: The following code assumes xRGB 32bit pixel format
unsigned int *pntrBWImage=(unsigned int*)..data pointer..; //assumes 4*width*height bytes with 32 bits i.e. 4 bytes per pixel
unsigned int fourBytes;
unsigned char r,g,b;
for (int index=0;index<width*height;index++)
{
fourBytes=pntrBWImage[index];//caches 4 bytes at a time
r=(fourBytes>>16);
g=(fourBytes>>8);
b=fourBytes;
I_Out[index] = (r >>2)+ (g>>1) + (b>>2); //This runs in 0.00065s on my pc and produces slightly darker results
//I_Out[index]=((unsigned int)(r+g+b))/3; //This runs in 0.0011s on my pc and produces a pure average
}
Check out the Color FAQ for information on this. These values come from the standardization of RGB values that we use in our displays. Actually, according to the Color FAQ, the values you are using are outdated, as they are the values used for the original NTSC standard and not modern monitors.
What is the source of these values?
The "source" of the coefficients posted are the NTSC specifications which can be seen in Rec601 and Characteristics of Television.
The "ultimate source" are the CIE circa 1931 experiments on human color perception. The spectral response of human vision is not uniform. Experiments led to weighting of tristimulus values based on perception. Our L, M, and S cones1 are sensitive to the light wavelengths we identify as "Red", "Green", and "Blue" (respectively), which is where the tristimulus primary colors are derived.2
The linear light3 spectral weightings for sRGB (and Rec709) are:
Rlin * 0.2126 + Glin * 0.7152 + Blin * 0.0722 = Y
These are specific to the sRGB and Rec709 colorspaces, which are intended to represent computer monitors (sRGB) or HDTV monitors (Rec709), and are detailed in the ITU documents for Rec709 and also BT.2380-2 (10/2018)
FOOTNOTES
(1) Cones are the color detecting cells of the eye's retina.
(2) However, the chosen tristimulus wavelengths are NOT at the "peak" of each cone type - instead tristimulus values are chosen such that they stimulate on particular cone type substantially more than another, i.e. separation of stimulus.
(3) You need to linearize your sRGB values before applying the coefficients. I discuss this in another answer here.
Starting a list to enumerate how different software packages do it. Here is a good CVPR paper to read as well.
FreeImage
#define LUMA_REC709(r, g, b) (0.2126F * r + 0.7152F * g + 0.0722F * b)
#define GREY(r, g, b) (BYTE)(LUMA_REC709(r, g, b) + 0.5F)
OpenCV
nVidia Performance Primitives
Intel Performance Primitives
Matlab
nGray = 0.299F * R + 0.587F * G + 0.114F * B;
These values vary from person to person, especially for people who are colorblind.
is all this really necessary, human perception and CRT vs LCD will vary, but the R G B intensity does not, Why not L = (R + G + B)/3 and set the new RGB to L, L, L?

Resources