Implementing convolution from scratch in Julia - image-processing

I am trying to implement convolution by hand in Julia. I'm not too familiar with image processing or Julia, so maybe I'm biting more than I can chew.
Anyway, when I apply this method with a 3*3 edge filter edge = [0 -1 0; -1 4 -1; 0 -1 0] as convolve(img, edge), I am getting an error saying that my values are exceeding the allowed values for the RGBA type.
Code
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_w, half_kernel_h) = size(kernel) .÷ 2
(width, height) = size(img)
cpy_im = copy(img)
for row ∈ 1+half_kernel_h:height-half_kernel_h
for col ∈ 1+half_kernel_w:width-half_kernel_w
from_row, to_row = row .+ (-half_kernel_h, half_kernel_h)
from_col, to_col = col .+ (-half_kernel_h, half_kernel_h)
cpy_im[row, col] = sum((kernel .* RGB.(img[from_row:to_row, from_col:to_col])))
end
end
cpy_im
end
Error (original)
ArgumentError: element type FixedPointNumbers.N0f8 is an 8-bit type representing 256 values from 0.0 to 1.0, but the values (-0.0039215684f0, -0.007843137f0, -0.007843137f0, 1.0f0) do not lie within this range.
See the READMEs for FixedPointNumbers and ColorTypes for more information.
I am able to identify a simple case where such error may occur (a white pixel surrounded by all black pixels or vice-versa). I tried "fixing" this by attempting to follow the advice here from another stackoverflow question, but I get more errors to the effect of Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package..
Code attempting to apply solution from the other SO question
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_w, half_kernel_h) = size(kernel) .÷ 2
(width, height) = size(img)
cpy_im = copy(img)
for row ∈ 1+half_kernel_h:height-half_kernel_h
for col ∈ 1+half_kernel_w:width-half_kernel_w
from_row, to_row = row .+ [-half_kernel_h, half_kernel_h]
from_col, to_col = col .+ [-half_kernel_h, half_kernel_h]
cpy_im[row, col] = sum((kernel .* RGB.(img[from_row:to_row, from_col:to_col] ./ 2 .+ 128)))
end
end
cpy_im
end
Corresponding error
MethodError: no method matching +(::ColorTypes.RGBA{Float32}, ::Int64)
Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package.
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:591
+(!Matched::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:87
+(!Matched::ChainRulesCore.AbstractThunk, ::Any) at ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:122
Now, I can try using convert etc., but when I look at the big picture, I start to wonder what the idiomatic way of solving this problem in Julia is. And that is my question. If you had to implement convolution by hand from scratch, what would be a good way to do so?
EDIT:
Here is an implementation that works, though it may not be idiomatic
function convolve(img::Matrix{<:Any}, kernel)
(half_kernel_h, half_kernel_w) = size(kernel) .÷ 2
(height, width) = size(img)
cpy_im = copy(img)
# println(Dict("width" => width, "height" => height, "half_kernel_w" => half_kernel_w, "half_kernel_h" => half_kernel_h, "row range" => 1+half_kernel_h:(height-half_kernel_h), "col range" => 1+half_kernel_w:(width-half_kernel_w)))
for row ∈ 1+half_kernel_h:(height-half_kernel_h)
for col ∈ 1+half_kernel_w:(width-half_kernel_w)
from_row, to_row = row .+ (-half_kernel_h, half_kernel_h)
from_col, to_col = col .+ (-half_kernel_w, half_kernel_w)
vals = Dict()
for method ∈ [red, green, blue, alpha]
x = sum((kernel .* method.(img[from_row:to_row, from_col:to_col])))
if x > 1
x = 1
elseif x < 0
x = 0
end
vals[method] = x
end
cpy_im[row, col] = RGBA(vals[red], vals[green], vals[blue], vals[alpha])
end
end
cpy_im
end

First of all, the error
Math on colors is deliberately undefined in ColorTypes, but see the ColorVectorSpace package.
should direct you to read the docs of the ColorVectorSpace package, where you will learn that using ColorVectorSpace will now enable math on RGB types. (The absence of default support it deliberate, because the way the image-processing community treats RGB is colorimetrically wrong. But everyone has agreed not to care, hence the ColorVectorSpace package.)
Second,
ArgumentError: element type FixedPointNumbers.N0f8 is an 8-bit type representing 256 values from 0.0 to 1.0, but the values (-0.0039215684f0, -0.007843137f0, -0.007843137f0, 1.0f0) do not lie within this range.
indicates that you're trying to write negative entries with an element type, N0f8, that can't support such values. Instead of cpy_im = copy(img), consider something like cpy_im = [float(c) for c in img] which will guarantee a floating-point representation that can support negative values.
Third, I would recommend avoiding steps like RGB.(img...) when nothing about your function otherwise addresses whether images are numeric, grayscale, or color. Fundamentally the only operations you need are scalar multiplication and addition, and it's better to write your algorithm generically leveraging only those two properties.

Tim Holy's answer above is correct - keep things simple and avoid relying on third-party packages when you don't need to.
I might point out that another option you may not have considered is to use a different algorithm. What you are implementing is the naive method, whereas many convolution routines using different algorithms for different sizes, such as im2col and Winograd (you can look these two up, I have a website that covers the idea behind both here).
The im2col routine might be worth doing as essentially you can break the routine in several pieces:
Unroll all 'regions' of the image to do a dot-product with the filter/kernel on, and stack them together into a single matrix.
Do a matrix-multiply with the unrolled input and filter/kernel.
Roll the output back into the correct shape.
It might be more complicated overall, but each part is simpler, so you may find this easier to do. A matrix multiply routine is definitely quite easy to implement. For 1x1 (single-pixel) convolutions where the image and filter have the same ordering (i.e. NCHW images and FCHW filter) the first and last steps are trivial as essentially no rolling/unrolling is necessary.
A final word of advice - start simpler and add in the code to handle edge-cases, convolutions are definitely fiddly to work with.
Hope this helps!

Related

Difference between absdiff and normal subtraction in OpenCV

I am currently planning on training a binary image classification model. The images I want to train on are the difference between two original pictures. In other words, for each data entry, I start out with 2 pictures, take their difference, and the label that difference as a 0 or 1. My question is what is the best way to find this difference. I know about cv2.absdiff and then normal subtraction of images - what is the most effective way to go about this?
About the data: The images I'm training on are screenshots that usually are the same but may have small differences. I found that normal subtraction seems to show the differences less than absdiff.
This is the code I use for absdiff:
diff = cv2.absdiff(img1, img2)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
th = 1
imask = mask>1
canvas = np.zeros_like(img2, np.uint8)
canvas[imask] = img2[imask]
And then this for normal subtraction:
def extract_diff(self,imageA, imageB, image_name, path):
subtract = imageB.astype(np.float32) - imageA.astype(np.float32)
mask = cv2.inRange(np.abs(subtract),(30,30,30),(255,255,255))
th = 1
imask = mask>1
canvas = np.zeros_like(imageA, np.uint8)
canvas[imask] = imageA[imask]
Thanks!
A difference can be negative or positive.
For some number types, such as uint8 (unsigned 8-bit int), which can't be negative (have no sign), a negative value wraps around and the value would make no sense anymore. Other types can be signed (e.g. floats, signed ints), so a negative value can be represented correctly.
That's why cv.absdiff exists. It always gives you absolute differences, and those are okay to represent in an unsigned type.
Example with numbers: a = 4, b = 6. a-b should be -2, right?
That value, as an uint8, will wrap around to become 0xFE, or 254 in decimal. The 254 value has some relation to the true -2 difference, but it also incorporates the range of values of the data type (8 bits: 256 values), so it's really just "code".
cv.absdiff would give you the absolute of the difference (-2), which is 2.

Histogram calculation in julia-lang

refer to julia-lang documentations :
hist(v[, n]) → e, counts
Compute the histogram of v, optionally using approximately n bins. The return values are a range e, which correspond to the edges of the bins, and counts containing the number of elements of v in each bin. Note: Julia does not ignore NaN values in the computation.
I choose a sample range of data
testdata=0:1:10;
then use hist function to calculate histogram for 1 to 5 bins
hist(testdata,1) # => (-10.0:10.0:10.0,[1,10])
hist(testdata,2) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,3) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,4) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,5) # => (-2.0:2.0:10.0,[1,2,2,2,2,2])
as you see when I want 1 bin it calculates 2 bins, and when I want 2 bins it calculates 3.
why does this happen?
As the person who wrote the underlying function: the aim is to get bin widths that are "nice" in terms of a base-10 counting system (i.e. 10k, 2×10k, 5×10k). If you want more control you can also specify the exact bin edges.
The key word in the doc is approximate. You can check what hist is actually doing for yourself in Julia's base module here.
When you do hist(test,3), you're actually calling
hist(v::AbstractVector, n::Integer) = hist(v,histrange(v,n))
That is, in a first step the n argument is converted into a FloatRange by the histrange function, the code of which can be found here. As you can see, the calculation of these steps is not entirely straightforward, so you should play around with this function a bit to figure out how it is constructing the range that forms the basis of the histogram.

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
<pre><code>
SVD svd(E,SVD::MODIFY_A);
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
</code></pre>
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
Thanks!
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E .

Need a specific example of U-Matrix in Self Organizing Map

I'm trying to develop an application using SOM in analyzing data. However, after finishing training, I cannot find a way to visualize the result. I know that U-Matrix is one of the method but I cannot understand it properly. Hence, I'm asking for a specific and detail example how to construct U-Matrix.
I also read an answer at U-matrix and self organizing maps but it only refers to 1 row map, how about 3x3 map? I know that for 3x3 map:
m(1) m(2) m(3)
m(4) m(5) m(6)
m(7) m(8) m(9)
a 5x5 matrix must me created:
u(1) u(1,2) u(2) u(2,3) u(3)
u(1,4) u(1,2,4,5) u(2,5) u(2,3,5,6) u(3,6)
u(4) u(4,5) u(5) u(5,6) u(6)
u(4,7) u(4,5,7,8) u(5,8) u(5,6,8,9) u(6,9)
u(7) u(7,8) u(8) u(8,9) u(9)
but I don't know how to calculate u-weight u(1,2,4,5), u(2,3,5,6), u(4,5,7,8) and u(5,6,8,9).
Finally, after constructing U-Matrix, is there any way to visualize it using color, e.g. heat map?
Thank you very much for your time.
Cheers
I don't know if you are still interested in this but I found this link
http://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/1990/UltschSiemon90
which explains very speciffically how to calculate the U-matrix.
Hope it helps.
By the way, the site were I found the link has several resources referring to SOMs I leave it here in case anyone is interested:
http://www.ifs.tuwien.ac.at/dm/somtoolbox/visualisations.html
The essential idea of a Kohonen map is that the data points are mapped to a
lattice, which is often a 2D rectangular grid.
In the simplest implementations, the lattice is initialized by creating a 3D
array with these dimensions:
width * height * number_features
This is the U-matrix.
Width and height are chosen by the user; number_features is just the number
of features (columns or fields) in your data.
Intuitively this is just creating a 2D grid of dimensions w * h
(e.g., if w = 10 and h = 10 then your lattice has 100 cells), then
into each cell, placing a random 1D array (sometimes called "reference tuples")
whose size and values are constrained by your data.
The reference tuples are also referred to as weights.
How is the U-matrix rendered?
In my example below, the data is comprised of rgb tuples, so the reference tuples
have length of three and each of the three values must lie between 0 and 255).
It's with this 3D array ("lattice") that you begin the main iterative loop
The algorithm iteratively positions each data point so that it is closest to others similar to it.
If you plot it over time (iteration number) then you can visualize cluster
formation.
The plotting tool i use for this is the brilliant Python library, Matplotlib,
which plots the lattice directly, just by passing it into the imshow function.
Below are eight snapshots of the progress of a SOM algorithm, from initialization to 700 iterations. The newly initialized (iteration_count = 0) lattice is rendered in the top left panel; the result from the final iteration, in the bottom right panel.
Alternatively, you can use a lower-level imaging library (in Python, e.g., PIL) and transfer the reference tuples onto the 2D grid, one at a time:
for y in range(h):
for x in range(w):
img.putpixel( (x, y), (
SOM.Umatrix[y, x, 0],
SOM.Umatrix[y, x, 1],
SOM.Umatrix[y, x, 2])
)
Here img is an instance of PIL's Image class. Here the image is created by iterating over the grid one pixel at a time; for each pixel, putpixel is called on img three times, the three calls of course corresponding to the three values in an rgb tuple.
From the matrix that you create:
u(1) u(1,2) u(2) u(2,3) u(3)
u(1,4) u(1,2,4,5) u(2,5) u(2,3,5,6) u(3,6)
u(4) u(4,5) u(5) u(5,6) u(6)
u(4,7) u(4,5,7,8) u(5,8) u(5,6,8,9) u(6,9)
u(7) u(7,8) u(8) u(8,9) u(9)
The elements with single numbers like u(1), u(2), ..., u(9) as just the elements with more than two numbers like u(1,2,4,5), u(2,3,5,6), ... , u(5,6,8,9) are calculated using something like the mean, median, min or max of the values in the neighborhood.
It's a nice idea calculate the elements with two numbers first, one possible code for that is:
for i in range(self.h_u_matrix):
for j in range(self.w_u_matrix):
nb = (0,0)
if not (i % 2) and (j % 2):
nb = (0,1)
elif (i % 2) and not (j % 2):
nb = (1,0)
self.u_matrix[(i,j)] = np.linalg.norm(
self.weights[i //2, j //2] - self.weights[i //2 +nb[0], j // 2 + nb[1]],
axis = 0
)
In the code above the self.h_u_matrix = self.weights.shape[0]*2 - 1 and self.w_u_matrix = self.weights.shape[1]*2 - 1 are the dimensions of the U-Matrix. With that said, for calculate the others elements it's necessary obtain a list with they neighboors and apply a mean for example. The following code implements that's idea:
for i in range(self.h_u_matrix):
for j in range(self.w_u_matrix):
if not (i % 2) and not (j % 2):
nodelist = []
if i > 0:
nodelist.append((i-1,j))
if i < 4:
nodelist.append((i+1, j))
if j > 0:
nodelist.append((i,j -1))
if j < 4:
nodelist.append((i,j+1))
meanlist = [self.u_matrix[u_node] for u_node in nodelist]
self.u_matrix[(i,j)] = np.mean(meanlist)
elif (i % 2) and (j % 2):
meanlist = [
(i - 1, j),
(i + 1, j),
(i, j - 1),
(i, j + 1)]
self.u_matrix[(i,j)] = np.mean(meanlist)

Understanding a passage in the Paper about VGGNet

I don't understand a passage in the article about the VGGNet. Maybe someone can help.
In my opinion, the number of weights in a convolutional layer is
p=w*h*d*n+n
where w is the width of the filters, h the height of the filters, d the depth of the filters and n the num of the filters.
In the article the following is written:
assuming that both the input and the output of a three-layer 3 × 3 onvolution stack has C channels, the stack is parametrised by 3*(3^2*C^2) = 27C^2
weights; at the same time, a single 7 × 7 conv. layer would require 7^2*C^2 = 49C^2 parameters.
I do not understand, what is meant by channels here, and why this formula is used.
Can someone explain this to me?
Thanks in advance.
Your intuition is correct; we just need to unpack their explanation a bit. For the first case:
w = 3 # filter width
h = 3 # filter height
d = C # filter depth (number of channels is same as number of input filters; eg RGB is C=3)
n = C # number of output filters/channels
This then makes whdn = 9C^2 parameters. Then, they also say there are three of these stacked, so thats 27C^2.
For a single 7x7 filter, then it's all the same 7x7xCxCx1.
The final difference is that you add n once more at the end in your original post; that is the bias terms, which in VGG they skip (many people skip bias terms; their value is debatable in some settings).

Resources