Why is Local Binary Pattern affected by scaling? - opencv

I found that the local binary pattern in scikit-image is affected by re-scaling the image, but I was not expecting this. Since the LBP just involves greater/less than comparisons between nearby pixels, I thought a linear transformation would not affect things.
import numpy as np
import skimage
from skimage.feature import local_binary_pattern
im = skimage.data.cell()
out_im = local_binary_pattern(im, 16, 2, method='uniform')
out_im = out_im[20:-20, 20:-20] # remove edges
counts,bins = np.histogram(out_im)
print('counts=',counts)
print('bins=',bins)
This gives me
counts= [ 896 2743 9555 14928 108466 94041 28682 14703 8728 33458]
bins= [ 0. 1.7 3.4 5.1 6.8 8.5 10.2 11.9 13.6 15.3 17. ]
But if I normalize the image:
im = (im-im.min())/(im.max()-im.min())
Then I get:
counts= [ 937 2716 9504 16263 109302 92114 27753 14248 8667 34696]
bins= [ 0. 1.7 3.4 5.1 6.8 8.5 10.2 11.9 13.6 15.3 17. ]
Can someone explain why? The original image has values between 0 and 255.

I found that there is no difference when there are only 4 points in the kernel, which correspond to up/down/left/right. This suggests that the difference arises due to the floating point interpolation when the points are located at non-integer coordinates. I'm guessing that the floating point calculations occasionally mess up the greater/equal comparisons, leading to slightly different histograms.

Related

K-NN: training MSE with K=1 not equal to 0

In theory, the training MSE for k = 1 should be zero. However, the following script shows otherwise. I first generate some toy data: x represents sleeping hours and y represents happiness. Then I train the data and predict the outcome. Finally, I calculate the MSE for the training data via two methods. Can anyone tell me what goes wrong?
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor(n_neighbors=1)
import numpy as np
x = np.array([7,8,6,7,5.7,6.8,8.6,6.5,7.8,5.7,9.8,7.7,8.8,6.2,7.1,5.7]).reshape(16,1)
y = np.array([5,7,4,5,6,9,7,6.8,8,7.6,9.3,8.2,7,6.2,3.8,6]).reshape(16,1)
model = model.fit(x,y)
for hours_slept in range(1,11):
happiness = model.predict([[hours_slept]])
print("if you sleep %.0f hours, you will be %.1f happy!" %(hours_slept, happiness))
# calculate MSE
# fast method
def model_mse(model,x,y):
predictions = model.predict(x)
return np.mean(np.power(y-predictions,2))
print(model_mse(model,x,y))
The output:
if you sleep 1 hours, you will be 6.0 happy!
if you sleep 2 hours, you will be 6.0 happy!
if you sleep 3 hours, you will be 6.0 happy!
if you sleep 4 hours, you will be 6.0 happy!
if you sleep 5 hours, you will be 6.0 happy!
if you sleep 6 hours, you will be 4.0 happy!
if you sleep 7 hours, you will be 5.0 happy!
if you sleep 8 hours, you will be 7.0 happy!
if you sleep 9 hours, you will be 7.0 happy!
if you sleep 10 hours, you will be 9.3 happy!
0.15999999999999992 #strictly larger than 0!
In your data, x has multiple labels for 5.7 in y, 6 and 7.6. After training, the algorithm assigns label 6 for the variable 5.7, and during evaluation, when it encounters 5.7 for the second time, it returns 6 but not 7.6. So, the squared error of this pair is (7.6 - 6)**2 = 2.56 and the mean squared error, considering other errors are 0, is 1/16 * 2.56 = 0.16 - exactly your result.
In theory, the training MSE for k = 1 should be zero
An implicit assumption here is that there are not duplicate samples x, or, to be precise, that same features x have same values y. Is it the case here? Let's see
pred = model.predict(x)
np.where(pred!=y)[0]
# array([9])
So, there is a single value where y and pred are indeed different:
y[9]
# array([7.6])
pred[9]
# array([6.])
where
x[9]
# array([5.7])
How many samples x have a value of 5.7, and what are the correspondent y's?
ind = np.where(x==5.7)[0]
ind
# array([ 4, 9, 15])
y[ind]
# result:
array([[6. ],
[7.6],
[6. ]])
pred[ind]
# result
array([[6.],
[6.],
[6.]])
So, what is actually happening here is that for x=5.7 the algorithm unsuprisingly cannot decide unambiguously which exact sample is the single closest neighbor - the one with y=6 or the one with y=7.6; and here it has chosen the one that does not coincide with the true y, leading to a non-zero MSE.
I guess that digging into the knn source code one would be able to justify exactly how such cases are handled internally, but I'm leaving this as an exercise.

How is Spark reading my image using the image format?

It might be a silly question but I can't figure out how Spark read my image using the spark.read.format("image").load(....) argument.
After importing my image which gives me the following:
>>> image_df.select("image.height","image.width","image.nChannels", "image.mode", "image.data").show()
+------+-----+---------+----+--------------------+
|height|width|nChannels|mode| data|
+------+-----+---------+----+--------------------+
| 430| 470| 3| 16|[4D 55 4E 4C 54 4...|
+------+-----+---------+----+--------------------+
I arrive to the conclusion that:
my image is 430x470 pixels,
my image is colored (RGB due to nChannels = 3) which is an openCV compatible-type,
my image mode is 16 which corresponds to a particular openCV byte-order.
Does someone knows which website/documentation I could browse to know more about it?
the data in the data column is of type Binary but:
when I run image_df.select("image.data").take(1) I got an output which seems to be only one array (see below).
>>> image_df.select("image.data").take(1)
# **1/** Here are the last elements of the result
....<<One Eternity Later>>....x92\x89\x8a\x8d\x84\x86\x89\x80\x84\x87~'))]
# 2/ I got also several part of the result which looks like:
.....\x89\x80\x80\x83z|\x7fvz}tpsjqtkrulsvmsvmsvmrulrulrulqtkpsjnqhnqhmpgmpgmpgnqhnqhn
qhnqhnqhnqhnqhnqhmpgmpgmpgmpgmpgmpgmpgmpgnqhnqhnqhnqhnqhnqhnqhnqhknejmdilcilchkbh
kbilcilckneloflofmpgnqhorioripsjsvmsvmtwnvypx{ry|sz}t{~ux{ry|sy|sy|sy|sz}tz}tz}tz}
ty|sy|sy|sy|sz}t{~u|\x7fv|\x7fv}.....
What come next are linked to the results displayed above. Those might be due to my lack of knowledge concerning openCV (or else). Nonetheless:
1/ I don't understand the fact that if I got an RGB image, I should have 3 matrix but the output finishes by .......\x84\x87~'))]. I was more thinking on obtaining something like [(...),(...),(...\x87~')].
2/ Is this part has a special meaning? Like those are the separator between each matrix or something?
To be more clear about what I'm trying to achieve, I want to process images to do pixel comparison between each images. Therefore, I want to know the pixel values for a given position in my image (I assume that if I have an RGB image, I shall have 3 pixel values for a given position).
Example: let's say that I have a webcam pointing to the sky only during the day and I want to know the values of a pixel at a position corresponding to the top left sky part, I found out that the concatenation of those values gives the colour Light Blue which says that the photo was taken on a sunny day. Let's say that the only possibility is that a sunny day takes the colour Light Blue.
Next I want to compare the previous concatenation with another concat of pixel values at the exact same position but from a picture taken the next day. If I found out that they are not equal then I conclude that the given picture was taken on a cloudy/rainy day. If equal then sunny day.
Any help on that would be highly appreciated. I have vulgarized my example for a better understanding but my goal is pretty much the same. I know that ML model can exist to achieve those stuff but I would be happy to try this first. My first goal is to split this column into 3 columns corresponding to each color code: a red matrix, a green matrix, a blue matrix
I think I have the logic. I used the keras.preprocessing.image.img_to_array() function to understand how the values are classified (since I have an RGB image, I must have 3 matrix: one for each color R G B). Posting that if someone wonder how it works, I might be wrong but I think I have something :
from keras.preprocessing import image
import numpy as np
from PIL import Image
# Using spark built-in data source
first_img = spark.read.format("image").schema(imageSchema).load(".....")
raw = first_img.select("image.data").take(1)[0][0]
np.shape(raw)
(606300,) # which is 470*430*3
# Using keras function
img = image.load_img(".../path/to/img")
yy = image.img_to_array(img)
>>> np.shape(yy)
(430, 470, 3) # the form is good but I have a problem of order since:
>>> raw[0], raw[1], raw[2]
(77, 85, 78)
>>> yy[0][0]
array([78., 85., 77.], dtype=float32)
# Therefore I used the numpy reshape function directly on raw
# to have 470 matrix of 3 lines and 470 columns:
array = np.reshape(raw, (430,470,3))
xx = image.img_to_array(array) # OPTIONAL and not used here
>>> array[0][0] == (raw[0],raw[1],raw[2])
array([ True, True, True])
>>> array[0][1] == (raw[3],raw[4],raw[5])
array([ True, True, True])
>>> array[0][2] == (raw[6],raw[7],raw[8])
array([ True, True, True])
>>> array[0][3] == (raw[9],raw[10],raw[11])
array([ True, True, True])
So if I understood well, spark will read the image as a big array - (606300,) here - where in fact each element are ordered and corresponds to their respective color shade (R G B).
After doing my little transformations, I obtain 430 matrix of 3 columns x 470 lines. Since my image is (470x430) for (WidthxHeight), each matrix corresponds to a pixel heigth position and inside each: 3 columns for each color and 470 lines for each width position.
Hope that helps someone :)!

calculate the spatial dimension of a graph

Given a graph (say fully-connected), and a list of distances between all the points, is there an available way to calculate the number of dimensions required to instantiate the graph?
E.g. by construction, say we have graph G with points A, B, C and distances AB=BC=CA=1. Starting from A (0 dimensions) we add B at distance 1 (1 dimension), now we find that a 2nd dimension is needed to add C and satisfy the constraints. Does code exist to do this and spit out (in this case) dim(G) = 2?
E.g. if the points are photos, and the distances between them calculated by the Gist algorithm (http://people.csail.mit.edu/torralba/code/spatialenvelope/), I would expect the derived dimension to match the number image parameters considered by Gist.
Added: here is a 5-d python demo based on the suggestion - seemingly perfect!
'similarities' is the distance matrix.
import numpy as np
from sklearn import manifold
similarities = [[0., 1., 1., 1., 1., 1.],
[1., 0., 1., 1., 1., 1.],
[1., 1., 0., 1., 1., 1.],
[1., 1., 1., 0., 1., 1.],
[1., 1., 1., 1., 0., 1.],
[1., 1., 1., 1., 1., 0]]
seed = np.random.RandomState(seed=3)
for i in [1, 2, 3, 4, 5]:
mds = manifold.MDS(n_components=i, max_iter=3000, eps=1e-9, random_state=seed,
dissimilarity="precomputed", n_jobs=1)
print("%d %f" % (i, mds.fit(similarities).stress_))
Output:
1 3.333333
2 1.071797
3 0.343146
4 0.151531
5 0.000000
I find that when I apply this method to a subset of my data (distances between 329 pictures with '11' in the file name, using two different metrics), the stress doesn't decrease to 0 as linearly I'd expect from the above - it levels off after about 5 dimensions. (On the SURF results I tried doubling max_iter, and varying eps by an order of magnitude each way without changing results in the first four digits.)
It turns out the distances do not satisfy the triangle inequality in ~0.02% of the triangles, with the average violation roughly equal to 8% the average distance, for one metric examined.
Overall I prefer the fractal dimension of the sorted distances since it is doesn't require picking a cutoff. I'm marking the MDS response as an answer because it works for the consistent case. My results for the fractal dimension and the MDS case are below.
Another descriptive statistic turns out to be the triangle violations. Results for this further below. If anyone could generalize to higher dimensions, that would be very interesting (results and learning python :-).
MDS results, ignoring the triangle inequality issue:
N_dim stress_
SURF_match GIST_match
1 83859853704.027344 913512153794.477295
2 24402474549.902721 238300303503.782837
3 14335187473.611954 107098797170.304825
4 10714833228.199451 67612051749.697998
5 9451321873.828577 49802989323.714806
6 8984077614.154467 40987031663.725784
7 8748071137.806602 35715876839.391762
8 8623980894.453981 32780605791.135693
9 8580736361.368249 31323719065.684353
10 8558536956.142039 30372127335.209297
100 8544120093.395177 28786825401.178596
1000 8544192695.435946 28786840008.666389
Forging ahead with that to devise a metric to compare the dimensionality of the two results, an ad hoc choice is to set the criterion to
1.1 * stress_at_dim=100
resulting in the proposition that the SURF_match has a quasi-dimension in 5..6, while GIST_match has a quasi-dimension in 8..9. I'm curious if anyone thinks that means anything :-). Another question is whether there is any meaningful interpretation for the relative magnitudes of stress at any dimension for the two metrics. Here are some results to put it in perspective. Frac_d is the fractal dimension of the sorted distances, calculated according to Higuchi's method using code from IQM, Dim is the dimension as described above.
Method Frac_d Dim stress(100) stress(1)
Lab_CIE94 1.1458 3 2114107376961504.750000 33238672000252052.000000
Greyscale 1.0490 8 42238951082.465477 1454262245593.781250
HS_12x12 1.0889 19 33661589105.972816 3616806311396.510254
HS_24x24 1.1298 35 16070009781.315575 4349496176228.410645
HS_48x48 1.1854 64 7231079366.861403 4836919775090.241211
GIST 1.2312 9 28786830336.332951 997666139720.167114
HOG_250_words 1.3114 10 10120761644.659481 150327274044.045624
HOG_500_words 1.3543 13 4740814068.779779 70999988871.696045
HOG_1k_words 1.3805 15 2364984044.641845 38619752999.224922
SIFT_1k_words 1.5706 11 1930289338.112194 18095265606.237080
SURFFAST_200w 1.3829 8 2778256463.307569 40011821579.313110
SRFFAST_250_w 1.3754 8 2591204993.421285 35829689692.319153
SRFFAST_500_w 1.4551 10 1620830296.777577 21609765416.960484
SURFFAST_1k_w 1.5023 14 949543059.290031 13039001089.887533
SURFFAST_4k_w 1.5690 19 582893432.960562 5016304129.389058
Looking at the Pearson correlation between columns of the table:
Pearson correlation 2-tailed p-value
FracDim, Dim: (-0.23333296587402277, 0.40262625206429864)
Dim, Stress(100): (-0.24513480360257348, 0.37854224076180676)
Dim, Stress(1): (-0.24497740363489209, 0.37885820835053186)
Stress(100),S(1): ( 0.99999998200931084, 8.9357374620135412e-50)
FracDim, S(100): (-0.27516440489210137, 0.32091019789264791)
FracDim, S(1): (-0.27528621200454373, 0.32068731053608879)
I naively wonder how all correlations but one can be negative, and what conclusions can be drawn. Using this code:
import sys
import numpy as np
from scipy.stats.stats import pearsonr
file = sys.argv[1]
col1 = int(sys.argv[2])
col2 = int(sys.argv[3])
arr1 = []
arr2 = []
with open(file, "r") as ins:
for line in ins:
words = line.split()
arr1.append(float(words[col1]))
arr2.append(float(words[col2]))
narr1 = np.array(arr1)
narr2 = np.array(arr2)
# normalize
narr1 -= narr1.mean(0)
narr2 -= narr2.mean(0)
# standardize
narr1 /= narr1.std(0)
narr2 /= narr2.std(0)
print pearsonr(narr1, narr2)
On to the number of violations of the triangle inequality by the various metrics, all for the 329 pics with '11' in their sequence:
(1) n_violations/triangles
(2) avg violation
(3) avg distance
(4) avg violation / avg distance
n_vio (1) (2) (3) (4)
lab 186402 0.031986 157120.407286 795782.437570 0.197441
grey 126902 0.021776 1323.551315 5036.899585 0.262771
600px 120566 0.020689 1339.299040 5106.055953 0.262296
Gist 69269 0.011886 1252.289855 4240.768117 0.295298
RGB
12^3 25323 0.004345 791.203886 7305.977862 0.108295
24^3 7398 0.001269 525.981752 8538.276549 0.061603
32^3 5404 0.000927 446.044597 8827.910112 0.050527
48^3 5026 0.000862 640.310784 9095.378790 0.070400
64^3 3994 0.000685 614.752879 9270.282684 0.066314
98^3 3451 0.000592 576.815995 9409.094095 0.061304
128^3 1923 0.000330 531.054082 9549.109033 0.055613
RGB/600px
12^3 25190 0.004323 790.258158 7313.379003 0.108057
24^3 7531 0.001292 526.027221 8560.853557 0.061446
32^3 5463 0.000937 449.759107 8847.079639 0.050837
48^3 5327 0.000914 645.766473 9106.240103 0.070915
64^3 4382 0.000752 634.000685 9272.151040 0.068377
128^3 2156 0.000370 544.644712 9515.696642 0.057236
HueSat
12x12 7882 0.001353 950.321873 7555.464323 0.125779
24x24 1740 0.000299 900.577586 8227.559169 0.109459
48x48 1137 0.000195 661.389622 8653.085004 0.076434
64x64 1134 0.000195 697.298942 8776.086144 0.079454
HueSat/600px
12x12 6898 0.001184 943.319078 7564.309456 0.124707
24x24 1790 0.000307 908.031844 8237.927256 0.110226
48x48 1267 0.000217 693.607735 8647.060308 0.080213
64x64 1289 0.000221 682.567106 8761.325172 0.077907
hog
250 53782 0.009229 675.056004 1968.357004 0.342954
500 18680 0.003205 559.354979 1431.803914 0.390665
1k 9330 0.001601 771.307074 970.307130 0.794910
4k 5587 0.000959 993.062824 650.037429 1.527701
sift
500 26466 0.004542 1267.833182 1073.692611 1.180816
1k 16489 0.002829 1598.830736 824.586293 1.938949
4k 10528 0.001807 1918.068294 533.492373 3.595306
surffast
250 38162 0.006549 630.098999 1006.401837 0.626091
500 19853 0.003407 901.724525 830.596690 1.085635
1k 10659 0.001829 1310.348063 648.191424 2.021545
4k 8988 0.001542 1488.200156 419.794008 3.545072
Anyone capable of generalizing to higher dimensions? Here is my first-timer code:
import sys
import time
import math
import numpy as np
import sortedcontainers
from sortedcontainers import SortedSet
from sklearn import manifold
seed = np.random.RandomState(seed=3)
pairs = sys.argv[1]
ss = SortedSet()
print time.strftime("%H:%M:%S"), "counting/indexing"
sys.stdout.flush()
with open(pairs, "r") as ins:
for line in ins:
words = line.split()
ss.add(words[0])
ss.add(words[1])
N = len(ss)
print time.strftime("%H:%M:%S"), "size ", N
sys.stdout.flush()
sim = np.diag(np.zeros(N))
dtot = 0.0
with open(pairs, "r") as ins:
for line in ins:
words = line.split()
i = ss.index(words[0])
j = ss.index(words[1])
#val = math.log(float(words[2]))
#val = math.sqrt(float(words[2]))
val = float(words[2])
sim[i][j] = val
sim[j][i] = val
dtot += val
avgd = dtot / (N * (N-1))
ntri = 0
nvio = 0
vio = 0.0
for i in xrange(1, N):
for j in xrange(i+1, N):
d1 = sim[i][j]
for k in xrange(j+1, N):
ntri += 1
d2 = sim[i][k]
d3 = sim[j][k]
dd = d1 + d2
diff = d3 - dd
if (diff > 0.0):
nvio += 1
vio += diff
avgvio = 0.0
if (nvio > 0):
avgvio = vio / nvio
print("tot: %d %f %f %f %f" % (nvio, (float(nvio)/ntri), avgvio, avgd, (avgvio/avgd)))
Here is how I tried sklearn's Isomap:
for i in [1, 2, 3, 4, 5]:
# nbrs < points
iso = manifold.Isomap(n_neighbors=nbrs, n_components=i,
eigen_solver="auto", tol=1e-9, max_iter=3000,
path_method="auto", neighbors_algorithm="auto")
dis = euclidean_distances(iso.fit(sim).embedding_)
stress = ((dis.ravel() - sim.ravel()) ** 2).sum() / 2
Given a graph (say fully-connected), and a list of distances between all the points, is there an available way to calculate the number of dimensions required to instantiate the graph?
Yes. The more general topic this problem would be part of, in terms of graph theory, is called "Graph Embedding".
E.g. by construction, say we have graph G with points A, B, C and distances AB=BC=CA=1. Starting from A (0 dimensions) we add B at distance 1 (1 dimension), now we find that a 2nd dimension is needed to add C and satisfy the constraints. Does code exist to do this and spit out (in this case) dim(G) = 2?
This is almost exactly the way that Multidimensional Scaling works.
Multidimensional scaling (MDS) would not exactly answer the question of "How many dimensions would I need to represent this point cloud / graph?" with a number but it returns enough information to approximate it.
Multidimensional Scaling methods will attempt to find a "good mapping" to reduce the number of dimensions, say from 120 (in the original space) down to 4 (in another space). So, in a way, you can iteratively try different embeddings for increasing number of dimensions and look at the "stress" (or error) of each embedding. The number of dimensions you are after is the first number for which there is an abrupt minimisation of the error.
Due to the way it works, Classical MDS, can return a vector of eigenvalues for the new mapping. By examining this vector of eigenvalues you can determine how many of its entries you would need to retain to achieve a (good enough, or low error) representation of the original dataset.
The key concept here is the "similarity" matrix which is a fancy name for a graph's distance matrix (which you already seem to have), irrespectively of its semantics.
Embedding algorithms, in general, are trying to find an embedding that may look different but at the end of the day, the point cloud in the new space will end up having a similar (depending on how much error we can afford) distance matrix.
In terms of code, I am sure that there is something available in all major scientific computing packages but off the top of my head I can point you towards Python and MATLAB code examples.
E.g. if the points are photos, and the distances between them calculated by the Gist algorithm (http://people.csail.mit.edu/torralba/code/spatialenvelope/), I would expect the derived dimension to match the number image parameters considered by Gist
Not exactly. This is a very good use case though. In this case, what MDS would return, or what you would be probing with dimensionality reduction in general would be to check how many of these features seem to be required to represent your dataset. Therefore, depending on the scenes, or, depending on the dataset, you might realise that not all of these features are necessary for a good enough representation of the whole dataset. (In addition, you might want to have a look at this link as well).
Hope this helps.
First, you can assume that any dataset has a dimensionality of at most 4 or 5. To get more relevant dimensions, you would need one million elements (or something like that).
Apparently, you already computed a distance. Are you sure it is actually a relavnt metric? Is it efficient for images that are quite distant? Perhaps you can try Isomap (geodesic distance, starting for only close neighbors) and see if your embedded space may not actually be Euclidian.

Way to parse Python pandas DataFrame to Matrix Market (MM) Format?

Is there a built-in way to write a Python pandas.DataFrame object (stored as 2x2 numpy.ndarray internally) to Matrix Market (MM) format? I have use cases for both sparse and dense matrices.
When I say "built-in" here I mean built into the pandas package. If not pandas, then is there something that can take a DataFrame or an 2x2 numpy.ndarray and do this?
I'm pretty sure there's nothing built in to pandas, but if you have the full stack installed you can use scipy:
>>> import scipy.io, scipy.sparse
>>> df = pd.DataFrame({"A": [1,2], "B": [3,0]})
>>> scipy.io.mmwrite("mmout", df)
>>> !cat mmout.mtx
%%MatrixMarket matrix array integer general
%
2 2
1
2
3
0
It'll also work for a sparse case:
>>> scipy.io.mmwrite("mmout", scipy.sparse.csr_matrix(df))
>>> !cat mmout.mtx
%%MatrixMarket matrix coordinate integer general
%
2 2 3
1 1 1
1 2 3
2 1 2
although you'd have to construct a copy.

Image filtering - wrong results?

I'm experimenting with convolving an image with a user-supplied mask, in this case
u = array([[-2,-2,-2],[-2,25,-2],[-2,-2,-2]])/9
using the commands
In[1]: import scipy.ndimage as ndi
In[2]: import skimage.io as io
In[3]: c = io.imread('cameraman.png')
In[4]: cu = ndi.convolve(c,u)
In[5]: io.imshow(cu)
I'm checking this against commands in GNU Octave:
Octave-3.8: 1> c = imread('cameraman.png');
Octave-3.8: 2> u = [-2 -2 -2;-2 25 -2;-2 -2 -2]/9
Octave-3.8: 3> cu = imfilter(c,u)
Octave-3.8: 4> imshow(cu)
But here's the thing: Octave seems to give the correct result, but Python doesn't, even though the commands convolve and imfilter are supposed to be implementing the same algorithm. (Well in fact imfilter performs a correlation, which in this case is the same as a convolution.)
The Octave output is:
!
and the Python output is:
!
which as you can see is very different to the Octave result. Does anybody know what's going on here? Or is there a better way of convolving with a user-supplied linear filter than using convolve?
The problem may be the result of your convolution taking your image luminance values out of bounds. I ran the example below in Matlab (~=Octave) and for an image that initially has grey values 0-255 so in normalised range [0,0.99] the result ends in with pixels in range [-0.88,2.03].
>> img=double(imread('cameraman.tif'))./255;
>> K=[-2 -2 -2 ; -2 25 -2; -2 -2 -2]/9;
>> out=conv2(img,K,'same');
>> max(max(out))
ans =
2.0288
>> min(min(out))
ans =
-0.8776
It could be that Python has a problem visualising images with out of range grey values <0 or >255 and this is causing a clamping of values resulting in black/white halos in those areas. Perhaps Octave normalises the image prior to displaying it resulting in few artifacts. If you normalise you image in Python prior to displaying it, do you still have this problem?

Resources