Measure of how much contiguous is a block of pixels [closed] - image-processing

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm using Genetic Algorithms (GA) on an image processing problem (an image segmentation to be more precise). In this case, an individual represent a block of pixels (i.e. a set of pixel coordinates). I need to encourage individuals with contiguous pixels.
To encourage contiguous blocks of pixels:
The "contiguousness" of an individual need to be considered in the fitness function to encourage individuals having adjacent pixels (best-fit). Hence during the evolution, the contiguousness of a set of coordinates (i.e. an individual) will influence the fitness of this individual.
The problem I'm facing is how to measure this feature (how much contiguous) on a set of pixel coordinates (x, y)?
As can be shown on the image below, the individual (set of pixels in black) on the right is clearly more "contiguous" (and therefore fitter) than the individual on the left:

I think I understand what you are asking, and my suggestion would be to count the number of shared "walls" between your pixels:
I would argue that from left to right the individuals are decreasing in continuity.
Counting the number of walls is not difficult to code, but might be slow the way I've implemented it here.
import random
width = 5
height = 5
image = [[0 for x in range(width)] for y in range(height)]
num_pts_in_individual = 4
#I realize this may give replicate points
individual = [[int(random.uniform(0,height)),int(random.uniform(0,width))] for x in range(num_pts_in_individual)]
#Fill up the image
for point in individual:
image[point[0]][point[1]] = 1
#Print out the image
for row in image:
print row
def count_shared_walls(image):
num_shared = 0
height = len(image)
width = len(image[0])
for h in range(height):
for w in range(width):
if image[h][w] == 1:
if h > 0 and image[h-1][w] == 1:
num_shared += 1
if w > 0 and image[h][w-1] == 1:
num_shared += 1
if h < height-1 and image[h+1][w] == 1:
num_shared += 1
if w < width-1 and image[h][w+1] == 1:
num_shared += 1
return num_shared
shared_walls = count_shared_walls(image)
print shared_walls
Different images and counts of shared walls:
[0, 0, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 0, 1, 1]
[0, 0, 0, 0, 0]
2
[1, 0, 0, 0, 0]
[0, 0, 0, 1, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 1, 0, 0]
0
[0, 0, 0, 1, 1]
[0, 0, 0, 1, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
4
One major problem with this, is that if a change in pixel locations occurs that does not change the number of shared walls, it will not affect the score. Maybe a combination of the distance method you described and the shared walls approach would be best.

Related

Create embeddings using one hot encoded and numeric values together

I have created video embeddings such that they represent video features. These features include
Content of video [Video speech converted into text to get text embedding of size 3]
Language [5 possible languages, one hot encoded]
Title [Text embedding using Doc2Vec of size 3]
(these numbers are just for example)
Such that my video embedding structure looks like this
-> video embedding = [ [content embedding of size 3], one hot encoded language, [title embedding of size 3] ]
-> video embedding = [ [0.004, 0.0032, 0.0064], 0, 0, 1, 0, 0, [0.03, 0.021, 0.001] ]
On flattening :
-> video embedding = [0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001]
However when I apply distance metric to find similarity among these embeddings, the one hot encoded features overpower
Eg. Cos([0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001], [0.004, 0.0032, 0.0064, 0, 1, 0, 0, 0, 0.03, 0.021, 0.001]) = 0.14 (Only after changing language)
cos([0.05, 0.005, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001], [0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001]) = 0.99 (After changing content vector and keeping language same)
Is there any way to use one hot encoded vectors and numeric vectors together? Or is there a better way to calculate similarity between such vectors?

How to calculate the resolution of an undistorted image?

An undistorted image typically have lower resolution than the original image due to non-uniform distribution of pixels and (usually) the cropping of the black edges. (See below as an example)
So given the camera calibration parameters, e.g. in ROS format
image_width: 1600
image_height: 1200
camera_name: camera1
camera_matrix:
rows: 3
cols: 3
data: [1384.355466887268, 0, 849.4355708515795, 0, 1398.17734010913, 604.5570699746268, 0, 0, 1]
distortion_model: plumb_bob
distortion_coefficients:
rows: 1
cols: 5
data: [0.0425049914802741, -0.1347528158561486, -0.0002287009852930437, 0.00641133892300999, 0]
rectification_matrix:
rows: 3
cols: 3
data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
projection_matrix:
rows: 3
cols: 4
data: [1379.868041992188, 0, 860.3000889574832, 0, 0, 1405.926879882812, 604.3997819099422, 0, 0, 0, 1, 0]
How would one calculate the final resolution of the undistorted rectified image?
From Fruchtzwerg's comment, the following will give the effective ROI of the undistorted image
import cv2
import numpy as np
mtx = np.array([
[1384.355466887268, 0, 849.4355708515795],
[ 0, 1398.17734010913, 604.5570699746268],
[0, 0, 1]])
dist = np.array([0.0425049914802741, -0.1347528158561486, -0.0002287009852930437, 0.00641133892300999, 0])
cv2.getOptimalNewCameraMatrix(mtx, dist, (1600, 1200), 1)

OpenCV how do conversions of Matrix elements work

I am having trouble understanding the inner workings of OpenCV. Consider the following code:
Scalar getAverageColor(Mat img, vector<Rect>& rois) {
int n = static_cast<int>(rois.size());
Mat avgs(1, n, CV_8UC3);
for (int i = 0; i < n; ++i) {
// What is the correct way to assign the color elements in
// the matrix?
avgs.at<Scalar>(i) = mean(Mat(img, rois[i]));
/*
This seems to always work, but there has to be a better way.
avgs.at<Vec3b>(i)[0] = mean(Mat(img, rois[i]))[0];
avgs.at<Vec3b>(i)[1] = mean(Mat(img, rois[i]))[1];
avgs.at<Vec3b>(i)[2] = mean(Mat(img, rois[i]))[2];
*/
}
// If I access the first element it seems to be set correctly.
Scalar first = avgs.at<Scalar>(0);
// However mean returns [0 0 0 0] if I did the assignment above using scalar, why???
Scalar avg = mean(avgs);
return avg;
}
If I use avgs.at<Scalar>(i) = mean(Mat(img, rois[i])) for the assignment in the loop the first element looks correct, but then the mean calculation always returns zero (even thought the first element looks correct). If I assign all the color elements by hand using Vec3b it seems to work, but why???
Note: cv::Scalar is a typedef for cv::Scalar_<double>, which derives from cv::Vec<double, 4>, which derives from cv::Matx<double, 4, 1>.
Similarly, cv::Vec3b is cv::Vec<uint8_t, 3> which derives from cv::Matx<uint8_t, 3, 1> -- this means that we can use any of those 3 in cv::Mat::at and get identical (correct) behaviour.
It's important to be aware that cv::Mat::at is basically a reinterpret_cast on the underlying data array. You need to be extremely careful to use an appropriate data type for the template argument, one which corresponds to the type of elements (including channel count) of the cv::Mat you're invoking it on.
The documentation mentions the following:
Keep in mind that the size identifier used in the at operator cannot be chosen at random. It depends on the image from which you are trying to retrieve the data. The table below gives a better insight in this:
If matrix is of type CV_8U then use Mat.at<uchar>(y,x).
If matrix is of type CV_8S then use Mat.at<schar>(y,x).
If matrix is of type CV_16U then use Mat.at<ushort>(y,x).
If matrix is of type CV_16S then use Mat.at<short>(y,x).
If matrix is of type CV_32S then use Mat.at<int>(y,x).
If matrix is of type CV_32F then use Mat.at<float>(y,x).
If matrix is of type CV_64F then use Mat.at<double>(y,x).
It doesn't seem to mention there what to do in case of multiple channels -- in that case you use cv::Vec<...> (or rather one of the typedefs provided). cv::Vec<...> is basically a wrapper around an fixed-size array of N values of given type.
In your case, the matrix avgs is CV_8UC3 -- each element consists of 3 unsigned byte values (i.e. 3 bytes total). However, by using avgs.at<Scalar>(i), you interpret each element as 4 doubles (32 bytes in total). That means that:
The actual element you tried to write to (if interpreted correctly) will only hold the 3 most significant bytes of the (8 byte floating point) mean of the first channel -- i.e. complete garbage.
You actually overwrite the next 10 elements (the last one partially, 3rd channel escapes unscathed) with more garbage.
At some point, you are bound to overflow the buffer and potentially trash other data structures. This issue is rather serious.
We can demonstrate it using the following simple program.
Example:
#include <opencv2/opencv.hpp>
int main()
{
cv::Mat test_mat(cv::Mat::zeros(1, 12, CV_8UC3)); // 12 * 3 = 36 bytes of data
std::cout << "Before: " << test_mat << "\n";
cv::Scalar test_scalar(cv::Scalar::all(1234.5678));
test_mat.at<cv::Scalar>(0, 0) = test_scalar;
std::cout << "After: " << test_mat << "\n";
return 0;
}
Output:
Before: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
After: [173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 0, 0, 0, 0]
This clearly shows we're writing way more than we should.
In Debug mode, the incorrect use of at also triggers an assertion:
OpenCV(3.4.3) Error: Assertion failed (((((sizeof(size_t)<<28)|0x8442211) >> ((traits::Depth<_Tp>::value) & ((1 << 3) - 1))*4) & 15) == elemSize1()) in cv::Mat::at, file D:\code\shit\so07\deps\include\opencv2/core/mat.inl.hpp, line 1102
To allow assignment of the result from cv::mean (which is a cv::Scalar) to our CV_8UC3 matrix, we need to do two things (not necessarily in this order):
Convert the values from double to uint8_t -- OpenCV will do a saturate_cast, but given that the mean won't go past the min/max of the input items, we'd be fine with a regular cast.
Get rid of the 4th element.
To remove the 4th element, we can use cv::Matx::get_minor (The documentation is a bit lacking, but a look at the implementation explains it fairly well). The result is a cv::Matx, so we have to use that instead of cv::Vec when using cv::Mat::at.
The two possible options then are:
Get rid of the 4th element and then
cast result to convert the cv::Matx to uint8_t element type.
Cast the cv::Scalar to cv::Scalar_<uint8_t> first, and then get rid of the 4th element.
Example:
#include <opencv2/opencv.hpp>
typedef cv::Matx<uint8_t, 3, 1> Mat31b; // Convenience, OpenCV only has typedefs for double and float variants
int main()
{
cv::Mat test_mat(1, 12, CV_8UC3); // 12 * 3 = 36 bytes of data
test_mat = cv::Scalar(1, 1, 1); // Set all elements to 1
std::cout << "Before: " << test_mat << "\n";
cv::Scalar test_scalar{ 2,3,4,0 };
cv::Matx31d temp = test_scalar.get_minor<3, 1>(0, 0);
test_mat.at<Mat31b>(0, 0) = static_cast<Mat31b>(temp);
// or
// cv::Scalar_<uint8_t> temp(static_cast<cv::Scalar_<uint8_t>>(test_scalar));
// test_mat.at<Mat31b>(0, 0) = temp.get_minor<3, 1>(0, 0);
std::cout << "After: " << test_mat << "\n";
return 0;
}
NB: You can get rid of the explicit temporaries, they're here just for easier readability.
Output:
Both options produce the following output:
Before: [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
After: [ 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
As we can see, only the first 3 bytes were changed, so it behaves correctly.
Some thoughts about performance.
It's hard to guess which of the two approaches is better. Casting first means you allocate smaller amount of memory for the temporary, but then you have to do 4 saturate_casts instead of 3. Some benchmarking would have to be done (excercise for the reader). The calculation of mean will outweigh it significantly, so it's likely to be irrelevant.
Given that we don't really need the saturate_casts, perhaps the simple, but more verbose approach (optimized version of the thing that worked for you) might perform better in a tight loop.
cv::Vec3b& current_element(avgs.at<cv::Vec3b>(i));
cv::Scalar current_mean(cv::mean(cv::Mat(img, rois[i])));
for (int n(0); n < 3; ++n) {
current_element[n] = static_cast<uint8_t>(current_mean[n]);
}
Update:
One more idea that came up in discussion with #alkasm. The assignment operator for a cv::Mat is vectorized when given a cv::Scalar (it assigns the same value to all elements), and it ignores the additional channel values the cv::Scalar may hold relative to the target cv::Mat type. (e.g. for a 3-channel Mat it ignores the 4th value).
We could take a 1x1 ROI of the target Mat, and assign it the mean Scalar. Necessary type conversions will happen, and the 4th channel will be discared. Probably not optimal, but it's by far the least amount of code so far.
test_mat(cv::Rect(0, 0, 1, 1)) = test_scalar;
The result is the same as before.

Cosine similarity when one of vectors is all zeros

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity )
when one of the vectors is all zeros?
v1 = [1, 1, 1, 1, 1]
v2 = [0, 0, 0, 0, 0]
When we calculate according to the classic formula we get division by zero:
Let d1 = 0 0 0 0 0 0
Let d2 = 1 1 1 1 1 1
Cosine Similarity (d1, d2) = dot(d1, d2) / ||d1|| ||d2||dot(d1, d2) = (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) = 0
||d1|| = sqrt((0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2) = 0
||d2|| = sqrt((1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2) = 2.44948974278
Cosine Similarity (d1, d2) = 0 / (0) * (2.44948974278)
= 0 / 0
I want to use this similarity measure in a clustering application.
And I often will need to compare such vectors.
Also [0, 0, 0, 0, 0] vs. [0, 0, 0, 0, 0]
Do you have any experience?
Since this is a similarity (not a distance) measure should I use special case for
d( [1, 1, 1, 1, 1]; [0, 0, 0, 0, 0] ) = 0
d([0, 0, 0, 0, 0]; [0, 0, 0, 0, 0] ) = 1
what about
d([1, 1, 1, 0, 0]; [0, 0, 0, 0, 0] ) = ? etc.
If you have 0 vectors, cosine is the wrong similarity function for your application.
Cosine distance is essentially equivalent to squared Euclidean distance on L_2 normalized data. I.e. you normalize every vector to unit length 1, then compute squared Euclidean distance.
The other benefit of Cosine is performance - computing it on very sparse, high-dimensional data is faster than Euclidean distance. It benefits from sparsity to the square, not just linear.
While you obviously can try to hack the similarity to be 0 when exactly one is zero, and maximal when they are identical, it won't really solve the underlying problems.
Don't choose the distance by what you can easily compute.
Instead, choose the distance such that the result has a meaning on your data. If the value is undefined, you don't have a meaning...
Sometimes, it may work to discard constant-0 data as meaningless data anyway (e.g. analyzing Twitter noise, and seeing a Tweet that is all numbers, no words). Sometimes it doesn't.
It is undefined.
Think you have a vector C that is not zero in place your zero vector. Multiply it by epsilon > 0 and let run epsilon to zero. The result will depend on C, so the function is not continuous when one of the vectors is zero.

Kalman Filter : some doubts

I have several questions:
In the example given in openCV document:
/* generate measurement */
cvMatMulAdd( kalman->measurement_matrix, state, measurement, measurement );
Is this correct?
In the tutorial: An Introduction to the Kalman Filter by Welch and Bishop
in Equation 1.2 it says measurement = H*state + measurement noise
Doesn't seems both are same.
I was trying to implement bouncing ball tracking for a single ball.
I tried the following: (Please point out if I am doing it incorrectly.)
For the measurement I am measuring two things: a) x b) y of the centroid of the ball.
I am just mentioning lines which are different from the example given in opencv documentation.
CvKalman* kalman = cvCreateKalman( 5, 2, 0 );
const float A[] = { 1, 0, 1, 0, 0,
0, 1, 0, 1, 0,
0, 0, 1, 0, 0,
0, 0, 0, 1, 1,
0, 0, 0, 0, 1};
CvMat* state = cvCreateMat( 5, 1, CV_32FC1 );
CvMat* measurement = cvCreateMat( 2, 1, CV_32FC1 );
//initialize the state of kalman filter
state->data.fl[0] = mean_c;
state->data.fl[1] = mean_r;
state->data.fl[2] = mean_c - prev_mean_c;
state->data.fl[3] = mean_r - prev_mean_r;
state->data.fl[4] = 9.81;
after initialization, this is what gives crash
cvMatMulAdd( kalman->transition_matrix, state,
kalman->process_noise_cov, state );
In this line they just use variable measurement to store noise. See previous line:
cvRandArr( &rng, measurement, CV_RAND_NORMAL, cvRealScalar(0),cvRealScalar(sqrt(kalman->measurement_noise_cov->data.fl[0])) );
You should change dimension of H matrix as well. It must be 5 by 2 to make it possible to calculate H*state + measurement noise. You get an error probably in line
memcpy( cvkalman->measurement_matrix->data.fl, H, sizeof(H));
because in initial example cvkalman->measurement_matrix and H are allocated as 4 by 4 matrices and you decreased dimension of cvkalman->measurement_matrix only to 5 by 2 (4*4 is more than 5*2)

Resources