Create embeddings using one hot encoded and numeric values together - vectorization

I have created video embeddings such that they represent video features. These features include
Content of video [Video speech converted into text to get text embedding of size 3]
Language [5 possible languages, one hot encoded]
Title [Text embedding using Doc2Vec of size 3]
(these numbers are just for example)
Such that my video embedding structure looks like this
-> video embedding = [ [content embedding of size 3], one hot encoded language, [title embedding of size 3] ]
-> video embedding = [ [0.004, 0.0032, 0.0064], 0, 0, 1, 0, 0, [0.03, 0.021, 0.001] ]
On flattening :
-> video embedding = [0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001]
However when I apply distance metric to find similarity among these embeddings, the one hot encoded features overpower
Eg. Cos([0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001], [0.004, 0.0032, 0.0064, 0, 1, 0, 0, 0, 0.03, 0.021, 0.001]) = 0.14 (Only after changing language)
cos([0.05, 0.005, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001], [0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001]) = 0.99 (After changing content vector and keeping language same)
Is there any way to use one hot encoded vectors and numeric vectors together? Or is there a better way to calculate similarity between such vectors?

Related

How to calculate the resolution of an undistorted image?

An undistorted image typically have lower resolution than the original image due to non-uniform distribution of pixels and (usually) the cropping of the black edges. (See below as an example)
So given the camera calibration parameters, e.g. in ROS format
image_width: 1600
image_height: 1200
camera_name: camera1
camera_matrix:
rows: 3
cols: 3
data: [1384.355466887268, 0, 849.4355708515795, 0, 1398.17734010913, 604.5570699746268, 0, 0, 1]
distortion_model: plumb_bob
distortion_coefficients:
rows: 1
cols: 5
data: [0.0425049914802741, -0.1347528158561486, -0.0002287009852930437, 0.00641133892300999, 0]
rectification_matrix:
rows: 3
cols: 3
data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
projection_matrix:
rows: 3
cols: 4
data: [1379.868041992188, 0, 860.3000889574832, 0, 0, 1405.926879882812, 604.3997819099422, 0, 0, 0, 1, 0]
How would one calculate the final resolution of the undistorted rectified image?
From Fruchtzwerg's comment, the following will give the effective ROI of the undistorted image
import cv2
import numpy as np
mtx = np.array([
[1384.355466887268, 0, 849.4355708515795],
[ 0, 1398.17734010913, 604.5570699746268],
[0, 0, 1]])
dist = np.array([0.0425049914802741, -0.1347528158561486, -0.0002287009852930437, 0.00641133892300999, 0])
cv2.getOptimalNewCameraMatrix(mtx, dist, (1600, 1200), 1)

OpenCV how do conversions of Matrix elements work

I am having trouble understanding the inner workings of OpenCV. Consider the following code:
Scalar getAverageColor(Mat img, vector<Rect>& rois) {
int n = static_cast<int>(rois.size());
Mat avgs(1, n, CV_8UC3);
for (int i = 0; i < n; ++i) {
// What is the correct way to assign the color elements in
// the matrix?
avgs.at<Scalar>(i) = mean(Mat(img, rois[i]));
/*
This seems to always work, but there has to be a better way.
avgs.at<Vec3b>(i)[0] = mean(Mat(img, rois[i]))[0];
avgs.at<Vec3b>(i)[1] = mean(Mat(img, rois[i]))[1];
avgs.at<Vec3b>(i)[2] = mean(Mat(img, rois[i]))[2];
*/
}
// If I access the first element it seems to be set correctly.
Scalar first = avgs.at<Scalar>(0);
// However mean returns [0 0 0 0] if I did the assignment above using scalar, why???
Scalar avg = mean(avgs);
return avg;
}
If I use avgs.at<Scalar>(i) = mean(Mat(img, rois[i])) for the assignment in the loop the first element looks correct, but then the mean calculation always returns zero (even thought the first element looks correct). If I assign all the color elements by hand using Vec3b it seems to work, but why???
Note: cv::Scalar is a typedef for cv::Scalar_<double>, which derives from cv::Vec<double, 4>, which derives from cv::Matx<double, 4, 1>.
Similarly, cv::Vec3b is cv::Vec<uint8_t, 3> which derives from cv::Matx<uint8_t, 3, 1> -- this means that we can use any of those 3 in cv::Mat::at and get identical (correct) behaviour.
It's important to be aware that cv::Mat::at is basically a reinterpret_cast on the underlying data array. You need to be extremely careful to use an appropriate data type for the template argument, one which corresponds to the type of elements (including channel count) of the cv::Mat you're invoking it on.
The documentation mentions the following:
Keep in mind that the size identifier used in the at operator cannot be chosen at random. It depends on the image from which you are trying to retrieve the data. The table below gives a better insight in this:
If matrix is of type CV_8U then use Mat.at<uchar>(y,x).
If matrix is of type CV_8S then use Mat.at<schar>(y,x).
If matrix is of type CV_16U then use Mat.at<ushort>(y,x).
If matrix is of type CV_16S then use Mat.at<short>(y,x).
If matrix is of type CV_32S then use Mat.at<int>(y,x).
If matrix is of type CV_32F then use Mat.at<float>(y,x).
If matrix is of type CV_64F then use Mat.at<double>(y,x).
It doesn't seem to mention there what to do in case of multiple channels -- in that case you use cv::Vec<...> (or rather one of the typedefs provided). cv::Vec<...> is basically a wrapper around an fixed-size array of N values of given type.
In your case, the matrix avgs is CV_8UC3 -- each element consists of 3 unsigned byte values (i.e. 3 bytes total). However, by using avgs.at<Scalar>(i), you interpret each element as 4 doubles (32 bytes in total). That means that:
The actual element you tried to write to (if interpreted correctly) will only hold the 3 most significant bytes of the (8 byte floating point) mean of the first channel -- i.e. complete garbage.
You actually overwrite the next 10 elements (the last one partially, 3rd channel escapes unscathed) with more garbage.
At some point, you are bound to overflow the buffer and potentially trash other data structures. This issue is rather serious.
We can demonstrate it using the following simple program.
Example:
#include <opencv2/opencv.hpp>
int main()
{
cv::Mat test_mat(cv::Mat::zeros(1, 12, CV_8UC3)); // 12 * 3 = 36 bytes of data
std::cout << "Before: " << test_mat << "\n";
cv::Scalar test_scalar(cv::Scalar::all(1234.5678));
test_mat.at<cv::Scalar>(0, 0) = test_scalar;
std::cout << "After: " << test_mat << "\n";
return 0;
}
Output:
Before: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
After: [173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 0, 0, 0, 0]
This clearly shows we're writing way more than we should.
In Debug mode, the incorrect use of at also triggers an assertion:
OpenCV(3.4.3) Error: Assertion failed (((((sizeof(size_t)<<28)|0x8442211) >> ((traits::Depth<_Tp>::value) & ((1 << 3) - 1))*4) & 15) == elemSize1()) in cv::Mat::at, file D:\code\shit\so07\deps\include\opencv2/core/mat.inl.hpp, line 1102
To allow assignment of the result from cv::mean (which is a cv::Scalar) to our CV_8UC3 matrix, we need to do two things (not necessarily in this order):
Convert the values from double to uint8_t -- OpenCV will do a saturate_cast, but given that the mean won't go past the min/max of the input items, we'd be fine with a regular cast.
Get rid of the 4th element.
To remove the 4th element, we can use cv::Matx::get_minor (The documentation is a bit lacking, but a look at the implementation explains it fairly well). The result is a cv::Matx, so we have to use that instead of cv::Vec when using cv::Mat::at.
The two possible options then are:
Get rid of the 4th element and then
cast result to convert the cv::Matx to uint8_t element type.
Cast the cv::Scalar to cv::Scalar_<uint8_t> first, and then get rid of the 4th element.
Example:
#include <opencv2/opencv.hpp>
typedef cv::Matx<uint8_t, 3, 1> Mat31b; // Convenience, OpenCV only has typedefs for double and float variants
int main()
{
cv::Mat test_mat(1, 12, CV_8UC3); // 12 * 3 = 36 bytes of data
test_mat = cv::Scalar(1, 1, 1); // Set all elements to 1
std::cout << "Before: " << test_mat << "\n";
cv::Scalar test_scalar{ 2,3,4,0 };
cv::Matx31d temp = test_scalar.get_minor<3, 1>(0, 0);
test_mat.at<Mat31b>(0, 0) = static_cast<Mat31b>(temp);
// or
// cv::Scalar_<uint8_t> temp(static_cast<cv::Scalar_<uint8_t>>(test_scalar));
// test_mat.at<Mat31b>(0, 0) = temp.get_minor<3, 1>(0, 0);
std::cout << "After: " << test_mat << "\n";
return 0;
}
NB: You can get rid of the explicit temporaries, they're here just for easier readability.
Output:
Both options produce the following output:
Before: [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
After: [ 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
As we can see, only the first 3 bytes were changed, so it behaves correctly.
Some thoughts about performance.
It's hard to guess which of the two approaches is better. Casting first means you allocate smaller amount of memory for the temporary, but then you have to do 4 saturate_casts instead of 3. Some benchmarking would have to be done (excercise for the reader). The calculation of mean will outweigh it significantly, so it's likely to be irrelevant.
Given that we don't really need the saturate_casts, perhaps the simple, but more verbose approach (optimized version of the thing that worked for you) might perform better in a tight loop.
cv::Vec3b& current_element(avgs.at<cv::Vec3b>(i));
cv::Scalar current_mean(cv::mean(cv::Mat(img, rois[i])));
for (int n(0); n < 3; ++n) {
current_element[n] = static_cast<uint8_t>(current_mean[n]);
}
Update:
One more idea that came up in discussion with #alkasm. The assignment operator for a cv::Mat is vectorized when given a cv::Scalar (it assigns the same value to all elements), and it ignores the additional channel values the cv::Scalar may hold relative to the target cv::Mat type. (e.g. for a 3-channel Mat it ignores the 4th value).
We could take a 1x1 ROI of the target Mat, and assign it the mean Scalar. Necessary type conversions will happen, and the 4th channel will be discared. Probably not optimal, but it's by far the least amount of code so far.
test_mat(cv::Rect(0, 0, 1, 1)) = test_scalar;
The result is the same as before.

Measure of how much contiguous is a block of pixels [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm using Genetic Algorithms (GA) on an image processing problem (an image segmentation to be more precise). In this case, an individual represent a block of pixels (i.e. a set of pixel coordinates). I need to encourage individuals with contiguous pixels.
To encourage contiguous blocks of pixels:
The "contiguousness" of an individual need to be considered in the fitness function to encourage individuals having adjacent pixels (best-fit). Hence during the evolution, the contiguousness of a set of coordinates (i.e. an individual) will influence the fitness of this individual.
The problem I'm facing is how to measure this feature (how much contiguous) on a set of pixel coordinates (x, y)?
As can be shown on the image below, the individual (set of pixels in black) on the right is clearly more "contiguous" (and therefore fitter) than the individual on the left:
I think I understand what you are asking, and my suggestion would be to count the number of shared "walls" between your pixels:
I would argue that from left to right the individuals are decreasing in continuity.
Counting the number of walls is not difficult to code, but might be slow the way I've implemented it here.
import random
width = 5
height = 5
image = [[0 for x in range(width)] for y in range(height)]
num_pts_in_individual = 4
#I realize this may give replicate points
individual = [[int(random.uniform(0,height)),int(random.uniform(0,width))] for x in range(num_pts_in_individual)]
#Fill up the image
for point in individual:
image[point[0]][point[1]] = 1
#Print out the image
for row in image:
print row
def count_shared_walls(image):
num_shared = 0
height = len(image)
width = len(image[0])
for h in range(height):
for w in range(width):
if image[h][w] == 1:
if h > 0 and image[h-1][w] == 1:
num_shared += 1
if w > 0 and image[h][w-1] == 1:
num_shared += 1
if h < height-1 and image[h+1][w] == 1:
num_shared += 1
if w < width-1 and image[h][w+1] == 1:
num_shared += 1
return num_shared
shared_walls = count_shared_walls(image)
print shared_walls
Different images and counts of shared walls:
[0, 0, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 0, 1, 1]
[0, 0, 0, 0, 0]
2
[1, 0, 0, 0, 0]
[0, 0, 0, 1, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 1, 0, 0]
0
[0, 0, 0, 1, 1]
[0, 0, 0, 1, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
4
One major problem with this, is that if a change in pixel locations occurs that does not change the number of shared walls, it will not affect the score. Maybe a combination of the distance method you described and the shared walls approach would be best.

Opencv virtually camera rotating/translating for bird's eye view

I've a calibrated camera where I exactly know the intrinsic and extrinsic data. Also the height of the camera is known. Now I want to virtually rotate the camera for getting a Bird's eye view, such that I can build the Homography matrix with the three rotation angles and the translation.
I know that 2 points can be transformed from one image to another via Homography as
x=K*(R-t*n/d)K^-1 * x'
there are a few things I'd like to know now:
if I want to bring back the image coordinate in ccs, I have to multiply it with K^-1, right? As Image coordinate I use (x',y',1) ?
Then I need to built a rotation matrix for rotating the ccs...but which convention should I use? And how do I know how to set up my WCS?
The next thing is the normal and the distance. Is it right just to take three points lying on the ground and compute the normal out of them? and is the distance then the camera height?
Also I'd like to know how I can change the height of the virtually looking bird view camera, such that I can say I want to see the ground plane from 3 meters height. How can I use the unit "meter" in the translation and homography Matrix?
So far for now, it would be great if someone could enlighten and help me. And please don't suggest generating the bird view with "getperspective", I ve already tried that but this way is not suitable for me.
Senna
That is the code i would advise (it's one of mine), to my mind it answers a lot of your questions,
If you want the distance, i would precise that it is in the Z matrix, the (4,3) coefficient.
Hope it will help you...
Mat source=imread("Whatyouwant.jpg");
int alpha_=90., beta_=90., gamma_=90.;
int f_ = 500, dist_ = 500;
Mat destination;
string wndname1 = getFormatWindowName("Source: ");
string wndname2 = getFormatWindowName("WarpPerspective: ");
string tbarname1 = "Alpha";
string tbarname2 = "Beta";
string tbarname3 = "Gamma";
string tbarname4 = "f";
string tbarname5 = "Distance";
namedWindow(wndname1, 1);
namedWindow(wndname2, 1);
createTrackbar(tbarname1, wndname2, &alpha_, 180);
createTrackbar(tbarname2, wndname2, &beta_, 180);
createTrackbar(tbarname3, wndname2, &gamma_, 180);
createTrackbar(tbarname4, wndname2, &f_, 2000);
createTrackbar(tbarname5, wndname2, &dist_, 2000);
imshow(wndname1, source);
while(true) {
double f, dist;
double alpha, beta, gamma;
alpha = ((double)alpha_ - 90.)*PI/180;
beta = ((double)beta_ - 90.)*PI/180;
gamma = ((double)gamma_ - 90.)*PI/180;
f = (double) f_;
dist = (double) dist_;
Size taille = source.size();
double w = (double)taille.width, h = (double)taille.height;
// Projection 2D -> 3D matrix
Mat A1 = (Mat_<double>(4,3) <<
1, 0, -w/2,
0, 1, -h/2,
0, 0, 0,
0, 0, 1);
// Rotation matrices around the X,Y,Z axis
Mat RX = (Mat_<double>(4, 4) <<
1, 0, 0, 0,
0, cos(alpha), -sin(alpha), 0,
0, sin(alpha), cos(alpha), 0,
0, 0, 0, 1);
Mat RY = (Mat_<double>(4, 4) <<
cos(beta), 0, -sin(beta), 0,
0, 1, 0, 0,
sin(beta), 0, cos(beta), 0,
0, 0, 0, 1);
Mat RZ = (Mat_<double>(4, 4) <<
cos(gamma), -sin(gamma), 0, 0,
sin(gamma), cos(gamma), 0, 0,
0, 0, 1, 0,
0, 0, 0, 1);
// Composed rotation matrix with (RX,RY,RZ)
Mat R = RX * RY * RZ;
// Translation matrix on the Z axis change dist will change the height
Mat T = (Mat_<double>(4, 4) <<
1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, dist,
0, 0, 0, 1);
// Camera Intrisecs matrix 3D -> 2D
Mat A2 = (Mat_<double>(3,4) <<
f, 0, w/2, 0,
0, f, h/2, 0,
0, 0, 1, 0);
// Final and overall transformation matrix
Mat transfo = A2 * (T * (R * A1));
// Apply matrix transformation
warpPerspective(source, destination, transfo, taille, INTER_CUBIC | WARP_INVERSE_MAP);
imshow(wndname2, destination);
waitKey(30);
}
This code works for me but I don't know why the Roll and Pitch angles are exchanged. When I change "alpha", the image is warped in pitch and when I change "beta" the image in warped in roll. So, I changed my rotation matrix, as can be seen below.
Also, the RY has a signal error. You can check Ry at: http://en.wikipedia.org/wiki/Rotation_matrix.
The rotation metrix I use:
Mat RX = (Mat_<double>(4, 4) <<
1, 0, 0, 0,
0, cos(beta), -sin(beta), 0,
0, sin(beta), cos(beta), 0,
0, 0, 0, 1);
Mat RY = (Mat_<double>(4, 4) <<
cos(alpha), 0, sin(alpha), 0,
0, 1, 0, 0,
-sin(alpha), 0, cos(alpha), 0,
0, 0, 0, 1);
Mat RZ = (Mat_<double>(4, 4) <<
cos(gamma), -sin(gamma), 0, 0,
sin(gamma), cos(gamma), 0, 0,
0, 0, 1, 0,
0, 0, 0, 1);
Regards

Kalman Filter : some doubts

I have several questions:
In the example given in openCV document:
/* generate measurement */
cvMatMulAdd( kalman->measurement_matrix, state, measurement, measurement );
Is this correct?
In the tutorial: An Introduction to the Kalman Filter by Welch and Bishop
in Equation 1.2 it says measurement = H*state + measurement noise
Doesn't seems both are same.
I was trying to implement bouncing ball tracking for a single ball.
I tried the following: (Please point out if I am doing it incorrectly.)
For the measurement I am measuring two things: a) x b) y of the centroid of the ball.
I am just mentioning lines which are different from the example given in opencv documentation.
CvKalman* kalman = cvCreateKalman( 5, 2, 0 );
const float A[] = { 1, 0, 1, 0, 0,
0, 1, 0, 1, 0,
0, 0, 1, 0, 0,
0, 0, 0, 1, 1,
0, 0, 0, 0, 1};
CvMat* state = cvCreateMat( 5, 1, CV_32FC1 );
CvMat* measurement = cvCreateMat( 2, 1, CV_32FC1 );
//initialize the state of kalman filter
state->data.fl[0] = mean_c;
state->data.fl[1] = mean_r;
state->data.fl[2] = mean_c - prev_mean_c;
state->data.fl[3] = mean_r - prev_mean_r;
state->data.fl[4] = 9.81;
after initialization, this is what gives crash
cvMatMulAdd( kalman->transition_matrix, state,
kalman->process_noise_cov, state );
In this line they just use variable measurement to store noise. See previous line:
cvRandArr( &rng, measurement, CV_RAND_NORMAL, cvRealScalar(0),cvRealScalar(sqrt(kalman->measurement_noise_cov->data.fl[0])) );
You should change dimension of H matrix as well. It must be 5 by 2 to make it possible to calculate H*state + measurement noise. You get an error probably in line
memcpy( cvkalman->measurement_matrix->data.fl, H, sizeof(H));
because in initial example cvkalman->measurement_matrix and H are allocated as 4 by 4 matrices and you decreased dimension of cvkalman->measurement_matrix only to 5 by 2 (4*4 is more than 5*2)

Resources