How to render multiple triangleStrips using Metal? - metal

Now I have already known how to render multiple triangles in Metal:
let vertexBuffer = device.makeBuffer(vertices_triangles)
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: vertices_triangles.count)
renderEncoder.endEncoding()
commandBuffer.present(view.currentDrawable!)
commandBuffer.commit()
Here, vertices_triangles is an Array of element Vertex. The adjacent three vertices shows a triangle to render.
However, I don't really know how to render multiple triangleStrips in Metal.
let vertexBuffer = device.makeBuffer(vertices_triangleStrips)
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: vertices_triangleStrips.count)
If I put adjacent vertices in vertices_triangleStrips and set renderEncoder.drawPrimitives.type to .triangleStrip, I will get one triangleStrip. But how can I render multiple triangleStrips? I tried using for loop to make multiple vertexBuffers and use renderEncoder.drawPrimitives to draw each triangleStrip. It seems that it's not a good idea to do this for performance reasons.

Referring to documentation of drawIndexedPrimitives(type:indexCount:indexType:indexBuffer:indexBufferOffset:instanceCount:baseVertex:baseInstance:) in Metal:
Primitive restart functionality is enabled with the largest unsigned integer index value, relative to indexType (0xFFFF for MTLIndexTypeUInt16 or 0xFFFFFFFF for MTLIndexTypeUInt32). This feature finishes drawing the current primitive at the specified index and starts drawing a new one with the next index.
You could render multiple triangleStrips by defining an indexBuffer seperated by 0xFFFF or 0xFFFFFFFF.
eg. rendering triangleStrips at vertex [0,1,2,3] [4,5,6,7] [8,9,10] [11,12,13,14,15,16]
let indexBytes: [UInt32] = [0, 1, 2, 3, 0xFFFFFFFF, 4, 5, 6, 7, 0xFFFFFFFF, 8, 9, 10, 0xFFFFFFFF, 11, 12, 13, 14, 15, 16, 0xFFFFFFFF]
let vertexBuffer = device.makeBuffer(bytes: vertices_triangleStrips,
length: vertices_triangleStrips.count * MemoryLayout<MetalPosition2>.stride,
options: [])!
let indexBuffer = device.makeBuffer(bytes: indexBytes,
length: indexBytes.count * MemoryLayout<UInt32>.stride,
options: [])!
renderEncoder.setVertexBuffer(vertexBuffer,
offset: 0,
index: 0)
renderEncoder.drawIndexedPrimitives(type: .triangleStrip,
indexCount: indexBytes.count,
indexType: .uint32,
indexBuffer: indexBuffer,
indexBufferOffset: 0) // only one instance

Related

How to perform Bilinear Interpolation to a masked image?

Suppose I have an image with mask, valid pixels are masked as 1 and others 0, how to perform bilinear interpolation to fill all the invalid pixels?
for example, image:
1, 0, 0, 4
mask:
1, 0, 0, 1
interpolation result should be:
1, 2, 3, 4
The valid pixels are not regularly arranged, a more complicated sample, image:
4, 0, 6, 0,
0, 8, 5, 0
5, 3, 0, 0
mask:
1, 0, 1, 0,
0, 1, 1, 0
1, 1, 0, 0
interpolate with scipy.interpolate.interp2d and the result has many holes and noise

I cannot run any metal compute shader in my phone

I am trying to run my metal program on my iPhone SE.
I tried many numbers for threadsPerThreadGroup and threadsPerGrid sizes and all of them gave me this error: TLValidateFeatureSupport:3539: failed assertion `Dispatch Threads with Non-Uniform Threadgroup Size is only supported on MTLGPUFamilyApple4 and later.'
Here is my code.
var threadsPerThreadGroup: MTLSize
var threadsPerGrid: MTLSize
computeCommandEncoder.setComputePipelineState(updateShader)
let w = updateShader.threadExecutionWidth
threadsPerThreadGroup = MTLSize(width: w, height: 1, depth: 1)
threadsPerGrid = MTLSize(width: Int(constants.bufferLength), height: 1, depth: 1)
if(frames % 2 == 0) {
computeCommandEncoder.setBuffer(buffer1, offset: 0, index: 0)
computeCommandEncoder.setBuffer(buffer2, offset: 0, index: 1)
} else {
computeCommandEncoder.setBuffer(buffer2, offset: 0, index: 0)
computeCommandEncoder.setBuffer(buffer1, offset: 0, index: 1)
}
computeCommandEncoder.setBytes(&constants, length: MemoryLayout<MyConstants>.stride, index: 2)
computeCommandEncoder.dispatchThreads(threadsPerGrid, threadsPerThreadgroup: threadsPerThreadGroup)
frames += 1
I am using iOS 13.4 and XCode 11.4.
threadExecutionWidth evaluates to 32 and constants.bufferLength is 512.
Use [dispatchThreads] only if the device supports non-uniform threadgroup sizes.
That is not worded as clearly as it could be. It means that dispatchThreads does not work on pre-A11 GPUs.
If you want a solution that works on all devices, you have to calculate how many threadgroups go into a grid yourself, and use dispatchThreadgroups.
If you want to have both methods in your code, you can detect the device's feature set at runtime.

WebGL TRIANGLE vs TRIANGLE_STRIP

I've got a single triangle rendering using gl.TRIANGLE_STRIP, but when I try to change it to gl.TRIANGLE, the faces do not render. It appears like the vertices are rendering as really tiny dots, but the faces are empty.
My understanding is that the vertex format for a TRIANGLE vs TRIANGLE_STRIP should be identical for a single triangle.
// vertex setup
const buffer = gl.createBuffer();
const vertices = new Float32Array([
1, -1, -1, 1, 1.3, 1.5, 1,
1, -1, 1, 1.3, 1, 1.5, 1,
0, 1, 0, 1, 1, 1.75, 1
]);
gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
gl.bufferData(gl.ARRAY_BUFFER, vertices, gl.STATIC_DRAW);
const length = vertices.length / 7;
const mode = gl.TRIANGLE_STRIP;
return {buffer, length, mode};
That works as expected with the following render code:
// render frame
gl.bindBuffer(gl.ARRAY_BUFFER, shape.buffer);
gl.vertexAttribPointer(attribs.position, 3, gl.FLOAT, false, 28, 0);
gl.vertexAttribPointer(attribs.color, 4, gl.FLOAT, false, 28, 12);
gl.enableVertexAttribArray(attribs.position);
gl.enableVertexAttribArray(attribs.color);
gl.useProgram(programs.colored);
gl.uniformMatrix4fv(uniforms.projection, false, projection);
gl.uniformMatrix4fv(uniforms.modelView, false, modelView);
gl.drawArrays(shape.mode, 0, shape.length);
But if I change the mode to gl.TRIANGLE, no faces appear, with the vertices just barely visible as tiny dots.
What am I misunderstanding here?

OpenCV how do conversions of Matrix elements work

I am having trouble understanding the inner workings of OpenCV. Consider the following code:
Scalar getAverageColor(Mat img, vector<Rect>& rois) {
int n = static_cast<int>(rois.size());
Mat avgs(1, n, CV_8UC3);
for (int i = 0; i < n; ++i) {
// What is the correct way to assign the color elements in
// the matrix?
avgs.at<Scalar>(i) = mean(Mat(img, rois[i]));
/*
This seems to always work, but there has to be a better way.
avgs.at<Vec3b>(i)[0] = mean(Mat(img, rois[i]))[0];
avgs.at<Vec3b>(i)[1] = mean(Mat(img, rois[i]))[1];
avgs.at<Vec3b>(i)[2] = mean(Mat(img, rois[i]))[2];
*/
}
// If I access the first element it seems to be set correctly.
Scalar first = avgs.at<Scalar>(0);
// However mean returns [0 0 0 0] if I did the assignment above using scalar, why???
Scalar avg = mean(avgs);
return avg;
}
If I use avgs.at<Scalar>(i) = mean(Mat(img, rois[i])) for the assignment in the loop the first element looks correct, but then the mean calculation always returns zero (even thought the first element looks correct). If I assign all the color elements by hand using Vec3b it seems to work, but why???
Note: cv::Scalar is a typedef for cv::Scalar_<double>, which derives from cv::Vec<double, 4>, which derives from cv::Matx<double, 4, 1>.
Similarly, cv::Vec3b is cv::Vec<uint8_t, 3> which derives from cv::Matx<uint8_t, 3, 1> -- this means that we can use any of those 3 in cv::Mat::at and get identical (correct) behaviour.
It's important to be aware that cv::Mat::at is basically a reinterpret_cast on the underlying data array. You need to be extremely careful to use an appropriate data type for the template argument, one which corresponds to the type of elements (including channel count) of the cv::Mat you're invoking it on.
The documentation mentions the following:
Keep in mind that the size identifier used in the at operator cannot be chosen at random. It depends on the image from which you are trying to retrieve the data. The table below gives a better insight in this:
If matrix is of type CV_8U then use Mat.at<uchar>(y,x).
If matrix is of type CV_8S then use Mat.at<schar>(y,x).
If matrix is of type CV_16U then use Mat.at<ushort>(y,x).
If matrix is of type CV_16S then use Mat.at<short>(y,x).
If matrix is of type CV_32S then use Mat.at<int>(y,x).
If matrix is of type CV_32F then use Mat.at<float>(y,x).
If matrix is of type CV_64F then use Mat.at<double>(y,x).
It doesn't seem to mention there what to do in case of multiple channels -- in that case you use cv::Vec<...> (or rather one of the typedefs provided). cv::Vec<...> is basically a wrapper around an fixed-size array of N values of given type.
In your case, the matrix avgs is CV_8UC3 -- each element consists of 3 unsigned byte values (i.e. 3 bytes total). However, by using avgs.at<Scalar>(i), you interpret each element as 4 doubles (32 bytes in total). That means that:
The actual element you tried to write to (if interpreted correctly) will only hold the 3 most significant bytes of the (8 byte floating point) mean of the first channel -- i.e. complete garbage.
You actually overwrite the next 10 elements (the last one partially, 3rd channel escapes unscathed) with more garbage.
At some point, you are bound to overflow the buffer and potentially trash other data structures. This issue is rather serious.
We can demonstrate it using the following simple program.
Example:
#include <opencv2/opencv.hpp>
int main()
{
cv::Mat test_mat(cv::Mat::zeros(1, 12, CV_8UC3)); // 12 * 3 = 36 bytes of data
std::cout << "Before: " << test_mat << "\n";
cv::Scalar test_scalar(cv::Scalar::all(1234.5678));
test_mat.at<cv::Scalar>(0, 0) = test_scalar;
std::cout << "After: " << test_mat << "\n";
return 0;
}
Output:
Before: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
After: [173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 173, 250, 92, 109, 69, 74, 147, 64, 0, 0, 0, 0]
This clearly shows we're writing way more than we should.
In Debug mode, the incorrect use of at also triggers an assertion:
OpenCV(3.4.3) Error: Assertion failed (((((sizeof(size_t)<<28)|0x8442211) >> ((traits::Depth<_Tp>::value) & ((1 << 3) - 1))*4) & 15) == elemSize1()) in cv::Mat::at, file D:\code\shit\so07\deps\include\opencv2/core/mat.inl.hpp, line 1102
To allow assignment of the result from cv::mean (which is a cv::Scalar) to our CV_8UC3 matrix, we need to do two things (not necessarily in this order):
Convert the values from double to uint8_t -- OpenCV will do a saturate_cast, but given that the mean won't go past the min/max of the input items, we'd be fine with a regular cast.
Get rid of the 4th element.
To remove the 4th element, we can use cv::Matx::get_minor (The documentation is a bit lacking, but a look at the implementation explains it fairly well). The result is a cv::Matx, so we have to use that instead of cv::Vec when using cv::Mat::at.
The two possible options then are:
Get rid of the 4th element and then
cast result to convert the cv::Matx to uint8_t element type.
Cast the cv::Scalar to cv::Scalar_<uint8_t> first, and then get rid of the 4th element.
Example:
#include <opencv2/opencv.hpp>
typedef cv::Matx<uint8_t, 3, 1> Mat31b; // Convenience, OpenCV only has typedefs for double and float variants
int main()
{
cv::Mat test_mat(1, 12, CV_8UC3); // 12 * 3 = 36 bytes of data
test_mat = cv::Scalar(1, 1, 1); // Set all elements to 1
std::cout << "Before: " << test_mat << "\n";
cv::Scalar test_scalar{ 2,3,4,0 };
cv::Matx31d temp = test_scalar.get_minor<3, 1>(0, 0);
test_mat.at<Mat31b>(0, 0) = static_cast<Mat31b>(temp);
// or
// cv::Scalar_<uint8_t> temp(static_cast<cv::Scalar_<uint8_t>>(test_scalar));
// test_mat.at<Mat31b>(0, 0) = temp.get_minor<3, 1>(0, 0);
std::cout << "After: " << test_mat << "\n";
return 0;
}
NB: You can get rid of the explicit temporaries, they're here just for easier readability.
Output:
Both options produce the following output:
Before: [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
After: [ 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
As we can see, only the first 3 bytes were changed, so it behaves correctly.
Some thoughts about performance.
It's hard to guess which of the two approaches is better. Casting first means you allocate smaller amount of memory for the temporary, but then you have to do 4 saturate_casts instead of 3. Some benchmarking would have to be done (excercise for the reader). The calculation of mean will outweigh it significantly, so it's likely to be irrelevant.
Given that we don't really need the saturate_casts, perhaps the simple, but more verbose approach (optimized version of the thing that worked for you) might perform better in a tight loop.
cv::Vec3b& current_element(avgs.at<cv::Vec3b>(i));
cv::Scalar current_mean(cv::mean(cv::Mat(img, rois[i])));
for (int n(0); n < 3; ++n) {
current_element[n] = static_cast<uint8_t>(current_mean[n]);
}
Update:
One more idea that came up in discussion with #alkasm. The assignment operator for a cv::Mat is vectorized when given a cv::Scalar (it assigns the same value to all elements), and it ignores the additional channel values the cv::Scalar may hold relative to the target cv::Mat type. (e.g. for a 3-channel Mat it ignores the 4th value).
We could take a 1x1 ROI of the target Mat, and assign it the mean Scalar. Necessary type conversions will happen, and the 4th channel will be discared. Probably not optimal, but it's by far the least amount of code so far.
test_mat(cv::Rect(0, 0, 1, 1)) = test_scalar;
The result is the same as before.

OpenCV: subtract same BGR values from all pixels

I have some BGR image:
cv::Mat image;
I want to subtract from all the pixels in the image the vector:
[10, 103, 196]
Meaning that the blue channel for all the pixels will be reduced by 10, the green by 103 and the red by 196.
Is there a standard way to do that, or should I run for loops over all the channels and all the pixels?
suppose we have image that all channels filled with zero and for instance it's dimension is 2x3
cv::Mat image = cv::Mat::zeros(2,3,CV_32SC3)
output will be:
[0, 0, 0, 0, 0, 0, 0, 0, 0;
0, 0, 0, 0, 0, 0, 0, 0, 0]
then if we want to add or subtract a singleton variable, then we can use cv::Scalar
1- suppose we want to add 3 in blue channel:
image = image + Scalar(3,0,0); // the result will be same as image=image+3;
with above code our matrix is now:
[3, 0, 0, 3, 0, 0, 3, 0, 0;
3, 0, 0, 3, 0, 0, 3, 0, 0]
2- if you want to add to another channel you can use second or third argument(or forth) of cv::Scalar like below
image = image +Scalar(3,2,-3);
output will be
[3, 2, -3, 3, 2, -3, 3, 2, -3;
3, 2, -3, 3, 2, -3, 3, 2, -3]
Using cv::subtract
cv::Mat image = cv::Mat::zeros(2,3,CV_32SC3);
subtract(image,Scalar(2,3,1),image);
output
[-2, -3, -1, -2, -3, -1, -2, -3, -1;
-2, -3, -1, -2, -3, -1, -2, -3, -1]

Resources