Realtime conversion of ARFrame.capturedImage CVPixelBufferRef to OpenCv Mat - ios

ARKit runs at 60 frames/sec, which equates to 16.6ms per frame.
My current code to convert the CVPixelBufferRef (kCVPixelFormatType_420YpCbCr8BiPlanarFullRange format) to a cv::Mat (YCrCb) runs in 30ms, which causes ARKit to stall and everything to lag.
Does anyone have any ideas on how to to a quicker conversion or do I need to drop the frame rate?
There is a suggestion by Apple to use Metal, but I'm not sure how to do that.
Also I could just take the grayscale plane, which is the first channel, which runs in <1ms, but ideally I need the colour information as well.

In order to process an image in a pixel buffer using Metal, you need to do following.
Call CVMetalTextureCacheCreateTextureFromImage to create CVMetalTexture object on top of the pixel buffer.
Call CVMetalTextureGetTexture to create a MTLTexture object, which Metal code (GPU) can read and write.
Write some Metal code to convert the color format.
I have an open source project (https://github.com/snakajima/vs-metal), which processes pixel buffers (from camera, not ARKit) using Metal. Feel free to copy any code from this project.

I tried to convert Ycbcr to RGB, do image processing in RGB mat and convert it back to Ycbcr, it worked very slowly. I suggest only do that with a static image. For realtime processing, we should process directly in cv::Mat. ARFrame.capturedImage is Ycbcr buffer. So, the solution is
Sperate the buffer to 2 cv::Mat (yPlane and cbcrPlane). Keep in mind, we do not clone memory, we create 2 cv::Mat with base addresses is yPlane address and cbcrPlane address.
Do image process on yPlane and cbcrPlane, size(cbcrPlane) = size(yPlane) / 2.
You can check out my code here: https://gist.github.com/ttruongatl/bb6c69659c48bac67826be7368560216

Related

Copy Metal frame buffer to MTLTexture with different Pixel Format

I need to grab the screen pixels into a texture to perform post processing.
Previously, i have been using BlitCommandEncoder to copy from texture to texture. Source texture being the MTLDrawable texture, onto my destination texture. They both have the same MTLPixelFormatBGRA8Unorm so everything works just fine.
However, now i need to use a frame buffer color attachment texture of MTLPixelFormatRGBA16Float for HDR rendering. So, when i am grabbing the screen pixels, i am actually grabbing from this color attachment texture instead of the Drawable texture. And i am getting this error:
[MTLDebugBlitCommandEncoder internalValidateCopyFromTexture:sourceSlice:sourceLevel:sourceOrigin:sourceSize:toTexture:destinationSlice:destinationLevel:destinationOrigin:options:]:447: failed assertion [sourceTexture pixelFormat](MTLPixelFormatRGBA16Float) must equal [destinationTexture pixelFormat](MTLPixelFormatBGRA8Unorm)
I don't think i need to change my destination texture to RGBA16Float format? Because that will take up double the memory. One full screen texture (color attachment) with that format should be enough for HDR to work right?
Is there other method to successfully perform this kind of copy?
On openGL there is no error when copying with glCopyTexImage2D
Metal automatically converts from source to destination format during rendering. So you could just do a no-op rendering pass to perform the conversion.
Alternatively, if you want to avoid boilerplate no-op rendering code, you can use the MPSImageConversion performance shader that's basically doing the same.

How to use shared memory between GPU and CPU on iOS with Metal? (ideally with objective c)

I created a MTLTexture like this:
descTex=[MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm width:texCamWidth height:texCamHeight mipmapped:NO];
descTex.usage = MTLTextureUsageShaderWrite | MTLTextureUsageShaderRead ;
[descTex setStorageMode:MTLStorageModeShared];
texOutRGB = [_device newTextureWithDescriptor:descTex];
Used a compute shader to fill the texture and render it to the screen. Results are as expected.
Now I need to do a CPU hook to modify the texture data which can not be done with a shader. I expected that the MTLTexture.buffer contents would allow me to loop over the pixels but it appears it does not work like that. I see people using the getBytes and then replaceRegion to write it back but that does not look like using shared memory since a copy of the data is made.
How to loop over the RGBA pixel data in the texture with the CPU?
If you created a simple 1D buffer instead, then just access the contents member as a pointer. If you need a RGBA buffer then create a CVPixelBuffer that contains BGRA pixels, you can then access the pixels by locking and then read/write to the base pointer for the buffer (take care to respect row widths), finally you can wrap the CVPixelBuffer as a metal texture to avoid the memcpy(). The 2D processing is not trivial, it is a lot easier to just use a 1D buffer.

Convert ARGB data(.tiff image) to 2D texture using D3D11

I have a sample ARGB image (.tiff file).
I want to pass it as a 2D texture. I am not sure how to do that or is it even possible ?
I think you can utilize WIC texture loader which is a part of official DirectX Tool Kit.

How to play Video using OpenGL in iOS?

I am trying to play a video using OpenGL ES 2.0 in iOS. I am not able to get a sample code or starting point of how to achieve this. Can anybody help me with this?
What you are looking for is getting a raw buffer for the video in real time. I believe you need to look into AVFoundation and somehow extract the CVPixelBufferRef. If I remember correctly you have a few ways; one is on demand at specific time, another for processing where you will get a fast iteration of the frames in a block, and the one you probably need is to receive the frames in real time. So with this you can extract a raw RGB buffer which needs to be pushed to the texture and then drawn to the render buffer.
I suggest you create a texture once (per video) and try making it as small as possible but ensure that the video frame will fit. You might need the POT (power of two) textures so to get the texture dimension from video width you need something like:
GLInt textureWidth = 1.0f;
while(textureWidth<videoWidth) textureWidth <<= 1; // Multiplies by 2
So the texture size is expected to be larger then the video. To push the data to the texture you then need to use texture subimage glTexSubImage2D. Which expects a pointer to your raw data and rectangle parameters where to save the data which are then (0, 0, sampleWidth, sampleHeight). Also then the texture coordinates must computed so they are not in range [0, 1] but rather for x: [0, sampleWidth/textureWidth].
So then you just need to put it all together:
Have a system to keep generating the video raw sample buffers
Generate a texture to fit video size
On new sample update the texture using glTexSubImage2D (watch out for threads)
After the data is loaded into the texture draw the texture as full screen rectangle (watch out for threads)
You might need to watch out for video orientation, transformation. So if possible do test your system with a few videos that have been recorded on the device in different orientations. I think there is now a support to receive the buffers already correctly oriented. But by default the sample at least used to be "wrong"; the portrait recorded video still had the samples in landscape but a transformation matrix or orientation was given with the asset.

Losing data when converting from UIImage to Mat

I am trying to use an HSV threshold to find a red cup in an image. One of my implementations uses the cvVideoCamera delegate and gets a mat straight from the camera. My other implementation lets the user record the video then we extract the frames using AVFoundation for processing.
When I threshold the image from AVFoundation I get nothing back besides a black image.
Here is my code:
inRange(gray, Scalar(114, 135, 135), Scalar(142, 255, 255), dst);
The first image is an example of an image that works properly and the second is an image from AVFoundation which does not threshold how I expect, it produces an all black image.
Does anyone have an idea why the second image produces different results when the color of the cup looks quite similar?
An image pulled from a video is reconstructed in a manner that depends on the video's encoding and compression codec. Short version is that unless you happen to pick out a keyframe (which you generally won't have an api to do, so trying to do this isn't viable), you're getting a image that is reconstructed from the video.
So a image taken from the video from same time that you took a straight-up image from the camera (assuming you could do both at the same time) would be different. After any sort processing you'll, of course, get different result.
Before applying your threshold, get the raw image for before approaches. Look at them (or a delta of them) and you'll see that just aren't the same image. The second approach will likely have introduced artifacts from being encoded into video from multiple frames, then reconstructed into a single frame image.
To overcome this issue, use OpenCV Native conversion functions.
#import <opencv2/imgcodecs/ios.h>
UIImage* MatToUIImage(const cv::Mat& image);
void UIImageToMat(const UIImage* image,
cv::Mat& m, bool alphaExist = false);

Resources