I need to implement offscreen rendering in Metal with copying to a system memory. Without drawing on the screen.
This code works fine, but I'm not sure that it's a correct code:
// rendering to offscreen texture
auto commandQueue = [device newCommandQueue];
auto commandBuffer = [commandQueue commandBuffer];
//[commandBuffer enqueue]; // Do I need this command?
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:mtlDescriptor];
// perform encoding
[renderEncoder endEncoding];
[commandBuffer commit];
auto commandBuffer = [commandQueue commandBuffer];
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
// Copying offscreen texture to a new managed texture
[blitEncoder copyFromTexture:drawable.texture sourceSlice:0 sourceLevel:level sourceOrigin:region.origin sourceSize:region.size toTexture:_mtlTexture destinationSlice:0 destinationLevel:level destinationOrigin:{xOffset, yOffset, 0}];
[blitEncoder endEncoding];
[commandBuffer commit];
[commandBuffer WaitUntilCompleted]; // I add waiting to get a fully completed texture for copying.
// Final stage - we copy a texture to our buffer in system memory
getBytes_bytesPerRow_fromRegion_mipmapLevel()
Do I need to call commandBuffer.enqueue ?
Also if I remove commandBuffer.WaitUntilCompleted I can get only a half of a frame. It seems that getBytes_bytesPerRow_fromRegion_mipmapLevel doesn't check that rendering is finished.
Or should I create offscreen texture "managed" instead of "private" and then copy it directly to my buffer:
// creating offscreen texture "managed"
// rendering to offscreen texture
auto commandQueue = [device newCommandQueue];
auto commandBuffer = [commandQueue commandBuffer];
//[commandBuffer enqueue]; // Do I need this command?
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:mtlDescriptor];
// perform encoding
[renderEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
// Copying "managed" offscreen texture to my buffer
getBytes_bytesPerRow_fromRegion_mipmapLevel()
1) You don't need to call enqueue on the command buffer. This is used in situations where you want to explicitly specify the order of command buffers in a multithreaded scenario, which is irrelevant here. Your command buffer will be implicitly enqueued upon being committed.
2) You do indeed need to wait until the command buffer has completed before copying its contents to system memory. Normally, it's essential for the GPU and CPU to be able to run asynchronously without waiting on one another, but in your use case, you want the opposite, and waiting is how you keep them in lockstep.
3) If you don't need a copy of the rendered image as a texture for further work on the GPU, you should be able to omit the full-on blit entirely, provided the texture you're rendering to is in the managed storage mode. You can call synchronizeResource: on the blit encoder instead, which will make the results of the rendering work visible to the copy of the texture in system memory, from which you can then copy directly.
If for some reason the render target can't be in managed storage (I noticed you're using drawables—are these actually MTLDrawables provided by a view or layer, and if so, why?), you will in fact need to blit to either a managed texture or a shared/managed buffer in order to copy the bits on the CPU side.
Related
I have a webgl project setup that uses 2 pass rendering to create effects on a texture.
Everything was working until recently chrome started throwing this error:
[.WebGL-0000020DB7FB7E40] GL_INVALID_OPERATION: Feedback loop formed between Framebuffer and active Texture.
This just started happening even though I didn't change my code, so I'm guessing a new update caused this.
I found this answer on SO, stating the error "happens any time you read from a texture which is currently attached to the framebuffer".
However I've combed through my code 100 times and I don't believe I am doing that. So here is how I have things setup.
Create a fragment shader with a uniform sampler.
uniform sampler2D sampler;
Create 2 textures
var texture0 = initTexture(); // This function does all the work to create a texture
var texture1 = initTexture(); // This function does all the work to create a texture
Create a Frame Buffer
var frameBuffer = gl.createFramebuffer();
Then I start the "2 pass processing" by uploading a html image to texture0, and binding texture0 to the sampler.
I then bind the frame buffer & call drawArrays:
gl.bindFramebuffer(gl.FRAMEBUFFER, frameBuffer);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture1, 0);
gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
To clean up I unbind the frame buffer:
gl.bindFramebuffer(gl.FRAMEBUFFER, null);
Edit:
After adding break points to my code I found that the error is not actually thrown until I bind the null frame buffer. So the drawArrays call isn't causing the error, it's binding the null frame buffer afterwards that sets it off.
Chrome since version 83 started to perform conservative checks for the framebuffer and the active texture feedback loop. These checks are likely too conservative and affect usage that should actually be allowed.
In these new checks Chrome seem to disallow a render target to be bound to any texture slot, even if this slot is not used by the program.
In your 2 pass rendering you likely have something like:
Initialize a render target and create a texture that points to a framebuffer.
Render to the target.
In 1 you likely bind a texture using gl.bindTexture(gl.TEXTURE_2D, yourTexture) you need to then, before the step 2, unbind the texture using gl.bindTexture(gl.TEXTURE_2D, null); Otherwise Chrome will fail because the render target is bound as a texture, even though this texture is not sampled by the program.
I need to grab the texture of whatever is drawn to the frame buffer, in other words, whatever is appearing below the effect that i'm about to draw.
Use case is to feed this texture to a shader that performs distortion to it.
what i've tried so far using MTLBlitCommandEncoder.
auto commandQueue = [device newCommandQueue];
auto commandBuffer = [commandQueue commandBuffer];
[commandBuffer enqueue];
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:mtlDescriptor];
// perform encoding
[renderEncoder endEncoding];
// now after several render passes i need to commit all the rendering for behind the effect,
// so that the texture i am grabbing will omit whatever is about to be drawn after this point
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
Next, i have to create a new command buffer, because if i don't do so i will get an error when i call addCompletedHandler after this commit. I suppose a command buffer cannot commit more than once right?
auto commandBuffer = [commandQueue commandBuffer];
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
[blitEncoder enqueue];
[blitEncoder copyFromTexture:drawable.texture sourceSlice:0 sourceLevel:level sourceOrigin:region.origin sourceSize:region.size toTexture:_mtlTexture destinationSlice:0 destinationLevel:level destinationOrigin:{xOffset, yOffset, 0}];
[blitEncoder endEncoding];
.
// continue with more other render encoding
This can run without any assert error. But the problem is that the depth test is appearing incorrectly. The effect appears to be drawn behind the models that it should be above (when i'm using just 1 command buffer without performing blit it renders correct).
i am using this settings that i suppose whatever written to the depth texture will be preserved?
mtlDescritpor.depthAttachment.loadAction = MTLLoadActionLoad;
mtlDescritpor.depthAttachment.storeAction = MTLStoreActionStore;
Can anyone point to me where went wrong?
EDIT:
i have tried using 2 command buffers, one after another, without performing blit, and the depth is also appearing wrong.
Does it mean depth test just can't work when it's a new command buffer?
Or is there a more recommended implementation for what i'm trying to achieve? I can't seem to find any examples..
EDIT2:
after more testing, it appears that there is a very inconsistent behaviour, even when using just 1 command buffer.
Some time the effect renders below (incorrect), sometimes correctly. (part of it should be rendering above as tested on OpenGL) When i comment off a line of code or add more lines of code, the result will change randomly. I am currently using depthCompareFunction MTLCompareFunctionLessEqual. If i change to MTLCompareFunctionNotEqual, everything will always be on drawn on top, which is also wrong.
So i realise i have been under the wrong impression of having the need to execute a commit first, in order to have what was drawn up till that point to be 'saved' to the texture.
Based on this info here https://github.com/gfx-rs/gfx/issues/2232
It is as though whatever render encoding performed is already drawn to the texture. So having 2 command buffers is not necessary at all.
As for the depth test issue, it was my mistake of not setting the viewport znear and zfar for the MTLRenderCommandEncoder to be same as the models'.
I'm doing realtime video processing on iOS at 120 fps and want to first preprocess image on GPU (downsample, convert color, etc. that are not fast enough on CPU) and later postprocess frame on CPU using OpenCV.
What's the fastest way to share camera feed between GPU and CPU using Metal?
In other words the pipe would look like:
CMSampleBufferRef -> MTLTexture or MTLBuffer -> OpenCV Mat
I'm converting CMSampleBufferRef -> MTLTexture the following way
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
// textureRGBA
{
size_t width = CVPixelBufferGetWidth(pixelBuffer);
size_t height = CVPixelBufferGetHeight(pixelBuffer);
MTLPixelFormat pixelFormat = MTLPixelFormatBGRA8Unorm;
CVMetalTextureRef texture = NULL;
CVReturn status = CVMetalTextureCacheCreateTextureFromImage(NULL, _textureCache, pixelBuffer, NULL, pixelFormat, width, height, 0, &texture);
if(status == kCVReturnSuccess) {
textureBGRA = CVMetalTextureGetTexture(texture);
CFRelease(texture);
}
}
After my metal shader is finised I convert MTLTexture to OpenCV
cv::Mat image;
...
CGSize imageSize = CGSizeMake(drawable.texture.width, drawable.texture.height);
int imageByteCount = int(imageSize.width * imageSize.height * 4);
int mbytesPerRow = 4 * int(imageSize.width);
MTLRegion region = MTLRegionMake2D(0, 0, int(imageSize.width), int(imageSize.height));
CGSize resSize = CGSizeMake(drawable.texture.width, drawable.texture.height);
[drawable.texture getBytes:image.data bytesPerRow:mbytesPerRow fromRegion:region mipmapLevel:0];
Some observations:
1) Unfortunately MTLTexture.getBytes seems expensive (copying data from GPU to CPU?) and takes around 5ms on my iphone 5S which is too much when processing at ~100fps
2) I noticed some people use MTLBuffer instead of MTLTexture with the following method:
metalDevice.newBufferWithLength(byteCount, options: .StorageModeShared)
(see: Memory write performance - GPU CPU Shared Memory)
However CMSampleBufferRef and accompanying CVPixelBufferRef is managed by CoreVideo is guess.
The fastest way to do this is to use a MTLTexture backed by a MTLBuffer; it is a special kind of MTLTexture that shares memory with a MTLBuffer. However, your C processing (openCV) will be running a frame or two behind, this is unavoidable as you need to submit the commands to the GPU (encoding) and the GPU needs to render it, if you use waitUntilCompleted to make sure the GPU is finished that just chews up the CPU and is wasteful.
So the process would be: first you create the MTLBuffer then you use the MTLBuffer method "newTextureWithDescriptor:offset:bytesPerRow:" to create the special MTLTexture. You need to create the special MTLTexture beforehand (as an instance variable), then you need to setup up a standard rendering pipeline (faster than using compute shaders) that will take the MTLTexture created from the CMSampleBufferRef and pass this into your special MTLTexture, in that pass you can downscale and do any colour conversion as necessary in one pass. Then you submit the command buffer to the gpu, in a subsequent pass you can just call [theMTLbuffer contents] to grab the pointer to the bytes that back your special MTLTexture for use in openCV.
Any technique that forces a halt in the CPU/GPU behaviour will never be efficient as half the time will be spent waiting i.e. the CPU waits for the GPU to finish and the GPU has to wait also for the next encodings (when the GPU is working you want the CPU to be encoding the next frame and doing any openCV work rather than waiting for the GPU to finish).
Also, when people normally refer to real-time processing they usually are referring to some processing with real-time feedback (visual), all modern iOS devices from the 4s and above have a 60Hz screen refresh rate, so any feedback presented faster than that is pointless but if you need 2 frames (at 120Hz) to make 1 (at 60Hz) then you have to have a custom timer or modify CADisplayLink.
I'm writing AR app that uses camera feed to take pictures positioned on certain places in the world. Now I came upon problem that I'm not sure what to do about.
I'm using CVOpenGLESTextureCacheRef to create textures from CMSampleBufferRef. The camera feed is being shown and it works perfectly. The problem occurs when I capture 12 photos and create textures from them. The way it works is that once I detect match with the target I create a texture like this:
CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBufferCopy);
size_t frameWidth = CVPixelBufferGetWidth(pixelBuffer);
size_t frameHeight = CVPixelBufferGetHeight(pixelBuffer);
CVOpenGLESTextureRef texture = NULL;
CVReturn err = CVOpenGLESTextureCacheCreateTextureFromImage(kCFAllocatorDefault,
cache,
pixelBuffer,
NULL,
GL_TEXTURE_2D,
GL_RGBA,
(GLsizei)frameWidth,
(GLsizei)frameHeight,
GL_BGRA,
GL_UNSIGNED_BYTE,
0,
&texture);
if (!texture || err) {
NSLog(#"CVOpenGLESTextureCacheCreateTextureFromImage failed (error: %d)", err);
return;
}
CVOpenGLESTextureCacheFlush(cache, 0);
The texture is then mapped to photo location in the world and is being rendered. I am not releasing texture here because I need it in the future. The texture used as the camera feed is obviously being released.
The issue appears when 12th photo is taken. The captureOutput:didOutputSampleBuffer:fromConnection: callback is not being called anymore. I understand it happens because the pool is full, like pointed out in documentation:
If your application is causing samples to be dropped by retaining the provided CMSampleBufferRef objects for too long, but it needs access to the sample data for a long period of time, consider copying the data into a new buffer and then releasing the sample buffer (if it was previously retained) so that the memory it references can be reused.
However I am not sure what to do. I tried using CMSampleBufferCreateCopy to create a copy of the buffer but it did not work because like documentation says, it creates a shallow copy.
How do I handle this in a most efficient way?
I'm a bit confused about FrameBuffers.
Currently, to draw on screen, I generate a framebuffer with a Renderbuffer for the GL_COLOR_ATTACHMENT0 using this code.
-(void)initializeBuffers{
//Build the main FrameBuffer
glGenFramebuffers(1, &frameBuffer);
glBindFramebuffer(GL_FRAMEBUFFER, frameBuffer);
//Build the color Buffer
glGenRenderbuffers(1, &colorBuffer);
glBindRenderbuffer(GL_RENDERBUFFER, colorBuffer);
//setup the color buffer with the EAGLLayer (it automatically defines width and height of the buffer)
[context renderbufferStorage:GL_RENDERBUFFER fromDrawable:EAGLLayer];
glGetRenderbufferParameteriv(GL_RENDERBUFFER, GL_RENDERBUFFER_WIDTH, &bufferWidth);
glGetRenderbufferParameteriv(GL_RENDERBUFFER, GL_RENDERBUFFER_HEIGHT, &bufferHeight);
//Attach the colorbuffer to the framebuffer
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, colorBuffer);
//Check the Framebuffer status
GLenum status = glCheckFramebufferStatus(GL_FRAMEBUFFER);
NSAssert(status == GL_FRAMEBUFFER_COMPLETE, ERROR_FRAMEBUFFER_FAIL);
}
And I show the buffer content using
[context presentRenderbuffer:GL_RENDERBUFFER];
Reading this question, I saw the comment of Arttu Peltonen who says:
Default framebuffer is where you render to by default, you don't have
to do anything to get that. Framebuffer objects are what you can
render to instead, and that's called "off-screen rendering" by some.
If you do that, you end up with your image in a texture instead of the
default framebuffer (that gets displayed on-screen). You can copy the
image from that texture to the default framebuffer (on-screen), that's
usually done with blitting (but it's only available in OpenGL ES 3.0).
But if you only wanted to show the image on-screen, you probably
wouldn't use a FBO in the first place.
So I wonder if my method is just to be used for off-screen rendering.
And in that case, what I have to do to render on the default buffer?!
(Note, I don't want to use a GLKView...)
The OpenGLÂ ES spec provides for two kinds of framebuffers: window-system-provided and framebuffer objects. The default framebuffer would be the window-system-provided kind. But the spec doesn't require that window-system-provided framebuffers or a default framebuffer exist.
In iOS, there are no window-system-provided framebuffers, and no default framebuffer -- all drawing is done with framebuffer objects. To render to the screen, you create a renderbuffer whose storage comes from a CAEAGLLayer object (or you use one that's created on your behalf, as when using the GLKView class). That's exactly what your code is doing.
To do offscreen rendering, you create a renderbuffer and call glRenderbufferStorage to allocate storage for it. Said storage is not associated with a CAEAGLLayer, so that renderbuffer can't be (directly) presented on the screen. (It's not a texture either -- setting up a texture as a render target works differently -- it's just an offscreen buffer.)
There's more information about all of this and example code for each approach in Apple's OpenGL ES Programming Guide for iOS.