I am writing a simple OOP wrapper around OpenGL ES. While writing the render- and framebuffer I have to bind the buffer in order to work with it:
- (void) setupSomething
{
…
glBindRenderbufferOES(GL_RENDERBUFFER_OES, myBufferID);
…
}
Now what if this setup code is called in a context where there’s already some other render buffer bound? My simple version mentioned above would have the nasty side effect of switching the current buffer, which sounds quite fragile. I figured I should write the code more defensively:
- (void) setupSomething
{
// Store current state
GLint previousRenderBuffer = 0;
glGetIntegerv(GL_RENDERBUFFER_BINDING_OES, &previousRenderBuffer);
// Do whatever I want to do
glBindRenderbufferOES(GL_RENDERBUFFER_OES, myBufferID);
…
// Restore previous state
glBindRenderbufferOES(GL_RENDERBUFFER_OES, previousRenderBuffer);
}
My questions are: is it really necessary/wise/customary to save the previous state like this, and if yes, is there some kind of glPushSomething that would do it for me?
Working with a graphics api like OpenGL it's usually a good idea to minimize the number of api calls. Some calls can be quite expensive - I'm not sure about glBindRenderBuffer though - it could be as cheap as just storing one int. But it could be some complex state switching operation. You'd better handle it youself - using your own 'stack' of buffers(there's not glPushAttrib or something like this for render buffers in OpenGL ES) or, better in my humble opinion, avoid those situations - always make sure you finish your work with render buffer you have bound before passing to another buffer.
Related
As the title says, I am having some trouble with AVAssetWriter and memory.
Some notes about my environment/requirements:
I am NOT using ARC, but if there is a way to simply use it and get it all working I'm all for it. My attempts have not made any difference though. And the environment I will be using this in requires memory to be minimised / released ASAP.
Objective-C is a requirement
Memory usage must be as low as possible, the 300mb it takes up now is unstable when testing on my device (iPhone X).
The code
This is the code used when taking the screenshots below https://gist.github.com/jontelang/8f01b895321d761cbb8cda9d7a5be3bd
The problem / items kept around in memory
Most of the things that seem to take up a lot of memory throughout the processing seems to be allocated in the beginning.
So at this point it doesn't seem to me that the issues are with my code. The code that I personally have control over seems to not be an issue, namely loading the images, creating the buffer, releasing it all seems to not be where the memory has a problem. For example if I mark in Instruments the majority of the time after the one above, the memory is stable and none of the memory is kept around.
The only reason for the persistent 5mb is that it is deallocated just after the marking period ends.
Now what?
I actually started writing this question with the focus being on wether my code was releasing things correctly or not, but now it seems like that is fine. So what are my options now?
Is there something I can configure within the current code to make the memory requirements smaller?
Is there simply something wrong with my setup of the writer/input?
Do I need to use a totally different way of making a video to be able to make this work?
A note on using CVPixelBufferPool
In the documentation of CVPixelBufferCreate Apple states:
If you need to create and release a number of pixel buffers, you should instead use a pixel buffer pool (see CVPixelBufferPool) for efficient reuse of pixel buffer memory.
I have tried with this as well, but I saw no changes in the memory usage. Changing the attributes for the pool didn't seem to have any effect as well, so there is a small possibility that I am not actually using it 100% properly, although from comparing to code online it seems like I am, at least. And the output file works.
The code for that, is here https://gist.github.com/jontelang/41a702d831afd9f9ceeb0f9f5365de03
And here is a slightly different version where I set up the pool in a slightly different way https://gist.github.com/jontelang/c0351337bd496a6c7e0c94293adf881f.
Update 1
So I looked a bit deeper into a trace, to figure out when/where the majority of the allocations are coming from. Here is an annotated image of that:
The takeaway is:
The space is not allocated "with" the AVAssetWriter
The 500mb that is held until the end is allocated within 500ms after the processing starts
It seems that it is done internally in AVAssetWriter
I have the .trace file uploaded here: https://www.dropbox.com/sh/f3tf0gw8gamu924/AAACrAbleYzbyeoCbC9FQLR6a?dl=0
When creating Dispatch Queue, ensure you create a queue with Autorlease Pool. Replace DISPATCH_QUEUE_SERIAL with DISPATCH_QUEUE_SERIAL_WITH_AUTORELEASE_POOL.
Wrap each iteration of for loop into autorelease pool as well
like this:
[assetWriterInput requestMediaDataWhenReadyOnQueue:recordingQueue usingBlock:^{
for (int i = 1; i < 200; ++i) {
#autoreleasepool {
while (![assetWriterInput isReadyForMoreMediaData]) {
[NSThread sleepForTimeInterval:0.01];
}
NSString *path = [NSString stringWithFormat:#"/Users/jontelang/Desktop/SnapperVideoDump/frames/frame_%i.jpg", i];
UIImage *image = [UIImage imageWithContentsOfFile:path];
CGImageRef ref = [image CGImage];
CVPixelBufferRef buffer = [self pixelBufferFromCGImage:ref pool:writerAdaptor.pixelBufferPool];
CMTime presentTime = CMTimeAdd(CMTimeMake(i, 60), CMTimeMake(1, 60));
[writerAdaptor appendPixelBuffer:buffer withPresentationTime:presentTime];
CVPixelBufferRelease(buffer);
}
}
[assetWriterInput markAsFinished];
[assetWriter finishWritingWithCompletionHandler:^{}];
}];
No, I see it is around 240 mb peaking in app. It's my first time using this allocation - interesting.
I'm using AssetWriter to write a video file by streaming cmSampleBuffer : CMSampleBuffer. It gets from AVCaptureVideoDataOutputSampleBufferDelegate by Camera CaptureOutput Realtime.
While I have not yet found the actual issue, the memory problem I described in this question was solved by simply doing it on the actual device instead of the simulator.
#Eugene_Dudnyk Answer is on spot, the auto release pool INSIDE the for or while loop is the key, here is how I got it working for Swift, also, please use AVAssetWriterInputPixelBufferAdaptor for pixel buffer pool:
videoInput.requestMediaDataWhenReady(on: videoInputQueue) { [weak self] in
while videoInput.isReadyForMoreMediaData {
autoreleasepool {
guard let sample = assetReaderVideoOutput.copyNextSampleBuffer(),
let buffer = CMSampleBufferGetImageBuffer(sample) else {
print("Error while processing video frames")
videoInput.markAsFinished()
DispatchQueue.main.async {
videoFinished = true
closeWriter()
}
return
}
// Process image and render back to buffer (in place operation, where ciProcessedImage is your processed new image)
self?.getCIContext().render(ciProcessedImage, to: buffer)
let timeStamp = CMSampleBufferGetPresentationTimeStamp(sample)
self?.adapter?.append(buffer, withPresentationTime: timeStamp)
}
}
}
My memory usage stopped rising.
What exactly should I expect to happen when using DiscardResource?
What's the difference between discard and destroying/deleting a resource.
When is a good time/use-case to discard a resource?
Unfortunately Microsoft doesn't seem to say much about it other than it "discards a resource".
TL;DR: Is a rarely used function that provides a driver hint related to handling clear compression structures. You are unlikely to use it except based on specific performance advice.
DiscardResource is the DirectX 12 version of the Direct3D 11.1 method. See Microsoft Docs
The primary use of these methods it to optimize the performance of tiled-based deferred rasterizer graphics parts by discarding the render target after present. This is a hint to the driver that the contents of the render target are no longer relevant to the operation of the program, so it can avoid some internal clearing operations on the next use.
For DirectX 11, this is in the DirectX 11 App template to use DiscardView because it makes use of DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL:
void DX::DeviceResources::Present()
{
// The first argument instructs DXGI to block until VSync, putting the application
// to sleep until the next VSync. This ensures we don't waste any cycles rendering
// frames that will never be displayed to the screen.
DXGI_PRESENT_PARAMETERS parameters = { 0 };
HRESULT hr = m_swapChain->Present1(1, 0, ¶meters);
// Discard the contents of the render target.
// This is a valid operation only when the existing contents will be entirely
// overwritten. If dirty or scroll rects are used, this call should be removed.
m_d3dContext->DiscardView1(m_d3dRenderTargetView.Get(), nullptr, 0);
// Discard the contents of the depth stencil.
m_d3dContext->DiscardView1(m_d3dDepthStencilView.Get(), nullptr, 0);
// If the device was removed either by a disconnection or a driver upgrade, we
// must recreate all device resources.
if (hr == DXGI_ERROR_DEVICE_REMOVED || hr == DXGI_ERROR_DEVICE_RESET)
{
HandleDeviceLost();
}
else
{
DX::ThrowIfFailed(hr);
}
}
The DirectX 12 App template doesn't need those explicit calls because it uses DXGI_SWAP_EFFECT_FLIP_DISCARD.
If you are wondering why the DirectX 11 app doesn't just use DXGI_SWAP_EFFECT_FLIP_DISCARD, it probably should. The DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL swap effect was the only one supported by Windows 8.x for Windows Store apps, which is when DiscardView was introduced. For Windows 10 / DirectX 12 / UWP, it's probably better to always use DXGI_SWAP_EFFECT_FLIP_DISCARD unless you specifically don't want the backbuffer discarded.
It is also useful for multi-GPU SLI / Crossfire configurations since the clearing operation can require synchronization between the GPUs. See this GDC 2015 talk
There are also other scenario-specific usages. For example, if doing deferred rendering for the G-buffer where you know every single pixel will be overwritten, you can use DiscardResource instead of doing ClearRenderTargetView / ClearDepthStencilView.
I've found that when I use textures above GL_TEXTURE18 on ios (tested on iOS 10), presentRenderbuffer triggers an EXC_BAD_ACCESS. Is there any reason for that? Can I not use textures up to GL_TEXTURE31
The GL_TEXTUREX are just some defined values, defined enumerations. In your case a GPU is the one that defines the actual number of supported textures and it is your responsibility to check what are these limitations.
You can get that by using glGet something like:
GLint max_combined_texture_image_units;
glGetIntegerv(GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS, &max_combined_texture_image_units);
Try this thread.
Do note that these defines/enumerations are here just to help you, it does not mean they are actually valid or supported. The openGL API is mostly designed by passing integer values typedef uint32_t GLenum; so as far as the API goes you may replace GL_TEXTURE0 with 1200 or any other value but you do need to ensure that the value is actually valid.
I'm currently tracking down some visual popping in my Metal app, and believe it is because I'm drawing directly to framebuffer, not a back-buffer
// this is when I've finished passing commands to the render buffer and issue the draw command. I believe this sends all the images directly to the framebuffer instead of using a backbuffer
[renderEncoder endEncoding];
[mtlCommandBuffer presentDrawable:frameDrawable];
[mtlCommandBuffer commit];
[mtlCommandBuffer release];
//[frameDrawable present]; // This line isn't needed (and I believe is performed by presentDrawable
Several googles later, I haven't found any documentation of back-buffers in metal. I know I could roll my own, but I can't believe metal doesn't support a back buffer.
Here is the code snippet of how I've setup my CAMetalLayer object.
+ (id)layerClass
{
return [CAMetalLayer class];
}
- (void)initCommon
{
self.opaque = YES;
self.backgroundColor = nil;
...
}
-(id <CAMetalDrawable>)getMetalLayer
{
id <CAMetalDrawable> frameDrawable;
while (!frameDrawable && !frameDrawable.texture)
{
frameDrawable = [self->_metalLayer nextDrawable];
}
return frameDrawable;
}
Can I enable a backbuffer on my CAMetalLayer object, or will I need to roll my own?
I assume by back-buffer, you mean a renderbuffer that is being rendered to, while the corresponding front-buffer is being displayed?
In Metal, the concept is provided by the drawables that you extract from CAMetalLayer. The CAMetalLayer instance maintains a small pool of drawables (generally 3), retrieves one of them from the pool each time you invoke nextDrawable, and returns it back to the pool after you've invoked presentDrawable and once rendering is complete (which may be some time later, since the GPU runs asynchronously from the CPU).
Effectively, on each frame loop, you grab a back-buffer by invoking nextDrawable, and make it eligible to become the front-buffer by invoking presentDrawable: and committing the MTLCommandBuffer.
Since there are only 3 drawables in the pool, the catch is that you have to manage this lifecycle yourself, by adding appropriate CPU resource synchronization at the time you invoke nextDrawable and in the callback you get once rendering is complete (as per the MTLCommandBuffer addCompletedHandler: callback set-up).
Typically you use a dispatch_semaphore_t for this:
_resource_semaphore = dispatch_semaphore_create(3);
then put the following just before you invoke nextDrawable:
dispatch_semaphore_wait(_resource_semaphore, DISPATCH_TIME_FOREVER);
and this in your addCompletedHandler: callback handler:
dispatch_semaphore_signal(_resource_semaphore);
Have a look at some of the simple Metal sample apps from Apple to see this in action. There is not a lot in terms of Apple documentation on this.
I want to do some work in my OpenGL ES project in concurrent GCD queues. Is it ok if to create EAGLContext for each thread? I'm going to do it with such way:
queue_ = dispatch_queue_create("test.queue", DISPATCH_QUEUE_CONCURRENT);
dispatch_async(queue_, ^{
NSMutableDictionary* threadDictionary = [[NSThread currentThread] threadDictionary];
EAGLContext* context = threadDictionary[#"context"];
if (!context) {
context = /* creating EAGLContext with sharegroup */;
threadDictionary[#"context"] = context;
}
if ([EAGLContext setCurrentContext:context]) {
// rendering
[EAGLContext setCurrentContext:nil];
}
});
If it is not correct what is the best practice to parallelize OpenGL rendering?
Not only is it okay, this is the only way you can share OpenGL resources between multiple threads. Note that shareable resources are typically limited to resources that allocate memory (e.g. buffer objects, textures, shaders). They do not include objects that merely store state (e.g. the global state machine, Framebuffer Objects or Vertex Array Objects). But if you are considering modifying data that you are using for rendering, I would strongly advise against this.
Whenever GL has a command in the pipeline that has not finished, any attempt to modify a resource used by that command will block until the command finishes. A better solution would be to double-buffer your resources, have a copy you use for rendering and a separate copy you use for updating. When you finish updating, the next time your drawing thread uses that resource, have it swap the buffers used for updating and drawing. This will reduce the amount of time the driver has to synchronize your worker threads with the drawing thread.
Now, if you are suggesting here that you want to draw from multiple threads, then you should re-think your strategy. OpenGL generally does not benefit from issuing draw commands from multiple threads, it just creates a synchronization nightmare. Multi-threading is useful mostly for controlling VSYNC on multiple windows (probably not something you will ever encounter in ES) or streaming resource data in the background.