I have a model with lots of high quality textures and I try hard to keep the overall memory usage down. One of the things I tried, is to remove the mipmaps after they got pushed to the GPU, in order to releadse the texture data from common RAM. When doing so, the model is still rendered with the once uploaded mipmaps texture. So thats fine, but the memory doesnt drop at all.
material.mipmaps.length = 0;
So my question is:
Is there a reference to the mipmaps kept by ThreeJS, that the garbace collector can't release the memory. Or is the texture referenced by webGL itself, which seems kind of strange, as webGL lets me think textures are always used in dedicated memory and must therefore be copied. If webGL keeps a reference to the original texture in the RAM, does webGL, would behave different on a desktop with a dedicated graphic card and a laptop with an onboard graphic card sharing the common RAM.
Would be really glad if some one can explain me whats going on inside of threeJS/webGL for texture references.
That's a good question.
Let's go down there...
So normally you'd dispose() a texture when you want it to be kicked out of the VRAM.
Tracing what that does might bring us to an answer. So what does dispose do?
https://github.com/mrdoob/three.js/blob/2d59713328c421c3edfc3feda1b116af13140b94/src/textures/Texture.js#L103-L107
Alright, so it dispatches an event. Alright. Where's that handled?
https://github.com/mrdoob/three.js/blob/2d59713328c421c3edfc3feda1b116af13140b94/src/renderers/WebGLRenderer.js#L654-L665
Aha, so finally:
https://github.com/mrdoob/three.js/blob/2d59713328c421c3edfc3feda1b116af13140b94/src/renderers/WebGLRenderer.js#L834-L837
And that suggests that we're leaving THREE.js and enter the world of raw WebGL.
Digging a bit in the WebGL specs here (sections 3.7.1 / 3.7.2) and a tutorial on raw WebGL here and here shows that WebGL is keeping a reference in memory, but that isn't a public property of the THREE.js texture.
Now, why that goes into RAM and not the VRAM I don't know... did you test that on a machine with dedicated or shared GPU RAM?
Related
I am receiving YUV 420 CMSampleBuffers of the screen in my System Broadcast Extension, however when I attempt to access the underlying bytes, I get inconsistent results: artefacts that are a mixture of (it seems) past and future frames. I am accessing the bytes in order to rotate portrait frames a quarter turn to landscape, but the problem reduces to not being able to correctly copy the texture.
The pattern of artefacts can change quite a lot. They can be all over the place and seem to have a fundamental "brush shape" that is square tile, sometimes small, sometimes large, which seems to depend on the failing work around at hand. They can occur in both the luminance and chroma channels, which results in interesting effects. The "grain" of the artefacts sometimes appears to be horizontal, which I guess is vertical in the original frame.
I do have two functioning work arounds:
rotate the buffers using Metal
rotate the buffers using CoreImage (even a "software" CIContext works)
The reason that I can't yet ship these workarounds is that System Broadcast Extensions have a very low memory limit of 50MB and memory usage can spike with these two solutions, and there seem to be interactions with other parts of the system (e.g. the AVAssetWriter or the daemon that dumps frames into my address space). I'm still working to understand memory usage here.
The artefacts seem like a synchronisation problem. However I have a feeling that this is not so much a new frame being written into the buffer that I'm looking at, but rather some sort of stale cache. CPU or GPU? Do GPUs have caches? The tiled nature of the artefacts reminds me of iOS GPUs, but that with a grain of salt (not a hardware person).
This brings me around to the question title. If this is a caching problem, and Metal / CoreImage has a consistent view of the pixels, maybe I can get Metal to flush the data I want for me, because an BGRA screen capture being converted to YUV IOSurface has Metal shader written all over it.
So I took the incoming CMSampleBuffer's CVPixelBuffer's IOSurface and created an MTLTexture from it (with all sorts of cacheModes and storageModes, haven't tried hazardTrackingModes yet) and then copied the bytes out with MTLTexture.getBytes(bytesPerRow:from:mipmapLevel:).
Yet the problem persists. I would really like to make the CPU deep copy approach work, for memory reasons.
To head off some questions:
it's not a bytes-per-row issue, that would slant the images
in the cpu case I do lock the CVPixelBuffer's base address
I even lock the the underlying IOSurface
I have tried discarding IOSurfaces whose lock seed changes under lock
I do discard frames when necessary
I have tried putting random memory fences and mutexes all over the place (not a hardware person)
I have not disassembled CoreImage yet
This question is the continuation of one a posted on the Apple Developer Forums
Art by https://twitter.com/artofzara
As far as I know from Cocos2D 2.0 a 1025*1025 texture does NOT use 4 times more memory than a 1024*1024 texture, just proportionally more.
If I put my textures to an atlas, there is some unused space almost all the time. This is wasted. (Not to mention the iOS5 POT textures memory bug, which makes POT Texture Atlases waste 33% more memory.) But If I just use my textures the way they are, then there is no memory wasted. The only advantage of Texture Atlases in my opinion is the ability to use a SpriteBatchNode.
But my app is heavily memory limited, and I only support devices which support NPOT textures. I know that NPOT texture handling is a bit slower, but saving memory is the most important for me.
I might be wrong, please confirm me, or show me why I am wrong. Thank you! :)
You should design for the worst case. Assume the bug always exists, and design your app's memory usage accordingly. There's no telling whether the bug will go away, reappear or an even worse bug introduced with a newer iOS version.
Riding on the brink of full memory usage is not a good idea, you always have to leave a threshold to allow for the occasional oddity. A new iOS version might introduce another bug, take more memory, the user might have apps running in background that use up more memory, there may be a tiny memory leak adding up over time, etc.
Also, CCSpriteBatchNode can be used with any texture, not just texture atlases.
I hear a lot that power of 2 textures are better for performance reasons, but I couldn't find enough solid information about if it's a problem when using XNA. Most of my textures have random dimensions and I don't see much of a problem, but maybe VS profiler doesn't show that.
In general, pow 2 textures are better. But most graphics cards should allow non pow 2 textures with a minimal loss of performance. However, if you use XNA reach profile, only pow 2 textures are allowed. And some small graphics cards only support the reach profile.
XNA is really a layer built on top of DirectX. So any performance guidelines that goes for that will also apply for anything using XNA.
The VS profiler also won't really apply to the graphics specific things you are doing. That will need to be profiled separately by some tool that can check how the graphic card itself is doing. If the graphics card is struggling it won't show up as a high resource usage on your CPU, but rather as a slow rendering speed.
Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL
I am learning OpenGL and recently discovered about glGenTextures.
Although several sites explain what it does, I feel forced to wonder how it behaves in terms of speed and, particularly, memory.
Exactly what should I consider when calling glGenTextures? Should I consider unloading and reloading textures for better speed? How many textures should a standard game need? What workarounds are there to get around any limitations memory and speed may bring?
According to the manual, glGenTextures only allocates texture "names" (eg ids) with no "dimensionality". So you are not actually allocating texture memory as such, and the overhead here is negligible compared to actual texture memory allocation.
glTexImage will actually control the amount of texture memory used per texture. Your application's best usage of texture memory will depend on many factors: including the maximum working set of textures used per frame, the available dedicated texture memory of the hardware, and the bandwidth of texture memory.
As for your question about a typical game - what sort of game are you creating? Console games are starting to fill blu-ray disk capacity (I've worked on a PS3 title that was initially not projected to fit on blu-ray). A large portion of this space is textures. On the other hand, downloadable web games are much more constrained.
Essentially, you need to work with reasonable game design and come up with an estimate of:
1. The total textures used by your game.
2. The maximum textures used at any one time.
Then you need to look at your target hardware and decide how to make it all fit.
Here's a link to an old Game Developer article that should get you started:
http://number-none.com/blow/papers/implementing_a_texture_caching_system.pdf