Simple question that I can't find an answer to, On openGL there is a glDeleteTextures(1, &t) Obviously the model is quite drastically different but i'm wondering if Metal has the same need or requirement.
Are MTLTexture's just released via reference counting or is there another way to release the texture and let the GPU clear it out of its caches as well?
MTLTexture lifetimes are managed exactly like other objects: they are dealloc'ed when their refcount reaches 0 (whether that happens automatically with ARC/Swift or MRR under Obj-C). Objects that use textures, like command buffers, may retain textures while they are in use, affecting their lifetime, but most applications can happily ignore this.
Related
So I am reading the vulkan book now and got a problem about the push Constant and ubo update.
After I set up all the pipeline and descriptor stuff. Basically I just need the copy the buffer to the UBO buffer such as memcpy then I am done.
Basically I can understand the issue about the whole pipeline needs to wait for this "buffer" ready then change it's content. So it will be slow.
On the other hand, when I use push constant, there is no such a problem. Although it's small (say 256 bytes big).
So far so good.
However, on the second thought, I find that if I am updating the UBO, I don't need to change the command buffer, or re-record it, I can submit the old CB since it's still the same.
Then if I want to update by using Push Constant, I have to reset the CB and record it again then submit it.
So won't this be an issue? How to make sure which one is faster?
Thanks.
Lots of people get confused on this issue, because the Vulkan Tutorial pre-records commands and Vulkan Guide re-record commands every frame.
When people say to use push constants for per-frame changing data like transform matrices and time data, there's the implicit assumption that you are recording command buffer per frame. Push constants essentially hitch a ride with the rest of your commands when submitted, which is also how they avoid synchronization and cache flushing to operate.
Now, in a lot of scenarios, re-recording command buffers can be easier and not significantly more costly than re-use. And indeed, re-using command buffers when things change can be a real pain to manage. Command buffers are meant to be fast to record. Still, the Vulkan tutorial went with pre-recording everything, which is also a valid approach though potentially harder to maintain at scale.
At the time the tutorial was created, the Vulkan Tutorial was essentially one of the only resources available to learn vulkan in a structured manner. Even though command buffers are quick to record, pre-recording command buffers eliminates even more CPU overhead and exemplifies Vulkan's "Never be draw call limited again" mantra to eliminating CPU overhead in graphics applications.
As for the speed comparison, you'll have to benchmark, but I would not necessarily choose one or the other for "speed" reasons. If you pre-record, you don't want to re-orient your entire rendering architecture just to take advantage of push constants. If you don't pre-record, there's no reason not to use push constants, they are just straight up easier to deal with.
It seems like currently you are pre-recording. I would not bother with push constants at all for this kind of data. I would also not focus on these kinds of issues until you get more familiar with vulkan, as it is very easy to get caught in the weeds with optimization in vulkan, strategies for optimization are no where near as uniform as in the CPU space.
Hi I am using PIXI and I am destroying textures whenever necessary. I am seeing that in the chrome task manager, the image cache and other resources are consuming very less memory. My only problem is with the new screens and textures getting loaded, the GPU memory increases (Like 5-10 mb on every activity). I am using texture.destroy(true) to destroy resources. I am also destroying base textures. I am also logging PIXI.utils.TextureCache and PIXI.utils.BaseTextureCache and I am seeing that the number of objects in these are bare minimum required for my app.
I am also making use of PIXI Animated Sprite. If it consumes any extra resources which I should be worried about, please let me know.
I am not sure where the webgl memory is increasing. I am using webgl because I make extensive usage of filters which is not possible using canvas renderer. Can any one help on how do I debug the memory usage and delete unnecessary textures from webgl. I am running on a tight timeline, so any help is very much appreciated.
i'm trying to program with opencl.
There are two types of memory object.
one is buffer and another one is image.
some blogs and web site,white papers say 'image object is little bit faster that buffer because of cache'.
i'm trying to use image object and the reason for that is 'clamp', it will make kernel code more simpler and faster(my opinion)
my question is 'is it possible to use image object and local memory and is it faster(than using buffer object with local memory)?"
Data-> image object-> copy to local memory -> operations -> write back to other image object.
As far as i understood, i cannot use async_work_group_copy instruction for local memory in this case.
so i have to copy and synchronize manually for local memory. it will make overhead a lot.
The only real answer to that is "it depends". Most implementations don't really have a value in doing async_work_group_copy. Image reads may be slightly higher latency than buffer reads when there is a cache hit, but you may get better cache behaviour from them on some architectures. Clamping, address calculation and filtering are effectively free operations performed by dedicated hardware, that you'd have to shift into shader code when using buffers, so that reduces your read latency and may increase throughput.
If you are going to get big caching benefits from images, local memory may just get in the way. The extra cost of writing to it, synchronizing, reading from it, calculating addresses and so on may cost you.
Sadly this is just one of those things you'll have to experiment with on your target architectures.
Hardware: iPad2
Sofware: OpenGL ES 2.0 C++
glDrawElements seems to take up about 25% of the cpu. Making the CPU 18ms and the GPU 10ms per frame.
When I don't use an index buffer and use glDrawArrays, it speeds up and glDrawArrays barley shows up on the profiler. Everything else is the same, glDrawArrays has more verts because I have to duplicate verts in the VBO without the index buffer.
so far:
virtually the same amount of state changes between the two methods
vertex structure is two floats(8 bytes).
indexbuffer is 16bit(tried 32bit as well)
GL_SATIC_DRAW for both buffers
buffers don't change after load
the same VBO and the indexbuffer render multiple times per frame, with different offsets and sizes
no opengl errors
So it looks like it's doing a software fallback of some sort. But I can't figure out what would cause OpenGL to fallback.
There are a few things that immediately jump to mind that might affect speed the way you describe.
For one, many commands are issued passively to reduce the number of bus transfers. They are queued up and wait for the next batch transfer. State changes, texture changes, and similar commands all accumulate. It is possible that the the draw commands are triggering a larger transfer in the one case but not in the other, or that you are triggering more frequent transfers in the one case or the other. For another, your specific models might be better organized for one or the other draw calls. You need to look at how big they are, if they reuse index values, and if they are optimized or reordered for rendering. glDrawArrays may require more data to be transferred, but if your models are small the overhead may not be much of a concern. Draw frequency becomes important since you want to queue off calls frequently to keep the card busy and let your CPU do other work, you don't want it to just accumulate in the command buffer waiting to be sent, but it needs to be balanced since there is a cost with those transfers. And to top it off, frequently indexed values can benefit from cache effects when they are frequently reused, but linearly accessed arrays can benefit from cache effects when they are accessed linearly, so you need to know your data since different types of data benefit from different methods.
Even Apple seems to be unsure which method to use.
Up until iOS7 the OpenGL ES Programming Guide for IOS for that version and earlier wrote:
For best performance, your models should be submitted as a single unindexed triangle strip using glDrawArrays with as few duplicated vertices as possible. If your models require many vertices to be duplicated (...), you may obtain better performance using a separate index buffer and calling glDrawElements instead. ... For best results, test your models using both indexed and unindexed triangle strips, and use the one that performs the fastest.
But their updated OpenGL ES Programming Guide for iOS that applies to iOS8 offers the opposite:
For best performance, your models should be submitted as a single indexed triangle strip. To avoid specifying data for the same vertex multiple times in the vertex buffer, use a separate index buffer and draw the triangle strip using the glDrawElements function
It looks like in your case you have just tried both, and found that one method is better suited for your data.
When developing an OpenGL program, is there a way to poll from the system to find out just how many megabytes are available to store textures, etc?
Or is the standard approach these days just allocate memory and forget about everything?
Although the official stance remains "you don't need to know, you don't want to know, and it would not help you anyway", luckily at least two IHVs have shown a little more insight lately and offer extensions to query that information:
NVX_gpu_memory_info
ATI_meminfo
One nice thing about these extensions is that they have a least common denominator which is just what most people need, and you don't need to query extension support or do anything special, as they both work via glGetIntegerv.
In the easiest case, you can just initialize an array of 4 integers to zero (or some minimum default value that you'll assume in case the extensions don't work), then you call glGetIntegerv twice (with GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX and TEXTURE_FREE_MEMORY_ATI, respectively), and finally call glGetError to clear the error state. glGetIntegerv does not modify the pointed-to memory if it fails, nor does it crash or any other bad thing -- it merely sets the error state to GL_INVALID_ENUM.
Both extensions return a value in the first array position, the ATI one returns some values in the other 3 too.
n.b.: glAreTexturesResident has not been supported for almost a decade on mainstream hardware, in the same manner as texture priorities. The common mantra is that the driver writer knows much better than you anyway.
OpenGL doesn't give you this information. And frankly: There's only little benefit, simply because today we have multitasking operating systems. The OpenGL driver is responsible for swapping in texture data to/from system memory, if there's demand for it.
What OpenGL can do for you, is tell, if the textures you've uploaded are still resident in fast memory. The function is called "glAreTexturesResident". You can use this to gradually upload stuff to the GPU until you've filled up the GPU's memory. But keep in mind that you're not the only user of the GPU.