GPU memory using CreateBitmapFromWicBitmap() - memory

When using CreateBitmapFromWicBitmap() I can see the GPU memory usage increase.
The code looks like:
hr = pRenderTarget->CreateBitmapFromWicBitmap(pImagePoolEntry->pConverter,
NULL, pImagePoolEntry->pPoolImage.GetAddressOf());
pPoolImage is defined as ComPtr<ID2D1Bitmap> pPoolImage;
The pPoolImage pointer hangs around for the life of using the image, when done and the pPoolImage is released I do not see the GPU memory usage decrease.
So is there a way to have the GPU release the image data? or should I not be worried. The only time I see the usage drop is when the application is terminated.
Note: I'm using GPU-Z to monitor the GPU memory usage.
Thanks

DirectX video memory destruction is deferred. You can explicitly use IDXGIDevice3::Trim, but generally it doesn't matter in real-world cases.
See MSDN

Related

iOS memory shared between CPU and GPU and what that means for reading

I have a MTLBuffer that is using memory that is allocated by the cpu and thus shared by both the cpu and the GPU.
Per Apple's suggestion I am using triple buffering to remove latency that might be caused by one processor waiting on the other to finish.
My vertex data changes every frame so every frame I am writing to one section of the array with the CPU and reading a different section with the GPU.
What I would like to do is read some of the values that the GPU is currently also reading as they save me some time doing calculations for the section of the buffer the CPU is writing to.
Essentially this is because the current frame's data is dependent on the previous frames data.
Is this valid? Can the CPU and the GPU be reading from the same portion of memory at once since memory is shared on iOS?
I think that's valid and safe, for two reasons. First, CPUs actually often have to read in order to write. Things like caches and memory buses don't allow for access to RAM at the granularity we usually think of (byte or even register size). In order to write, it usually has to read a larger chunk from memory, modify just the part written, and then (eventually) write the larger chunk back to memory. So, even the approach where you don't explicitly read from parts of the buffer that the GPU is reading and you only write to parts that the GPU isn't accessing can, in theory, still be implicitly reading from parts of the buffer that the GPU is reading. Since we're not given the info we'd need to reliably avoid that, I'd say it isn't considered a problem.
Second, no warning is given about what you describe in Apple's docs. There's the "Maintaining Coherency Between CPU and GPU Memory" section in the article about resource objects. That only discussed the case where either the CPU or GPU are modifying shared data, not where both are just reading.
Then there's the "Resource Storage Modes and Device Memory Models" section describing the new storage modes introduced with iOS 9 and macOS 10.11. And the docs for MTLResourceStorageModeShared itself. Again, there's mention of reading vs. writing, but none about reading vs. reading.
If there were a problem with simultaneous reading, I think Apple would have discussed it.

How does browser GPU memory usage works?

By pressing F12 and then Esc on Chrome, you can see a few options to tick. One of them is show FPS meter, which allows us to see GPU memory usage in real time.
I have a few questions regarding this GPU memory usage:
This GPU memory means the memory the webpage needs to store its code: variables, methods, images, cached videos, etc. Is this right to affirm?
Is there a reason as to why it has an upper bound of 512 Mb? Is there a way to reduce or increase it?
How much GPU memory usage is enough to see considerable slowdown on browser navigation?
If I have an array with millions of elements (just hypothetically), and I splice all the elements in the array, will it free the memory that was in use? Or will it not "really" free the memory, requiring an additional step to actually wipe it out?
1. What is stored in GPU memory
Although there are no hard-set rules on the type of data that can be stored in GPU-memory, the bulk of GPU memory generally contains single-frame resources like textures, multi-frame resources like vertex buffers and index buffer data, and programmable-shader compiled code fragments. So while in theory it is possible to store video's in GPU memory, as well as all kinds of other bulk data, in practice, for every streamed video only a bunch of frames will ever be in GPU-ram.
The main reason for this soft-selection of texture-like data sets is that a GPU is a parallel hardware architecture, and it expects the data to be compatible with that philosophy, which means that there are no inter-dependencies between sets of data (i.e. pixels). Decoding images from a video stream is more or less the same as resolving interdependence between data-blocks.
2. Is 512MB enough for everyone?
No. It's probably based on your hardware.
3. When does GPU memory become slow?
You have to know that some parts of the GPU memory are so fast you can't even start to appreciate the speed. There is nothing wrong with the speed of a GPU card. What matters is the time it takes to get the data IN that memory in the first place. That is called bandwidth, and the operations usually need to be synchronized. In that case, the driver will lock the Northbridge bus so that data can flow from main memory into GPU memory, and this locking + transfer takes quite some time.
So to answer the question, once it is uploaded, the GUI will remain fast, no matter how much more memory is used on the GPU card. The only thing that can slow it down, are changes to the GUI, and other GPU processes taking time to complete that may interfere with rendering operations.
4. Splicing ram memory frees it up?
I'm not quite sure what you mean by splicing. GPU memory is freed by applications that release that memory by using the API calls to do that. If you want to render you GPU memory blank, you'd have to grab the GPU handles of the resources first, upload 'clear' data into them, and then release the handles again, but (for normal single-threaded GPU applications) you can only do that in your own process context.

High virtual memory usage + low allocations on iOS

I have code that has a low amount of active allocations (about 5 MB according to Instruments), but a high amount of system memory usage (over 100 MB). I know the code is leak-free, and I'm not seeing any allocation spikes after some optimization, but I'm still crashing due to the high amount of memory usage.
I Googled around a lot and see that I'm supposed to be using the VM Tracker instrument, which confirms my high memory usage, but I'm not sure how to address this situation. I'm using as little memory as possible, it's still too much on an iPad 1, and I don't have the knowledge or tools to figure out how to get the OS to not mark so much memory as dirty when I'm not actually using it. Where do I go from here?
Use the Profile tool and select memory +allocations. Click the VM tracker and take a snapshots. This results in a list with resident dirty and virtual memory usage per object type. This will give you an indication where to look.
I think the most common problem is that you have a lot of autoreleased objects that reside in the autoreleasepool. The following link explains more on how to handle autoreleasepools:
How does the NSAutoreleasePool autorelease pool work?

What is the maximum size of the texture memory on a modern GPU?

We believe that texture memory is part of the global memory, is this true? If so, how much can you allocate? (Indirectly, how much is there?)
And is it true that all multiprocessors can read from the texture memory at the same time?
Texture data is contained in CUDA arrays, and CUDA arrays are allocated out of global memory; so however much global memory is still free (you can call cuMemGetInfo() to see how much free memory is left) is available for allocation as textures.
It's impossible to know how much memory is consumed by a given CUDA array - obviously it has to be at least Width*Height*Depth*sizeof(Texel), but it may take more because the driver has to do an allocation that conforms to the hardware's alignment requirements.
Texture limits for different compute capabilities can be found in the CUDA Programming Guide, available at the NVIDIA CUDA website.
For a given device, one can query device capabilities including texture limits using the cudaGetDeviceProperties function.
Allocation depends on the amount of available global memory and the segmentation of the memory, so there is no easy way to tell whether a given allocation will be successfull or not, especially when working with large textures.

How to use AQTime's memory allocation profiler in a program that uses a large amount of memory?

I'm finding AQTime hard to use because it interferes with the original program too much. If I have a program that uses, for example, 300MB of ram I can use AQTime's allocation profiler without a problem, and find out where most of the memory is being used. However I notice that running under AQTime, the original program uses more like 1GB while it's being profiled.
Right now I'm trying to reduce memory usage in a program which is using 1.4GB of memory. If I run it under AQTime, then the original program uses all of the 2GB address space and crashes. I can of course invent a smaller set of test data and estimate how the memory usage will scale with the full data set - but the reason I'm using a profiler in the first place is to try to avoid this sort of guesswork.
I already have AQTime set to 'Collect stack information - None' and all the check boxes to do with checking memory integrity are switched off, and I've tried restricting the area being profiled to just a few classes but this doesn't seem to improve anything. Is there a way to use AQTime that produces a smaller overhead? Or failing that, what other approaches are there to get a good idea of the memory being used?
The app is written in Delphi 2010 and I'm using AQTime 6.
NB: On top of the increased memory usage, running under AQTime slows the app down an awful lot, making the whole exercise not just impossible but impractical too :-P
AFAIK the allocation profiler will track memory block allocation regardless of profiling areas. Profiling areas are used to track classes instantiation. Of course memory-profiling an application that allocates a large amount of memory is a issue, you may try to use the LARGE_ADRESS_AWARE flag, and the /3GB boot switch, or use a 64 bit system (as long as you have at least 4GB of memory, or more). Also you can take snapshot of the application state before it crashes, to see where the memory is allocated. Profiling takes time, anyway, you may have to let it run for a while.

Resources