Proper way to free GPU memory when using directX? - memory

I've been working through a handful of directX tutorials. They all mention D3DXCreateTextureFromFileEx
for loading an image from a file into video memory. However, nobody (that I've seen), talks about how to free up that memory. Can I just call free() on the pointer that is returned?

Like almost all of the DirectX API, when you create a new object, it's returned as a pointer to a COM interface. In your case you get a pointer to an IDirect3DTexture9 interface.
When you want to release these resources, you use the normal COM method for disposing of reference-counted interfaces, by calling Release() on the pointer:
IDirect3DDevice9 *pTexture = NULL;
D3DXCreateTextureFromFileEx( /* ... */, &pTexture);
// ... use the texture .....
// release interface
pTexture->Release();
pexture = NULL;

Related

vkGetMemoryFdKHR is return the same fd?

In WIN32:
I'm sure that if the handle is the same, the memory may not be the same, and the same handle will be returned no matter how many times getMemoryWin32HandleKHR is executed.
This is consistent with vulkan's official explanation: Vulkan shares memory.
It doesn't seem to work properly in Linux.
In my program,
getMemoryWin32HandleKHR works normally and can return a different handle for each different memory.
The same memory returns the same handle.
But in getMemoryFdKHR, different memories return the same fd.
Or the same memory executes getMemoryFdKHR twice, it can return two different handles.
This causes me to fail the device memory allocation during subsequent imports.
I don't understand why this is?
Thanks!
#ifdef WIN32
texGl.handle = device.getMemoryWin32HandleKHR({ info.memory, vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32 });
#else
VkDeviceMemory memory=VkDeviceMemory(info.memory);
int file_descriptor=-1;
VkMemoryGetFdInfoKHR get_fd_info{
VK_STRUCTURE_TYPE_MEMORY_GET_FD_INFO_KHR, nullptr, memory,
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT
};
VkResult result= vkGetMemoryFdKHR(device,&get_fd_info,&file_descriptor);
assert(result==VK_SUCCESS);
texGl.handle=file_descriptor;
// texGl.handle = device.getMemoryFdKHR({ info.memory, vk::ExternalMemoryHandleTypeFlagBits::eOpaqueFd });
Win32 is nomal.
Linux is bad.
It will return VK_ERROR_OUT_OF_DEVICE_MEMORY.
#ifdef _WIN32
VkImportMemoryWin32HandleInfoKHR import_allocate_info{
VK_STRUCTURE_TYPE_IMPORT_MEMORY_WIN32_HANDLE_INFO_KHR, nullptr,
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_BIT, sharedHandle, nullptr };
#elif __linux__
VkImportMemoryFdInfoKHR import_allocate_info{
VK_STRUCTURE_TYPE_IMPORT_MEMORY_FD_INFO_KHR, nullptr,
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT,
sharedHandle};
#endif
VkMemoryAllocateInfo allocate_info{
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // sType
&import_allocate_info, // pNext
aligned_data_size_, // allocationSize
memory_index };
VkDeviceMemory device_memory=VK_NULL_HANDLE;
VkResult result = vkAllocateMemory(m_device, &allocate_info, nullptr, &device_memory);
NVVK_CHECK(result);
I think it has something to do with fd.
In my some test: if I try to get fd twice. use the next fd that vkAllocateMemory is work current......but I think is error .
The fd obtained in this way is different from the previous one.
Because each acquisition will be a different fd.
This makes it impossible for me to distinguish, and the following fd does vkAllocateMemory.
Still get an error.
So this test cannot be used.
I still think it should have the same process as win32. When the fd is obtained for the first time, vkAllocateMemory can be performed correctly.
thanks very much!
The Vulkan specifications for the Win32 handle and POSIX file descriptor interfaces explicitly state different things about their importing behavior.
For HANDLEs:
Importing memory object payloads from Windows handles does not transfer ownership of the handle to the Vulkan implementation. For handle types defined as NT handles, the application must release handle ownership using the CloseHandle system call when the handle is no longer needed.
For FDs:
Importing memory from a file descriptor transfers ownership of the file descriptor from the application to the Vulkan implementation. The application must not perform any operations on the file descriptor after a successful import.
So HANDLE importation leaves the HANLDE in a valid state, still referencing the memory object. File descriptor importation claims ownership of the FD, leaving it in a place where you cannot use it.
What this means is that the FD may have been released by the internal implementation. If that is the case, later calls to create a new FD may use the same FD index as a previous call.
The safest way to use both of these APIs is to have the Win32 version emulate the functionality of the FD version. Don't try to do any kinds of comparisons of handles. If you need some kind of comparison logic, then you'll have to implement it yourself. When you import a HANDLE, close it immediately afterwards.

What is ID3D12GraphicsCommandList::DiscardResource?

What exactly should I expect to happen when using DiscardResource?
What's the difference between discard and destroying/deleting a resource.
When is a good time/use-case to discard a resource?
Unfortunately Microsoft doesn't seem to say much about it other than it "discards a resource".
TL;DR: Is a rarely used function that provides a driver hint related to handling clear compression structures. You are unlikely to use it except based on specific performance advice.
DiscardResource is the DirectX 12 version of the Direct3D 11.1 method. See Microsoft Docs
The primary use of these methods it to optimize the performance of tiled-based deferred rasterizer graphics parts by discarding the render target after present. This is a hint to the driver that the contents of the render target are no longer relevant to the operation of the program, so it can avoid some internal clearing operations on the next use.
For DirectX 11, this is in the DirectX 11 App template to use DiscardView because it makes use of DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL:
void DX::DeviceResources::Present()
{
// The first argument instructs DXGI to block until VSync, putting the application
// to sleep until the next VSync. This ensures we don't waste any cycles rendering
// frames that will never be displayed to the screen.
DXGI_PRESENT_PARAMETERS parameters = { 0 };
HRESULT hr = m_swapChain->Present1(1, 0, &parameters);
// Discard the contents of the render target.
// This is a valid operation only when the existing contents will be entirely
// overwritten. If dirty or scroll rects are used, this call should be removed.
m_d3dContext->DiscardView1(m_d3dRenderTargetView.Get(), nullptr, 0);
// Discard the contents of the depth stencil.
m_d3dContext->DiscardView1(m_d3dDepthStencilView.Get(), nullptr, 0);
// If the device was removed either by a disconnection or a driver upgrade, we
// must recreate all device resources.
if (hr == DXGI_ERROR_DEVICE_REMOVED || hr == DXGI_ERROR_DEVICE_RESET)
{
HandleDeviceLost();
}
else
{
DX::ThrowIfFailed(hr);
}
}
The DirectX 12 App template doesn't need those explicit calls because it uses DXGI_SWAP_EFFECT_FLIP_DISCARD.
If you are wondering why the DirectX 11 app doesn't just use DXGI_SWAP_EFFECT_FLIP_DISCARD, it probably should. The DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL swap effect was the only one supported by Windows 8.x for Windows Store apps, which is when DiscardView was introduced. For Windows 10 / DirectX 12 / UWP, it's probably better to always use DXGI_SWAP_EFFECT_FLIP_DISCARD unless you specifically don't want the backbuffer discarded.
It is also useful for multi-GPU SLI / Crossfire configurations since the clearing operation can require synchronization between the GPUs. See this GDC 2015 talk
There are also other scenario-specific usages. For example, if doing deferred rendering for the G-buffer where you know every single pixel will be overwritten, you can use DiscardResource instead of doing ClearRenderTargetView / ClearDepthStencilView.

What is Semantical Memory Leak?

I understand the definition of a Memory Leak but couldn't find anything referred to Semantical or Semantic Memory Leak and the differences to memory leak.
Example for a memory leak:
#include <stdlib.h>
void function_which_allocates(void) {
/* allocate an array of 45 floats */
float *a = malloc(sizeof(float) * 45);
/* additional code making use of 'a' */
/* return to main, having forgotten to free the memory we malloc'd */
}
int main(void) {
function_which_allocates();
/* the pointer 'a' no longer exists, and therefore cannot be freed,
but the memory is still allocated. a leak has occurred. */
}
Definition (Semantical garbage)
A variable which the program will never use again, but
still keeps a reference to it, is called semantic garbage.
In other words, imagine allocating an array in your main program and using it only in the first few lines, and after you're done not freeing it. basically the major difference between a semantic memory leak and a memory leak is that, in a memory leak you have no reference to the unfreed array, however in a semantic leak you actually have a reference to it although you're no longer using it.
Definition (Semantical garbage)
A variable which the program will never use again, but
still keeps a reference to it, is called semantic garbage.
class Huge {
Huge() { // Constructor:
// Allocates lots of data and stores
// it in the newly created object
}
}
void f() {
Huge semanticGarbage = new Huge();
heavy.computation(new Indeed(100));
System.exit(1);
}
All sophisticated GC algorithms contend in vain against semantic garbage.
Reference:
Technion CS 234319: Programming Languages Course
(Lecture) Chapter 5 Storage 5.5 Automatic memory management - Semantical memory leak

Force garbage collection of JavaScriptCore virtual machine on iOS

Is there any way to force iOS (or Mac OS) JavaScriptCore VM garbage collector to run? I need it only to test for memory leaks, so private API would be fine.
Use following function from JSBase.h:
/*!
#function JSGarbageCollect
#abstract Performs a JavaScript garbage collection.
#param ctx The execution context to use.
#discussion JavaScript values that are on the machine stack, in a register,
protected by JSValueProtect, set as the global object of an execution context,
or reachable from any such value will not be collected.
During JavaScript execution, you are not required to call this function; the
JavaScript engine will garbage collect as needed. JavaScript values created
within a context group are automatically destroyed when the last reference
to the context group is released.
*/
JS_EXPORT void JSGarbageCollect(JSContextRef ctx);
You can obtain JSContextRef from your JSContext, using JSGlobalContextRef readonly property.
Update
I've found next change in WebKit - bugs.webkit.org/show_bug.cgi?id=84476:
JSGarbageCollect should not synchronously call collectAllGarbage().
Instead, it should notify the GCActivityCallback that it has abandoned
an object graph, which will set the timer to run at some point in the
future that JSC can decide according to its own policies.
This explains why previous solution doesn't work as expected.
Digging deeper into WebKit sources, I found another interesting approach. Please, try following code:
JS_EXPORT void JSSynchronousGarbageCollectForDebugging(JSContextRef ctx);
#interface JSContext (GarbageCollection)
-(void)garbageCollect {
JSSynchronousGarbageCollectForDebugging(self.JSGlobalContextRef);
}
#end
Simply call garbageCollect method on your JSContext instance after that. I've tried it locally on iOS and it seems to work.

Is there a CUDA smart pointer?

If not, what is the standard way to free up cudaMalloced memory when an exception is thrown? (Note that I am unable to use Thrust.)
You can use RAII idiom and put your cudaMalloc() and cudaFree() calls to the constructor and destructor of your object respectively.
Once the exception is thrown your destructor will be called which will free the allocated memory.
If you wrap this object into a smart-pointer (or make it behave like a pointer) you will get your CUDA smart-pointer.
You can use this custom cuda::shared_ptr implementation. As mentioned above, this implementation uses std::shared_ptr as a wrapper for CUDA device memory.
Usage Example:
std::shared_ptr<T[]> data_host = std::shared_ptr<T[]>(new T[n]);
.
.
.
// In host code:
fun::cuda::shared_ptr<T> data_dev;
data_dev->upload(data_host.get(), n);
// In .cu file:
// data_dev.data() points to device memory which contains data_host;
This repository is indeed a single header file (cudasharedptr.h), so it will be easy to manipulate it if is necessary for your application.

Resources