vkGetMemoryFdKHR is return the same fd? - memory

In WIN32:
I'm sure that if the handle is the same, the memory may not be the same, and the same handle will be returned no matter how many times getMemoryWin32HandleKHR is executed.
This is consistent with vulkan's official explanation: Vulkan shares memory.
It doesn't seem to work properly in Linux.
In my program,
getMemoryWin32HandleKHR works normally and can return a different handle for each different memory.
The same memory returns the same handle.
But in getMemoryFdKHR, different memories return the same fd.
Or the same memory executes getMemoryFdKHR twice, it can return two different handles.
This causes me to fail the device memory allocation during subsequent imports.
I don't understand why this is?
Thanks!
#ifdef WIN32
texGl.handle = device.getMemoryWin32HandleKHR({ info.memory, vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32 });
#else
VkDeviceMemory memory=VkDeviceMemory(info.memory);
int file_descriptor=-1;
VkMemoryGetFdInfoKHR get_fd_info{
VK_STRUCTURE_TYPE_MEMORY_GET_FD_INFO_KHR, nullptr, memory,
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT
};
VkResult result= vkGetMemoryFdKHR(device,&get_fd_info,&file_descriptor);
assert(result==VK_SUCCESS);
texGl.handle=file_descriptor;
// texGl.handle = device.getMemoryFdKHR({ info.memory, vk::ExternalMemoryHandleTypeFlagBits::eOpaqueFd });
Win32 is nomal.
Linux is bad.
It will return VK_ERROR_OUT_OF_DEVICE_MEMORY.
#ifdef _WIN32
VkImportMemoryWin32HandleInfoKHR import_allocate_info{
VK_STRUCTURE_TYPE_IMPORT_MEMORY_WIN32_HANDLE_INFO_KHR, nullptr,
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_BIT, sharedHandle, nullptr };
#elif __linux__
VkImportMemoryFdInfoKHR import_allocate_info{
VK_STRUCTURE_TYPE_IMPORT_MEMORY_FD_INFO_KHR, nullptr,
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT,
sharedHandle};
#endif
VkMemoryAllocateInfo allocate_info{
VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, // sType
&import_allocate_info, // pNext
aligned_data_size_, // allocationSize
memory_index };
VkDeviceMemory device_memory=VK_NULL_HANDLE;
VkResult result = vkAllocateMemory(m_device, &allocate_info, nullptr, &device_memory);
NVVK_CHECK(result);
I think it has something to do with fd.
In my some test: if I try to get fd twice. use the next fd that vkAllocateMemory is work current......but I think is error .
The fd obtained in this way is different from the previous one.
Because each acquisition will be a different fd.
This makes it impossible for me to distinguish, and the following fd does vkAllocateMemory.
Still get an error.
So this test cannot be used.
I still think it should have the same process as win32. When the fd is obtained for the first time, vkAllocateMemory can be performed correctly.
thanks very much!

The Vulkan specifications for the Win32 handle and POSIX file descriptor interfaces explicitly state different things about their importing behavior.
For HANDLEs:
Importing memory object payloads from Windows handles does not transfer ownership of the handle to the Vulkan implementation. For handle types defined as NT handles, the application must release handle ownership using the CloseHandle system call when the handle is no longer needed.
For FDs:
Importing memory from a file descriptor transfers ownership of the file descriptor from the application to the Vulkan implementation. The application must not perform any operations on the file descriptor after a successful import.
So HANDLE importation leaves the HANLDE in a valid state, still referencing the memory object. File descriptor importation claims ownership of the FD, leaving it in a place where you cannot use it.
What this means is that the FD may have been released by the internal implementation. If that is the case, later calls to create a new FD may use the same FD index as a previous call.
The safest way to use both of these APIs is to have the Win32 version emulate the functionality of the FD version. Don't try to do any kinds of comparisons of handles. If you need some kind of comparison logic, then you'll have to implement it yourself. When you import a HANDLE, close it immediately afterwards.

Related

How to pass native void pointers to a Dart Isolate - without copying?

I am working on exposing an audio library (C library) for Dart. To trigger the audio engine, it requires a few initializations steps (non blocking for UI), then audio processing is triggered with a perform function, which is blocking (audio processing is a heavy task). That is why I came to read about Dart isolates.
My first thought was that I only needed to call the performance method in the isolate, but it doesn't seem possible, since the perform function takes the engine state as first argument - this engine state is an opaque pointer ( Pointer in dart:ffi ). When trying to pass engine state to a new isolate with compute function, Dart VM returns an error - it cannot pass C pointers to an isolate.
I could not find a way to pass this data to the isolate, I assume this is due to the separate memory of main isolate and the one I'm creating.
So, I should probably manage the entire engine state in the isolate which means :
Create the engine state
Initialize it with some options (strings)
trigger the perform function
control audio at runtime
I couldn't find any example on how to perform this actions in the isolate, but triggered from main thread/isolate. Neither on how to manage isolate memory (keep the engine state, and use it). Of course I could do
Here is a non-isolated example of what I want to do :
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parametersString);
startEngine(engineState);
perform(engineState);
And at runtime, triggered by UI actions (like slider value changed, or button clicked) :
setEngineControl(engineState, valueToSet);
double controleValue = getEngineControl(engineState);
The engine state could be encapsulated in a class, I don't think it really matters here.
Whether it is a class or an opaque datatype, I can't find how to manage and keep this state, and perform triggers from main thread (processed in isolate). Any idea ?
In advance, thanks.
PS: I notice, while writing, that my question/explaination may not be precise, I have to say I'm a bit lost here, since I never used Dart Isolates. Please tell me if some information is missing.
EDIT April 24th :
It seems to be working with creating and managing object state inside the Isolate. But the main problem isn't solved. Because the perform method is actually blocking while it is not completed, there is no way to still receive messages in the isolate.
An option I thought first was to use the performBlock method, which only performs a block of audio samples. Like this :
while(performBlock(engineState)) {
// listen messages, and do something
}
But this doesn't seem to work, process is still blocked until audio performance finishes. Even if this loop is called in an async method in the isolate, it blocks, and no message are read.
I now think about the possibility to pass the Pointer<Void> managed in main isolate to another, that would then be the worker (for perform method only), and then be able to trigger some control methods from main isolate.
The isolate Dart package provides a registry sub library to manage some shared memory. But it is still impossible to pass void pointer between isolates.
[ERROR:flutter/lib/ui/ui_dart_state.cc(157)] Unhandled Exception: Invalid argument(s): Native objects (from dart:ffi) such as Pointers and Structs cannot be passed between isolates.
Has anyone already met this kind of situation ?
It is possible to get an address which this Pointer points to as a number and construct a new Pointer from this address (see Pointer.address and Pointer.fromAddress()). Since numbers can freely be passed between isolates, this can be used to pass native pointers between them.
In your case that could be done, for example, like this (I used Flutter's compute to make the example a bit simpler but that would apparently work with explicitly using Send/ReceivePorts as well)
// Callback to be used in a backround isolate.
// Returns address of the new engine.
int initEngine(String parameters) {
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parameters);
startEngine(engineState);
return engineState.address;
}
// Callback to be used in a backround isolate.
// Does whichever processing is needed using the given engine.
void processWithEngine(int engineStateAddress) {
final engineState = Pointer<Void>.fromAddress(engineStateAddress);
process(engineState);
}
void main() {
// Initialize the engine in a background isolate.
final address = compute(initEngine, "parameters");
final engineState = Pointer<Void>.fromAddress(address);
// Do some heavy computation in a background isolate using the engine.
compute(processWithEngine, engineState.address);
}
I ended up doing the processing of callbacks inside the audio loop itself.
while(performAudio())
{
tasks.forEach((String key, List<int> value) {
double val = getCallback(key);
value.forEach((int element) {
callbackPort.send([element, val]);
});
});
}
Where the 'val' is the thing you want to send to callback. The list of int 'value' is a list of callback index.
Let's say you audio loop performs with vector size of 512 samples, you will be able to pass your callbacks after every 512 audio samples are processed, which means 48000 / 512 times per second (assuming you sample rate is 48000). This method is not the best one but it works, I still have to see if it works in very intensive processing context though. Here, it has been thought for realtime audio, but it could work the same for audio rendering.
You can see the full code here : https://framagit.org/johannphilippe/csounddart/-/blob/master/lib/csoundnative.dart

What is ID3D12GraphicsCommandList::DiscardResource?

What exactly should I expect to happen when using DiscardResource?
What's the difference between discard and destroying/deleting a resource.
When is a good time/use-case to discard a resource?
Unfortunately Microsoft doesn't seem to say much about it other than it "discards a resource".
TL;DR: Is a rarely used function that provides a driver hint related to handling clear compression structures. You are unlikely to use it except based on specific performance advice.
DiscardResource is the DirectX 12 version of the Direct3D 11.1 method. See Microsoft Docs
The primary use of these methods it to optimize the performance of tiled-based deferred rasterizer graphics parts by discarding the render target after present. This is a hint to the driver that the contents of the render target are no longer relevant to the operation of the program, so it can avoid some internal clearing operations on the next use.
For DirectX 11, this is in the DirectX 11 App template to use DiscardView because it makes use of DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL:
void DX::DeviceResources::Present()
{
// The first argument instructs DXGI to block until VSync, putting the application
// to sleep until the next VSync. This ensures we don't waste any cycles rendering
// frames that will never be displayed to the screen.
DXGI_PRESENT_PARAMETERS parameters = { 0 };
HRESULT hr = m_swapChain->Present1(1, 0, &parameters);
// Discard the contents of the render target.
// This is a valid operation only when the existing contents will be entirely
// overwritten. If dirty or scroll rects are used, this call should be removed.
m_d3dContext->DiscardView1(m_d3dRenderTargetView.Get(), nullptr, 0);
// Discard the contents of the depth stencil.
m_d3dContext->DiscardView1(m_d3dDepthStencilView.Get(), nullptr, 0);
// If the device was removed either by a disconnection or a driver upgrade, we
// must recreate all device resources.
if (hr == DXGI_ERROR_DEVICE_REMOVED || hr == DXGI_ERROR_DEVICE_RESET)
{
HandleDeviceLost();
}
else
{
DX::ThrowIfFailed(hr);
}
}
The DirectX 12 App template doesn't need those explicit calls because it uses DXGI_SWAP_EFFECT_FLIP_DISCARD.
If you are wondering why the DirectX 11 app doesn't just use DXGI_SWAP_EFFECT_FLIP_DISCARD, it probably should. The DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL swap effect was the only one supported by Windows 8.x for Windows Store apps, which is when DiscardView was introduced. For Windows 10 / DirectX 12 / UWP, it's probably better to always use DXGI_SWAP_EFFECT_FLIP_DISCARD unless you specifically don't want the backbuffer discarded.
It is also useful for multi-GPU SLI / Crossfire configurations since the clearing operation can require synchronization between the GPUs. See this GDC 2015 talk
There are also other scenario-specific usages. For example, if doing deferred rendering for the G-buffer where you know every single pixel will be overwritten, you can use DiscardResource instead of doing ClearRenderTargetView / ClearDepthStencilView.

Can Core Foundation be used in PLCrashReporter signal callback?

I'm using PLCrashReporter in my iOS project and I'm curious, is it possible to use Core Foundation code in my custom crash callback. The thing, that handle my needs is CFPreferences.Here is part of code, that I create:
void LMCrashCallback(siginfo_t* info, ucontext_t* uap, void* context) {
CFStringRef networkStatusOnCrash;
networkStatusOnCrash = (CFStringRef)CFPreferencesCopyAppValue(networkStatusKey, kCFPreferencesCurrentApplication);
CFStringRef additionalInfo = CFStringCreateWithFormat(
NULL, NULL, CFSTR( "Additional Crash Properties:[Internet: %#]", networkStatusOnCrash);
CFPreferencesSetAppValue(additionalInfoKey, additionalInfo,
kCFPreferencesCurrentApplication);
CFPreferencesAppSynchronize(kCFPreferencesCurrentApplication);
}
My target is to collect some system information, just in time when app crashed, e.g Internet connection type.
I know it is not good idea to create own crash callback due to async-safe functions, but this can help.
Also as other option: Is there a way to extend somehow PLCrashReportSystemInfo class?
This is very dangerous. In particular the call to CFStringCreateWithFormat allocates memory. Allocating memory in the middle of a crash handler can lead to battery-draining deadlock (yep; had that bug…) For example, if you were in the middle of free() (which is not an uncommon place to crash), you may already be holding a spinlock on the heap. When you call malloc to get some memory, you may spinlock the heap again and deadlock in a tight-loop. The heap needs to be locked so often and for such short periods of time that it doesn't use a blocking lock. It does the equivalent of while (locked) {}.
You seem to just be reading a preference and copying it to another preference. There's no reason to do that inside a crash handler. Just check hasPendingCrashReport during startup (which I assume you're doing already), and read the key then. It's not clear what networkStatusKey is, but it should still be there when you start up again.
If for any reason it's modified very early (before you call hasPendingCrashReport), you can grab it in main() before launching the app. Or you can grab it in a +load method, which is called even earlier.

iOS Patch program instruction at runtime

How would one go about modifying individual assembly instructions in an application while it is running?
I have a Mobile Substrate tweak that I am writing for an existing application. In the tweak's constructor (MSInitialize), I need to be able to rewrite individual instruction(s) in the app's code. What I mean by this is that there may be multiple places in the application's address space that I wish to modify, but in each instance, only a single instruction needs to be modified. I have already disabled ASLR for the application and know the exact memory address of the instruction to be patched, and I have the hex bytes (as a char[], but this is uninportant and can be changed if necessary) of the new instruction. I just need to figure out how to perform the change.
I know that iOS uses Data Execution Prevention (DEP) to specify that executable memory pages cannot also be writeable and vice versa, but I know that it is possible to bypass this on a jailbroken device. I also know that the ARM processor used by iDevices has an instruction cache that needs to be updated to reflect the change. However, I do not even know where to begin to do this.
So, to answer the question that would surely otherwise be asked, I have not tried anything. This is not because I am lazy; rather, it is because I have absolutely no clue how this could be accomplished. Any help at all would be greatly appreciated.
Edit:
If it helps at all, my ultimate goal is to use this in a Mobile Substrate tweak that hooks an App Store application. Previously, in order to mod this application, one would have to first crack it to decrypt the app so the binary could be patched. I want to make it so people wouldn't have to crack the app, since that can lead to piracy which I am strongly against. I can't use Mobile Substrate normally because all of the work is done in C++, not Objective-C, and the application is stripped, leaving no symbols to use MSHookFunction on.
Completely forgot I asked this question, so I'll show what I ended up with now. The comments should explain how and why it works.
#include <stdio.h>
#include <stdbool.h>
#include <mach/mach.h>
#include <libkern/OSCacheControl.h>
#define kerncall(x) ({ \
kern_return_t _kr = (x); \
if(_kr != KERN_SUCCESS) \
fprintf(stderr, "%s failed with error code: 0x%x\n", #x, _kr); \
_kr; \
})
bool patch32(void* dst, uint32_t data) {
mach_port_t task;
vm_region_basic_info_data_t info;
mach_msg_type_number_t info_count = VM_REGION_BASIC_INFO_COUNT;
vm_region_flavor_t flavor = VM_REGION_BASIC_INFO;
vm_address_t region = (vm_address_t)dst;
vm_size_t region_size = 0;
/* Get region boundaries */
if(kerncall(vm_region(mach_task_self(), &region, &region_size, flavor, (vm_region_info_t)&info, (mach_msg_type_number_t*)&info_count, (mach_port_t*)&task))) return false;
/* Change memory protections to rw- */
if(kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_READ | VM_PROT_WRITE | VM_PROT_COPY))) return false;
/* Actually perform the write */
*(uint32_t*)dst = data;
/* Flush CPU data cache to save write to RAM */
sys_dcache_flush(dst, sizeof(data));
/* Invalidate instruction cache to make the CPU read patched instructions from RAM */
sys_icache_invalidate(dst, sizeof(data));
/* Change memory protections back to r-x */
kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_EXECUTE | VM_PROT_READ));
return true;
}
vm_protect to w^x, assuming you're jailbroken with a decent jailbreak (e.g. if mobilesubstrate works)
Writing to instruction memory from processor registers is, as others say above, a bit tricky. Especially with iPhones, since Apple tries to keep the processor details secret.
Permissions on memory access are the first problem. Executable memory is not normally writable. However, if this is overcome, then there is a little dance to go through to get data out of the processor registers and into the instruction pipeline. In general, there are synchronisation instructions, which force a specific order on the memory accesses before and after them, and cache commands, which force dirty write data out to memory and flush out clean and possibly stale read data. Both of these are highly dependent on the detailed implementation of the processor.
Arm Has nice manuals on the web that explain these in detail for specific processors. However, whether the processors inside iPhones do what the public Arm manuals say, I have no idea.
Here's a place to start understanding the Arm memory synchronisation model for one processor:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0092b/ch04s03s04.html
and that goes on to tell how to flush the instruction cache by a write to a control register. It certainly is possible to write self-modifying code for Arm processors because somewhere in that manual I found a statement that said that it is sometimes unavoidable and the has to be supported.
(I'm not claiming this is an answer. But it wouldn't fit in a comment.)

how do the registers get saved when a process gets interrupted?

this has been bugging me all day. When a program sets itself up to call a function when it receives a certain interrupt, I know that the registers are pushed onto the stack when the program is interrupted, but what I can't figure out is: how do the registers get off the stack? I know that the compiler doesn't know if the function is an interrupt handler, and it can't know how many arguments the interrupt gave to the function. So how on earth does it get the registers off?
It depends on the compiler, the OS and the CPU.
For low level embedded stuff, where an ISR may be called directly in response to an interrupt, the compiler will typically have some extension to the language (usually C or C++) that flags a given routine as an ISR, and registers will be saved and restored at the beginning and end of such a routine. [1]
For common desktop/server OSs though there is normally a level of abstraction between interrupts and user code - interrupts are normally handled first by some kernel code before being passed to a user routine, in which case the kernel code takes care of saving and restoring registers, and there is nothing special about the user-supplied ISR.
[1] E.g. Keil 8051 C compiler:
void Some_ISR(void) interrupt 0 // this routine will get called in response to interrupt 0
{
// compiler generates preamble to save registers
// ISR code goes here
// compiler generates code to restore registers and
// do any other special end-of-ISR stuff
}

Resources