Unix: sharing already-mapped memory between processes - memory

I have a pre-built userspace library that has an API along the lines of
void getBuffer (void **ppBuf, unsigned long *pSize);
void bufferFilled (void *pBuf, unsigned long size);
The idea being that my code requests a buffer from the lib, fills it with stuff, then hands it back to the lib.
I want another process to be able to fill this buffer. I can do this by creating some new shared buffer via shm*/shm_* APIs, have the other process fill that, then copy it to the lib's buffer in the lib's local process, but this has the overhead of an extra (potentially large) copy.
Is there a way to share memory that has ALREADY been mapped for a process? eg something like:
[local lib process]
getBuffer (&myLocalBuf, &mySize);
shmName = shareThisMemory (myLocalBuf, mySize);
[other process]
myLocalBuf = openTheSharedMemory (shmName);
That way the other process could write directly into the lib's buffer.
(Synchronization between the processes is already taken care of so no problems there).

There are good reasons for not allowing this functionality, particularly from the security side of things. A "share this mem" API would subvert the access permissions system.
Just assume an application holds some sort of critical/sensitive information in memory; the app links (via e.g. using a shared library, a preload, a modified linker/loader) to whatever component outside, and said component for the sheer fun of it decides to "share out the address space". It'd be a free-for-all, a method to bypass any sort of data access permission/restriction. You'd tunnel your way into the app.
Not good for your usecase, admitted, but rather justified from the system / application integrity point of view. Try searching the web for /proc/pid/mem mmap vulnerability for some explanation why this sort of access isn't wanted (in general).
If the library you use is designed to allow such shared access, it must itself provide the hooks to either allocate such a shared buffer, or use an elsewhere-preallocated (and possibly shared) buffer.
Edit: To make this clear, the process boundary is explicitly about not sharing the address space (amongst other things).
If you require a shared address space, either use threads (then the entire address space is shared and there's never any need to "export" anything), or explicitly set up a shared memory region in the same way as you'd set up a shared file.
Look at it from the latter point of view, two processes not opening it O_EXCL would share access to a file. But if one process already has it open O_EXCL, then the only way to "make it shared" (open-able to another process) is to close() it first then open() it again without O_EXCL. There's no other way to "remove" exclusive access from a file that you've opened as such other than to close it first.
Just as there is no way to remove exclusive access to a memory region mapped as such other than to unmap it first - and for a process' memory, MAP_PRIVATE is the default, for good reasons.
More: a process-shared memory buffer really isn't much different than a process shared file; using SysV-IPC style semantics, you have:
| SysV IPC shared memory Files
==============+===================================================================
creation | id = shmget(key,..., IPC_CREAT); fd = open("name",...,O_CREAT);
lookup | id = shmget(key,...); fd = open("name",...);
access | addr = shmat(id,...); addr = mmap(...,fd,...);
|
global handle | IPC key filename
local handle | SHM ID number filedescriptor number
mem location | created by shmat() created by mmap()
I.e. the key is the "handle" you're looking for, pass that the same way you would pass a filename, and both sides of the IPC connection can then use that key to check whether the shared resource exists, as well at access (attach to the handle) the contents though that.

A more modern way to share memory among processes is to use the POSIX shm_open() API.
Essentially, it's a portable way of putting files on a ramdisk (tmpfs). So one process uses shm_open plus ftruncate plus mmap. The other uses shm_open (with the same name) plus mmap plus shm_unlink. (With more than two processes, the last one to mmap it can unlink it.)
This way the shared memory will get reclaimed automatically when the last process exits; no need to explicitly remove the shared segment (as with SysV shared memory).
You still need to modify your application to allocate shared memory in this way, though.

In theory at least, you can record the memory address of the buffer you got from your lib and have the other process mmap /proc/$PID_OF_FIRST_PROCCESS/mem file with the address as the offset.
I haven't tested it and I'm not sure /proc/PID/mem actually has an mmap file op implemented and there are a ton of security consideration but it might work. Best of luck :-)

Related

Does the order of hdf5 close matter?

I am implementing a HDF5 layer in an interpreted language with automatic reclamation facilities (garbage collect).
When a proxy to a HDF5 entity (H5File, H5Group, H5Dataset, H5Dataspace, H5Datatype, etc...) will be no longer referenced, it will be automatically reclaimed. With ephemeron like facility, I can arrange to be noticed and invoke the corresponding close function automagically (H5Fclose, H5Gclose, H5Dclose, etc...) in order to release the target resource.
By default, I have no control on the order of reclamation. However, if ever order of close counts, then I can arrange to keep a strong pointer on a parent proxy (for example the H5 File) from within any other entity. If order does not count, then I will avoid this useless complication.
So my questions:
Can I invoke H5Fclose(fid); before H5Gclose(gid); where previously gid=H5Gcreate(fid,'/foo',H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);?
Can I continue to operate on the group once I closed the containing file? For example, is it legal to call H5Fclose(fid); before gid2=H5Gcreate(gid,'bar',H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); in above example? If not, are there other entities concerned, or is it just file?
Doh, case of blindness, the documentation tells that close is delayed until all objects have been closed, so 1. order does not count and 2. is legal.
https://support.hdfgroup.org/HDF5/doc1.6/RM_H5F.html#File-Close
However, it may not work in every circumstances, so it's not recommended.
H5Fclose terminates access to an HDF5 file by flushing all data to storage and terminating access to the file through file_id.
If this is the last file identifier open for the file and no other access identifier is open (e.g., a dataset identifier, group identifier, or shared datatype identifier), the file will be fully closed and access will end.
Delayed close:
Note the following deviation from the above-described behavior. If H5Fclose is called for a file but one or more objects within the file remain open, those objects will remain accessible until they are individually closed. Thus, if the dataset data_sample is open when H5Fclose is called for the file containing it, data_sample will remain open and accessible (including writable) until it is explicitely closed. The file will be automatically closed once all objects in the file have been closed.
Be warned, however, that there are circumstances where it is not possible to delay closing a file. For example, an MPI-IO file close is a collective call; all of the processes that opened the file must close it collectively. The file cannot be closed at some time in the future by each process in an independent fashion. Another example is that an application using an AFS token-based file access privilage may destroy its AFS token after H5Fclose has returned successfully. This would make any future access to the file, or any object within it, illegal.
In such situations, applications must close all open objects in a file before calling H5Fclose. It is generally recommended to do so in all cases.

Why does VkAccessFlagBits include both read bits and write bits?

In vulkan.h, every instance of VkAccessFlagBits appears in a pair that contains a srcAccessMask and a dstAccessMask:
VkAccessFlags srcAccessMask;
VkAccessFlags dstAccessMask;
In every case, according to my understanding, the purpose of these masks is to help designate two sets of operations, such that results of operations in the first set will be visible to operations in the second set. For instance, write operations occurring prior to a barrier should not get hung up in caches but should instead propagate all the way to locations from which they can be read after the barrier. Or something like that.
The access flags come in both READ and WRITE forms:
/* ... */
VK_ACCESS_SHADER_READ_BIT = 0x00000020,
VK_ACCESS_SHADER_WRITE_BIT = 0x00000040,
VK_ACCESS_COLOR_ATTACHMENT_READ_BIT = 0x00000080,
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT = 0x00000100,
/* ... */
But it seems to me that srcAccessMask should probably always be some sort of VK_ACCESS_*_WRITE_BIT combination, while dstAccessMask should always be a combination of VK_ACCESS_*_READ_BIT values. If that is true, then the READ/WRITE distinction is identical to and implicit in the src/dst distinction, and so it should be good enough to just have VK_ACCESS_SHADER_BIT etc., without READ_ or WRITE_ variants.
Why are there READ_ and WRITE_ variants, then? Is it ever useful to specify that some read operations must fully complete before some other operations have begun? Note that all operations using VkAccessFlagBits produce (I think) execution dependencies as well as memory dependencies. It seems to me that the execution dependencies should be good enough to prevent earlier reads from receiving values written by later writes.
While writing this question I encountered a statement in the Vulkan specification that provides at least part of an answer:
Memory dependencies are used to solve data hazards, e.g. to ensure that write operations are visible to subsequent read operations (read-after-write hazard), as well as write-after-write hazards. Write-after-read and read-after-read hazards only require execution dependencies to synchronize.
This is from the section 6.4. Execution And Memory Dependencies. Also, from earlier in that section:
The application must use memory dependencies to make writes visible before subsequent reads can rely on them, and before subsequent writes can overwrite them. Failure to do so causes the result of the reads to be undefined, and the order of writes to be undefined.
From this I surmise that, yes, the execution dependencies produced by the Vulkan commands that involve these access flags probably do free you from ever having to put a VK_ACCESS_*_READ_BIT into a srcAccessMask field--but that you might in fact want to have READ_ flags, WRITE_ flags, or both in some of your dstAccessMask fields, because apparently it's possible to use an explicit dependency to prevent read-after-write hazards in such a way that write-after-write hazards are NOT prevented. (And maybe vice-versa?)
Like, maybe your Vulkan will sometimes decide that a write does not actually need to be propagated all the way through a particular cache to its final specified destination for the sake of a subsequent read operation, IF Vulkan happens to know that that read operation will simply read from that same cache, saving some time? But then a second write might happen, and write to a different cache, and there'll be two caches left in a race (with the choice of winner undefined) to send their two values to the same spot. Or something? Maybe my mental model of these caches is entirely wrong.
It is fairly solidly established, at least, that memory barriers are confusing.
Let's go over all the possibilities:
read–read — well yeah that one is pretty useless. Khronos seems to agree #131 it is pointless value in src (basically equivalent to 0).
read–write — execution dependency should be sufficient to synchronize without this. Khronos seems to agree #131 it is pointless value in src (basically equivalent to 0).
write–read — that's the obvious and most common one.
write–write — similar reason to write–read above. Without it the order of the writes would be undefined. It is a bit pointless for most situations to write something you haven't even read in between. But hey, now you have a way to synchronize it.
You can provide bitmask of more of these masks to both src and dst. In which case it makes sense to have both masks for driver to sort the dependencies out for you. (I don't expect performance overhead from this on API level, so it is allowed as convenience)
From API design perspective, it could mean adding different enum for srcAccess. But perhaps _READ variants could just be forbidden in srcAccess through "Valid Usage", making this argument weak. The src == READ variant might have been kept, because it is benign.

ChromeWorker to write a huge file

In my extension, I need to write a huge file (say around 20 gigs) to the disk. Currently I am doing it in the main thread, but file creation is very expensive operation. I was about to move the whole file creation process to a ChromeWorker, but based on https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Functions_and_classes_available_to_workers I cannot have access to the nsiFile from a ChromeWorker.
So my questions are:
1. Is it possible to access Cc, Ci, and Cu from within a ChromeWorker?
2. If not what would be the most efficient way to create and fill large files in Firefox. Note that I need to write the file based on segments and offsets (Ci.nsISeekableStream).
It's not possible to access nsIFile from ChromeWorker. But nsIFile is horrible synchronus option.
Go with OS.File: https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/OSFile.jsm
On that page go to the link for usage on workers: https://developer.mozilla.org/docs/Mozilla/JavaScript_code_modules/OSFile.jsm/OS.File_for_workers
On the mainthread os.file returns promises.
In worker they are synchronus. Wrap your os.file functions in worker with a try-catch, as when an error occurs, (like os.file.remove with option of ignoreAbsent set to false) then the catch will hold the OS.File.Error object.
Great move to ChromeWorker btw! I'm a huge fan of ChromeWorkers. I wrote a simple example of jsm using chromeworker here: https://github.com/Noitidart/jpm-chromeworker
For segments, you'll have to OS.File.open and then on the return value do a .setPosition() then you can read certain number of bytes from that position, or write, or whatever. Its awesome stuff. OS.File is the new way and the recommended way to do file operations. Its been around awhile now though since like Firefox 29 or before that.

popen() implementation,fd leaks,and vfork()

In the glibc implementation of popen(), it specifies that
The popen() function shall ensure that any streams from previous popen() calls that remain open in the parent process are closed in the new child process.
Why? If the purpose is to avoid fd leaks, why not just close all open fds?
The glibc implementation of popen() uses fork(). Although there are dup2() and close() calls between fork() and exec(), is it possible to replace fork() by vfork() to improve performance?
Is the Linux implementation of popen() based on fork() rather than vfork()? Why (or why not)?
I'm going to write a bidirectional version of popen(), which returns two FILE*: one for read and one for write. How do I implement it correctly? It should be thread safe and no fd leaks. It is better if it is fast.
vfork(2) is obsolete (removed from POSIX2008), and fork(2) is quite efficient, since it uses copy-on-write techniques.
popen(3) cannot close all opened files, because it does not know them and cannot know which are relevant. Imagine a program which gets a socket and pass its file descriptor as an argument to the popen-ed command (or simply popen("time cat /etc/issue >&9; date","r")....). See also fcntl(2) with FD_CLOEXEC, open(2) with O_CLOEXEC, execve(2)
File descriptors are program-wide and process-wide scare resources and it is your responsability to manage them correctly. You should know which fd-s should be closed in your child process before execve. If you know what program is execve-d and what fds it needs, you can close all other fds (or most of them, perhaps with for (int i=STDERR_FILENO+1; i<64; i++) (void) close(i);) before execve.
If you are coding a reusable library, document its policy regarding file descriptors (and any other global process-wide resources) and probably use FD_CLOEXEC on any file descriptors it is obtaining itself (not as explicit argument or data), e.g. for internal use.
It looks like you are reinventing p2open (then you probably need to understand the implementation details of your FILE in your C standard library, or else use fdopen(3) with care and caution); you might find some implementation of it. Beware, the process using that probably need to have some event loop (e.g. above poll(2) ...) to avoid a potential deadlock (with both parent and child processes blocked on reading).
Did you consider using some existing event loop infrastructure (e.g. libevent, libev, glib from GTK, etc....)?
BTW Linux has several free software implementations for its C standard library. The GNU libc is quite common, but there is musl-libc and several others. Study the source code of your libc.

Mixing Custom External File Handler with Micro Focus Default EXTFH

I have written a custom external file handler (EXTFH), but there are some cases where I want to revert to the Micro Focus EXTFH. The cases are on a file by file basis (as opposed to a filetype by filetype basis).
My idea is that upon OPEN, I place a marker in the FCD that tells the subsequent operations (READ, WRITE, CLOSE) as to which EXTFH is in use.
My EXTFH has control and the logic can be very simple if there is a place in the FCD that is guaranteed to not be corrupted by MicroFocus.
Is there a place in the FCD (fcd2.h and fcd3.h) that I can mark an open file as being opened by my EXTFH?
My worst case is that I keep a list of the fcd->handle pointers that I have allocated and if I allocated it, then direct to my EXTFH. If not, direct to the MF EXTFH.
Here is the documentation from Micro Focus on EXTFH:
http://supportline.microfocus.com/documentation/books/sx20books/fhexfh.htm
That is older documentation, but is appears to be pretty much up-to-date.
[edit to clarify how we will use detect which to use: We will use the extension on the file name to determine which file handler to use. For instance, if the extension is: .xyz, then use our EXTFH, otherwise use MF EXTFH]. It appears we can check the filename on every fileio, but I think it would be cleaner if we just checked upon OPEN. For subsequent calls for that file, we would just check something in the FCD.

Resources