Does the order of hdf5 close matter? - hdf5

I am implementing a HDF5 layer in an interpreted language with automatic reclamation facilities (garbage collect).
When a proxy to a HDF5 entity (H5File, H5Group, H5Dataset, H5Dataspace, H5Datatype, etc...) will be no longer referenced, it will be automatically reclaimed. With ephemeron like facility, I can arrange to be noticed and invoke the corresponding close function automagically (H5Fclose, H5Gclose, H5Dclose, etc...) in order to release the target resource.
By default, I have no control on the order of reclamation. However, if ever order of close counts, then I can arrange to keep a strong pointer on a parent proxy (for example the H5 File) from within any other entity. If order does not count, then I will avoid this useless complication.
So my questions:
Can I invoke H5Fclose(fid); before H5Gclose(gid); where previously gid=H5Gcreate(fid,'/foo',H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);?
Can I continue to operate on the group once I closed the containing file? For example, is it legal to call H5Fclose(fid); before gid2=H5Gcreate(gid,'bar',H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); in above example? If not, are there other entities concerned, or is it just file?

Doh, case of blindness, the documentation tells that close is delayed until all objects have been closed, so 1. order does not count and 2. is legal.
https://support.hdfgroup.org/HDF5/doc1.6/RM_H5F.html#File-Close
However, it may not work in every circumstances, so it's not recommended.
H5Fclose terminates access to an HDF5 file by flushing all data to storage and terminating access to the file through file_id.
If this is the last file identifier open for the file and no other access identifier is open (e.g., a dataset identifier, group identifier, or shared datatype identifier), the file will be fully closed and access will end.
Delayed close:
Note the following deviation from the above-described behavior. If H5Fclose is called for a file but one or more objects within the file remain open, those objects will remain accessible until they are individually closed. Thus, if the dataset data_sample is open when H5Fclose is called for the file containing it, data_sample will remain open and accessible (including writable) until it is explicitely closed. The file will be automatically closed once all objects in the file have been closed.
Be warned, however, that there are circumstances where it is not possible to delay closing a file. For example, an MPI-IO file close is a collective call; all of the processes that opened the file must close it collectively. The file cannot be closed at some time in the future by each process in an independent fashion. Another example is that an application using an AFS token-based file access privilage may destroy its AFS token after H5Fclose has returned successfully. This would make any future access to the file, or any object within it, illegal.
In such situations, applications must close all open objects in a file before calling H5Fclose. It is generally recommended to do so in all cases.

Related

Remove disconnected structures of compounds

I am uploading 3 different chemical files to my application, one at a time. Each file contains SMILE of compound, but the tag name is different. I am creating an IAtomContainer stream by reading file. I want to remove the disconnected structures from the stream. Is there any way to remove it instead of manually checking SMILES. I am using cdk 1.5.13.
ConnectivityChecker.isConnected(IAtomContainer);
this is working. Its returning boolean value.

ChromeWorker to write a huge file

In my extension, I need to write a huge file (say around 20 gigs) to the disk. Currently I am doing it in the main thread, but file creation is very expensive operation. I was about to move the whole file creation process to a ChromeWorker, but based on https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Functions_and_classes_available_to_workers I cannot have access to the nsiFile from a ChromeWorker.
So my questions are:
1. Is it possible to access Cc, Ci, and Cu from within a ChromeWorker?
2. If not what would be the most efficient way to create and fill large files in Firefox. Note that I need to write the file based on segments and offsets (Ci.nsISeekableStream).
It's not possible to access nsIFile from ChromeWorker. But nsIFile is horrible synchronus option.
Go with OS.File: https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/OSFile.jsm
On that page go to the link for usage on workers: https://developer.mozilla.org/docs/Mozilla/JavaScript_code_modules/OSFile.jsm/OS.File_for_workers
On the mainthread os.file returns promises.
In worker they are synchronus. Wrap your os.file functions in worker with a try-catch, as when an error occurs, (like os.file.remove with option of ignoreAbsent set to false) then the catch will hold the OS.File.Error object.
Great move to ChromeWorker btw! I'm a huge fan of ChromeWorkers. I wrote a simple example of jsm using chromeworker here: https://github.com/Noitidart/jpm-chromeworker
For segments, you'll have to OS.File.open and then on the return value do a .setPosition() then you can read certain number of bytes from that position, or write, or whatever. Its awesome stuff. OS.File is the new way and the recommended way to do file operations. Its been around awhile now though since like Firefox 29 or before that.

Mixing Custom External File Handler with Micro Focus Default EXTFH

I have written a custom external file handler (EXTFH), but there are some cases where I want to revert to the Micro Focus EXTFH. The cases are on a file by file basis (as opposed to a filetype by filetype basis).
My idea is that upon OPEN, I place a marker in the FCD that tells the subsequent operations (READ, WRITE, CLOSE) as to which EXTFH is in use.
My EXTFH has control and the logic can be very simple if there is a place in the FCD that is guaranteed to not be corrupted by MicroFocus.
Is there a place in the FCD (fcd2.h and fcd3.h) that I can mark an open file as being opened by my EXTFH?
My worst case is that I keep a list of the fcd->handle pointers that I have allocated and if I allocated it, then direct to my EXTFH. If not, direct to the MF EXTFH.
Here is the documentation from Micro Focus on EXTFH:
http://supportline.microfocus.com/documentation/books/sx20books/fhexfh.htm
That is older documentation, but is appears to be pretty much up-to-date.
[edit to clarify how we will use detect which to use: We will use the extension on the file name to determine which file handler to use. For instance, if the extension is: .xyz, then use our EXTFH, otherwise use MF EXTFH]. It appears we can check the filename on every fileio, but I think it would be cleaner if we just checked upon OPEN. For subsequent calls for that file, we would just check something in the FCD.

Unix: sharing already-mapped memory between processes

I have a pre-built userspace library that has an API along the lines of
void getBuffer (void **ppBuf, unsigned long *pSize);
void bufferFilled (void *pBuf, unsigned long size);
The idea being that my code requests a buffer from the lib, fills it with stuff, then hands it back to the lib.
I want another process to be able to fill this buffer. I can do this by creating some new shared buffer via shm*/shm_* APIs, have the other process fill that, then copy it to the lib's buffer in the lib's local process, but this has the overhead of an extra (potentially large) copy.
Is there a way to share memory that has ALREADY been mapped for a process? eg something like:
[local lib process]
getBuffer (&myLocalBuf, &mySize);
shmName = shareThisMemory (myLocalBuf, mySize);
[other process]
myLocalBuf = openTheSharedMemory (shmName);
That way the other process could write directly into the lib's buffer.
(Synchronization between the processes is already taken care of so no problems there).
There are good reasons for not allowing this functionality, particularly from the security side of things. A "share this mem" API would subvert the access permissions system.
Just assume an application holds some sort of critical/sensitive information in memory; the app links (via e.g. using a shared library, a preload, a modified linker/loader) to whatever component outside, and said component for the sheer fun of it decides to "share out the address space". It'd be a free-for-all, a method to bypass any sort of data access permission/restriction. You'd tunnel your way into the app.
Not good for your usecase, admitted, but rather justified from the system / application integrity point of view. Try searching the web for /proc/pid/mem mmap vulnerability for some explanation why this sort of access isn't wanted (in general).
If the library you use is designed to allow such shared access, it must itself provide the hooks to either allocate such a shared buffer, or use an elsewhere-preallocated (and possibly shared) buffer.
Edit: To make this clear, the process boundary is explicitly about not sharing the address space (amongst other things).
If you require a shared address space, either use threads (then the entire address space is shared and there's never any need to "export" anything), or explicitly set up a shared memory region in the same way as you'd set up a shared file.
Look at it from the latter point of view, two processes not opening it O_EXCL would share access to a file. But if one process already has it open O_EXCL, then the only way to "make it shared" (open-able to another process) is to close() it first then open() it again without O_EXCL. There's no other way to "remove" exclusive access from a file that you've opened as such other than to close it first.
Just as there is no way to remove exclusive access to a memory region mapped as such other than to unmap it first - and for a process' memory, MAP_PRIVATE is the default, for good reasons.
More: a process-shared memory buffer really isn't much different than a process shared file; using SysV-IPC style semantics, you have:
| SysV IPC shared memory Files
==============+===================================================================
creation | id = shmget(key,..., IPC_CREAT); fd = open("name",...,O_CREAT);
lookup | id = shmget(key,...); fd = open("name",...);
access | addr = shmat(id,...); addr = mmap(...,fd,...);
|
global handle | IPC key filename
local handle | SHM ID number filedescriptor number
mem location | created by shmat() created by mmap()
I.e. the key is the "handle" you're looking for, pass that the same way you would pass a filename, and both sides of the IPC connection can then use that key to check whether the shared resource exists, as well at access (attach to the handle) the contents though that.
A more modern way to share memory among processes is to use the POSIX shm_open() API.
Essentially, it's a portable way of putting files on a ramdisk (tmpfs). So one process uses shm_open plus ftruncate plus mmap. The other uses shm_open (with the same name) plus mmap plus shm_unlink. (With more than two processes, the last one to mmap it can unlink it.)
This way the shared memory will get reclaimed automatically when the last process exits; no need to explicitly remove the shared segment (as with SysV shared memory).
You still need to modify your application to allocate shared memory in this way, though.
In theory at least, you can record the memory address of the buffer you got from your lib and have the other process mmap /proc/$PID_OF_FIRST_PROCCESS/mem file with the address as the offset.
I haven't tested it and I'm not sure /proc/PID/mem actually has an mmap file op implemented and there are a ton of security consideration but it might work. Best of luck :-)

External stored procedure on IBM i

I am trying to create an external stored procedure on an IBM i (V5R4), but I'm getting an error when I try to run it.
All I want to do is call an RPG program, without passing any parameters or worrying about returning any data. Sorry, I'm not an RPG programmer or an expert on IBM i, so I could be missing something very simple.
The SQL to create the procedure:
CREATE PROCEDURE SOMELIB.SOMEPROC ( )
LANGUAGE RPGLE
NOT DETERMINISTIC
NO SQL
EXTERNAL NAME 'OTHERLIB/SOMERG'
PARAMETER STYLE GENERAL;
The error I get when executing CALL SOMELIB.SOMEPROC() is:
SQL State: 38501
Vendor Code: -443
Message: [CEE9901] Application error. RNX1216 unmonitored by BB1002RG at statement 2100000001, instruction X'0000'. Cause . . . . . : The application ended abnormally because an exception occurred and was not handled. The name of the program to which the unhandled exception is sent is SOMERG SOMERG . The program was stopped at the high-level language statement number(s) at the time the message was sent. If more than one statement number is shown, the program is an optimized ILE program. Optimization does not allow a single statement number to be determined. If *N is shown as a value, it means the real value was not available. Recovery . . . : See the low level messages previously listed to locate the cause of the exception. Correct any errors, and then try the request again.
Your procedure is calling the RPG program without the library list set. You can do one of two things:
1) Change the F-spec in the RPG program to qualify the library using the EXTFILE keyword.
2) Call a CL program from the stored procedure that adds the appropriate library to the library list making sure to allow for the fact that the library may already be there from a prior call. Then have the CL program call the RPG program.
(a little bit more rude solution) Identify the user that starts the Stored Procedure. Change the jobdescription of that user to have the correct library list.
But in my experience is the CL program the most pragmatic solution too.
Assuming the file is in the same library as the program, add EXTFILE(variablename) and USROPN to the F-spec. Take the library name from the PSDS and construct the variablename value before you OPEN the file.
If the file and program are in different libraries, you might create a data area in the program library to hold the name of the data library. Retrieve the data area (using the PSDS) instead of using the PSDS (for the file library). If program and file aren't kept together, it can be a good idea to keep the data library name in an external object that can be changed rather than recompiling.
(Actually, I've rarely used data areas in the past ten years or so. Instead I create a user index. Each entry in the *USRIDX replaces a data area. The entries are keyed by a value that used to be a data area name. One object replaces many others and one procedure can manage all entries. One object to own and authorize reduces some system overhead.)
A suggestion to get rid of this trouble: make the user profile JOBD contains all libraries needed by the stored procedure.

Resources