Using pthreads with MPICH

Using pthreads with MPICH - pthreads

I am having trouble using pthreads in my MPI program. My program runs fine without involving pthreads. But I then decided to execute a time-consuming operation in parallel and hence I create a pthread that does the following (MPI_Probe, MPI_Get_count, and MPI_Recv). My program fails at MPI_Probe and no error code is returned. This is how I initialize the MPI environment
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided_threading_support);
The provided threading support is '3' which I assume is MPI_THREAD_SERIALIZED. Any ideas on how I can solve this problem?

The provided threading support is '3' which I assume is MPI_THREAD_SERIALIZED.
The MPI standard defines thread support levels as named constants and only requires that their values are monotonic, i.e. MPI_THREAD_SINGLE < MPI_THREAD_FUNNELED < MPI_THREAD_SERIALIZED < MPI_THREAD_MULTIPLE. The actual numeric values are implementation-specific and should never be used or compared against.
MPI communication calls by default never return error codes other than MPI_SUCCESS. The reason for that is, MPI calls the communicator's error handler before an MPI call returns and all communicators are initially created with MPI_ERRORS_ARE_FATAL installed as their error handler. That error handler terminates the program and usually prints some debugging information, e.g. the reason for the failure. Both MPICH (and its countless variants) and Open MPI produce quite elaborate reports on what led to the termination.
To enable user error handling on communicator comm, you should make the following call:
MPI_Comm_set_errhandler(comm, MPI_ERRORS_RETURN);
Watch out for the error codes returned - their numerical values are also implementation-specific.

If your MPI implementation isn't willing to give you MPI_THREAD_MULTIPLE, there's three things you can do:
Get a new MPI implementation.
Protect MPI calls with a critical section.
Cut it out with the threading thing.
I would suggest #3. The whole point of MPI is parallelism -- if you find yourself creating multiple threads for a single MPI subprocess, you should consider whether those threads should have been independent subprocesses to begin with.
Particularly with MPI_THREAD_MULTIPLE. I could maybe see a use for MPI_THREAD_SERIALIZED, if your threads are sub-subprocess workers for the main subprocess thread... but MULTIPLE implies that you're tossing data around all over the place. That loses you the primary convenience offered by MPI, namely synchronization. You'll find yourself essentially reimplementing MPI on top of MPI.
Okay, now that you've read all that, the punchline: 3 is MPI_THREAD_MULTIPLE. But seriously. Reconsider your architecture.

Related

Why does VkAccessFlagBits include both read bits and write bits?

In vulkan.h, every instance of VkAccessFlagBits appears in a pair that contains a srcAccessMask and a dstAccessMask:
VkAccessFlags srcAccessMask;
VkAccessFlags dstAccessMask;
In every case, according to my understanding, the purpose of these masks is to help designate two sets of operations, such that results of operations in the first set will be visible to operations in the second set. For instance, write operations occurring prior to a barrier should not get hung up in caches but should instead propagate all the way to locations from which they can be read after the barrier. Or something like that.
The access flags come in both READ and WRITE forms:
/* ... */
VK_ACCESS_SHADER_READ_BIT = 0x00000020,
VK_ACCESS_SHADER_WRITE_BIT = 0x00000040,
VK_ACCESS_COLOR_ATTACHMENT_READ_BIT = 0x00000080,
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT = 0x00000100,
/* ... */
But it seems to me that srcAccessMask should probably always be some sort of VK_ACCESS_*_WRITE_BIT combination, while dstAccessMask should always be a combination of VK_ACCESS_*_READ_BIT values. If that is true, then the READ/WRITE distinction is identical to and implicit in the src/dst distinction, and so it should be good enough to just have VK_ACCESS_SHADER_BIT etc., without READ_ or WRITE_ variants.
Why are there READ_ and WRITE_ variants, then? Is it ever useful to specify that some read operations must fully complete before some other operations have begun? Note that all operations using VkAccessFlagBits produce (I think) execution dependencies as well as memory dependencies. It seems to me that the execution dependencies should be good enough to prevent earlier reads from receiving values written by later writes.

While writing this question I encountered a statement in the Vulkan specification that provides at least part of an answer:
Memory dependencies are used to solve data hazards, e.g. to ensure that write operations are visible to subsequent read operations (read-after-write hazard), as well as write-after-write hazards. Write-after-read and read-after-read hazards only require execution dependencies to synchronize.
This is from the section 6.4. Execution And Memory Dependencies. Also, from earlier in that section:
The application must use memory dependencies to make writes visible before subsequent reads can rely on them, and before subsequent writes can overwrite them. Failure to do so causes the result of the reads to be undefined, and the order of writes to be undefined.
From this I surmise that, yes, the execution dependencies produced by the Vulkan commands that involve these access flags probably do free you from ever having to put a VK_ACCESS_*_READ_BIT into a srcAccessMask field--but that you might in fact want to have READ_ flags, WRITE_ flags, or both in some of your dstAccessMask fields, because apparently it's possible to use an explicit dependency to prevent read-after-write hazards in such a way that write-after-write hazards are NOT prevented. (And maybe vice-versa?)
Like, maybe your Vulkan will sometimes decide that a write does not actually need to be propagated all the way through a particular cache to its final specified destination for the sake of a subsequent read operation, IF Vulkan happens to know that that read operation will simply read from that same cache, saving some time? But then a second write might happen, and write to a different cache, and there'll be two caches left in a race (with the choice of winner undefined) to send their two values to the same spot. Or something? Maybe my mental model of these caches is entirely wrong.
It is fairly solidly established, at least, that memory barriers are confusing.

Let's go over all the possibilities:
read–read — well yeah that one is pretty useless. Khronos seems to agree #131 it is pointless value in src (basically equivalent to 0).
read–write — execution dependency should be sufficient to synchronize without this. Khronos seems to agree #131 it is pointless value in src (basically equivalent to 0).
write–read — that's the obvious and most common one.
write–write — similar reason to write–read above. Without it the order of the writes would be undefined. It is a bit pointless for most situations to write something you haven't even read in between. But hey, now you have a way to synchronize it.
You can provide bitmask of more of these masks to both src and dst. In which case it makes sense to have both masks for driver to sort the dependencies out for you. (I don't expect performance overhead from this on API level, so it is allowed as convenience)
From API design perspective, it could mean adding different enum for srcAccess. But perhaps _READ variants could just be forbidden in srcAccess through "Valid Usage", making this argument weak. The src == READ variant might have been kept, because it is benign.

Capturing stdout in Objective C

I am using C in Objective C and I want to capture stdout to UIView from Console.
Here is the line I'm talking about:
print(stdout, v=toplevel_eval(v));

Other than you are writing in C I have no idea how much you know about C, "Unix" and Cocoa I/O - so some of this you may already know.
Here is one possible solution, it looks more complicated than it is.
Reading:
You need to understand the system calls pipe, dup2 and read.
You need to understand the GCD function dispatch_async and how to obtain a GCD queue.
pipe and dup2 are often used in conjunction with fork and exec to launch a process and write/read to/from that process standard input/output. What you will be doing uses some of the same basic ideas so looking up examples of this common pattern will help you understand how these calls work. Here are some notes from a University: Pipe, Fork, Exec and Related Topics.
Outline:
Using dispatch_async schedule a block to handle the reading and writing of the data. The block will:
Use pipe To create a pipe and dup2 To connect stdout - file descriptor 1 - it.
Enter a loop which uses read to obtain the available data from the pipe. Data read will be in a byte array.
Within the loop convert the read bytes into an NSString
Within the loop append that string to your view - you must do this on the main thread as it involves the UI, and you can do that using another dispatch_async specifying the main queue.
That is it. Your block will now execute concurrently in the background reading whatever your C code writes to the standard output and adding it to your view.
If you get stuck you can ask a new question showing the code you have written and describing what doesn't work.
HTH

popen() implementation,fd leaks,and vfork()

In the glibc implementation of popen(), it specifies that
The popen() function shall ensure that any streams from previous popen() calls that remain open in the parent process are closed in the new child process.
Why? If the purpose is to avoid fd leaks, why not just close all open fds?
The glibc implementation of popen() uses fork(). Although there are dup2() and close() calls between fork() and exec(), is it possible to replace fork() by vfork() to improve performance?
Is the Linux implementation of popen() based on fork() rather than vfork()? Why (or why not)?
I'm going to write a bidirectional version of popen(), which returns two FILE*: one for read and one for write. How do I implement it correctly? It should be thread safe and no fd leaks. It is better if it is fast.

vfork(2) is obsolete (removed from POSIX2008), and fork(2) is quite efficient, since it uses copy-on-write techniques.
popen(3) cannot close all opened files, because it does not know them and cannot know which are relevant. Imagine a program which gets a socket and pass its file descriptor as an argument to the popen-ed command (or simply popen("time cat /etc/issue >&9; date","r")....). See also fcntl(2) with FD_CLOEXEC, open(2) with O_CLOEXEC, execve(2)
File descriptors are program-wide and process-wide scare resources and it is your responsability to manage them correctly. You should know which fd-s should be closed in your child process before execve. If you know what program is execve-d and what fds it needs, you can close all other fds (or most of them, perhaps with for (int i=STDERR_FILENO+1; i<64; i++) (void) close(i);) before execve.
If you are coding a reusable library, document its policy regarding file descriptors (and any other global process-wide resources) and probably use FD_CLOEXEC on any file descriptors it is obtaining itself (not as explicit argument or data), e.g. for internal use.
It looks like you are reinventing p2open (then you probably need to understand the implementation details of your FILE in your C standard library, or else use fdopen(3) with care and caution); you might find some implementation of it. Beware, the process using that probably need to have some event loop (e.g. above poll(2) ...) to avoid a potential deadlock (with both parent and child processes blocked on reading).
Did you consider using some existing event loop infrastructure (e.g. libevent, libev, glib from GTK, etc....)?
BTW Linux has several free software implementations for its C standard library. The GNU libc is quite common, but there is musl-libc and several others. Study the source code of your libc.

Delphi : force unload injected module

i use this code to determine if a specific module has been injected to my application's process
(i use it to prevent some Packet Sniffer Softwares)
Var
H:Cardinal;
Begin
H:= GetModuleHandle('WSock32.dll');
if H >0 then FreeLibrary(H);
end;
the problem is when i call Freelibrary it do nothing !
i don't wanna show message then terminate the application i just want to unload the injected module silently
thanks in advance

Well, first of all I'll attempt to answer the question as asked. And then, I'll try to argue that you are asking the wrong question.
Modules are reference counted. It's possible that there are multiple references to this module. So, keep calling FreeLibrary:
procedure ForceRemove(const ModuleName: string);
var
hMod: HMODULE;
begin
hMod := GetModuleHandle(PChar(ModuleName));
if hMod=0 then
exit;
repeat
until not FreeLibrary(hMod);
end;
If you were paranoid you might choose to add an alternative termination of the loop to avoid looping indefinitely.
I don't really know that this will work in your scenario. For instance, it's quite plausible that your process links statically to WSock32. In which case no amount of calling FreeLibrary will kick it out. And even if you could kick it out, the fact that your process statically linked to it probably means it's going to fail pretty hard.
Even if you can kick it out, it seems likely that other code in your process will hold references to functions in the module. And so you'll just fail somewhere else. I can think of very few scenarios where it makes sense to kick a module out of your process with complete disregard for the other users of that module.
Now, let's step back and look at what you are doing. You are trying to remove a standard system DLL from your process because you believe that it is only present because your process is having its packets sniffed. That seems unlikely to be true.
Since you state that your process is subject to packet sniffing attack. That means that the process is communicating over TCP/IP. Which means that it probably uses system modules to carry out that communication. One of which is WSock32. So you very likely link statically to WSock32. How is your process going to work if you kill one of the modules used to supply its functionality?
Are you quite sure that the presence of WSock32 in your process indicates that your process is under attack? If a packet sniffer was going to inject a DLL into your process, why would it inject the WSock32 system DLL? Did you check whether or not your process, or one of its dependencies, statically links to WSock32?
I rather suspect that you've just mis-diagnosed what is happening.
Some other points:
GetModuleHandle returns, and FreeLibrary accepts an HMODULE. For 32 bit that is compatible with Cardinal, but not for 64 bit. Use HMODULE.
The not found condition for GetModuleHandle is that the return value is 0. Nowhere in the documentation is it stated that a value greater than 0 indicates success. I realise that Cardinal and HMODULE are unsigned, and so <>0 is the same as >0, but it really makes no sense to test >0. It leaves the programmer thinking, "what is so special about <0?"

Erlang tracing (collect data from processes only in my modules)

While tracing my modules using dbg, I encountered with the problem how to collect messages such as spawn, exit, register, unregister, link, unlink, getting_linked, getting_unlinked, which are allowed by erlang:trace, but only for those processes which were spawned from my modules directly?
As an examle I don't need to know which processes io module create, when i call io:format in some module function. Does anybody know how to solve this problem?

Short answer:
one way is to look at call messages followed by spawn messages.
Long answer:
I'm not an expert on dbg. The reason is that I've been using an (imho much better, safer and even handier) alternative: pan , from https://gist.github.com/gebi/jungerl/tree/master/lib/pan
The API is summarized in the html doc.
With pan:start you can trace specifying a callback module that receives all the trace messages. Then your callback module can process them, e.g. keep track of processes in ETS or a state data that is passed into every call.
The format of the trace messages is specified under pan:scan.
For examples of callback modules, you may look at src/cb_*.erl.
Now to your question:
With pan you can trace on process handling and calls in your favourit module like this:
pan:start({ip, CallbackModule}, Node, all, [procs,call], {Module}).
where Module is the name of your module (in this case: sptest)
Then the callback module (in this case: cb_write) can look at the spawn messages that follow a call message within the same process, e.g.:
32 - {call,{<6761.194.0>,{'fun',{shell,<node>}}},{sptest,run,[[97,97,97]]},{1332,247999,200771}}
33 - {spawn,{<6761.194.0>,{'fun',{shell,<node>}}},{{<6761.197.0>,{io,fwrite,2}},{io,fwrite,[[77,101,115,115,97,103,101,58,32,126,115,126,110],[[97,97,97]]]}},{1332,247999,200805}}
As pan is also using the same tracing back end as dbg, the trace messages (and the information) can be collected using the Erlang trace BIF-s as well, but pan is much more secure.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart