Synchronizing forked processes with pthread_mutex in C - pthreads

Is it possible to use mutex from pthread.h to synchronize processes created with fork() from unistd.h? Afaik, both in the end are using system call clone().
I am asking it in the scope of shared memory segment (from ipc.h, shm.h) with critical data, which should be protected against concurrent writes from different processes. In that memory then semaphores can be defined and later used in different processes. Why couldn't mutexes be used instead of semaphores?
Why am I asking?
First of all I was told that it won't work, without receiving any explanation for that. On the Internet I was not able to find any answer so I decided to ask here.
Second, forked process is safer than thread created with pthread_create - if forked process crashes, the rest of the program continues to work and if thread crashes then whole program exits.
Third, mutexes seem to be more human-friendly than semaphores in managing.

Related

Why is there only limited usage of thread pools in TensorFlow-Federated?

TFF's threading libraries start a new thread from ThreadRun by default, and the only usage (as of TFF 0.42.0) of the optional ThreadPool parameter is in the implementation of a single executor. Why is this the case?
After conferring with some people who were close to the implementation, the understanding we came to was:
The issue with totally general usage of thread pools in TFF is generally that if used incorrectly, we may be courting deadlock. We need FIFO scheduling in the thread pool itself, and FIFO-compatible usage in the runtime (if you need the result of a computation, you need to know it will be started before you start).
When implementing the first usages of thread pools in the TF executor, we reasoned ourselves to believing the following statement is true: at the leaf executors (that is, so long as an executor doesnt have any children), this FIFO-compatible programming is guaranteed by the stateful executor interface. That is, if you need a value, you know it has already been created (otherwise the executor wouldn't be able to resolve it), so as long as the thread pool is FIFO, it will be ready before you execute. Either the creating function already pushed a function onto this FIFO queue, or just created the value directly, so you can push yourself onto the FIFO queue no sweat.
Due to difficulty, we haven't really tried to reason too hard about how / whether we might be able to make similar statements about executors which have children (and these children may be pushing work onto the queue; AFAIK we dont really currently make any guarantees about how we do this, but i could imagine reasoning about a similar invariant step-by-step 'up the stack'). Thus we have only considered it safe so far to inject thread pool usage at leaf executors. The fact that we don't have this in the XLAExecutor yet is simply due to lack of use.

Why POSIX standardize semaphore as system call but leave mutex and condition variable to Pthread (user level)

I came up with this strange question which haunted me. Why POSIX standardize support for semaphore as syscall but leave condition variable and mutex to pthread library?
What's the division of responsibility here? Why semaphore is not standardized in Pthread package? Why the syscall for synchronization that POSIX standardize is semaphore but not mutex, condition variable?
Don't know. Guess performance is the concern for not implementing mutex as syscall. (Atomic hardware instructions are unprivileged so implementing them at user level is possible. Even though Linux provide futex, it is actually trying to optimize spin lock into two phase lock, towards sleep lock). And the reason for semaphore is that semaphore can be manipulated by different process, compared to the fact that mutex can only be unlocked by the process that hold it? Semaphore's V operation allows process waiting for it unblocked. So semaphore is kept by kernel, and semaphore's id is like the file descriptor, a capability given out by kernel, which makes it a syscall but not purely user level package.
But what about condition variable? Any reason to specify it in Pthread but not syscall level? Because it is stateless and originates from monitor, which is purely stateless programming construct, so it can be implemented using mutex?
Thanks!
Short answer: semaphores and pthreads have separate histories.
Yes: semaphores can be used between processes where pthreads stuff is (generally) all within the current process, or between processes which share memory.
From a performance perspective: a quick poke (on my x86_64) tells me that sem_wait() and sem_post() use straightforward lock cmpxchg instructions, doing syscall only to suspend/wake-up a thread. That is essentially the same as a pthread_mutex_t -- when the semaphore is used as a mutex.
Obviously a semaphore can do things that a mutex and a condition variable do not do, and you can use unnamed semaphores within a process -- sem_init() with pshared=0.
I guess the pthread developers decided that specifying a pthread_sema_t would be unnecessary duplication. Sadly, it does leave room for doubt that the (more general) semaphore might have performance issues even when used only within a process :-( Or, indeed, some doubt that semaphore and pthread stuff always play nicely together :-(

Is JavoNet a threadsafe library, and more imporantlty, does it allow usage of all threads?

Is javonet threadsafe? I couldn't find any documentation one way or the other. Even if it is threadsafe, is there some sort of "mutex" that's preventing full usages of all threads?
When I tried to run javonet in parallel, it did work, but the CPU usage did not significantly increase above the sequential load (ie on a 10CPU system, the CPU usage hovered around 20% for parallel load, whcih was only merely double the sequential CPU load of 10%); however, if I ran 10 version of the exact same sequential code (that used javonet), I achieved 100% CPU usage....so it "feels" like javonet must have some built-in mutexes that's preventing full parallel usage.
Javonet is thread safe. You just need to follow standard practices for writing multi-threaded applications and Javonet will take care of executing your code properly.
Javonet creates new corresponding .NET thread for calling Java threads. Also the other way for callbacks, events and delegates if called from other thread Javonet will create the corresponding thread on Java side. Once the calling thread completes, Javonet will close the thread on the other side.
If the corresponding thread already exists, Javonet will rejoin to valid thread.
Javonet does use internal mutexes / readwritelocks while accessing objects instances, some caching collections and types what depending on your Java code might affect the parallelization capabilities.

How to convert synchronous blocking shared memory model code to asynchronous coroutines running on thread pool?

While there are lots of solutions matching my question partially, I'd like to know if a complete match exists. It's hard to find a complete solution because of these partial ones occupying search results. This should be a runtime framework and (optionally) a transformation required to source language code when the language doesn't support coroutines.
There are libraries like lthread having lthread_cond_wait() API, but every lthread is bounded by a single pthread. I'd like lightweight threads to be able to run in several pthreads. They should be arbitrary picked by thread pool. Either single-threaded schedulers or global lock schedulers don't match. I think we can do better.
lthreads is also not an option because it neither involves source code transformation nor avoids it like protothreads.
Several green-threading runtimes (Erlang, Limbo) don't match because they are limited to CSP (communicating sequential processes) model only, but I'd like to have shared memory model synchronization primitives as well: mutexes, condition variables, rwlocks.
Transformation involves:
Transforming stack contexts into objects in heap
Transforming mutex calls into manipulating disabling and activating jobs on thread pool and publish-subscribe
Condition variables should also be transformed into publish-subscribe realtionships
It would be nice to have Ada-style rendezvous
I failed to do straightforward runtime implementation due to potential deadlocks in publish-subscribe mechanism without using global lock or single scheduler thread, but I still think this is possible.
Disclaimer: lthread author.
You can launch several pthreads and run an lthread scheduler in each one (this is done automagically by calling lthread_run() in the pthread function). This way each pthread will run a bunch of lthreads.

stack management in CLR

I understand the basic concept of stack and heap but great if any1 can solve following confusions:
Is there a single stack for entire application process or for each thread starting in a project a new stack is created?
Is there a single Heap for entire application process or for each thread starting in a project a new stack is created?
If Stack are created for each thread, then how process manage sequential flow of threads (and hence stacks)
There is a separate stack for every thread. This is true not only for CLR, and not only for Windows, but pretty much for every OS or platform out there.
There is single heap for every Application Domain. A single process may run several app domains at once. A single app domain may run several threads.
To be more precise, there are usually two heaps per domain: one regular and one for really large objects (like, say, a 64K array).
I don't understand what you mean by "sequential flow of threads".
One stack for each thread, all threads share the same heaps.
There is no 'sequential flow' of threads. A thread is an operating system object that stores a copy of the processor state. The processor state includes the register values. One of them is ESP, the stack pointer. Another really important one is EIP, the instruction pointer. When the operating system switches between threads, it stores the processor state in the current thread object and reloads the state from the thread object for the thread that was selected to run next. The processor now simply continues executing where it left off previously.
Getting a thread started is perhaps now easy to understand as well. The operating system allocates a megabyte of memory for the stack. And initializes the ESP register value to point to that memory. And sets the value of the EIP register to the address of the method where the thread should start executing. The value of the ThreadStart delegate in C#.
Each thread must have it's own stack, that's where local variables and parameters are held, and the return addresses of the previous functions.

Resources