pthread mutex without mmap possible? - memory

On linux with copy-on-write forking. When creating pthread interprocess mutex in a parent, will it be effective in the child or it will work so only if I mmap it into shared memory?

When calling fork() the whole memory space is duplicated, including mutexes, so to answer your question, the new mutex will be a copy of the parent's mutex, so you have to mmap it.
Note:
You will need to set the PTHREAD_PROCESS_SHARED flag on the mutex attribute using pthread_mutexattr_getpshared().

Related

Why POSIX standardize semaphore as system call but leave mutex and condition variable to Pthread (user level)

I came up with this strange question which haunted me. Why POSIX standardize support for semaphore as syscall but leave condition variable and mutex to pthread library?
What's the division of responsibility here? Why semaphore is not standardized in Pthread package? Why the syscall for synchronization that POSIX standardize is semaphore but not mutex, condition variable?
Don't know. Guess performance is the concern for not implementing mutex as syscall. (Atomic hardware instructions are unprivileged so implementing them at user level is possible. Even though Linux provide futex, it is actually trying to optimize spin lock into two phase lock, towards sleep lock). And the reason for semaphore is that semaphore can be manipulated by different process, compared to the fact that mutex can only be unlocked by the process that hold it? Semaphore's V operation allows process waiting for it unblocked. So semaphore is kept by kernel, and semaphore's id is like the file descriptor, a capability given out by kernel, which makes it a syscall but not purely user level package.
But what about condition variable? Any reason to specify it in Pthread but not syscall level? Because it is stateless and originates from monitor, which is purely stateless programming construct, so it can be implemented using mutex?
Thanks!
Short answer: semaphores and pthreads have separate histories.
Yes: semaphores can be used between processes where pthreads stuff is (generally) all within the current process, or between processes which share memory.
From a performance perspective: a quick poke (on my x86_64) tells me that sem_wait() and sem_post() use straightforward lock cmpxchg instructions, doing syscall only to suspend/wake-up a thread. That is essentially the same as a pthread_mutex_t -- when the semaphore is used as a mutex.
Obviously a semaphore can do things that a mutex and a condition variable do not do, and you can use unnamed semaphores within a process -- sem_init() with pshared=0.
I guess the pthread developers decided that specifying a pthread_sema_t would be unnecessary duplication. Sadly, it does leave room for doubt that the (more general) semaphore might have performance issues even when used only within a process :-( Or, indeed, some doubt that semaphore and pthread stuff always play nicely together :-(

Synchronizing forked processes with pthread_mutex in C

Is it possible to use mutex from pthread.h to synchronize processes created with fork() from unistd.h? Afaik, both in the end are using system call clone().
I am asking it in the scope of shared memory segment (from ipc.h, shm.h) with critical data, which should be protected against concurrent writes from different processes. In that memory then semaphores can be defined and later used in different processes. Why couldn't mutexes be used instead of semaphores?
Why am I asking?
First of all I was told that it won't work, without receiving any explanation for that. On the Internet I was not able to find any answer so I decided to ask here.
Second, forked process is safer than thread created with pthread_create - if forked process crashes, the rest of the program continues to work and if thread crashes then whole program exits.
Third, mutexes seem to be more human-friendly than semaphores in managing.

using mutexes in external modules

If I have a module that has a mutex and I write the value of an int variable using lock/unlock on the mutex, how is the same mutex locked/unlocked in another module that is being run in a thread?
The external module also needs to write the value of the variable and this is being done in a threaded loop function. How does one lock this using the same mutex or should another mutex be used?
What happens if 2 different mutexes lock the same memory segments(not necessarily simultaneously)?
In order to share a mutex using pthreads, you are going to have to somehow share the address of the mutex through a pointer or make the mutex a globally accessible variable.
Mutexes or semaphores themselves do not lock a given memory or critical code section ... instead they lock access to a specific "flag" or location in memory (i.e., like an unsigned long or some other POD type) that is then used to determine access to a critical section or other globally accessible memory location. In other words once a thread "owns" a given mutex, it gets to have access to a segment of code or memory section that any other thread trying to obtain ownership of that same mutex is blocked from for the duration of the owning thread's lock.
If you use two different mutexes to block access to a given location, that does not provide mutually exclusive access to the actual memory segment ... each thread that does not share a mutex will have equal access to the memory segment even though they may each have an ownership lock on their respective mutexes. Therefore the memory segment is not really protected ... two threads could access the memory segment at the same time, creating concurrency issues.
So again, if you have different modules or threads, and you want to have exclusive access to a given section of code or memory, you're going to have to share the same mutex between those elements. There are many ways this could be done, either though something like named semaphores if you need to-do this for multiple separate processes, or through a shared memory segment (i.e., shmget(), shmat(), shmdt, etc.), etc. if you can't somehow share a globally accessible mutex or pointer because of some separation of address space between your modules.

Why does it take longer to call a fork() than it does to call pthread_create()?

I was wondering this, is it because they only need a stack and storage for registers so they are cheap to create ?
Thanks a lot :)
fork() has to clone the entire process and all its associated kernel data structures, including file handles, memory, and so forth. Though this might be done lazily by setting appropriate copy-on-write flags, it is a lot more work than creating a new thread, which just shares the same file handles and memory.

Multithreaded Heap Management

In C/C++ I can allocate memory in one thread and delete it in another thread. Yet whenever one requests memory from the heap, the heap allocator needs to walk the heap to find a suitably sized free area. How can two threads access the same heap efficiently without corrupting the heap? (Is this done by locking the heap?)
In general, you do not need to worry about the thread-safety of your memory allocator. All standard memory allocators -- that is, those shipped with MacOS, Windows, Linux, etc. -- are thread-safe. Locks are a standard way of providing thread-safety, though it is possible to write a memory allocator that only uses atomic operations rather than locks.
Now it is an entirely different question whether those memory allocators scale; that is, is their performance independent of the number of threads performing memory operations? In most cases, the answer is no; they either slow down or can consume a lot more memory. The first scalable allocator in both dimensions (speed and space) is Hoard (which I wrote); the Mac OS X allocator is inspired by it -- and cites it in the documentation -- but Hoard is faster. There are others, including Google's tcmalloc.
Yes an "ordinary" heap implementation supporting multithreaded code will necessarily include some sort of locking to ensure correct operation. Under fairly extreme conditions (a lot of heap activity) this can become a bottleneck; more specialized heaps (generally providing some sort of thread-local heap) are available which can help in this situation. I've used Intel TBB's "scalable allocator" to good effect. tcmalloc and jemalloc are other examples of mallocs implemented with multithreaded scaling in mind.
Some timing comparisons comparisons between single threaded and multithread-aware mallocs here.
This is an Operating Systems question, so the answer is going to depend on the OS.
On Windows, each process gets its own heap. That means multiple threads in the same process are (by default) sharing a heap. Thus the OS has to thread-synchronize its allocation and deallocation calls to prevent heap corruption. If you don't like the idea of the possible contention that may ensue, you can get around it by using the Heap* routines. You can even overload malloc (in C) and new (in C++) to call them.
I found this link.
Basically, the heap can be divided into arenas. When requesting memory, each arena is checked in turn to see whether it is locked. This means that different threads can access different parts of the heap at the same time safely. Frees are a bit more complicated because each free must be freed from the arena that it was allocated from. I imagine a good implementation will get different threads to default to different arenas to try to minimize contention.
Yes, normally access to the heap has to be locked. Any time you have a shared resource, that resource needs to be protected; memory is a resource.
This will depend heavily on your platform/OS, but I believe this is generally OK on major sytems. C/C++ do not define threads, so by default I believe the answer is "heap is not protected", that you must have some sort of multithreaded protection for your heap access.
However, at least with linux and gcc, I believe that enabling -pthread will give you this protection automatically...
Additionally, here is another related question:
C++ new operator thread safety in linux and gcc 4

Resources