Does the semaphore have its own stack? - stack

When I create semaphore function from a process, then does semaphore have its own stack, or share stack of the process.
I know that the threads have their own stack, but I wonder about semaphore.

Related

GCD vs #synchronized vs NSLock

Can someone give a rundown of the benefits and drawbacks of these 3 systems in how they relate to thread safety?
From watching more recent WWDC videos, I get the feeling that Apple is pushing the usage of GCD to create performant reader-writer that are thread safe.
What's the idea/backing behind this? Is it the time to access a lock having to enter the kernel that leads to this GCD push, and shying away from #synchronized and NSLock?
Are #synchronized and NSLock being pushed out of what would be considered best practice, or is there still a place for them?
There are many many details that could be discussed at great length in regards to this. But, at the core:
These always require a lock to be taken somewhere or somehow:
#synchronized(...) { ... }
[lock lock];
Locks are very expensive for the reasons you mention; they necessarily consume kernel resources. (The #synchronized() case actually may avoid kernel locks these days, but it is a hash based exclusion mechanism and that, in itself, is expensive).
And these do not always require a lock (but sometimes maybe do):
dispatch_sync(...concurrent q...., ^{ ... });
dispatch_async(...queue of any kind...., ^{ ... });
There is a fast path through the dispatch functions that are effectively lockless (though they will use test-and-set atomic primitives that can cause performance issues under load).
The end result is that a synchronous dispatch to a concurrent queue can effectively be treated as "execute this on this thread right now". A synchronous dispatch to a serial queue can do the atomic test-and-set to test if the queue is processing, mark it as busy, and, if it wasn't busy, execute the block on the calling thread immediately.
Asynchronous dispatches can be similarly as fast, though asynchronous dispatch requires copying the block (which can be very cheap, but something to think about).
In general, GCD can do anything a lock can do, can do it at least -- if not more -- efficiently, and you can use the GCD APIs to go way beyond just simple locking (using a semaphore as a computation throttle, for example).
BTW: If your tasks are relatively coarse grained, have a look at NSOperationQueue and NSOperation.

Dispatch semaphores and memory visibility

If I signal a dispatch semaphore in one thread and wait for it in another thread, is the waiting thread guaranteed to see all changes made by the signalling thread upto a point? If so, is it synchronized at the signalling point or the waiting point?

Thread pools and context switching (tasks)?

This is quite a general computer science question and not specific to any OS or framework.
So I am a little confused by the overhead associated with switching tasks on a thread pool. In many cases it doesn't make sense to give every job its own specific thread (we don't want to create too many hardware threads), so instead we put these jobs into tasks which can be scheduled to run on a thread. We setup up a pool of threads and then dynamically allocate the tasks to run on a thread taken from the thread pool.
I am just a little confused (can't find a in depth answer) on the overhead associated with switching tasks on a specific thread (in the thread pool). A DrDobbs article (sourced below) states it does but I need a more in depth answer to what is actually happening (a cite-able source would be fantastic :)).
By definition, SomeWork must be queued up in the pool and then run on
a different thread than the original thread. This means we necessarily
incur queuing overhead plus a context switch just to move the work to
the pool. If we need to communicate an answer back to the original
thread, such as through a message or Future or similar, we will incur
another context switch for that.
Source: http://www.drdobbs.com/parallel/use-thread-pools-correctly-keep-tasks-sh/216500409?pgno=1
What components of the thread are actually switching? The thread itself isn't actually switching, just the data that is specific to the thread. What is the overhead associated with this (more, less or the same)?
let´s clarify first 5 key concepts here and then discuss how they correlates in a thread pool context:
thread:
In a brief resume it can be described as a program execution context, given by the code that is being run, the data in cpu registries and the stack. when a thread is created it is assigned the code that should be executed in that thread context. In each cpu cycle the thread has an instruction to execute and the data in cpu registries and stack in a given state.
task:
Represents a unit of work. It's the code that is assigned to a thread to be executed.
context switch (from wikipedia):
Is the process of storing and restoring the state (context) of a thread so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU and is an essential feature of a multitasking operating system. What constitutes the context is as explained above is the code that is being executed, the cpu registries and the stack.
What is context switched is the thread. A task represents only a peace of work that can be assigned to a thread to be executed. At given moment a thread can be executing a task.
Thread Pool (from wikipedia):
In computer programming, the thread pool is where a number of threads are created to perform a number of tasks, which are usually organized in a queue.
Thread Pool Queue:
Where tasks are placed to be executed by threads in the pool. This data structure is a shared peace of memory where threads may compete to queue/dequeue, may lead to contention in high load scenarios.
Illustrating a thread pool usage scenario:
In your program (eventually running in the main thread), you create a task and schedules it to be executed in thread pool.
The task is queued in the thread pool queue.
When a thread from the pool executes it dequeues a task from the pool and starts to executed it.
If there is no free cpus to execute the thread from the pool, the operating system at some point (depending on thread scheduler policy and thread priorities) will stop a thread from executing, context switching to other thread.
the operating system can stop the execution of a thread at any time, context switching to another thread, returning latter to continue where it stopped.
The overhead of the context switching is augmented when the number of active threads that competes for cpus grows. Thus, ideally, a thread pool tries to use the minimum necessary threads to occupy all available cpus in a machine.
If your tasks haven't code that blocks somewhere, context switching is minimized because it is used no more threads than the available cpus on machine.
Of course if you have only one core, your main thread and the thread pool will compete for the same cpu.
The article probably talks about the case in which work is posted to the pool and the result of it is being waited for. Running a task on the thread-pool in general does not incur any context switching overhead.
Imagine queueing 1000 work items. A thread-pool thread will executed them one after the other. All of that without a single context switch in between.
Switching happens doe to waiting/blocking.

Why is `pthread_mutex_lock` needed when `pthread_mutex_trylock` is there?

pthread_mutex_trylock detects deadlocks, doesn't block, then why would you even "need" pthread_mutex_lock?
Perhaps when you deliberately want the thread to block? But in that case it may result in a deadlock?
pthread_mutex_trylock does not detect deadlocks.
You can use it to avoid deadlocks but you have to do that by wrapping your own code around it, effectively multiple calls to pthread_mutex_trylock in a loop with a time-out, after which your thread releases all its resources.
In any case, you can avoid deadlocks even with pthread_mutex_lock if you just follow the simple rule that all threads allocate resources in the same order.
You use pthread_mutex_lock if you just want to efficiently wait until the resource is available, without having to spin on the mutex, something which is often very inefficient. Properly designed multi-threaded applications have no need for the pthread_mutex_trylock variant.
Locks should only be held for the absolute minimum time to do the work and, if that's too long, you can generally redesign things so the lock time is less (such as by using the mutex to only copy data to a thread's local data areas, and having the long-running bit work on that after the mutex is released).
The pseudo-code:
while not pthread_mutex_trylock:
yield
will continue to run your thread, waiting for the lock to be available, especially since there is no pthread_yield() in POSIX threads (though it's sometimes provided as a non-portable extension).
That means, at worst, the code segment above won't even be able to portably yield the CPU, therefore chewing up the rest of it's quantum every time through the scheduler cycle.
And at best, it will still activate the thread once per scheduler cycle just to see if the mutex can be obtained.
Whereas:
pthread_mutex_lock
will most likely totally pause your thread until the lock is made available, since it will move it to a waiting queue until the current lock holder releases the mutex.
That's probably the major reason why you should prefer pthread_mutex_lock to pthread_mutex_trylock.
Perhaps when you deliberately want the thread to block?
Yup, exactly in this case. But you can mimic pthread_mutex_lock() behavior with something like that
while(pthread_mutex_trylock(&mtx))
pthread_yield()

Does pthread_exit kill a thread.. I mean free the stack allocated to it?

I want to create a lot of threads for a writing into a thread, and after writing I call exit... But, when I call exit do I free up the stack or do I still consume it??
In order to avoid resource leaks, you have to do one of these 2:
Make sure some other thread call pthread_join() on the thread
Create the thread as 'detached', which can either be done by setting the proper pthread attribute to pthread_create, or by calling the pthread_detach() function.
Failure to do so will often result in the entire stack "leaking" in many implementations.
The system allocates underlying storage for each thread, (thread ID, thread retval, stack), and this will remain in the process space (and not be recycled) until the thread has terminated and has been joined by other threads.
If you have a thread which you don't care how the thread terminates, and a detached thread is a good choice.
For detached threads, the system recycles its underlying resources automatically after the thread terminates.
source article: http://www.ibm.com/developerworks/library/l-memory-leaks/

Resources