NumberOfConcurrentThreads parameter in CreateIoCompletionPort - network-programming

I am still confused about the NumberOfConcurrentThreads parameter within CreateIoCompletionPort(). I have read and re-read the MSDN dox, but the quote
This value limits the number of runnable threads associated with the
completion port.
still puzzles me.
Question
Let's assume that I specify this value as 4. In this case, does this mean that:
1) a thread can call GetQueuedCompletionStatus() (at which point I can allow a further 3 threads to make this call), then as soon as that call returns (i.e. we have a completion packet) I can then have 4 threads again call this function,
or
2) a thread can call GetQueuedCompletionStatus() (at which point I can allow a further 3 threads to make this call), then as soon as that call returns (i.e. we have a completion packet) I then go on to process that packet. Only when I have finished processing the packet do I then call GetQueuedCompletionStatus(), at which point I can then have 4 threads again call this function.
See my confusion? Its the use of the phrase 'runnable threads'.
I think it might be the latter, because the link above also quotes
If your transaction required a lengthy computation, a larger
concurrency value will allow more threads to run. Each completion
packet may take longer to finish, but more completion packets will be
processed at the same time.
This will ultimately affect how we design servers. Consider a server that receives data from clients, then echoes that data to logging servers. Here is what our thread routine could look like:
DWORD WINAPI ServerWorkerThread(HANDLE hCompletionPort)
{
DWORD BytesTransferred;
CPerHandleData* PerHandleData = nullptr;
CPerOperationData* PerIoData = nullptr;
while (TRUE)
{
if (GetQueuedCompletionStatus(hCompletionPort, &BytesTransferred,
(PULONG_PTR)&PerHandleData, (LPOVERLAPPED*)&PerIoData, INFINITE))
{
// OK, we have 'BytesTransferred' of data in 'PerIoData', process it:
// send the data onto our logging servers, then loop back around
send(...);
}
}
return 0;
}
Now assume I have a four core machine; if I leave NumberOfConcurrentThreads as zero within my call to CreateIoCompletionPort() I will have four threads running ServerWorkerThread(). Fine.
My concern is that the send() call may take a long time due to network traffic. Hence, I could be receiving a load of data from clients that cannot be dequeued because all four threads are taking a long time sending the data on?!
Have I missed the point here?
Update 07.03.2018 (This has now been resolved: see this comment.)
I have 8 threads running on my machine, each one runs the ServerWorkerThread():
DWORD WINAPI ServerWorkerThread(HANDLE hCompletionPort)
{
DWORD BytesTransferred;
CPerHandleData* PerHandleData = nullptr;
CPerOperationData* PerIoData = nullptr;
while (TRUE)
{
if (GetQueuedCompletionStatus(hCompletionPort, &BytesTransferred,
(PULONG_PTR)&PerHandleData, (LPOVERLAPPED*)&PerIoData, INFINITE))
{
switch (PerIoData->Operation)
{
case CPerOperationData::ACCEPT_COMPLETED:
{
// This case is fired when a new connection is made
while (1) {}
}
}
}
I only have one outstanding AcceptEx() call; when that gets filled by a new connection I post another one. I don't wait for data to be received in AcceptEx().
I create my completion port as follows:
CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 4)
Now, because I only allow 4 threads in the completion port, I thought that because I keep the threads busy (i.e. they do not enter a wait state), when I try and make a fifth connection, the completion packet would not be dequeued hence would hang! However this is not the case; I can make 5 or even 6 connections to my server! This shows that I can still dequeue packets even though my maximum allowed number of threads (4) are already running? This is why I am confused!

the completion port - is really KQUEUE object. the NumberOfConcurrentThreads is corresponded to MaximumCount
Maximum number of concurrent threads the queue can satisfy waits for.
from I/O Completion Ports
When the total number of runnable threads associated with the
completion port reaches the concurrency value, the system blocks the
execution of any subsequent threads associated with that completion
port until the number of runnable threads drops below the concurrency
value.
it's bad and not exactly said. when thread call KeRemoveQueue ( GetQueuedCompletionStatus internal call it) system return packet to thread only if Queue->CurrentCount < Queue->MaximumCount even if exist packets in queue. system not blocks any threads of course. from another side look for KiInsertQueue - even if some threads wait on packets - it activated only in case Queue->CurrentCount < Queue->MaximumCount.
also look how and when Queue->CurrentCount is changed. look for KiActivateWaiterQueue (This function is called when the current thread is about to enter a wait state) and KiUnlinkThread. in general - when thread begin wait for any object (or another queue) system call KiActivateWaiterQueue - it decrement CurrentCount and possible (if exist packets in queue and became Queue->CurrentCount < Queue->MaximumCount and threads waited for packets) return packet to wait thread. from another side, when thread stop wait - KiUnlinkThread is called. it increment CurrentCount.
your both variant is wrong. any count of threads can call GetQueuedCompletionStatus(). and system of course not blocks the execution of any subsequent threads. for example - you have queue with MaximumCount = 4. you can queue 10 packets to queue. and call GetQueuedCompletionStatus() from 7 threads in concurrent. but only 4 from it got packets. another will be wait (despite yet 6 packets in queue). if some of threads, which remove packet from queue begin wait - system just unwait and return packet to another thread wait on queue. or if thread (which already previous remove packet from this queue (Thread->Queue == Queue) - so active thread) again call KeRemoveQueue will be Queue->CurrentCount -= 1;

Related

Difference between pthread_exit(PTHREAD_CANCELED) and pthread_cancel(pthread_self())

When pthread_exit(PTHREAD_CANCELED) is called I have expected behavior (stack unwinding, destructors calls) but the call to pthread_cancel(pthread_self()) just terminated the thread.
Why pthread_exit(PTHREAD_CANCELED) and pthread_cancel(pthread_self()) differ significantly and the thread memory is not released in the later case?
The background is as follows:
The calls are made from a signal handler and reasoning behind this strange approach is to cancel a thread waiting for the external library semop() to complete (looping around on EINTR I suppose)
I have noticed that calling pthread_cancel from other thread does not work (as if semop was not a cancellation point) but signalling the thread and then calling pthread_exit works but calls the destructor within a signal handler.
pthread_cancel could postpone the action to the next cancellation point.
In terms of thread specific clean-up behaviour there should be no difference between cancelling a thread via pthread_cancel() and exiting a thread via pthread_exit().
POSIX says:
[...] When the cancellation is acted on, the cancellation clean-up handlers for thread shall be called. When the last cancellation clean-up handler returns, the thread-specific data destructor functions shall be called for thread. When the last destructor function returns, thread shall be terminated.
From Linux's man pthread_cancel:
When a cancellation requested is acted on, the following steps occur for thread (in this order):
Cancellation clean-up handlers are popped (in the reverse of the order in which they were pushed) and called. (See pthread_cleanup_push(3).)
Thread-specific data destructors are called, in an unspecified order. (See pthread_key_create(3).)
The thread is terminated. (See pthread_exit(3).)
Referring the strategy to introduce a cancellation point by signalling a thread, I have my doubts this were the cleanest way.
As many system calls return on receiving a signal while setting errno to EINTR, it would be easy to catch this case and simply let the thread end itself cleanly under this condition via pthread_exit().
Some pseudo code:
while (some condition)
{
if (-1 == semop(...))
{ /* getting here on error or signal reception */
if (EINTR == errno)
{ /* getting here on signal reception */
pthread_exit(...);
}
}
}
Turned out that there is no difference.
However some interesting side effects took place.
Operations on std::iostream especially cerr/cout include cancellation points. When the underlying operation is canceled the stream is marked as not good. So you will get no output from any other thread if only one has discovered cancellation on an attempt to print.
So play with pthread_setcancelstate() and pthread_testcancel() or just call cerr.clear() when needed.
Applies to C++ streams only, stderr,stdin seems not be affected.
First of all, there are two things associated to thread which will tell what to do when you call pthread_cancel().
1. pthread_setcancelstate
2. pthread_setcanceltype
first function will tell whether that particular thread can be cancelled or not, and the second function tells when and how that thread should be cancelled, for example, should that thread be terminated as soon as you send cancellation request or it need to wait till that thread reaches some milestone before getting terminated.
when you call pthread_cancel(), thread wont be terminated directly, above two actions will be performed, i.e., checking whether that thread can be cancelled or not, and if yes, when to cancel.
if you disable cancel state, then pthread_cancel() can't terminate that thread, but the cancellation request will stay in a queue waiting for that thread to become cancellable, i.e., at some point of time if you are enabling cancel state, then your cancel request will start working on terminating that thread
whereas if you use pthread_exit(), then the thread will be terminated irrespective to the cancel state and cancel type of that particular thread.
*this is one of the differences between pthread_exit() and pthread_cancel(), there can be few more.

Handing off a piece of work to a thread and waiting for it to accept

My application works as follows:
the worker-threads initialize and begin waiting in pthread_cond_wait()
the main thread connects to DB and starts handing over one row at a time to the proper worker
Because of the DB-driver internals, the next row can not be read until the current one is extracted, so the main thread has to wait for the worker to "accept" the row.
I achieve this by calling pthread_cond_wait() inside the main thread -- waiting for a pthread_signal() from the worker. This works cleanly -- on both Linux and FreeBSD -- but usually takes much longer on Linux. Whereas I consistently process the entire 1.6M rows in about 27 seconds on FreeBSD, on Linux it usually takes over 2 minutes. Except sometimes the Linux box shows the same time...
The code is compiled from the same source and the program talks to the same DB-server. If anything, the Linux box is located on the same LAN as the DB, whereas the FreeBSD machine connects via VPN (so it should be a bit slower). But it is the wide inconsistency of the Linux results that bothers me, and I suspect the thread-coordination...
Here is what I have now:
MAIN THREAD WORKER
--------------------------------------------------------------------------
get new row
figure out, which worker it belongs to lock my mutex
lock the worker's mutex go into pthread_cond_wait
signal the worker extract the row's data
unlock the worker's mutex signal the main thread
go into pthread_cond_wait unlock the mutex
go on back to getting the next row go on to process the row's data
Is there a better way? Thanks!
If reading the next row must be serial anyway, why are you delegating this to the worker? As the main thread has to wait anyway, have the main thread do the extraction and have the hand-off occur as soon as the row has been sufficiently extracted that the master can proceed to the next row.
Other than that, you will need to provide code, as your description is incomplete, as would be any question of this nature submitted without code.
It looks like your problem is that you are calling pthread_cond_wait() without the mutex locked in the main thread. This means that there's a race-condition: if the worker thread wakes up, extracts the data and signals the condition before the parent executes pthread_cond_wait(), the wakeup will be lost.
What you should have is some shared state paired with the condition variable, like this:
Main Thread:
get_new_row();
worker = decide_worker();
pthread_mutex_lock(&mutex);
/* Signal worker that data is available */
flag[worker] = 1;
pthread_cond_signal(&cond);
/* Wait for worker to extract it */
while (flag[worker] == 1)
pthread_cond_wait(&cond, &mutex):
pthread_mutex_unlock(&mutex);
Worker Thread:
pthread_mutex_lock(&mutex);
/* Wait for data to be available */
while (flag[worker] == 0)
pthread_cond_wait(&cond, &mutex):
extract_row_data();
/* Signal main thread that extraction is complete */
flag[worker] = 0;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);

pthreads wait and signal doubts linux

Before pthread wait we lock using a mutex so that some other code might not try to change the condition variable. wait then unlocks the mutex and waits for the signal.
Say, in some other thread i had locked the same mutex and after that, i had used 'signal'. and then unlock thread.
when signal is done, the waiting thread wakes up and aquires the mutex again.
Thread1 Thread2
{ {
lock(mutex); lock(mutex);
wait(mutex); signal(mutex);
unlock(mutex); unlock(mutex);
} }
Say the three thread one statements are enclosed in a while(1) loop. Then assume that thread2 locks the mutex, signals it, and unlocks the mutex. and then doesn't end, but goes to sleep.
So will the value of the condition variable be changed permanently? If three statements of thread one are running in infinite lop, will it never wait and just find that the signal has been given? When wait call returns, does it set the value of the condition variable back to initial value?
If yes, can I use create,destroy or initialize methods on the variables to set the value back? If yes, how? What exactly do these functions do?
Thanks,
pthread_cond_signal() will always wake at least one thread that is currently waiting on that condition variable in pthread_cond_wait(). If the same thread or a different thread then calls pthread_cond_wait() again, it will block and wait for another signal.
This means that pthread condition variables must always be paired with some kind of shared data, protected by the mutex that is held when calling pthread_cond_wait(). Before calling pthread_cond_wait(), the thread must check the shared data to see if the condition it wants to wait for has occurred - if not, it shouldn't wait.
The simplest example of such shared data might be a global flag. In your example:
int flag = 0;
Thread 1 {
pthread_mutex_lock(&mutex);
while (!flag)
pthread_cond_wait(&cond, &mutex);
pthread_mutex_unlock(&mutex);
}
Thread 2 {
pthread_mutex_lock(&mutex);
flag = 1;
pthread_mutex_signal(&cond);
pthread_mutex_unlock(&mutex);
}
You can see here that when the condition is "reset" is entirely under your control - for example you could have Thread 1 set flag = 0; before it calls pthread_mutex_unlock().
The shared state is often more complex than a simple flag - for example you might have a producer thread call pthread_mutex_wait() while there is no room in a shared buffer.

Resetting comm event mask

I have been doing overlapped serial port communication in Delphi lately and there is one problem I'm not sure how to solve.
I communicate with a modem. I write a request frame (an AT command) to the modem's COM port and then wait for the modem to respond. The event mask of the port is set to EV_RXCHAR, so when I write a request, I call WaitCommEvent() and start waiting for data to appear in the input queue. When overlapped waiting for event finishes, I immediately start reading data from the queue and read all that the device sends at once:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) do something and then goto 1
Waiting for event finishes after first byte appears in the input queue. During my read operation, however, more bytes appear in the queue and each of them causes an internal event flag to be set. This means that when I read all the data from the queue and then call WaitCommEvent() for the second time, it will immediately return with EV_RXCHAR mask, even though there is no data to be read.
How should I handle reading and waiting for event to be sure that the event mask returned by WaitCommEvent() is always valid? Is it possible to reset the flags of the serial port so that when I read all data from the queue and call WaitCommEvent() after then, it will not return immediately with a mask that was valid before I read the data?
The only solution that comes to my mind is this:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) call WaitCommEvent() which should return true immediately at the same time resetting the event flag set internally
5) do something and goto 1
Is it a good idea or is it stupid? Of course I know that the modem almost always finishes its answers with CRLF characters so I could set the comm mask to EV_RXFLAG and wait for the #10 character to appear, but there are many other devices with which I communicate and they do not always send frame end characters.
Your help will be appreciated. Thanks in advance!
Mariusz.
Your solution does sound workable. I just use a state machine to handle the transitions.
(psuedocode)
ioState := ioIdle;
while (ioState <> ioFinished) and (not aborted) do
Case ioState of
ioIdle : if there is data to read then set state to ioMidFrame
ioMidframe : if data to read then read, if end of frame set to ioEndFrame
ioEndFrame : process the data and set to ioFinished
ioFinished : // don't do anything, for doc purposes only.
end;

pthread conditional variable

I'm implementing a thread with a queue of tasks. As soon as as the first task is added to the queue the thread starts running it.
Should I use pthread condition variable to wake up the thread or there is more appropriate mechanism?
If I call pthread_cond_signal() when the other thread is not blocked by pthread_cond_wait() but rather doing something, what happens? Will the signal be lost?
Semaphores are good if-and-only-if your queue already is thread safe. Also,
some semaphore implementations may be limited by top counter value.
Even it is unlikely you would overrun maximal value.
Simplest and correct way to do this is following:
pthread_mutex_t queue_lock;
pthread_cond_t not_empty;
queue_t queue;
push()
{
pthread_mutex_lock(&queue_lock);
queue.insert(new_job);
pthread_cond_signal(&not_empty)
pthread_mutex_unlock(&queue_lock);
}
pop()
{
pthread_mutex_lock(&queue_lock);
if(queue.empty())
pthread_cond_wait(&queue_lock,&not_empty);
job=quque.pop();
pthread_mutex_unlock(&queue_lock);
}
From the pthread_cond_signal Manual:
The pthread_cond_broadcast() and pthread_cond_signal() functions shall have no effect if there are no threads currently blocked on cond.
I suggest you use Semaphores. Basically, each time a task is inserted in the queue, you "up" the semaphore. The worker thread blocks on the semaphore by "down"'ing it. Since it will be "up"'ed one time for each task, the worker thread will go on as long as there are tasks in the queue. When the queue is empty the semaphore is at 0, and the worker thread blocks until a new task arrives. Semaphores also easily handle the case when more than 1 task arrived while the worker was busy. Notice that you still have to lock access to the queue to keep inserts/removes atomic.
The signal will be lost, but you want the signal to be lost in that case. If there is no thread to wakeup, the signal serves no purpose. (If nobody is waiting for something, nobody needs to be notified when it happens, right?)
With condition variables, lost signals cannot cause a thread to "sleep through a fire". Unless you actually code a thread to go to sleep when there's already a fire, there is no need to "save a signal". When the fire starts, your broadcast will wake up any sleeping threads. And you would have to be pretty daft to code a thread to go to sleep when there's already a fire.
As already suggested, semaphores should be the best choice. If you need a fixed-size queue just use 2 semaphores (as in classical producer-consumer).
In artyom code, it would be better to replace "if" with "while" in pop() function, to handle spurious wakeup.
No effects.
If you check how pthread_condt_signal is implemented, the condt uses several counters to check whether there are any waiting threads to wake up. e.g., glibc-nptl
/* Are there any waiters to be woken? */
if (cond->__data.__total_seq > cond->__data.__wakeup_seq){
...
}

Resources