Does a lock-free queue "multiple producers-single consumer" exist for Delphi? - delphi

I've found several implementations for single producer-single consumer, but none for multiple producer-single consumer.
Does a lock-free queue for "multiple producers-single consumer" exist for Delphi?

Lock-free queue from the OmniThreadLibrary supports multiple producers. You can use it separately from the threading library (i.e. you can use OtlContainers unit in any other framework).
As the Daniele pointed below, there are two queues in the OmniThreadLibrary. The one in the OtlContainers supports multiple producers and multiple consumers while the "smarter" version in OtlComm (which is just a wrapper for the simpler version) is only single producer/single consumer.
Documentation is still a big problem of the OmniThreadLibrary project :(. Some information on the queue can be found here .

May be that could be helpful: Interlocked SList functions.

http://svn.berlios.de/svnroot/repos/dzchart/utilities/dzLib/trunk/lockfree/
#Daniele Teti:
The reader must wait for all writers who still have access to the old queue to exit the Enqueue method. Since the first thing the reader does in the Dequeue method is providing a new queue for new writers which enter Enqueue it should not take long for all writers that have a reference to the old queue to exit Enqueue. But you are right: It is lock free only for the writers but might still require the reader thread to wait for some writers to exit Enqueue.

For a multiple-producer / single-consumer Queue/FIFO, you can easily make one LockFree using SLIST or a trivial Lock Free LIFO stack. What you do is have a second "private" stack for the consumer (which can also be done as a SLIST for simplicity or any other stack model you choose). The consumer pops items off the private stack. Whenever the private LIFO is exhasted, you do a Flush rather than Pop off the shared concurrent SLIST (grabbing the entire SLIST chain) and then walk the Flushed list in-order pushing items onto the private stack.
That works for single-producer / single-consumer and for multiple-producer / single-consumer.
However, it does not work for multiple-producer / multiple-consumer cases.

Related

Does let!/do! always run the async object in a new thread?

From the wikibook on F# there is a small section where it says:
What does let! do?#
let! runs an async<'a> object on its own thread, then it immediately
releases the current thread back to the threadpool. When let! returns,
execution of the workflow will continue on the new thread, which may
or may not be the same thread that the workflow started out on.
I have not found anywhere else in books or on the web where this fact (highlighted in bold) is stated.
Is this true for all let!/do! regardless of what the async object contains (e.g. Thread.Sleep()) and how it is started (e.g. Async.Start)?
Looking in the F# source code on github, I wasn't able to find the place where a call to bind executes on a new (TP) thread. Where in the code is the magic happening?
Which part of that statement do you find surprising? That parts of a single async can execute on different threadpool threads, or that a threadpool thread is necessarily being released and obtained on each bind?
If it's the latter, then I agree - it sounds wrong. Looking at the code, there are only a few places where a new work item is being queued on the threadpool (namely, the few Async module functions that use queueAsync internally), and Async.SwitchToNewThread spawns a non-threadpool thread and runs the continuation there. A bind alone doesn't seem to be enough to switch threads.
The spirit of the statement however seems to be about the former - no guarantees are made that parts of an async block will run on the same thread. The exact thread that you run on should be treated as an implementation detail, and when you yield control and await some result, you can be pretty sure that you'll land on a different thread at least some of the time.
No. An async operations might execute synchronously on the current thread, or it might wind up completing on a different thread. It depends entirely on how the async API in question is implemented.
See Do the new C# 5.0 'async' and 'await' keywords use multiple cores? for a decent explanation. The implementation details of F# and C# async are different, but the overall principles are the same.
The builder that implements the F# async computation expression is here.

Is it better for an API to dispatch itself to a queue and invoke a callback, or for the API caller to do the dispatching?

Examples:
Asynchronous method with its own dispatching:
// Library
func asyncAPI(callback: Result -> Void) {
dispatch_async(self.queue) {
...
callback(result)
}
}
// Caller
asyncAPI() { result in
...
}
Synchronous method with exposed dispatch queue:
// Library
func syncAPI() -> Result {
assert(isRunningOnCorrectQueue())
...
return result
}
// Caller
dispatch_async(api.queue) {
let result = api.syncAPI()
...
}
These two examples behave the same but I am looking to learn whether one of these ends up complicating a larget codebase more than the other, especially when there is a lot of asynchrony.
I would argue against both of the patterns you propose.
For the first pattern (where the API manages it's own backgrounding) I see little or no benefit to doing it this way, as opposed to leaving it to the caller. If you want to use a private, serial queue to protect data (or any other sort of critical section) internal to your API, that's fine, but that queue should be private, and it should specifically not target any public, non-global-concurrent queue (Note: it should especially not target the main queue). Ideally, the primary implementation of your API would also take a second parameter, so callers can specify on which queue to invoke the callback. (People can work around the lack of such a parameter by passing a callback block that re-dispatches to their desired queue, but I think that's clunkier than having an extra, optional parameter.) This puts the API consumer in complete control of the concurrency, while preserving your freedom to use queues internally to protect state.
As to the second approach, it's my opinion that we all should avoid creating new synchronous, blocking API. When you provide a synchronous, blocking API and don't provide a callback-based version, that means that you have denied consumers of your API any opportunity to avoid blocking. When you only provide synchronous, blocking API, then if someone wants to call your API in the background, at least one thread (in addition to any additional threads that your API consumes behind the scenes) will be consumed from the finite number of threads available to each process. (In the worst case this can lead to starvation conditions that are effectively deadlocks.)
Another red flag with this second example is that it vends a queue; Any time an API vends a queue, something is amiss. As mentioned, if you want to use a private serial queue to protect state or other critical sections internal to your API, go for it, but don't expose that queue to the outside world. If nothing else, it unnecessarily exposes details of your implementation. In looking at the system framework headers, I couldn't find a single case where a dispatch_queue_t was vended where it wasn't immediately obvious that the intent was for the API consumer to push in the queue, and not read it out.
It's also worth mentioning that these patterns are problematic regardless of whether your workload is CPU-bound or IO-bound. If it's CPU-bound, then not managing your own dispatch gives consumers of the API explicit control over how this CPU work is executed. If your workload is IO-bound, then you should use the OS- and libdispatch-provided asynchronous IO mechanisms (dispatch_io, dispatch_sources, kevent, etc) to avoid consuming a thread (or more than one) for the duration of your work.
Another answer here implied that forcing consumers to manage their own concurrency leads to "boilerplate" code. If you feel that the burden of API consumers potentially having to wrap calls to your API with dispatch_async is too great, then feel free to provide a convenience overload that dispatches to the default global concurrent queue, but please always leave the version that allows API consumers the ability to explicitly manage their own concurrency.
If, on the other hand, all this is internal to the implementation, and not part of the public API, then do whatever is most expedient, knowing that you can refactor the implementation behind the public API any time in the future.
As you said, the 2 generally accomplish the same thing but the first is more preferable in most scenarios. There are several benefits to using the first method.
The API is simpler. You simply call the method and provide code for the callback block.
Less boilerplate code, No typing dispatch_async every time you want to call it as it is just included in the method itself.
Less room for bugs/errors. By wrapping the asynchronous logic inside the method itself, you ensure that it is called on the right queue internally without the caller having to worry about any of that.
Touching on the last point, you also have finer control over the queue itself. Let's say you are trying to perform certain tasks on a particular queue. It is way simpler to simply wrap the code in a GCD call on that queue a single time rather than having to remember to reuse that same queue every time you want to call the method.

Wrapper class for thread-safe objects

I have recently played around with one demo opensource project for the basic functionality of the INDY10 TCP/IP server and stumbled upon the problem of internal multitasking implementation of INDY and its interaction with VCL components. Since there are many different topics in SO on the subject, I decided to make a simple client-server application and test some of the solutions and approaches suggested, at least the ones that I understood correctly. Below I would like to summarize and review an approach that was previously suggested on SO, and if possible listen to your expert opinion on the subject.
Problem: Encapsulation the VCL for thread-safe usage inside an indy10-based client/server application.
Description of the Development Env.:
Delphi Version: Delphi® XE2 Version 16.0
INDY Version 10.5.8.0
O.S. Windows 7 (32Bit)
As mentioned in the article ([ Is the VCL Thread-safe?]) (sorry I do not have enough reputation to post the link) special care should be taken when one wishes to use any kind of VCL components inside a multithreaded (multitasking) application. VCL is not thread safe, but can be used in a thread safe way!
The how and the why usually depend on the application at hand but one can attempt to generalize a bit and suggest some kind of general approach to this problem. First of all, as in the case of INDY10, one does not need to be explicitly parallelizing his code, i.e. create and execute multiple threads, in order to expose VCL to deadlocks and data inter dependencies.
In every sclient-server application, the server has to be able to handle multiple requests simultaneously, so naturally, INDY10 internally implements this functionality. This would mean that the INDY10 set of classes are responsible to manage the program's thread creation, execution and destruction procedures internally.
The most obvious place where our code is exposed to the inner workings of INDY10 and hence possible thread conflicts, is the IdTCPServerExecute (TIdTCPServer onExecute event) method.
Naturally, INDY10 provides classes (wrappers) that ensure thread-safe program flow, but since I did not manage to get enough explanation on their application and usage, I prefer a custom made approach.
Below I summarize a method ( the suggested technique is based on a previous comment I found in SO How to use TIdThreadSafe class from Indy10 ) that attempts (and presumably succeeds) in dealing with this problem:
The question I tackle below is: How to make a specific class "MyClass" ThreadSafe?
The main idea is to create kind of a wrapper class that encapsulates "MyClass" and queues the threads that try to access it in First-In-First-Out principle. The underlying objects that are used for synchronization are [Windows's Critical Section Objects.].
In the context of a client-server application, "MyClass" will contain all thread unsafe functionality of our server, so we will try to ensure that those procedures and functions are not executed by more than one working thread simultaneously. This naturally means loss of parallelism of our code, but since the approach is simple and seems to be , in some cases this maybe a useful approach.
Wrapper class Implementation:
constructor TThreadSafeObject<T>.Create(originalObject: T);
begin
tsObject := originalObject; // pass it already instantiated instance of MyClass
tsCriticalSection:= TCriticalSection.Create; // Critical section Object
end;
destructor TThreadSafeObject<T>.Destroy();
begin
FreeAndNil(tsObject);
FreeAndNil(tsCriticalSection);
inherited Destroy;
end;
function TThreadSafeObject<T>.Lock(): T;
begin
tsCriticalSection.Enter;
result:=tsObject;
end;
procedure TThreadSafeObject<T>.Unlock();
begin
tsCriticalSection.Leave;
end;
procedure TThreadSafeObject<T>.FreeOwnership();
begin
FreeAndNil(tsObject);
FreeAndNil(tsCriticalSection);
end;
MyClass Definition:
MyClass = class
public
procedure drawRandomBitmap(abitmap: TBitmap); //Draw Random Lines on TCanvas
function decToBin(i: LongInt): String; //convert decimal number to Bin.
procedure addLineToMemo(aLine: String; MemoFld: TMemo); // output message to TMemo
function randomColor(): TColor;
end;
Usage:
Since threads execute in order and wait for the thread which has the current ownership of the critical section to finish (tsCriticalSection.Enter; and tsCriticalSection.Leave;) it is logical that if you want to manage that ownership relay, you need one unique instance TThreadSafeObject (you can consider using the singleton pattern). so include:
tsMyclass:= TThreadSafeObject<MyClass>.Create(MyClass.Create);
in Form.Create and
tsMyclass.Destroy;
in Form.Close; Here tsMyclass is a global variable of type MyClass.
Usage:
Regarding the usage of MyClass try the following:
with tsMyclass.Lock do
try
addLineToMemo('MemoLine1', Memo1);
addLineToMemo('MemoLine2', Memo1);
addLineToMemo('MemoLine3', Memo1);
finally
// release ownership
tsMyclass.unlock;
end;
, where Memo1 is an instance of a TMemo component on the form.
With this, we are supposed to ensure that anything that happens when tsMyClass is locked
will be executed by only one thread at a time. An obvious drawback of this approach, however, is that since I have only one instance of tsMyclass, even if one thread is trying to draw for e.g. on the Canvas, while another is writing on the Memo, the first thread will have to wait for the second to finish and only then it will be able to carry out its job.
My questions here are:
Is the above suggested method correct? Am I still free of race
conditions or do I have some "loopholes" in the code, from where
data conflicts could occur?
How can one, in general, test for thread
unsafety of his/her applicaiton?
I would like to stress that the above approach is in no way my own doing. It is basically a summary of the solution found in 2. Nevertheless, I have decided to post again in an attempt to get some kind of closure on the topic or a kind of proof of validity for the suggested solution. Besides, repetition is mother of all knowledge, as they say.
With this, we are supposed to ensure that anything that happens when
tsMyClass is locked will be executed by only one thread at a time. An
obvious drawback of this approach, however, is that since I have only
one instance of tsMyclass, even if one thread is trying to draw for
e.g. on the Canvas, while another is writing on the Memo, the first
thread will have to wait for the second to finish and only then it
will be able to carry out its job.
I see one big problem here: the VCL (forms, drawing, etc...) lives on the main thread. Even if you block concurrent thread access, the updates need to be done in the context of the main thread. This is the part where you need to use Synhronize(), the big difference with a lock (Criticalsection) is that synchronized code is ran in the context of the main thread. The end result is basically the same, your threaded code is serialized and you lose the advantage of using threads in the first place.
Locking on the whole object can be much too coarse.
Imagine cases where some properties or methods are independent of others. If the lock works on a "global" level, many operations will be blocked needlessly.
From Reduce lock granularity – Concurrency optimization
So, how can we reduce lock granularity? With a short answer, by asking
for locks as less as possible. The basic idea is to use separate locks
to guard multiple independent state variables of a class, instead of
having only one lock in class scope.
First things first: You don't need to implement a LOCK for each of your objects, Delphi's done that for you with the TMonitor class:
TMonitor.Enter(WhateverObject);
try
// Your code goes here.
finally TMonitor.Leave(WhateverObject);
end;
just make sure you free the WhateverObject when your application shuts down, or else you'll run into a bug that I've opened on QC: http://qc.embarcadero.com/wc/qcmain.aspx?d=111795
Secondly, making an application multi-threading is a bit more involved. You can't just wrapp each call between Enter/Leave calls: your "locking" needs to take into account what the object does and what the access pattern is. Wrapping calls within Enter/Leave simply make sure that only one thread runs that method at any time, but race conditions are much more complex, and might arise from successive calls to your locked methods. Even those each method is locked, and only one thread ever called those methods at any given time, the state of the locked object might change between as a consequence of other thread's activity.
This kind of code would be just fine in a single-threaded application, but locking at method level is not enough when switching to multi-threaded:
if List.IndexOf(Something) = -1 then
List.Add(Something);

Difference between the WaitFor function for TMutex delphi and the equivalent in win32 API

The documentation of delphi says that the WaitFor function for TMutex and others sychronization objects wait until a handle of object is signaled.But this function also guarantee the ownership of the object for the caller?
Yes, the calling thread of a TMutex owns the mutex; the class is just a wrapper for the OS mutex object. See for yourself by inspecting SyncObjs.pas.
The same is not true for other synchronization objects, such as TCriticalSection. Any thread my call the Release method on such an object, not just the thread that called Acquire.
TMutex.Acquire is a wrapper around THandleObjects.WaitFor, which will call WaitForSingleObject OR CoWaitForMultipleHandles depending on the UseCOMWait contructor argument.
This may be very important, if you use STA COM objects in your application (you may do so without knowing, dbGO/ADO is COM, for instance) and you don't want to deadlock.
It's still a dangerous idea to enter a long/infinite wait in the main thread, 'cause the only method which correctly handles calls made via TThread.Synchronize is TThread.WaitFor and you may stall (or deadlock) your worker threads if you use the SyncObjs objects or WinAPI wait functions.
In commercial projects, I use a custom wait method, built upon the ideas from both THandleObjects.WaitFor AND TThread.WaitFor with optional alertable waiting (good for asynchronous IO but irreplaceable for the possibility to abort long waits).
Edit: further clarification regarding COM/OLE:
COM/OLE model (e.g. ADO) can use different threading models: STA (single-threaded) and MTA (multi or free-threaded).
By definition, the main GUI thread is initialized as STA, which means, the COM objects can use window messages for their asynchronous messaging (particulary when invoked from other threads, to safely synchronize). AFAIK, they may also use APC procedures.
There is a good reason for the CoWaitForMultipleHandles function to exist - see its' use in SyncObjs.pas THandleObject.WaitFor - depending on the threading model, it can process internal COM messages, while blocking on the wait handle.

Best practices to parallelize using async workflow

Lets say I wanted to scrape a webpage, and extract some data. I'd most likely write something like this:
let getAllHyperlinks(url:string) =
async { let req = WebRequest.Create(url)
let! rsp = req.GetResponseAsync()
use stream = rsp.GetResponseStream() // depends on rsp
use reader = new System.IO.StreamReader(stream) // depends on stream
let! data = reader.AsyncReadToEnd() // depends on reader
return extractAllUrls(data) } // depends on data
The let! tells F# to execute the code in another thread, then bind the result to a variable, and continue processing. The sample above uses two let statements: one to get the response, and one to read all the data, so it spawns at least two threads (please correct me if I'm wrong).
Although the workflow above spawns several threads, the order of execution is serial because each item in the workflow depends on the previous item. Its not really possible to evaluate any items further down the workflow until the other threads return.
Is there any benefit to having more than one let! in the code above?
If not, how would this code need to change to take advantage of multiple let! statements?
The key is we are not spawning any new threads. During the whole course of the workflow, there are 1 or 0 active threads being consumed from the ThreadPool. (An exception, up until the first '!', the code runs on the user thread that did an Async.Run.) "let!" lets go of a thread while the Async operation is at sea, and then picks up a thread from the ThreadPool when the operation returns. The (performance) advantage is less pressure against the ThreadPool (and of course the major user advantage is the simple programming model - a million times better than all that BeginFoo/EndFoo/callback stuff you otherwise write).
See also http://cs.hubfs.net/forums/thread/8262.aspx
I was writing an answer but Brian beat me to it. I fully agree with him.
I'd like to add that if you want to parallelize synchronous code, the right tool is PLINQ, not async workflows, as Don Syme explains.

Resources