How can I parallelize check spelling using Delphi? - delphi

I've got a sort of spell checker written in Delphi. It analyzes the text sentence by sentence.
It encolors wrong items according to some rules after parsing each sentence. The user is able to interrupt this process, which is important.
How can I parallelize this process in general using some 3rd party Delphi libraries?
In the current state I've got on the fly sentence coloration after check. Thus the user sees the progress.

The algorithm would be as such:
Create multiple workers.
Create a spell-checker in each worker.
Grab the text and split it into work units (word or sentences). Each work unit must be accompanied with the location in original text.
Send work units to workers. Good approach is to send data into common queue from which workers are taking work units. Queue must either support multiple readers or you must use locking to access it.
Each worker takes a work unit, runs a spell-check and returns the result (together with the location in the original text) to the owner.
The simplest way to return a result is to send a message to the main thread.
Alternatively, you can write results into a result queue (which must either use locking or support multiple writers) and application can then poll those results (either from a timer or from the OnIdle handler).
How the multiple spell-checkers will access the dictionary is another problem. You can load a copy of the dictionary in each worker or you can protect access to the dictionary with a lock (but that would slow things down). If you are lucky, dictionary is thread-safe for reading and you can do simultaneous queries without locking.
Appropriate OmniThreadLibrary abstraction for the problem would be either a ParallelTask or a BackgroundWorker.

To parallelize, just create a new class descendent from TThread, create an object from it, give part of the job to the new thread, run Execute, and collect the results in the main thread.
Like this:
TMySpellChecker = class(TThread)
protected
FText: String;
FResult: String;
public
procedure Execute; override;
property Text: String read FText write FText;
property Result: String read FResult write FResult;
end;
TMySpellChecker.Execute;
begin
// Analyze the text, and compute the result
end;
In the main thread:
NewThread := TMySpellChecker.Create(True); // Create suspended
NewThread.Text := TextSegment;
NewThread.Execute;
The thread object will then do the analyzing in the background, while the main thread continues to run.
To handle the results, you need to assign a handler to the OnTerminate event of the thread object:
NewThread.OnTerminate := HandleMySpellCheckerTerminate;
This must be done before you run Execute on the thread object.
To allow for interruptions, one possibility is to break the main text up into segments, place the segments in a list in the main thread, and then analyze the segments one by one using the thread object. You can then allow for interruptions between each run.

Related

TIdCmdTCPServer and data synchronization with main thread [anomaly?]

Situation looks like this. External application client.exe sends a command MONITOR_ENCODING every ~250ms to the server. In server application I use IdCmdTCPServer1BeforeCommandHandler to read sent command from client.
I copy received command (AData string) to global variable command and then show it in memo1. Meanwhile I have constantly running TSupervisorThread which copies global variable command to local Copiedcommand variable and then assigns new text to command variable. Is this an expected behaviour that from time to time instead of MONITOR_ENCODING text I get RESET?
procedure TForm1.IdCmdTCPServer1BeforeCommandHandler(ASender: TIdCmdTCPServer; var AData: String; AContext: TIdContext);
begin
command:=AData; //command is a global variable
form1.Memo1.Lines.Add(IntToStr(form1.Memo1.Lines.Count+1)+'|'+IntToStr(GetTickCount)+'|'+command);
end;
procedure TSupervisorThread.CopyGlobalVariables;
begin
CopiedCommand:=command; //Copiedcommand declared in TSupervisorThread
command:='RESET';
end;
procedure TSupervisorThread.Execute;
begin
while Terminated=false do
begin
Synchronize(CopyGlobalVariables);
sleep(250);
end;
end;
FYI, the OnBeforeCommandHandler event is not the correct place to read commands with TIdCmdTCPServer. You are supposed to add an entry for each command to its CommandHandlers collection and then assign an OnCommand handler to each entry. The OnBeforeCommandHandler event is triggered before TIdCmdTCPServer parses a received command. That is OK for logging purposes, just make sure you don't use it to drive your processing logic. Leave that to the individual OnCommand events.
But, either way, command reading is done in a worker thread. You are not synchronizing with the main UI thread when adding received commands to your UI. You MUST synchronize. There are many ways to do that - TThread.Synchronize(), TThread.Queue(), TIdSync, TIdNotify, (Send|Post)Message(), etc, just to name a few.
More importantly, you have 2 threads (or more, depending on how many clients are connected at the same time) fighting over the same global variable without syncing access to it at all. You need a lock around the variable, such as a TCriticalSection or TMutex, or use Indy's TIdThreadSafeString class.
But, that won't solve the race condition that your code has. Between the time that your OnBeforeCommandHandler assigns a new value to command and the time that it reads command back to add it to the UI, your TSupervisorThread is free to modify command. That is exactly what you are seeing happen. It is not an anomaly in Indy, it is a timing issue in your code.
The easiest solution to that race condition is to simply add AData to your UI instead of command. That way, it won't matter if TSupervisorThread modifies command, your UI won't see it.
But, why are you using a global command variable at all? What is its real purpose? Are you trying to make an outside thread modify the commands that TIdCmdTCPServer parses? You are not controlling which clients get to parse real commands and which get fake commands. Why are you doing this?
Besides that, having TSupervisorThread perform 99% of its work in the main UI thread is a poor use of a worker thread, you may as well just use a TTimer in the UI instead. Otherwise, you need to coordinate your threads better, such as by using TEvent objects to signal when command is assigned to, and when it is reset.
I think you need to rethink your design.

Wrapper class for thread-safe objects

I have recently played around with one demo opensource project for the basic functionality of the INDY10 TCP/IP server and stumbled upon the problem of internal multitasking implementation of INDY and its interaction with VCL components. Since there are many different topics in SO on the subject, I decided to make a simple client-server application and test some of the solutions and approaches suggested, at least the ones that I understood correctly. Below I would like to summarize and review an approach that was previously suggested on SO, and if possible listen to your expert opinion on the subject.
Problem: Encapsulation the VCL for thread-safe usage inside an indy10-based client/server application.
Description of the Development Env.:
Delphi Version: Delphi® XE2 Version 16.0
INDY Version 10.5.8.0
O.S. Windows 7 (32Bit)
As mentioned in the article ([ Is the VCL Thread-safe?]) (sorry I do not have enough reputation to post the link) special care should be taken when one wishes to use any kind of VCL components inside a multithreaded (multitasking) application. VCL is not thread safe, but can be used in a thread safe way!
The how and the why usually depend on the application at hand but one can attempt to generalize a bit and suggest some kind of general approach to this problem. First of all, as in the case of INDY10, one does not need to be explicitly parallelizing his code, i.e. create and execute multiple threads, in order to expose VCL to deadlocks and data inter dependencies.
In every sclient-server application, the server has to be able to handle multiple requests simultaneously, so naturally, INDY10 internally implements this functionality. This would mean that the INDY10 set of classes are responsible to manage the program's thread creation, execution and destruction procedures internally.
The most obvious place where our code is exposed to the inner workings of INDY10 and hence possible thread conflicts, is the IdTCPServerExecute (TIdTCPServer onExecute event) method.
Naturally, INDY10 provides classes (wrappers) that ensure thread-safe program flow, but since I did not manage to get enough explanation on their application and usage, I prefer a custom made approach.
Below I summarize a method ( the suggested technique is based on a previous comment I found in SO How to use TIdThreadSafe class from Indy10 ) that attempts (and presumably succeeds) in dealing with this problem:
The question I tackle below is: How to make a specific class "MyClass" ThreadSafe?
The main idea is to create kind of a wrapper class that encapsulates "MyClass" and queues the threads that try to access it in First-In-First-Out principle. The underlying objects that are used for synchronization are [Windows's Critical Section Objects.].
In the context of a client-server application, "MyClass" will contain all thread unsafe functionality of our server, so we will try to ensure that those procedures and functions are not executed by more than one working thread simultaneously. This naturally means loss of parallelism of our code, but since the approach is simple and seems to be , in some cases this maybe a useful approach.
Wrapper class Implementation:
constructor TThreadSafeObject<T>.Create(originalObject: T);
begin
tsObject := originalObject; // pass it already instantiated instance of MyClass
tsCriticalSection:= TCriticalSection.Create; // Critical section Object
end;
destructor TThreadSafeObject<T>.Destroy();
begin
FreeAndNil(tsObject);
FreeAndNil(tsCriticalSection);
inherited Destroy;
end;
function TThreadSafeObject<T>.Lock(): T;
begin
tsCriticalSection.Enter;
result:=tsObject;
end;
procedure TThreadSafeObject<T>.Unlock();
begin
tsCriticalSection.Leave;
end;
procedure TThreadSafeObject<T>.FreeOwnership();
begin
FreeAndNil(tsObject);
FreeAndNil(tsCriticalSection);
end;
MyClass Definition:
MyClass = class
public
procedure drawRandomBitmap(abitmap: TBitmap); //Draw Random Lines on TCanvas
function decToBin(i: LongInt): String; //convert decimal number to Bin.
procedure addLineToMemo(aLine: String; MemoFld: TMemo); // output message to TMemo
function randomColor(): TColor;
end;
Usage:
Since threads execute in order and wait for the thread which has the current ownership of the critical section to finish (tsCriticalSection.Enter; and tsCriticalSection.Leave;) it is logical that if you want to manage that ownership relay, you need one unique instance TThreadSafeObject (you can consider using the singleton pattern). so include:
tsMyclass:= TThreadSafeObject<MyClass>.Create(MyClass.Create);
in Form.Create and
tsMyclass.Destroy;
in Form.Close; Here tsMyclass is a global variable of type MyClass.
Usage:
Regarding the usage of MyClass try the following:
with tsMyclass.Lock do
try
addLineToMemo('MemoLine1', Memo1);
addLineToMemo('MemoLine2', Memo1);
addLineToMemo('MemoLine3', Memo1);
finally
// release ownership
tsMyclass.unlock;
end;
, where Memo1 is an instance of a TMemo component on the form.
With this, we are supposed to ensure that anything that happens when tsMyClass is locked
will be executed by only one thread at a time. An obvious drawback of this approach, however, is that since I have only one instance of tsMyclass, even if one thread is trying to draw for e.g. on the Canvas, while another is writing on the Memo, the first thread will have to wait for the second to finish and only then it will be able to carry out its job.
My questions here are:
Is the above suggested method correct? Am I still free of race
conditions or do I have some "loopholes" in the code, from where
data conflicts could occur?
How can one, in general, test for thread
unsafety of his/her applicaiton?
I would like to stress that the above approach is in no way my own doing. It is basically a summary of the solution found in 2. Nevertheless, I have decided to post again in an attempt to get some kind of closure on the topic or a kind of proof of validity for the suggested solution. Besides, repetition is mother of all knowledge, as they say.
With this, we are supposed to ensure that anything that happens when
tsMyClass is locked will be executed by only one thread at a time. An
obvious drawback of this approach, however, is that since I have only
one instance of tsMyclass, even if one thread is trying to draw for
e.g. on the Canvas, while another is writing on the Memo, the first
thread will have to wait for the second to finish and only then it
will be able to carry out its job.
I see one big problem here: the VCL (forms, drawing, etc...) lives on the main thread. Even if you block concurrent thread access, the updates need to be done in the context of the main thread. This is the part where you need to use Synhronize(), the big difference with a lock (Criticalsection) is that synchronized code is ran in the context of the main thread. The end result is basically the same, your threaded code is serialized and you lose the advantage of using threads in the first place.
Locking on the whole object can be much too coarse.
Imagine cases where some properties or methods are independent of others. If the lock works on a "global" level, many operations will be blocked needlessly.
From Reduce lock granularity – Concurrency optimization
So, how can we reduce lock granularity? With a short answer, by asking
for locks as less as possible. The basic idea is to use separate locks
to guard multiple independent state variables of a class, instead of
having only one lock in class scope.
First things first: You don't need to implement a LOCK for each of your objects, Delphi's done that for you with the TMonitor class:
TMonitor.Enter(WhateverObject);
try
// Your code goes here.
finally TMonitor.Leave(WhateverObject);
end;
just make sure you free the WhateverObject when your application shuts down, or else you'll run into a bug that I've opened on QC: http://qc.embarcadero.com/wc/qcmain.aspx?d=111795
Secondly, making an application multi-threading is a bit more involved. You can't just wrapp each call between Enter/Leave calls: your "locking" needs to take into account what the object does and what the access pattern is. Wrapping calls within Enter/Leave simply make sure that only one thread runs that method at any time, but race conditions are much more complex, and might arise from successive calls to your locked methods. Even those each method is locked, and only one thread ever called those methods at any given time, the state of the locked object might change between as a consequence of other thread's activity.
This kind of code would be just fine in a single-threaded application, but locking at method level is not enough when switching to multi-threaded:
if List.IndexOf(Something) = -1 then
List.Add(Something);

Do I need to wrap accesses to Int64's with a critical section?

I have code that logs execution times of routines by accessing QueryPerformanceCounter. Roughly:
var
FStart, FStop : Int64 ;
...
QueryPerformanceCounter (FStart) ;
... <code to be measured>
QueryPerformanceCounter (FStop) ;
<calculate FStop - FStart, update minimum and maximum execution times, etc>
Some of this logging code is inside threads, but on the other hand, there is a display UI that accesses the derived results. I figure the possibility exists of the VCL thread accessing the same variables that the logging code is also accessing. The VCL will only ever read the data (and a mangled read would not be too serious) but the logging code will read and write the data, sometimes from another thread.
I assume QueryPerformanceCounter itself is thread-safe.
The code has run happily without any sign of a problem, but I'm wondering if I need to wrap my accesses to the Int64 counters in a critical section?
I'm also wondering what the speed penalty of the critical section access is?
Any time you access multi-byte non-atomic data across thread when both reads and writes are involved, you need to serialize the access. Whether you use a critical section, mutex, semaphore, SRW lock, etc is up to you.

implementing a stack of commands for obtaining a transactional behaviour- Delphi

I need to create a tool for performing complex scripts against a database.
For several resasons I cannot rely on DB transactional behaviour, but I need to implement my own transactional system.
The approach I am trying is with the help of the command pattern (my case is more complex, here I put a simplified version for discussion):
type
IMyCommand = interface(IInterface)
procedure Execute();
procedure Undo();
end;
type
TSQLCommand = class (TInterfacedObject, IMyCommand)
private
FDBConnection: TDBConnection;
FDBQuery: TDBQuery;
FExecuteSQL: string;
FUndoSQL: string;
FExecuted: boolean; // set to True as the command has been executed
public
procedure Execute;
procedure Undo;
procedure Prepare(aExecuteSQL, aUndoSQL: string);
constructor Create(aDBConnection: TDBConnection);
destructor Destroy; override;
end;
I create a set of actions, for every action I will pass a "Execute" and "Undo" sql statement, examples:
A call to Prepare could be:
Prepare('INSERT INTO TESTTABLE (ID, DATA) VALUES (15, 'Hello')',// aExecuteSQL
'DELETE FROM TESTTABLE WHERE ID = 15'); //aUndoSQL
so somehow I am making very small changes (like inserting a single simple row, updating a single row, ...), for every change the "undo" is very obvious.
I will prepare a stack of command objects (using probably the TObjectStack collection), and call the Execute method one command at a time and as one is executed I will set FExecuted to True, and save the component to disk.
So what I whant to do is to run all the scripts, but I want to manage the cases in which something goes wrong.
If something goes wrong I would like to execute all the commands from last to first calling the Undo method. Of course before doing this I need to be able to restore from disk the components (in case the failure is an hardware failure, in case the failure is another reason I already have the stack in memory and I can easily call undo one command at a time).
Note: The main reason why I cannot rely on the DB transactional behaviour is that I need to insert also big blobs, and every blob is downloaded from internet and then inserted, so I cannot leave a transaction open for ever because I want to commit every small change to the db. What I do with blobs is download one, insert it, download next, insert it, ...
So my question is: could you suggest a way to persist to disk my objects? I have Delphi 2009, so one option is to make a TInterdacedPersistent and save the component to stream and then to file, anyway in this way I would have many files, with extra complicatinos, while I would prefer a single file. Could you suggest?
Edit: I realized TObjectStack is buggy in Delphi 2009 (Pop doesn't return a correct type), so the same can be done with TObjectStack.
I can't see a better approach than using a transaction, as Andrei K. mentioned your implementation is NOT safe, therefore using StartTransaction, Commit and Rollback is a MUST!

Can I nest critical sections? Is TCriticalSection nestable?

I want to have two procedures which can call each other, or be called from whatever threads are running, but only have one running at a time. How can I do this? Will this work correctly?
var
cs: TCriticalSection;
procedure a;
begin
cs.Acquire;
try
// Execute single threaded here.
finally
cs.Release;
end;
end;
procedure b;
begin
cs.Acquire;
try
// Execute single threaded here. Maybe with calls to procedure a.
finally
cs.Release;
end;
end;
Yes, that will work. Procedure A can call B and vice versa within the same thread and while Thread A is using procedure A or B, Thread B has to wait when it wants to use those procedures.
See the MSDN documentation about Critical Sections: http://msdn.microsoft.com/en-us/library/ms682530%28VS.85%29.aspx
Critical sections can be nested, but for every call to Acquire you must have a call to Release. Because you have your Release call in a try .. finally clause you ensure that this happens, so your code is fine.
While it is possible on Windows to acquire a critical section multiple times, it is not possible on all platforms, some of them will block on the attempt to re-acquire a synchronization object.
There is not really a need to allow for "nesting" here. If you design your classes properly, in a way that the public interface acquires and releases the critical section, and the implementation methods don't, and if you make sure that implementation methods never call interface methods, then you don't need that particular feature.
See also the Stack Overflow question "Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex)" for some details on the bad sides of recursive mutex / critical section acquisition.

Resources