how to properly wait for completion of NtCreateFile/etc? - driver

I am using native NT API in my application to access files (NtCreateFile/etc). In order to avoid dealing with STATUS_PENDING I am using FILE_SYNCHRONOUS_IO_NONALERT flag when opening related file. So, opening file looks like this:
UNICODE_STRING fname = toNtUnicode(ntpath);
OBJECT_ATTRIBUTES oa;
InitializeObjectAttributes(&oa, &fname, 0, at.handle(), NULL);
HANDLE h;
IO_STATUS_BLOCK io_status;
NTSTATUS r = NtOpenFile(&h, GENERIC_READ|SYNCHRONIZE, &oa, &io_status,
FILE_SHARE_READ, FILE_SYNCHRONOUS_IO_NONALERT|FILE_DIRECTORY_FILE);
if (r != STATUS_SUCCESS)
...; // error handling
Unfortunately, it causes kernel to serialize all operations on given handle. I.e. if I try to execute multiple reads in parallel (using multiple threads) -- only one request will be processed at any point in time.
I could get rid of serialization:
HANDLE h;
IO_STATUS_BLOCK io_status;
NTSTATUS r = NtOpenFile(&h, GENERIC_READ|SYNCHRONIZE, &oa, &io_status,
FILE_SHARE_READ, FILE_DIRECTORY_FILE);
if (r == STATUS_PENDING)
...; // what to do here???
but how exactly should I wait for completion -- WaitForSingleObject() on file handle? As far as I know it can change to signaled state due to many reasons -- is there any way to tell that my open file (or dir) operation completed?
Similarly, if I submit multiple reads (from multiple threads) -- how can I tell which one (if any) has finished?

NtOpenFile is synchronous api. it never return STATUS_PENDING to you. even if driver return STATUS_PENDING for IRP_MJ_CREATE i/o sub-system will be wait for IRP complete
https://github.com/Zer0Mem0ry/ntoskrnl/blob/master/Io/iomgr/parse.c#L1404
so you never need check for STATUS_PENDING after NtOpenFile and never need wait (and in principle we can not wait here - we yet not have file handle -so can not wait on it or bind it to say IOCP. we not pass any event or another callback mechanism for NtOpenFile)

Related

How to pass native void pointers to a Dart Isolate - without copying?

I am working on exposing an audio library (C library) for Dart. To trigger the audio engine, it requires a few initializations steps (non blocking for UI), then audio processing is triggered with a perform function, which is blocking (audio processing is a heavy task). That is why I came to read about Dart isolates.
My first thought was that I only needed to call the performance method in the isolate, but it doesn't seem possible, since the perform function takes the engine state as first argument - this engine state is an opaque pointer ( Pointer in dart:ffi ). When trying to pass engine state to a new isolate with compute function, Dart VM returns an error - it cannot pass C pointers to an isolate.
I could not find a way to pass this data to the isolate, I assume this is due to the separate memory of main isolate and the one I'm creating.
So, I should probably manage the entire engine state in the isolate which means :
Create the engine state
Initialize it with some options (strings)
trigger the perform function
control audio at runtime
I couldn't find any example on how to perform this actions in the isolate, but triggered from main thread/isolate. Neither on how to manage isolate memory (keep the engine state, and use it). Of course I could do
Here is a non-isolated example of what I want to do :
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parametersString);
startEngine(engineState);
perform(engineState);
And at runtime, triggered by UI actions (like slider value changed, or button clicked) :
setEngineControl(engineState, valueToSet);
double controleValue = getEngineControl(engineState);
The engine state could be encapsulated in a class, I don't think it really matters here.
Whether it is a class or an opaque datatype, I can't find how to manage and keep this state, and perform triggers from main thread (processed in isolate). Any idea ?
In advance, thanks.
PS: I notice, while writing, that my question/explaination may not be precise, I have to say I'm a bit lost here, since I never used Dart Isolates. Please tell me if some information is missing.
EDIT April 24th :
It seems to be working with creating and managing object state inside the Isolate. But the main problem isn't solved. Because the perform method is actually blocking while it is not completed, there is no way to still receive messages in the isolate.
An option I thought first was to use the performBlock method, which only performs a block of audio samples. Like this :
while(performBlock(engineState)) {
// listen messages, and do something
}
But this doesn't seem to work, process is still blocked until audio performance finishes. Even if this loop is called in an async method in the isolate, it blocks, and no message are read.
I now think about the possibility to pass the Pointer<Void> managed in main isolate to another, that would then be the worker (for perform method only), and then be able to trigger some control methods from main isolate.
The isolate Dart package provides a registry sub library to manage some shared memory. But it is still impossible to pass void pointer between isolates.
[ERROR:flutter/lib/ui/ui_dart_state.cc(157)] Unhandled Exception: Invalid argument(s): Native objects (from dart:ffi) such as Pointers and Structs cannot be passed between isolates.
Has anyone already met this kind of situation ?
It is possible to get an address which this Pointer points to as a number and construct a new Pointer from this address (see Pointer.address and Pointer.fromAddress()). Since numbers can freely be passed between isolates, this can be used to pass native pointers between them.
In your case that could be done, for example, like this (I used Flutter's compute to make the example a bit simpler but that would apparently work with explicitly using Send/ReceivePorts as well)
// Callback to be used in a backround isolate.
// Returns address of the new engine.
int initEngine(String parameters) {
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parameters);
startEngine(engineState);
return engineState.address;
}
// Callback to be used in a backround isolate.
// Does whichever processing is needed using the given engine.
void processWithEngine(int engineStateAddress) {
final engineState = Pointer<Void>.fromAddress(engineStateAddress);
process(engineState);
}
void main() {
// Initialize the engine in a background isolate.
final address = compute(initEngine, "parameters");
final engineState = Pointer<Void>.fromAddress(address);
// Do some heavy computation in a background isolate using the engine.
compute(processWithEngine, engineState.address);
}
I ended up doing the processing of callbacks inside the audio loop itself.
while(performAudio())
{
tasks.forEach((String key, List<int> value) {
double val = getCallback(key);
value.forEach((int element) {
callbackPort.send([element, val]);
});
});
}
Where the 'val' is the thing you want to send to callback. The list of int 'value' is a list of callback index.
Let's say you audio loop performs with vector size of 512 samples, you will be able to pass your callbacks after every 512 audio samples are processed, which means 48000 / 512 times per second (assuming you sample rate is 48000). This method is not the best one but it works, I still have to see if it works in very intensive processing context though. Here, it has been thought for realtime audio, but it could work the same for audio rendering.
You can see the full code here : https://framagit.org/johannphilippe/csounddart/-/blob/master/lib/csoundnative.dart

AcceptEx() synchronous completion?

I am using IO completion ports and AcceptEx() whilst learning about servers, and am studying Len Holgate's free server framework to do this. He has the following code:
// Basically calls AcceptEx() via a previously obtained function pointer
if (!CMSWinSock::AcceptEx(
m_listeningSocket,
pSocket->m_socket,
reinterpret_cast<void*>(const_cast<BYTE*>(pBuffer->GetBuffer())),
bufferSize,
sizeOfAddress,
sizeOfAddress,
&bytesReceived,
pBuffer))
{
const DWORD lastError = ::WSAGetLastError();
if (ERROR_IO_PENDING != lastError)
{
Output(_T("CSocketServerEx::Accept() - AcceptEx: ") + GetLastErrorMessage(lastError));
pSocket->Release();
pBuffer->Release();
}
}
else
{
// Accept completed synchronously. We need to marshal the data recieved over to the
// worker thread ourselves...
m_iocp.PostStatus((ULONG_PTR)m_listeningSocket, bytesReceived, pBuffer);
}
I am confused about the "Accept completed synchronously" else-case. I have tried many times to get this code path to be hit (by pausing the code before I issue the AcceptEx, connecting, then resuming the code), but whenever I try the call always fails with ERROR_IO_PENDING and I get my notification packet. Furthermore, I have read this MS knowledgebase article (which I may have misinterpreted) which states
Additionally, if a Winsock2 I/O call returns SUCCESS or IO_PENDING, it
is guaranteed that a completion packet will be queued to the IOCP when
the I/O completes
However, I am thinking this doesn't apply to AcceptEx() because the dox for AcceptEx() states of the parameter lpdwBytesReceived
This parameter is set only if the operation completes synchronously.
So it seems it can complete synchronously...can someone tell me how AcceptEx() can complete synchronously (i.e. how I can replicate it in my server?)
Additionally, if a Winsock2 I/O call returns SUCCESS or ERROR_IO_PENDING, it
is guaranteed that a completion packet will be queued to the IOCP when
the I/O completes
this is apply for any I/O request if completion port is associated with the file. but begin from windows vista this also depend from notification mode set for a file handle.
but need first begin look from native view.
by default, if FILE_SKIP_COMPLETION_PORT_ON_SUCCESS not set, exist 3 case by returned NTSTATUS status :
NT_SUCCESS(status) or status >= 0 - will be completion
NT_ERROR(status) or status >= 0xc0000000 - will be no completion
NT_WARNING(status) or status < 0xc0000000 - unclear - if this
error from I/O manager (say - STATUS_DATATYPE_MISALIGNMENT - will
be no completion). if this error from driver (say
STATUS_NO_MORE_FILES - will be completion).
the win32 layer usually separate check for STATUS_PENDING and return ERROR_IO_PENDING in this case (but exist and exceptions, like ReadDirectoryChangesW). otherwise in case NT_ERROR(status) api return fail and set error code. otherwise return success. visible that case NT_WARNING(status) considered as success, but in this case, if error from I/O manager, will be no completion. I/O usually return errors from NT_ERROR(status) range, if parameters is incorrect. only case which i know (for asynchronous api) - STATUS_DATATYPE_MISALIGNMENT can be returned in case wrong aligned buffers, when I/O manager have special knowledge about buffer align. in NtNotifyChangeDirectoryFile
(ReadDirectoryChangesW for win32) or NtQueryDirectoryFile (no corresponded win32 api). so only case which i know when will be no completion, when win32 return success - call ReadDirectoryChangesW with unaligned lpBuffer (it must be DWORD-aligned ) - in this case I/O manager just return STATUS_DATATYPE_MISALIGNMENT but win32 layer interpret this as success code and return true. but will be no completion in this case. however this is rarely case and you probably need use wrong align structures for this. so in general yes:
by default if I/O call returns SUCCESS or ERROR_IO_PENDING will be queued a completion entry to the port. (with special exception case which i try describe)
if we set FILE_SKIP_COMPLETION_PORT_ON_SUCCESS on file object (note this is per file object but not per file handle - documentation not exactly here) all become much more simply and efficient - completion entry will be queue to the port - when and only when I/O request return STATUS_PENDING. ERROR_IO_PENDING from win32 view (except ReadDirectoryChangesW (maybe some else api ?) where win32 layer simply lost return code information)
However, I am thinking this doesn't apply to AcceptEx()
you mistake. this, how i say, apply to any io request. "This parameter is set only if the operation completes synchronously." - and so what ?
if look to code snippet, clear visible that code assume - in case AcceptEx completed synchronous and no error occurs - will be no io completion. or SetFileCompletionNotificationModes(m_listeningSocket, FILE_SKIP_COMPLETION_PORT_ON_SUCCESS) called or code is wrong - will be io completion in this case and not need m_iocp.PostStatus - this is fatal error will be. however i doubt that code used FILE_SKIP_COMPLETION_PORT_ON_SUCCESS - so it wrong. but error never raised because driver side implementation of AcceptEx (underlining ioctl) never return STATUS_SUCCESS: it check parameters - if it wrong - just return some error, otherwise always return STATUS_PENDING. as result, for asynchronous sockets AcceptEx never return true and code never jump to error else case. but anyway code is wrong. also i think design not the best - in case we determinate will be no completion - better just direct call completion routine with returned error code instead Release() (this will be done in completion routine) or PostStatus - for what post ?! - call direct.
how AcceptEx() can complete synchronously
very easy - if m_listeningSocket is handle to synchronous file object. however in this case you can not bind IOCP to file (it can be bind only in case asynchronous file object).
about lpdwBytesReceived parameter - system copy Information member of IO_STATUS_BLOCK or if want OVERLAPPED.InternalHigh, in case operation is completed just. in case pending returned - this data simply not ready and not filled. you got actual number of bytes returned by io in completion

Multithreaded TIdHTTP file download with filestream results in corrupted file

I've followed quite a few examples of creating a multithreaded file-download using TIdHTTP component, but I'm stumped with the following problem:
But first, a simplified version of my code.
This part calculates the size of the file that needs to be downloaded:
TIdHTTP* tcpClient = new TIdHTTP(NULL);
tcpClient->ProtocolVersion = pv1_1;
tcpClient->Head(URL);
__int64 LSize = tcpClient->Response->ContentLength;
System::Classes::TFileStream *STFile = new System::Classes::TFileStream(FFileName, fmCreate);
try
{
STFile->Size = LSize;
}
__finally
{
delete STFile;
};
delete tcpClient;
Next is a part of the Execute method of a thread that gets called multiple times by my MainForm. FStartPos is the start position in the file for that thread (i.o.w first thread starts at position 0), FEndPos is the end of the block that needs to be retrieved:
TFileStream *LStream = new TFileStream(FFileName, fmOpenWrite | fmShareDenyNone);
LHttpClient = new TIdHTTP(NULL);
LHttpClient->ProtocolVersion = pv1_1;
LHttpClient->Request->BasicAuthentication = true;
LHttpClient->Request->Username = FUsername;
LHttpClient->Request->Password = FPassword;
try
{
LHttpClient->OnWork = ReceiveDataEvent;
try
{
try
{
LStream->Seek(FStartPos, TSeekOrigin::soBeginning);
LHttpClient->Request->Range = "bytes="+UnicodeString(FStartPos)+"-"+UnicodeString(FEndPos);
LHttpClient->Get(URL, LStream);
IsFin = true;
} catch(Exception &e)
{
// log the error
}
}
__finally
{
LHttpClient->Disconnect();
delete LHttpClient;
}
}
__finally
{
delete LStream;
}
When I try to download e.g. an 87MB file, I create 5 download threads, so each thread should be downloading 17MB odd. What I see happening, is that the file gets created (and reports 87MB in size). Usually thread number 3 finishes first, and the file size jumps to 52MB, then thread 1 finishes and the file size jumps to 17MB, and lastly thread 4 finishes with the file now at 69MB (instead of the 87MB is started as).
I have a feeling I'm either using TFileStream incorrectly, or using it in a way that it wasn't designed to be used in.
My question is, is my code wrong? Or is there a better / more appropriate way to write from multiple threads to a single file, but each in its own block?
(I'm running C++ Builder 10.1 with the built in Indy 10 I think)
Thanks in advance for any advice.
-G-
The root of your problem is that your worker threads are using the fmOpenWrite flag when creating the TFileStream:
Open the file for writing only. Writing to the file completely replaces the current contents.
This means that each thread is wiping out whatever is already in the file and resetting its size to 0!
For what you are attempting, you need to use fmOpenReadWrite instead:
Open the file to modify the current contents rather than replace them.
That being said, there are some other things to consider:
After Head() exits, you should make sure that the Response->AcceptRange property is set to "bytes". If it is not, DO NOT spawn multiple threads to download the file, as the server will ignore whatever you assign to the Request->Range property, and each request will download the entire file in full.
when you do spawn multiple threads for downloading ranges, after Get() has received the response headers, you should check the Response->ContentRange... properties to make sure you are getting what you expected, and to handle the case when the server might send you a smaller range than you requested. For instance, you can do that checking in the OnHeadersAvailable event, and if something unexpected happens then you can cancel the download (by raising an exception) before anything gets written to the TFileStream. This is especially important for range handling, in case you get a 200 response (receiving the whole file) instead of a 206 response (receiving only a range of the file).
the Request->Range property is deprecated, you should be using the Request->Ranges property instead:
//LHttpClient->Request->Range = "bytes="+UnicodeString(FStartPos)+"-"+UnicodeString(FEndPos);
TIdEntityRange *range = LHttpClient->Request->Ranges->Add();
range->StartPos = FStartPos;
range->EndPos = FEndPos;

One thread showing interest in another thread (consumer / producer)

I would like to have to possibility to make thread (consumer) express interest in when another thread (producer) makes something. But not all the time.
Basically I want to make a one-shot consumer. Ideally the producer through would go merrily about its business until one (or many) consumers signal that they want something, in which case the producer would push some data into a variable and signal that it has done so. The consumer will wait until the variable has become filled.
It must also be so that the one-shot consumer can decide that it has waited too long and abandon the wait (a la pthread_cond_timedwait)
I've been reading many articles and SO questions about different ways to synchronize threads. Currently I'm leaning towards a condition variable approach.
I would like to know if this is a good way to go about it (being a novice at thread programming I probably have quite a few bugs in there), or if it perhaps would be better to (ab)use semaphores for this situation? Or something else entirely? Just an atomic assign to a pointer variable if available? I currently don't see how these would work safely, probably because I'm trying to stay on the safe side, this application is supposed to run for months, without locking up. Can I do without the mutexes in the producer? i.e.: just signal a condition variable?
My current code looks like this:
consumer {
pthread_mutex_lock(m);
pred = true; /* signal interest */
while (pred) {
/* wait a bit and hopefully get an answer before timing out */
pthread_cond_timedwait(c, m, t);
/* it is possible that the producer never produces anything, in which
case the pred will stay true, we must "designal" interest here,
unfortunately the also means that a spurious wake could make us miss
a good answer, no? How to combat this? */
pred = false;
}
/* if we got here that means either an answer is available or we timed out */
//... (do things with answer if not timed out, otherwise assign default answer)
pthread_mutex_unlock(m);
}
/* this thread is always producing, but it doesn't always have listeners */
producer {
pthread_mutex_lock(m);
/* if we have a listener */
if (pred) {
buffer = "work!";
pred = false;
pthread_cond_signal(c);
}
pthread_mutex_unlock(m);
}
NOTE: I'm on a modern linux and can make use of platform-specific functionality if necessary
NOTE2: I used the seemingly global variables m, c, and t. But these would be different for every consumer.
High-level recap
I want a thread to be able to register for an event, wait for it for a specified time and then carry on. Ideally it should be possible for more than one thread to register at the same time and all threads should get the same events (all events that came in the timespan).
What you want is something similar to a std::future in c++ (doc). A consumer requests a task to be performed by a producer using a specific function. That function creates a struct called future (or promise), holding a mutex, a condition variable associated with the task as well as a void pointer for the result, and returns it to the caller. It also put that struct, the task id, and the parameters (if any) in a work queue handled by the producer.
struct future_s {
pthread_mutex_t m;
pthread_cond_t c;
int flag;
void *result;
};
// basic task outline
struct task_s {
struct future_s result;
int taskid;
};
// specific "mytask" task
struct mytask_s {
struct future_s result;
int taskid;
int p1;
float p2;
};
future_s *do_mytask(int p1, float p2){
// allocate task data
struct mytask_s * t = alloc_task(sizeof(struct mytask_s));
t->p1 = p1;
t->p2 = p2;
t->taskid = MYTASK_ID;
task_queue_add(t);
return (struct future_s *)t;
}
Then the producer pull the task out of the queue, process it, and once terminated, put the result in the future and trigger the variable.
The consumer may wait for the future or do something else.
For a cancellable futures, include a flag in the struct to indicate that the task is cancelled. The future is then either:
delivered, the consumer is the owner and must deallocate it
cancelled, the producer remains the owner and disposes of it.
The producer must therefore check that the future has not been cancelled before triggering the condition variable.
For a "shared" future, the flag turns into a number of subscribers. If the number is above zero, the order must be delivered. The consumer owning the result is left to be decided between all consumers (First come first served? Is the result passed along to all consumers?).
Any access to the future struct must be mutexed (which works well with the condition variable).
Regarding the queues, they may be implemented using a linked list or an array (for versions with limited capacity). Since the functions creating the futures may be called concurrently, they have to be protected with a lock, which is usually implemented with a mutex.

Resetting comm event mask

I have been doing overlapped serial port communication in Delphi lately and there is one problem I'm not sure how to solve.
I communicate with a modem. I write a request frame (an AT command) to the modem's COM port and then wait for the modem to respond. The event mask of the port is set to EV_RXCHAR, so when I write a request, I call WaitCommEvent() and start waiting for data to appear in the input queue. When overlapped waiting for event finishes, I immediately start reading data from the queue and read all that the device sends at once:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) do something and then goto 1
Waiting for event finishes after first byte appears in the input queue. During my read operation, however, more bytes appear in the queue and each of them causes an internal event flag to be set. This means that when I read all the data from the queue and then call WaitCommEvent() for the second time, it will immediately return with EV_RXCHAR mask, even though there is no data to be read.
How should I handle reading and waiting for event to be sure that the event mask returned by WaitCommEvent() is always valid? Is it possible to reset the flags of the serial port so that when I read all data from the queue and call WaitCommEvent() after then, it will not return immediately with a mask that was valid before I read the data?
The only solution that comes to my mind is this:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) call WaitCommEvent() which should return true immediately at the same time resetting the event flag set internally
5) do something and goto 1
Is it a good idea or is it stupid? Of course I know that the modem almost always finishes its answers with CRLF characters so I could set the comm mask to EV_RXFLAG and wait for the #10 character to appear, but there are many other devices with which I communicate and they do not always send frame end characters.
Your help will be appreciated. Thanks in advance!
Mariusz.
Your solution does sound workable. I just use a state machine to handle the transitions.
(psuedocode)
ioState := ioIdle;
while (ioState <> ioFinished) and (not aborted) do
Case ioState of
ioIdle : if there is data to read then set state to ioMidFrame
ioMidframe : if data to read then read, if end of frame set to ioEndFrame
ioEndFrame : process the data and set to ioFinished
ioFinished : // don't do anything, for doc purposes only.
end;

Resources