I am using IO completion ports and AcceptEx() whilst learning about servers, and am studying Len Holgate's free server framework to do this. He has the following code:
// Basically calls AcceptEx() via a previously obtained function pointer
if (!CMSWinSock::AcceptEx(
m_listeningSocket,
pSocket->m_socket,
reinterpret_cast<void*>(const_cast<BYTE*>(pBuffer->GetBuffer())),
bufferSize,
sizeOfAddress,
sizeOfAddress,
&bytesReceived,
pBuffer))
{
const DWORD lastError = ::WSAGetLastError();
if (ERROR_IO_PENDING != lastError)
{
Output(_T("CSocketServerEx::Accept() - AcceptEx: ") + GetLastErrorMessage(lastError));
pSocket->Release();
pBuffer->Release();
}
}
else
{
// Accept completed synchronously. We need to marshal the data recieved over to the
// worker thread ourselves...
m_iocp.PostStatus((ULONG_PTR)m_listeningSocket, bytesReceived, pBuffer);
}
I am confused about the "Accept completed synchronously" else-case. I have tried many times to get this code path to be hit (by pausing the code before I issue the AcceptEx, connecting, then resuming the code), but whenever I try the call always fails with ERROR_IO_PENDING and I get my notification packet. Furthermore, I have read this MS knowledgebase article (which I may have misinterpreted) which states
Additionally, if a Winsock2 I/O call returns SUCCESS or IO_PENDING, it
is guaranteed that a completion packet will be queued to the IOCP when
the I/O completes
However, I am thinking this doesn't apply to AcceptEx() because the dox for AcceptEx() states of the parameter lpdwBytesReceived
This parameter is set only if the operation completes synchronously.
So it seems it can complete synchronously...can someone tell me how AcceptEx() can complete synchronously (i.e. how I can replicate it in my server?)
Additionally, if a Winsock2 I/O call returns SUCCESS or ERROR_IO_PENDING, it
is guaranteed that a completion packet will be queued to the IOCP when
the I/O completes
this is apply for any I/O request if completion port is associated with the file. but begin from windows vista this also depend from notification mode set for a file handle.
but need first begin look from native view.
by default, if FILE_SKIP_COMPLETION_PORT_ON_SUCCESS not set, exist 3 case by returned NTSTATUS status :
NT_SUCCESS(status) or status >= 0 - will be completion
NT_ERROR(status) or status >= 0xc0000000 - will be no completion
NT_WARNING(status) or status < 0xc0000000 - unclear - if this
error from I/O manager (say - STATUS_DATATYPE_MISALIGNMENT - will
be no completion). if this error from driver (say
STATUS_NO_MORE_FILES - will be completion).
the win32 layer usually separate check for STATUS_PENDING and return ERROR_IO_PENDING in this case (but exist and exceptions, like ReadDirectoryChangesW). otherwise in case NT_ERROR(status) api return fail and set error code. otherwise return success. visible that case NT_WARNING(status) considered as success, but in this case, if error from I/O manager, will be no completion. I/O usually return errors from NT_ERROR(status) range, if parameters is incorrect. only case which i know (for asynchronous api) - STATUS_DATATYPE_MISALIGNMENT can be returned in case wrong aligned buffers, when I/O manager have special knowledge about buffer align. in NtNotifyChangeDirectoryFile
(ReadDirectoryChangesW for win32) or NtQueryDirectoryFile (no corresponded win32 api). so only case which i know when will be no completion, when win32 return success - call ReadDirectoryChangesW with unaligned lpBuffer (it must be DWORD-aligned ) - in this case I/O manager just return STATUS_DATATYPE_MISALIGNMENT but win32 layer interpret this as success code and return true. but will be no completion in this case. however this is rarely case and you probably need use wrong align structures for this. so in general yes:
by default if I/O call returns SUCCESS or ERROR_IO_PENDING will be queued a completion entry to the port. (with special exception case which i try describe)
if we set FILE_SKIP_COMPLETION_PORT_ON_SUCCESS on file object (note this is per file object but not per file handle - documentation not exactly here) all become much more simply and efficient - completion entry will be queue to the port - when and only when I/O request return STATUS_PENDING. ERROR_IO_PENDING from win32 view (except ReadDirectoryChangesW (maybe some else api ?) where win32 layer simply lost return code information)
However, I am thinking this doesn't apply to AcceptEx()
you mistake. this, how i say, apply to any io request. "This parameter is set only if the operation completes synchronously." - and so what ?
if look to code snippet, clear visible that code assume - in case AcceptEx completed synchronous and no error occurs - will be no io completion. or SetFileCompletionNotificationModes(m_listeningSocket, FILE_SKIP_COMPLETION_PORT_ON_SUCCESS) called or code is wrong - will be io completion in this case and not need m_iocp.PostStatus - this is fatal error will be. however i doubt that code used FILE_SKIP_COMPLETION_PORT_ON_SUCCESS - so it wrong. but error never raised because driver side implementation of AcceptEx (underlining ioctl) never return STATUS_SUCCESS: it check parameters - if it wrong - just return some error, otherwise always return STATUS_PENDING. as result, for asynchronous sockets AcceptEx never return true and code never jump to error else case. but anyway code is wrong. also i think design not the best - in case we determinate will be no completion - better just direct call completion routine with returned error code instead Release() (this will be done in completion routine) or PostStatus - for what post ?! - call direct.
how AcceptEx() can complete synchronously
very easy - if m_listeningSocket is handle to synchronous file object. however in this case you can not bind IOCP to file (it can be bind only in case asynchronous file object).
about lpdwBytesReceived parameter - system copy Information member of IO_STATUS_BLOCK or if want OVERLAPPED.InternalHigh, in case operation is completed just. in case pending returned - this data simply not ready and not filled. you got actual number of bytes returned by io in completion
Related
I am working on exposing an audio library (C library) for Dart. To trigger the audio engine, it requires a few initializations steps (non blocking for UI), then audio processing is triggered with a perform function, which is blocking (audio processing is a heavy task). That is why I came to read about Dart isolates.
My first thought was that I only needed to call the performance method in the isolate, but it doesn't seem possible, since the perform function takes the engine state as first argument - this engine state is an opaque pointer ( Pointer in dart:ffi ). When trying to pass engine state to a new isolate with compute function, Dart VM returns an error - it cannot pass C pointers to an isolate.
I could not find a way to pass this data to the isolate, I assume this is due to the separate memory of main isolate and the one I'm creating.
So, I should probably manage the entire engine state in the isolate which means :
Create the engine state
Initialize it with some options (strings)
trigger the perform function
control audio at runtime
I couldn't find any example on how to perform this actions in the isolate, but triggered from main thread/isolate. Neither on how to manage isolate memory (keep the engine state, and use it). Of course I could do
Here is a non-isolated example of what I want to do :
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parametersString);
startEngine(engineState);
perform(engineState);
And at runtime, triggered by UI actions (like slider value changed, or button clicked) :
setEngineControl(engineState, valueToSet);
double controleValue = getEngineControl(engineState);
The engine state could be encapsulated in a class, I don't think it really matters here.
Whether it is a class or an opaque datatype, I can't find how to manage and keep this state, and perform triggers from main thread (processed in isolate). Any idea ?
In advance, thanks.
PS: I notice, while writing, that my question/explaination may not be precise, I have to say I'm a bit lost here, since I never used Dart Isolates. Please tell me if some information is missing.
EDIT April 24th :
It seems to be working with creating and managing object state inside the Isolate. But the main problem isn't solved. Because the perform method is actually blocking while it is not completed, there is no way to still receive messages in the isolate.
An option I thought first was to use the performBlock method, which only performs a block of audio samples. Like this :
while(performBlock(engineState)) {
// listen messages, and do something
}
But this doesn't seem to work, process is still blocked until audio performance finishes. Even if this loop is called in an async method in the isolate, it blocks, and no message are read.
I now think about the possibility to pass the Pointer<Void> managed in main isolate to another, that would then be the worker (for perform method only), and then be able to trigger some control methods from main isolate.
The isolate Dart package provides a registry sub library to manage some shared memory. But it is still impossible to pass void pointer between isolates.
[ERROR:flutter/lib/ui/ui_dart_state.cc(157)] Unhandled Exception: Invalid argument(s): Native objects (from dart:ffi) such as Pointers and Structs cannot be passed between isolates.
Has anyone already met this kind of situation ?
It is possible to get an address which this Pointer points to as a number and construct a new Pointer from this address (see Pointer.address and Pointer.fromAddress()). Since numbers can freely be passed between isolates, this can be used to pass native pointers between them.
In your case that could be done, for example, like this (I used Flutter's compute to make the example a bit simpler but that would apparently work with explicitly using Send/ReceivePorts as well)
// Callback to be used in a backround isolate.
// Returns address of the new engine.
int initEngine(String parameters) {
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parameters);
startEngine(engineState);
return engineState.address;
}
// Callback to be used in a backround isolate.
// Does whichever processing is needed using the given engine.
void processWithEngine(int engineStateAddress) {
final engineState = Pointer<Void>.fromAddress(engineStateAddress);
process(engineState);
}
void main() {
// Initialize the engine in a background isolate.
final address = compute(initEngine, "parameters");
final engineState = Pointer<Void>.fromAddress(address);
// Do some heavy computation in a background isolate using the engine.
compute(processWithEngine, engineState.address);
}
I ended up doing the processing of callbacks inside the audio loop itself.
while(performAudio())
{
tasks.forEach((String key, List<int> value) {
double val = getCallback(key);
value.forEach((int element) {
callbackPort.send([element, val]);
});
});
}
Where the 'val' is the thing you want to send to callback. The list of int 'value' is a list of callback index.
Let's say you audio loop performs with vector size of 512 samples, you will be able to pass your callbacks after every 512 audio samples are processed, which means 48000 / 512 times per second (assuming you sample rate is 48000). This method is not the best one but it works, I still have to see if it works in very intensive processing context though. Here, it has been thought for realtime audio, but it could work the same for audio rendering.
You can see the full code here : https://framagit.org/johannphilippe/csounddart/-/blob/master/lib/csoundnative.dart
I am using native NT API in my application to access files (NtCreateFile/etc). In order to avoid dealing with STATUS_PENDING I am using FILE_SYNCHRONOUS_IO_NONALERT flag when opening related file. So, opening file looks like this:
UNICODE_STRING fname = toNtUnicode(ntpath);
OBJECT_ATTRIBUTES oa;
InitializeObjectAttributes(&oa, &fname, 0, at.handle(), NULL);
HANDLE h;
IO_STATUS_BLOCK io_status;
NTSTATUS r = NtOpenFile(&h, GENERIC_READ|SYNCHRONIZE, &oa, &io_status,
FILE_SHARE_READ, FILE_SYNCHRONOUS_IO_NONALERT|FILE_DIRECTORY_FILE);
if (r != STATUS_SUCCESS)
...; // error handling
Unfortunately, it causes kernel to serialize all operations on given handle. I.e. if I try to execute multiple reads in parallel (using multiple threads) -- only one request will be processed at any point in time.
I could get rid of serialization:
HANDLE h;
IO_STATUS_BLOCK io_status;
NTSTATUS r = NtOpenFile(&h, GENERIC_READ|SYNCHRONIZE, &oa, &io_status,
FILE_SHARE_READ, FILE_DIRECTORY_FILE);
if (r == STATUS_PENDING)
...; // what to do here???
but how exactly should I wait for completion -- WaitForSingleObject() on file handle? As far as I know it can change to signaled state due to many reasons -- is there any way to tell that my open file (or dir) operation completed?
Similarly, if I submit multiple reads (from multiple threads) -- how can I tell which one (if any) has finished?
NtOpenFile is synchronous api. it never return STATUS_PENDING to you. even if driver return STATUS_PENDING for IRP_MJ_CREATE i/o sub-system will be wait for IRP complete
https://github.com/Zer0Mem0ry/ntoskrnl/blob/master/Io/iomgr/parse.c#L1404
so you never need check for STATUS_PENDING after NtOpenFile and never need wait (and in principle we can not wait here - we yet not have file handle -so can not wait on it or bind it to say IOCP. we not pass any event or another callback mechanism for NtOpenFile)
A 3rd party data loss prevention driver when enabled driver verifier on it causes driver verifier bugcheck based on IrqlZwPassive Rule
The crash includes the following information:
ZwOpenKey should only be called at IRQL = PASSIVE_LEVEL.
What are some of the potential impacts to a Windows system if ZwOpenKey is used outside of IRQL=PASSIVE_LEVEL?
Is this always a serious problem that we should raise with a vendor, or only in certain scenarios.
all Zw api in kernel must be called only on PASSIVE_LEVEL. this is by design. if call it on APC_LEVEL this already will be UB some times this can work, some times produce hang or crash. say in case ZwOpenKey - registry manager can read key data from disk, if it still not in memory. so pass IRP to filesystem and wait for it complete. but Irp for completion can insert special APC (IopCompleteRequest) in calling thread. if thread on APC level - APC will not be executed, until IRQL of thread not lower to passive. but it never done - he wait on IRP complete..
another point - on exit from Zw service, system check - are UserApcPending in Thread and if yes, raise IRQL to APC_LEVEL, initiate user apc, and lower it back to PASSIVE_LEVEL (system assume that Zw called on PASSIVE_LEVEL) - so you can enter to Zw api at APC_LEVEL and exit on PASSIVE_LEVEL. can ask - why thread at some time have APC_LEVEL ? simply, because nothing to do IRQL raised ? or exist some requirements why at some point must be APC_LEVEL ? if yes, what is be if situation require stay on APC_LEVEL but thread ahead of time lower IRQL to PASSIVE_LEVEL ? really UB. in most case can be nothing. but in some case can be very nasty bug which very hard catch and research.
TWaitResult.wrIOCompletion is undocumented. Does anyone know when and how it's used?
It is used only on Windows, by THandleObject (and its descendants TEvent, TSimpleEvent, TMutex, and TSemaphore) in the following methods:
THandleObject.WaitFor(). When the object is created with UseCOMWait set to True, the wait is handled by the Win32 API CoWaitForMultipleHandles() function, where wrIOCompletion is returned if RPC_S_CALLPENDING is reported. When UseCOMWait is False instead, the wait is handled by the Win32 API WaitForMultipleObjectsEx() function 1.
THandleObject.WaitForMultiple(). wrIOCompletion is returned if the UseCOMWait parameter is True and CoWaitForMultipleHandles() reports RPC_S_CALLPENDING, or when UseCOMWait is False and WaitForMultipleObjectsEx() reports WAIT_IO_COMPLETION.
Either way, the meaning is the same 2:
The wait was ended by one or more user-mode asynchronous procedure calls (APC) queued to the thread
Refer to MSDN for more details about APC queues:
Asynchronous Procedure Calls
Alertable I/O
In a nutshell, an Alertable I/O or APC operation allows a user-defined operation/function to be queued in a thread so it will be called by the thread when in a safe state to make such a call. wrIOCompletion indicates that the thread that is calling WaitFor/Multiple() had to stop waiting before the timeout elapsed so it could execute one or more queued Alertable/APC functions. The thread will have to call WaitFor/Multiple() again to finish waiting on its desired object(s) 3.
1: THandleObject.WaitFor() does not currently handle WAIT_IO_COMPLETION (bug?) when UseCOMWait is false. It will return wrError instead (and the value of the THandleObject.LastError property will not be assigned!)
2: the CoWaitForMultipleHandles() documentation describes RPC_S_CALLPENDING as "The timeout period elapsed before the required handle or handles were signaled", but that would be a more appropriate description for RPC_E_TIMEOUT instead. (documentation error?)
3: in practice wrIOCompletion should never happen, because CoWaitForMultipleHandles() is never called with the COWAIT_ALERTABLE flag:
If the COWAIT_ALERTABLE flag is set in dwFlags, a value of WAIT_IO_COMPLETION indicates the wait was ended by one or more user-mode asynchronous procedure calls (APC) queued to the thread.
And WaitForMultipleObjectsEx() is never called with its bAlertable parameter set to True:
bAlertable [in]
If this parameter is TRUE and the thread is in the waiting state, the function returns when the system queues an I/O completion routine or APC, and the thread runs the routine or function. Otherwise, the function does not return and the completion routine or APC function is not executed.
These conditions are needed to trigger the wrIOCompletion result. I have opened a bug report for this in Quality Portal:
RSP-14047 THandleObject never returns wrIOCompletion.
I have been doing overlapped serial port communication in Delphi lately and there is one problem I'm not sure how to solve.
I communicate with a modem. I write a request frame (an AT command) to the modem's COM port and then wait for the modem to respond. The event mask of the port is set to EV_RXCHAR, so when I write a request, I call WaitCommEvent() and start waiting for data to appear in the input queue. When overlapped waiting for event finishes, I immediately start reading data from the queue and read all that the device sends at once:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) do something and then goto 1
Waiting for event finishes after first byte appears in the input queue. During my read operation, however, more bytes appear in the queue and each of them causes an internal event flag to be set. This means that when I read all the data from the queue and then call WaitCommEvent() for the second time, it will immediately return with EV_RXCHAR mask, even though there is no data to be read.
How should I handle reading and waiting for event to be sure that the event mask returned by WaitCommEvent() is always valid? Is it possible to reset the flags of the serial port so that when I read all data from the queue and call WaitCommEvent() after then, it will not return immediately with a mask that was valid before I read the data?
The only solution that comes to my mind is this:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) call WaitCommEvent() which should return true immediately at the same time resetting the event flag set internally
5) do something and goto 1
Is it a good idea or is it stupid? Of course I know that the modem almost always finishes its answers with CRLF characters so I could set the comm mask to EV_RXFLAG and wait for the #10 character to appear, but there are many other devices with which I communicate and they do not always send frame end characters.
Your help will be appreciated. Thanks in advance!
Mariusz.
Your solution does sound workable. I just use a state machine to handle the transitions.
(psuedocode)
ioState := ioIdle;
while (ioState <> ioFinished) and (not aborted) do
Case ioState of
ioIdle : if there is data to read then set state to ioMidFrame
ioMidframe : if data to read then read, if end of frame set to ioEndFrame
ioEndFrame : process the data and set to ioFinished
ioFinished : // don't do anything, for doc purposes only.
end;