Unordered socket read & close notification using IOCP - iocp

Most server framework/examples using sockets and I/O completion ports makes notifications in a way I couldn't completely figure out the purpose.
Upon read packets are processed, usually they are reordered to circumvent thread scheduling issues processing packets out of order no matter IOCP ensure a FIFO queue.
The problem is when a socket is closed gracefully or by an error. I saw in both situation, and again by the o.s. thread scheduler, the close notification may be sent to the application (i.e. http server using the framework) "before" the notification of data previously readed.
I think that the close notification should be queued in such way so the application receives it after previous reads.
Is there any intended use in most code I saw or my behavior may be correct depending on the situation?

What you suggest makes sense and I would imagine that any code that handles graceful close (a read returning 0 bytes) would do so by processing it after any proceeding successful read. Errors coming out of GetQueuedCompletionStatus(), such as connection reset errors, etc, are harder to integrate into the receive flow as they occur out of band as far as the receive data is concerned. Your question's a bit vague and depends very much on the code you're using and how you (or the people who wrote that code) want to handle these things. There is no single correct way, IMHO.

Related

Time to send SDO

I am working on CANopen architecture and had three questions:
1- When the 'synchronous window' is closed until the next SYNC message, should we send the SDO message? Can we not send a message during this period?
2- Is it possible not to send the PDO message during the simultaneous window?
3- What is the answer that the slaves give in the SYNC message?
Disclaimer: I don't have exact answers but I just wanted to share my assumptions & thoughts.
CiA 301 doesn't mention the relation between synchronous window and SDOs. In normal operation after the initial configuration, one may assume that SDOs aren't present on the system, or at least they are rare compared to PDOs. Although not strictly necessary, SDOs are generally initiated by a device which has the master role, and that device also produces the SYNC messages (again, it's not strictly necessary but it's the usual/common implementation). So, the master device may adjust the timing of SDO requests according to the synchronous window.
Here is a quote from CiA 301:
If the synchronous window length expires all synchronous TPDOs may be
discarded and an EMCY message may be transmitted; all synchronous
RPDOs may be discarded until the next SYNC message is received.
Synchronous RPDO processing is resumed with the next SYNC message.
CiA 301 uses the word "may" (see the quote above). So I'm not sure if it's mandatory or not. In my opinion, it makes sense to follow the advice and abort synchronous TPDO transmissions after the synchronous window and send an EMCY message. Event-driven (non-synchronous) TPDOs can be sent within or after the synchronous window.
There is no direct response to SYNC messages. On SYNC reception, SYNC consumers (slaves) sample their inputs, drive their outputs according to the previous RPDOs, and start transmitting their TPDOs containing the previous samples (or the current ones? I'm not sure about this).
Synchronous windows are for specific PDO synchronization only. For hard real-time systems, data might be required to arrive within certain fixed time intervals - not too early, not too late. That is, it acts as a real-time deadline. If such features are enabled, you need to take that in consideration when doing the specific CANopen bus implementation.
For example if some SDO communication would occupy the bus so that the PDO can't meet its time window, that would be a problem. But this is easily solved by giving the PDO a lower COBID than the SDO, which should already be the case in most default device profile setups like "DS401 GPIO module". Other than that, you would have to make sure there is no ridiculous bus loads or that nodes hang up or get busy doing other things.
In systems with hard real-time requirements you probably don't want to allow any SDO communication during operational mode to begin with.
What is the answer that the slaves give in the SYNC message?
That question doesn't make any sense. You need to study what the SYNC message does and what it is for.

How does error handling work in SCTP Sockets API Extensions?

I have been trying to implement a wrapper library for the Linux interface to SCTP sockets, and I am not sure how to integrate the asynchronous style of errors (where they are delivered via events). All example code I have seen, if it deals with the errors at all, simply prints out the information related to the error when it is received, but inserting error-handling code there seems like it would be ineffective, because by that point all of the context related to the original message which was sent has been lost and only a 32-bit integer sinfo_context remains. It also seems that there is no way to directly tell when a given message has been acknowledged successfully by the remote peer, which would make it impossible to implement an approach which listens for errors after sending a message, because the context information for successfully-delivered messages could never be freed.
Is there a way to handle the errors related to a given sending operation as part of the call to a send function, or is there a different way to approach error handling for SCTP which does not lose the context of the error?
One solution which I have considered is using the SCTP_SENDER_DRY notification to tell when packets have been sent, however this requires sending only one packet at a time. Another idea is to use the peer's receiver window size together with the sinfo_cumtsn field of sctp_sndrcvinfo to calculate how much data has been acknowledged as fully received using the cumulative TSN, however there are a couple of disadvantages to this: first, it requires bookkeeping overhead to calculate a number of bytes received by the peer based on the cumulative TSN (especially if the peer's window size may change); second, it requires waiting until all earlier packets were received before reporting success, which seems to defeat the purpose of SCTP's multistreaming; and third, it seems like it would not work for unordered packets.

Erlang dead letter queue

Let's say my Erlang application receives an important message from the outside (through an exposed API endpoint, for example). Due to a bug in the application or an incorrectly formatted message the process handling the message crashes.
What happens to the message? How can I influence what happens to the message? And what happens to the other messages waiting in the process mailbox? Do I have to introduce a hierarchy of processes just to make sure that no messages are lost?
Is there something like Akka's dead letter queue in Erlang? Let's say I want to handle the message later - either by fixing the message or fixing the bug in the application itself, and then rerunning the message processing.
I am surprised how little information about this topic is available.
There is no information because there is no dead letter queue, if you application crashed while processing your message the message would be already received, why would it go on a dead letter queue (if one existed).
Such a queue would be a major scalability issue with not much use (you would get arbitrary messages which couldn't be sent and would be totally out of context)
If you need to make sure a message is processed you usually use a way to get a reply back when the message is processed like a gen_server call.
And if your messages are such important that it would be a catastrophe if lost you should probably persist it in a external DB, because otherwise if your computer crashes what would happen to all the messages in transit?

Erlang/OTP framework's error_logger hangs under fairly high load

My application is basically a content based router which will route MMS events.
The logger I am using is the one that comes with the OTP framework in SASL mode "error_logger"
The issue is ::
I am using a client to generate MMS events with default values. This client (in Java) has the ability to send high load of events in multiple THREADS
I am sending 100 events in 10 threads (each thread sending 10 MMS events) to the my router written in Erlang/OTP.
The problem is, when such high load is received by my router , my Logger hangs i.e it stops updating my Log file. But the router is still able to route the events.
The conclusions that I have come up with is ::
Scheduling problem in Erlang when such high load of events is received (a separate process for each event).
A very unlikely dead-loack state.
Might be due to sending events in multiple threads rather than sending them sequentially. But I guess a router will be connected to multiple service provider boxes, so I thought of sending events in threads.
Can anybody help mw in demystifying the problem?
You already have a good answer, but I'll add to the discussion.
The error_logger is by default using cached write operations to disk. So one possibility is that you don't really notice this while under low load, but under high load your writes get stuck in the cache for a while.
On a side note: there should be no problem having multiple threads doing calls to Erlang.
Another way of testing this is to add your own logger to error_logger, and see what happens. Possibly printing to the shell or something else that is "fast".
Which version of Erlang are you using? Prior to R14A (R13B4 maybe?), there was a performance penalty when you invoked a selective receive when the message queue contained a lot of messages. This behaviour meant that in a process that receives lots of messages (error_logger being the canonical example), if it was barely keeping up with the load then a small spike in load could cause the cost of processing to spike up and stay there as the new processing cost was higher than the process could bear. This problem has been solved in R14A.
Secondly - why are you sending a high volume of events/calls/logs to a text logger? Formatting strings for output to a human readable log file is a lot more expensive than using a binary disk_log for instance. Reducing the cost of logging will help, but reducing the volume of logs will help even more. Maybe investigate exactly why you need to log these things and see if you can't record them another (less expensive) way.
Problems with error_logger are often symptoms of some other overload problem. Try looking at the message queue sizes for all your processes when this problem occurs and see if something else is backed up too. The following erlang shellcode might help:
[ { P, element(2, process_info(P, message_queue_len)) }
|| P <- erlang:processes(), is_process_alive(P) ]

WSASend() with more than one buffer - could complete incomplete?

Say I post the following WSASend call (Windows I/O completion ports without callback functions):
void send_data()
{
WSABUF wsaBuff[2];
wsaBuff[0].len = 20;
wsaBuff[1].len = 25;
WSASend(sock, &wsaBuff[0], 2, ......);
}
When I get the "write_done" notification from the completion port, is it possible that wsaBuff[1] will be sent completely (25 bytes) yet wsaBuff[0] will be only partially sent (say 7 bytes)?
As I've said before, in a reply to one of your other very similar questions, the only time this is likely to fail is in low resource situations (non-paged pool or locked pages limit issues most likely) where you might get a partial completion and an error return of ENOBUFS. And again, as I've said before, in 10 years of IOCP development work I've never seen this as a problem in production, only in situations where we have been stress testing a system to death (quite literally sometimes as non-paged pool exhaustion can sometimes cause badly behaved drivers to blue screen the box).
I would suggest that you simply add some code to log the failure, close the socket and that's it, you've dealt with the possibility of the failure and can move on. I'd be surprised if your failure handling code is ever executed. But you can be confident that you'll know if it is and once you can reproduce the issue you can spend more time thinking about if you really need to handle it any better.
As WSASend is the preferred way of doing overlapped socket IO, it would not make any sense if it completed while incomplete - the completion notification/routine/event is the only way for the application to cleanup/recycle the used structures.
Also: NO, it's not possible, a single WSASend call is still a single IO call, regardless of the buffers used.

Resources