epoll - Should you wait for EPOLLOUT? - epoll

Is it wrong to wait for EPOLLIN, read all data from the socket, and then immediately send the response?
Is it better to wait for EPOLLOUT before sending the response? If so - why? If not - what exactly is the purpose of EPOLLOUT?
I've seen some epoll examples that wait for EPOLLOUT and some that don't.

If you wait for EPOLLOUT, you are guaranteed that the next send will not block. That means it will accept at least 1 byte (this is admittedly a quite poor guaranteee, but unluckily it's just that, you're never guaranteed that send accepts more than at least 1 byte).
You can do perfectly well without waiting for EPOLLOUT if either blocking is no issue or if the socket is nonblocking (in which case send would fail with EWOULDBLOCK). It sure results in much less complicated code.
It's not wrong to do either.

Related

Zero byte receives: purpose clarification

I am learning server development with IO Completion Ports. My book, "Network Programming for Microsoft Windows - Second Edition", states the following:
With every overlapped send or receive operation, it is probable that
the data buffers submitted will be locked. When memory is locked, it
cannot be paged out of physical memory. The operating system imposes a
limit on the amount of memory that may be locked. When this limit is
reached, overlapped operations will fail with the WSAENOBUFS error. If
a server posts many overlapped receives on each connection, this limit
will be reached as the number of connections grow. If a server
anticipates handling a very high number of concurrent clients, the
server can post a single zero byte receive on each connection. Because
there is no buffer associated with the receive operation, no memory
needs to be locked. With this approach, the per-socket receive buffer
should be left intact because once the zero-byte receive operation
completes, the server can simply perform a non-blocking receive to
retrieve all the data buffered in the socket's receive buffer. There
is no more data pending when the non-blocking receive fails with
WSAEWOULDBLOCK.
Now, I'm trying to understand this paragraph; I think I've got it but want to make sure please.
I understand about memory being locked if I post make multiple WSARecv() calls with large buffers. But I am not entirely sure how a zero byte buffer prevents this.
I am thinking it is this (and would like confirmation please):
If I have n connections, and I post 50 WSARecv() calls with a 1KB buffer on each connection, that is n * 50KB total memory locked. All of that memory is locked, regardless of whether or not it is actually being used (i.e. whether or not anything is being copied into it from the TCP buffers). Hence if I keep adding more connections, I will keep locking more memory that may or may ever be used. Thus I can run out, with WSAENOBUFS error.
If I however post a zero byte receive on each connection, a completion packet will be generated on that connection only when there is data available for reading. (That is my first assumption, is that correct?)
Now, when I know there is some data, I can then post a WSARecv() with a buffer of 1KB (or however much) - or indeed loop repeatedly reading it all as suggested in my book - knowing that it will be filled immediately hence not remain unused and locked (second assumption, is that correct?)
Question 1
Thus, if my two assumptions are correct, then I have understood my book :) This means then that my server could, in theory, post a zero byte receive when a new connection is established, then when a completion packet is generated, read all of the data until there is no more, then post another zero byte receive - is that correct?
Question 2
However, isn't there still a risk that if I receive completion packets for lots of my zero byte receive posts at once, and I then go onto make multiple WSARecv() calls, that I will still end up with some failing with WSAENOBUFS?
Hopefully someone can clarify these two assumptions and two questions for me.
OK I've done research into this along with experimentation and have found the following:
Assumptions:
1) Assumption 1 is correct.
2) I believe assumption 2 is correct.
Questions
1) I have tested this and this seems to work.
2) This I guess remains a possibility but much less likely than if I posted receives with a none-zero buffer.
Note that we can still raise the WSAENOBUF error when sending too fast; more details here.

Unordered socket read & close notification using IOCP

Most server framework/examples using sockets and I/O completion ports makes notifications in a way I couldn't completely figure out the purpose.
Upon read packets are processed, usually they are reordered to circumvent thread scheduling issues processing packets out of order no matter IOCP ensure a FIFO queue.
The problem is when a socket is closed gracefully or by an error. I saw in both situation, and again by the o.s. thread scheduler, the close notification may be sent to the application (i.e. http server using the framework) "before" the notification of data previously readed.
I think that the close notification should be queued in such way so the application receives it after previous reads.
Is there any intended use in most code I saw or my behavior may be correct depending on the situation?
What you suggest makes sense and I would imagine that any code that handles graceful close (a read returning 0 bytes) would do so by processing it after any proceeding successful read. Errors coming out of GetQueuedCompletionStatus(), such as connection reset errors, etc, are harder to integrate into the receive flow as they occur out of band as far as the receive data is concerned. Your question's a bit vague and depends very much on the code you're using and how you (or the people who wrote that code) want to handle these things. There is no single correct way, IMHO.

Read unknown number of incoming bytes

My app communicates with a server over TCP, using AsyncSocket. There are two situations in which communication takes place:
The app sends the server something, the server responds. The app needs to read this response and do something with the information in it. This response is always the same length, e.g., a response is always 6 bytes.
The app is "idling" and the server initiates communication at some time (unknown to the app). The app needs to read whatever the server is sending (could be any number of bytes, but the first byte will indicate how many bytes are following so I know when to stop reading) and process this information.
The first situation is working fine. readDataToLength:timeout:tag returns what I need and I can do with it what I want. It's the second situation that I'm unsure of how to implement. I can't use readDataToLength:timeout:tag, since I don't know the length beforehand.
I'm thinking I could do something with readDataWithTimeout:tag:, setting the timeout to -1. That makes the socket to constantly listen for anything that's coming in, I believe. However, that will probably interfere with data that's coming in as response to something I sent out (situation 1). The app can't distinguish incoming data from situation 1 or situation 2 anymore.
Anybody here who can give me help me solve this?
Your error is in the network protocol design.
Unless your protocol has this information, there's no way to distinguish the response from the server-initiated communication. And network latency prevents obvious time-based approach you've described from working reliably.
One simple way to fix the protocol in your case (if the server-initiated messages are always less then 255 bytes) - add the 7-th byte to the beginning of the response, with the value FF.
This way you can readDataWithTimeout:tag: for 1 byte.
On timeout you retry until there's a data.
If the received value is FF, you read 6 more bytes with readDataToLength:6 timeout: tag:, and interpret it as the response to the request you've sent earlier.
If it's some other value, you read the message with readDataToLength:theValue timeout: tag:, and process the server-initiated message.

EAGAIN Error: Using Berkeley Socket API

Sometimes when I try to send some packets continuously( I am using the send() API ) I receive this error. Now I am not sure what should I do than. I have these questions:
1) Can I re-send again ? If yes then after how much time should I try again. Is there any particular strategy to be followed
2) Is buffer size has exceeded its limits is the only reason ?
3) Can someone please give me a better idea/code, how to handle such scenario.
Thanks.
Sambit.
From send(): "EAGAIN -- The socket is marked non-blocking and the requested operation would block." and also When the message does not fit into the send buffer of the socket, send normally blocks, unless the socket has been placed in non-blocking I/O mode. In non-blocking mode it would return EAGAIN in this case. The select(2) call may be used to determine when it is possible to send more data.
This thread has a simple example of using select() to deal with EAGAIN, and is followed by significant discussion about what sorts of surprises lurk beneath the surface.
EAGAIN is usually returned when there is no outbound buffer space left. How long to wait depends on the speed of the underlying connection. The normal way is to wait until select() or poll() tells you that the socket is available for writing. If on Linux, take a look at the select_tut(2) manpage, and of course the send(2) manpage.
You could change to blocking operation (which is the default) if you want the call to wait until there is space available. Or you could call select(2) to wait until the socket is writeable and then try again.
There is one other important consideration. If you are sending UDP packets, then keep in mind that there is no guarantee of congestion control, and if you're sending packets over the Internet you will almost certainly get packet loss if you just try sending UDP packets as fast as possible (this doesn't necessarily apply to other datagram sockets such as Unix sockets).

WSASend() with more than one buffer - could complete incomplete?

Say I post the following WSASend call (Windows I/O completion ports without callback functions):
void send_data()
{
WSABUF wsaBuff[2];
wsaBuff[0].len = 20;
wsaBuff[1].len = 25;
WSASend(sock, &wsaBuff[0], 2, ......);
}
When I get the "write_done" notification from the completion port, is it possible that wsaBuff[1] will be sent completely (25 bytes) yet wsaBuff[0] will be only partially sent (say 7 bytes)?
As I've said before, in a reply to one of your other very similar questions, the only time this is likely to fail is in low resource situations (non-paged pool or locked pages limit issues most likely) where you might get a partial completion and an error return of ENOBUFS. And again, as I've said before, in 10 years of IOCP development work I've never seen this as a problem in production, only in situations where we have been stress testing a system to death (quite literally sometimes as non-paged pool exhaustion can sometimes cause badly behaved drivers to blue screen the box).
I would suggest that you simply add some code to log the failure, close the socket and that's it, you've dealt with the possibility of the failure and can move on. I'd be surprised if your failure handling code is ever executed. But you can be confident that you'll know if it is and once you can reproduce the issue you can spend more time thinking about if you really need to handle it any better.
As WSASend is the preferred way of doing overlapped socket IO, it would not make any sense if it completed while incomplete - the completion notification/routine/event is the only way for the application to cleanup/recycle the used structures.
Also: NO, it's not possible, a single WSASend call is still a single IO call, regardless of the buffers used.

Resources