Zero byte receives: purpose clarification - network-programming

I am learning server development with IO Completion Ports. My book, "Network Programming for Microsoft Windows - Second Edition", states the following:
With every overlapped send or receive operation, it is probable that
the data buffers submitted will be locked. When memory is locked, it
cannot be paged out of physical memory. The operating system imposes a
limit on the amount of memory that may be locked. When this limit is
reached, overlapped operations will fail with the WSAENOBUFS error. If
a server posts many overlapped receives on each connection, this limit
will be reached as the number of connections grow. If a server
anticipates handling a very high number of concurrent clients, the
server can post a single zero byte receive on each connection. Because
there is no buffer associated with the receive operation, no memory
needs to be locked. With this approach, the per-socket receive buffer
should be left intact because once the zero-byte receive operation
completes, the server can simply perform a non-blocking receive to
retrieve all the data buffered in the socket's receive buffer. There
is no more data pending when the non-blocking receive fails with
WSAEWOULDBLOCK.
Now, I'm trying to understand this paragraph; I think I've got it but want to make sure please.
I understand about memory being locked if I post make multiple WSARecv() calls with large buffers. But I am not entirely sure how a zero byte buffer prevents this.
I am thinking it is this (and would like confirmation please):
If I have n connections, and I post 50 WSARecv() calls with a 1KB buffer on each connection, that is n * 50KB total memory locked. All of that memory is locked, regardless of whether or not it is actually being used (i.e. whether or not anything is being copied into it from the TCP buffers). Hence if I keep adding more connections, I will keep locking more memory that may or may ever be used. Thus I can run out, with WSAENOBUFS error.
If I however post a zero byte receive on each connection, a completion packet will be generated on that connection only when there is data available for reading. (That is my first assumption, is that correct?)
Now, when I know there is some data, I can then post a WSARecv() with a buffer of 1KB (or however much) - or indeed loop repeatedly reading it all as suggested in my book - knowing that it will be filled immediately hence not remain unused and locked (second assumption, is that correct?)
Question 1
Thus, if my two assumptions are correct, then I have understood my book :) This means then that my server could, in theory, post a zero byte receive when a new connection is established, then when a completion packet is generated, read all of the data until there is no more, then post another zero byte receive - is that correct?
Question 2
However, isn't there still a risk that if I receive completion packets for lots of my zero byte receive posts at once, and I then go onto make multiple WSARecv() calls, that I will still end up with some failing with WSAENOBUFS?
Hopefully someone can clarify these two assumptions and two questions for me.

OK I've done research into this along with experimentation and have found the following:
Assumptions:
1) Assumption 1 is correct.
2) I believe assumption 2 is correct.
Questions
1) I have tested this and this seems to work.
2) This I guess remains a possibility but much less likely than if I posted receives with a none-zero buffer.
Note that we can still raise the WSAENOBUF error when sending too fast; more details here.

Related

How does error handling work in SCTP Sockets API Extensions?

I have been trying to implement a wrapper library for the Linux interface to SCTP sockets, and I am not sure how to integrate the asynchronous style of errors (where they are delivered via events). All example code I have seen, if it deals with the errors at all, simply prints out the information related to the error when it is received, but inserting error-handling code there seems like it would be ineffective, because by that point all of the context related to the original message which was sent has been lost and only a 32-bit integer sinfo_context remains. It also seems that there is no way to directly tell when a given message has been acknowledged successfully by the remote peer, which would make it impossible to implement an approach which listens for errors after sending a message, because the context information for successfully-delivered messages could never be freed.
Is there a way to handle the errors related to a given sending operation as part of the call to a send function, or is there a different way to approach error handling for SCTP which does not lose the context of the error?
One solution which I have considered is using the SCTP_SENDER_DRY notification to tell when packets have been sent, however this requires sending only one packet at a time. Another idea is to use the peer's receiver window size together with the sinfo_cumtsn field of sctp_sndrcvinfo to calculate how much data has been acknowledged as fully received using the cumulative TSN, however there are a couple of disadvantages to this: first, it requires bookkeeping overhead to calculate a number of bytes received by the peer based on the cumulative TSN (especially if the peer's window size may change); second, it requires waiting until all earlier packets were received before reporting success, which seems to defeat the purpose of SCTP's multistreaming; and third, it seems like it would not work for unordered packets.

Linux recv returns data not seen in Wireshark capture

I am receiving data through a TCP socket and although this code has been working for years, I came across a very odd behaviour trying to integrate a new device (that acts as a server) into my system:
Before receiving the HTTP Body response, the recv() kernel function gives me strange characters like '283' or '7b'.
I am actually debuging with gdb and I can see that the variables hold these values right after recv() was called (so it is not just what printf shows me)
I always read byte-after-byte (one at a time) with the recv() function and the returned value is always positive.
This first line of the received HTTP Body cannot be seen in Wireshark (!) and is also not expected to be there. In Wireshark I see what I would expect to receive.
I changed the device that sends me the data and I still receive the exact same values
I performed a clean debug build and also tried a release version of my programm and still get the exact same values, so I assume these are not random values that happened to be in memory.
i am running Linux kernel 3.2.58 without the option to upgrade/update.
I am not sure what other information i should provide and I have no idea what else to try.
Found it. The problem is that I did not take the Transfer-Encoding into consideration, which is chunked. I was lucky because also older versions of Wireshark were showing these bytes in the payload so other people also posted similar problems in the wireshark forum.
Those "strange" bytes show you the payload length that you are supposed to receive. When you are done reading this amount of bytes, you will receive again a number that tells you whether you should continue reading (and, again, how many bytes you will receive). As far as I understood, this is usefull when you have data that change dynamically and you might want to continuously get their current value.

Is it guaranteed that mnesia event listeners will get each state of a record, if it changes fast?

Let's say I have some record like {my_table, Id, Value}.
I constantly overwrite the value so that it holds consecutive integers like 1, 2, 3, 4, 5 etc.
In a distributed environment, is it guaranteed that my event listeners will receive all of the values? (I don't care about ordering)
I haven't verified this by reading that part of the source yet, but it appears that sending a message out is part of the update process, so messages should always come out, even on very fast changes. (The alternative would be for Mnesia to either queue messages or queue changes and run them in batches. I'm almost positive this is not what happens -- it would be too hard to predict the variability of advantageous moments to start batching jobs or queueing messages. Sending messages is generally much cheaper than making a change in the db.)
Since Erlang guarantees delivery of messages to a live destination this is as close to a promise that every Mnesia change will eventually be seen as you're likely to get. The order of messages couldn't be guaranteed on the receiving end (as it appears you expect), and of course a network failure could make a set of messages get missed (rendering the destination something other than live from the perspective of the sender).

Strange rare out-of-order data received using Indy

We're having a bizarre problem with Indy10 where two large strings (a few hundred characters each) that we send out one after the other using TCP are appearing at the other end intertwined oddly. This happens extremely infrequently.
Each string is a complete XML message terminated with a LF and in general the READ process reads an entire XML message, returning when it sees the LF.
The call to actually send the message is protected by a critical section around the call to the IOHandler's writeln method and so it is not possible for two threads to send at the same time. (We're certain the critical section is implemented/working properly). This problem happens very rarely. The symptoms are odd...when we send string A followed by string B what we received at the other end (on the rare occasions where we have failure) is the trailing section of string A by itself (i.e., there's a LF at the end of it) followed by the leading section of string A and then the entire string B followed by a single LF. We've verified that the "timed out" property is not true after the partial read - we log that property after every read that returns content. Also, we know there are no embedded LF characters in the string, as we explicitly replace all non-alphanumeric characters in the string with spaces before appending the LF and sending it.
We have log mechanisms inside the critical sections on both the transmission and receiving ends and so we can see this behavior at the "wire".
We're completely baffled and wondering (although always the lowest possibility) whether there could be some low-level Indy issues that might cause this issue, e.g., buffers being sent in the wrong order....very hard to believe this could be the issue but we're grasping at straws.
Does anyone have any bright ideas?
You could try Wireshark to find out how the data is tranferred. This way you can find out whether the problem is in the server or in the client. Also remember to use TCP to get "guaranteed" valid data in right order.
Are you using TCP or UDP? If you are using UDP, it is possible (and expected) that the UDP packets can be received in a different order than they were transmitted due to the routing across the network. If this is the case, you'll need to add some sort of packet ID to each UDP packet so that the receiver can properly order the packets.
Do you have multiple threads reading from the same socket at the same time on the receiving end? Even just to query the Connected() status causes a read to occur. That could cause your multiple threads to read the inbound data and store it into the IOHandler.InputBuffer in random order if you are not careful.
Have you checked the Nagle settings of the IOHandler? We had a similar problem that we fixed by setting UseNagle to false. In our case sending and receiving large amounts of data in bursts was slow due to Nagle coalescing, so it's not quite the same as your situation.

sequence ID for handling reliability

I'm trying to figure out a simple way to handle reliability for UDP messages. I figured I would just send each one with a sequencing ID and by comparing the ID to the one previously received, a loss can be detected. I would normally just use integers however the idea that it would just keep incrementing indefinitely did not sit right with me.
I could use base64, but that would only make it a bit more readable but doesn't really solve anything.
I also considered prefixing a date stamp however that would be kind of sloppy when dealing messages received around midnight.
I feel like there must a better solution that someone could suggest, even if that is to just stick with integers.
My preference for this particular job is to use an incrementing (at least 64-bit) integer sequence seeded with a high-resolution timestamp. In this way, even if there is a loss of state at the sending end, when the sequence is re-seeded from the time, it will in all likelihood simply jump forward. Any Year 10K bugs that this might introduce are left as an exercise for Lazarus Long. :-)
Keep in mind that sequence-gap detection is essentially an optimization. The transmitting end needs to retransmit until an ack is received, with a nack (for a gap or corrupted datagram) simply eliciting earlier retransmission. (ZMODEM is a rare exception to this rule, with its default mode of operation being the use of a single ack at the end of the stream and all other retransmissions governed by nacks; as a file transfer protocol, however, it is essentially one giant multipart datagram.)
Use TCP? This is why TCP is different to UDP
I don't mean to be sarcastic, but this is why TCP is there.

Resources