IOCP server and send data with single wsasend - buffer

I work on IOCP Server in windows. And i have to send buffer to all connected socket.
The buffer size is small - up to 10 bytes. When i get notification for each wsasend in GetQueuedCompletionStatus, is there guarantee that the buffer was sent in one piece by single wsasend? Or should i put additional code, that check if all 10 bytes was sent, and post another wsasend if necessary?

There is no guarantee but it's highly unlikely that a send that is less than a single operating system page size would partially fail.
Failures are more likely if you're sending a buffer that is more than a single operating system page size in length and if you're not actively managing how many outstanding operations you have and how many your system can support before running out of "non paged pool" or hitting the "I/O page lock limit"
It's only possible to recover from a partial failure if you never have any other sends pending on that connection.
I tend to check that the value is as expected in the completion handler and abort the connection with an RST if it's not. I've never had this code execute in production and I've been building lots of different kinds of IOCP based client and server systems for well over 10 years now.


Do any MQTT clients actually reuse packet identifiers?

The packet identifier is required for certain MQTT control packets ( It's defined by the standard as a 16bit integer, and is generated by each client. The identifiers are reusable by the client after the acknowledgement packet is received. So the standard allows up to 64k in-flight messages. In practice, the clients I've looked at seem to just increment a counter, and so allow a total of 64k messages to be sent by a client. Both of rust MQTT client libraries panic when that counter overflows. (UPDATED 2016-09-07: if the rust clients are compiled in release mode then they don't panic, the value of the Packet Identifier becomes 0 -- in normal circumstances this will work, but...)
Does anyone know of an MQTT client that allows more than 64k messages/client (i.e. re-uses packet identifiers)? I'm wondering if this is a limitation that I need to be aware of in general, or if it's just a few clients. I've taken a quick look at compliance tests and haven't yet seen much to indicate that this is checked -- I'll keep looking.
Edit: It could be that some clients achieve this as a side-effect of limiting the number of in-flight messages. UPDATE 2016-09-07 the rust clients do it by assuming they can wrap on overflow and never catch up to lagging messages (maybe a good bet, but not assured, and with an ugly outcome if it happens)
As you have pointed out, the packet identifier are intended as temporary value that must persist until the published packet is received and acknowledged.
Once acknowledged, you can reuse the identifier (or not).
Most client runs on embedded system and they don't track more than a single packet (so only a single identifier is being handled) since they wait for ACK or REC/COMP before making any other publishing.
So for these clients, even a single identifier would be enough.
Please notice that for QoS 1, remembering the identifier is futile since it's valid to resend the packet if the next packet is not an ACK (so you have the identifier to reply with in the packet you are receiving).
For the rare clients that do support interleaved publish packets, they only need to support 2 valid identifiers at any time (that is, in the case they have received a QoS 2 packet, answered with PUBREC and then receive another QoS 1 or 2 packet).
Once they receive a PUBREL packet they can reply with a PUBCOMP without needing to remember the identifier (it's in the PUBREL header), so the only time they do need to remember identifier is between the PUBLISH and the PUBREC packet. Provided they allow interleaved publish packets, the only case where a second identifier is required is when they are publishing while receiving a published packet at the same time.
Now, from the point of view of the broker, most implementation use a 16-bit counter per client so they could support, in theory, up to 65535 in-transit packets.
In reality, since the minimum size of a publish packet is 8 bytes (usually more), that means having to store at least 9 bytes per potential packet (the additional byte is for remembering the current state in case of QoS 2) so that's half a MB of memory in the minimal case, but likely much more in real life, since you never have an empty publish payload and topic name.
As you see, it's almost impossible for an embedded system to implement with such storage requirement so shortcut are taken.
In most scenario, either the server does not allow so many un-acknowledged packet (by simply replying to the client in order to release the identifier) or use the identifiers pool between different clients.
So typically, again, the worst case for the broker can only happen if the client does not acknowledge the published packets. If the broker does not get any answer from the client it can:
close the connection
refuse to send new published information or
ignore the answer and republish
All of these strategies needs to be implemented anyway since you could have the same issue with a slow client and a fast publisher and your 65535 identifiers.
Since you have these strategies, there is no need to waste a MB of memory per client and instead cut your leg much earlier (while keeping a reasonable working condition).
In the end, the packet identifiers are a tool to deal with identification of recent packets, not a tool to index all packet received. A counter is good enough for this case and a wrapping around should not pose any issue when you account for the memory and bandwidth requirement.

queue and flow control with delphi using indy

i have a client server application (TCP) that's designed with indy delphi.
i want to have a queue and flow control in my server side application.
my server should not lose any clients data when server traffic is full.
for example , in my server side application i want determine maximum of bandwidth for server is 10Mbps and then if server bandwidth (this 10Mbps) was full the other clients be on queue until bandwidth get free .
so i want to know how can i design this with delphi ?
best regard
The client should not send the message directly to the server. Put the message in a local store (f.i. sqlite-db) and in a thread you read the first message from the local store and try to send it to the server.
If the message was delivered to the server (no exception raised) delete the message from the local store and process the next "first" message in the local store.
Within the TIdTCPServer.OnExecute method which receives the client data, it is possible to 'delay' processing of the incoming request with a simple Sleep command. The client data will stay in the TCP socket until the Sleep command finished.
If your server keeps track of the current 'global' bandwidth usage for all clients, it is possible to set the Sleep time dynamically. You could even set different priorities for different clients.
So you would need a simple but thread safe bandwidth usage monitor, an algorithm which calculates sensible Sleep time values, and a way to assign this Sleep time to the individual client connection contexts.
See also: for an example implementation in PHP

akka stream ActorSubscriber does not work with remote actors says:
"ActorPublisher and ActorSubscriber cannot be used with remote actors, because if signals of the Reactive Streams protocol (e.g. request) are lost the the stream may deadlock."
Does this mean akka stream is not location transparent? How do I use akka stream to design a backpressure-aware client-server system where client and server are on different machines?
I must have misunderstood something. Thanks for any clarification.
They are strictly a local facility at this time.
You can connect it to an TCP sink/source and it will apply back-pressure using TCP as well though (that's what Akka Http does).
How do I use akka stream to design a backpressure-aware client-server system where client and server are on different machines?
Check out streams in Artery (Dec. 2016, so 18 months later):
The new remoting implementation for actor messages was released in Akka 2.4.11 two months ago.
Artery is the code name for it. It’s a drop-in replacement to the old remoting in many cases, but the implementation is completely new and it comes with many important improvements.
(Remoting enables Actor systems on different hosts or JVMs to communicate with each other)
Regarding back-pressure, this is not a complete solution, but it can help:
What about back-pressure? Akka Streams is all about back-pressure but actor messaging is fire-and-forget without any back-pressure. How is that handled in this design?
We can’t magically add back-pressure to actor messaging. That must still be handled on the application level using techniques for message flow control, such as acknowledgments, work-pulling, throttling.
When a message is sent to a remote destination it’s added to a queue that the first stage, called SendQueue, is processing. This queue is bounded and if it overflows the messages will be dropped, which is in line with the actor messaging at-most-once delivery nature. Large amount of messages should not be sent without application level flow control. For example, if serialization of messages is slow and can’t keep up with the send rate this queue will overflow.
Aeron will propagate back-pressure from the receiving node to the sending node, i.e. the AeronSink in the outbound stream will not progress if the AeronSource at the other end is slower and the buffers have been filled up.
If messages are sent at a higher rate than what can be consumed by the receiving node the SendQueue will overflow and messages will be dropped. Aeron itself has large buffers to be able to handle bursts of messages.
The same thing will happen in the case of a network partition. When the Aeron buffers are full messages will be dropped by the SendQueue.
In the inbound stream the messages are in the end dispatched to the recipient actor. That is an ordinary actor tell that will enqueue the message in the actor’s mailbox. That is where the back-pressure ends on the receiving side. If the actor is slower than the incoming message rate the mailbox will fill up as usual.
Bottom line, flow control for actor messages must be implemented at the application level. Artery does not change that fact.

Max of Indy clients?

How many clients can connect to TidTCPServer in same time ? I used Indy10 , DelphiXE2 and target os is windows server 2003.
Is there a better option Instead of Indy for delphi?
By default, the MaxConnections is set to 0, so the number of active threads isn't checked by the Indy server before accepting another connection, but it mostly depends on what the clients are doing on the server. For example, if your server accepts a client connection and then calculates pie to a trillion digits within that client thread context, you'll get significantly fewer connections handled properly than if you are handing off the work to another process. Basically, your result will vary based directly on the tasks performed.
For a generic answer... if you override the default stack size allocated to each thread, you could have up to a few thousand connections in a 32-bit server application, but likely not much more than that. See: What's the maximum number of threads in Windows Server 2003? and
Also check the ListenQueue property, set to 15 by default. Apparently the OS can increase it further on its own... I don't know the current Windows Server default listen queue, but I typically bump up the default amount quite a bit.
Bottom line - get to a thousand active threads/connections and you are likely going to hit a wall sooner rather than later.
However many clients the OS can handle with available resources. Keep in mind that each connected client uses its own thread, so you have to factory in the process's default thread size.

What do a benefit from changing from blocking to non-blocking sockets?

We have an application server developed with Delphi 2010 and Indy 10. This server receives more than 50 requests per second and it works well. But in some cases, it seems to me that Indy is very obscure. Their components are good, but sometimes I found myself digging into the source code only to understand a simple thing. Indy lacks on good documentation and good support.
The last thing that i came across was a big problem for me: I must detect when a client disconnects non gracefully (When the the client crashes or shutdown, for instance. Not telling the server that it will disconnect) and indy was not able to do that. If I want that, I will have to develop a algorithm like heartbeat, pooling or TCP keep-alive. I do not want to spend more time doing a, at least I think, component job. After a few study, I found out that this is not Indy's fault, but this is an issue of all blocking sockets components.
Now I am really thinking of changing the core of the Server to another good suite. I must admit I am tending to use a non-blocking socket. Based on that, I have some questions:
What do a benefit from changing from blocking to non-blocking sockets?
Will I be able to detect client disconnects (non gracefully)?
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
I know this must be a subjective question, but I really want to hear that from you. My first question is the one I care most. I do not care if I have to pay 100, 500, 1000, 10000 dollars, but I want a complete solution. For now, I am thinking about Ip*works .
I think some guys are not understand what I want. I don't want to create my own socket. I have been working with sockets for a long time and I am getting tired of it. Really.
And non-blocking sockets CAN detect client disconnects. That is a fact and it has good documentation all over the internet. A non-blocking socket checks the socket state for new incoming data all the time, and it makes possible to detect that the socket is not valid. This is not a heartbeat algorithm. A heartbeat algorithm is used on client side and it sends periodically packets (aka keep-alive) to the server to tells it is still alive.
I am not make myself clear. Maybe because English is not my main language. I am not saying that it is possible to detect a dropped connection without trying to send or receiving data from a socket. What I am saying is that every non-blocking socket is able to do that because they constantly tries to read from the socket for new incoming data. Why is that so hard to understand? If you guys download and run ip*works demos, in special, the echoserver and echoclient ones (both use TCP) you can test by yourselves. I already tested it, and it works like I expected to do. Even if you use the old TCPSocketServer and TCPSocketClient in a non-blocking mode you will see what I meant.
"What do a benefit from changing from blocking to non-blocking sockets? Will I be able to detect client disconnects (non gracefully)?"
Just my two cents to get the ball rolling on this question - I'm not a socket EXPERT, but I do have a good deal of experience with them. If I'm mistaken, I'm sure someone will correct me... :-)
I assume that since you're running a server using blocking sockets with 50 connections per second, you have a threading mechanism in place to handle client requests. If so, you don't really stand to gain anything from non-blocking sockets. On the contrary - you will have to change your server logic to be event driven- based on events fired in your main thread from the non-blocking sockets, or use constant polling to know what your sockets are up to.
Non-blocking sockets can't detect clients disconnecting without notification any more than blocking sockets can - they don't have telepathic powers... The nature of the TCP/IP 'conversation' between client and server is the same - blocking and non-blocking is only with respect to your application's interaction with the socket connection conducting the 'conversation'.
If you need to purge dead connections, you need to implement a heartbeat or timeout mechanism on your socket (I've never seen a modern socket implementation that didn't support timeouts).
What do a benefit from changing from blocking to non-blocking sockets?
Increased speed, availability, and throughput (from my experience). I had an IndySockets client that was getting about 15 requests per second and when I went directly to asynchronous sockets the throughput increased to about 90 requests per second (on the same machine). In a separate benchmark test on a server at a data-center with a 30 Mbit connection I was able to get more than 300 requests per second.
Will I be able to detect client disconnects (non gracefully)?
That's one thing I haven't had to try yet, since all of my code has been on the client side.
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
You can build your own socket client in a couple of days and it can be very robust and fast... much faster than most of the stuff I've seen "off the shelf". Feel free to take a look at my asynchronous socket client:
(Per Mikey's comments)
I'm asking you for a generic, technical explanation of how NBS increase throughput as opposed to a properly designed BS server.
Let's take a high load server as an example: say your server is supposed to handle 1000 connections at any given time, with blocking sockets you would have to create 1000 threads and even if they're mostly idle, the CPU will still spend a lot of time context switching. As the number of clients increases you will have to increase the number of threads in order to keep up and the CPU will inevitably increase the context switching. For every connection you establish with a blocking socket, you will incur the overhead of spawning of a new thread and you eventually you will incur the overhead of cleaning up after the thread. Of course, the first thing that comes to mind is: why not use the ThreadPool, you can reuse the threads and reduce the overhead of creating/cleaning-up of threads.
Here is how this is handled on Windows (hence the .NET connection): sure you could, but the first thing you'll notice with the .NET ThreadPool is that it has two types of threads and it's not a coincidence: user threads and I/O completion port threads. Asynchronous sockets use the IO completion ports which "allows a single thread to perform simultaneous I/O operations on different handles, or even simultaneous read and write operations on the same handle."(1) The I/O completion port threads are specifically designed to handle I/O in a much more efficient way than you would ever be able to achieve if you used the user threads in ThreadPool, unless you wrote your own kernel-mode driver.
"The com­ple­tion port uses some spe­cial voodoo to make sure only a spe­cif­ic num­ber of threads can run at once — if one thread blocks in ker­nel-​mode, it will au­to­mat­i­cal­ly start up an­oth­er one."(2)
There are other advantages also: "in addition to the nonblocking advantage of the overlapped socket I/O, the other advantage is better performance because you save a buffer copy between the TCP stack buffer and the user buffer for each I/O call." (3)
I am using Indy and Synapse TCP libraries with good results for some years now, and did not find any showstoppers in them. I use the libraries in threads - client and server side, stability and performance was not a problem. (Six thousand request and response messages per second and more with the server running on the same system are typical.)
Blocking sockets are very useful if the protocol is more advanced than a simple 'send a string / receive a string'. Non-blocking sockets cause a higher coupling of message protocol handlers with the socket read / write logic, so I quickly moved away from non-blocking code.
No library can overcome the limitations of the TCP/IP protocol regarding detection of connection loss. Only trying to read or send data can tell wether the connection is still present.
In Windows, there is a third option which is overlapped I/O. Non-blocking sockets are essential a model using Windows messages developed to avoid single-threaded GUI apps to become "blocked" while waiting for data. A modern application IMHO would be better designed using threads and overlapped I/O.
See for example
Aahhrrgghh - the myth of being able to always detect "dropped" connections. If you pull the power on a machine with a client connection then the server cannot tell, without sending data, that the connection is "dead". The is through the design of the TCP protocol. Don't take my word for it - read this article (Detection of Half-Open (Dropped) TCP/IP Socket Connections).
This article explains the main differences between blocking and non-blocking:
Introduction to Indy, by Chad Z. Hower
Pros of Blocking
Easy to program - Blocking is very easy to program. All user code can
exist in one place, and in a
sequential order.
Easy to port to Unix - Since Unix uses blocking sockets, portable code
can be written easily. Indy uses this
fact to achieve its single source
Work well in threads - Since blocking sockets are sequential they
are inherently encapsulated and
therefore very easily used in threads.
Cons of Blocking
User Interface "Freeze" with clients - Blocking socket calls do not
return until they have accomplished
their task. When such calls are made
in the main thread of an application,
the application cannot process the
user interface messages. This causes
the User Interface to "freeze" because
the update, repaint and other messages
cannot be processed until the blocking
socket calls return control to the
applications message processing loop.
He also wrote:
Blocking is NOT Evil
Blocking sockets have been repeatedly
attacked with out warrant. Contrary to
popular belief, blocking sockets are
not evil.
It is not is an issue of all blocking sockets components that they are unable to detect a client disconnect. There is no technical advantage on the side of non-blocking components in this area.
