Fastest Inter-service communication stack in Service Fabric? - communication

Our application consists of a stateless web API making concurrent call to a stateful service using service remoting. Through load-testing, we noticed that there is delay caused during inter-service communication (between calling the service and the service itself starting to process the request). This occurs when we increase the load to approximately 1000 concurrent calls per second.
We tried this solution but our timing didnt improve. We then tried to replace our communication stack by WCF with tcp binding but that didnt improve it either.
For inter-service communication in Service Fabric, which communication stack is the most efficient? and any idea on what's causing the bottleneck?

Try Kestral, it's the fastest that I've found so far.


What are possible Scalability options for an application supporting ONLY Single TCP Socket Connection?

There is a legacy implementation(to an extent company proprietary in Pascal, C with some java macros) which processes TCP Socket based requests from TCP client application. It supports multiple client applications(around 5K) connecting over TCP Socket, however, it only supports single socket connection with backend(database). There are two instances of the server, so in total, it supports 10K client applications over two TCP Socket connection with database. All database related communication happens in synchronous manner over single socket connection. There are massive issues in this application, especially higher RTT(Round Trip Time) and occasional outages due to back-pressure. We have an ops team for such issues. They mostly resolve them by restarting the server. Hardly, we have people in our team who know coding details of this application and there is not much documentation. As this is a critical application we can not afford messing with it. We don't want to touch the code at least for now. This even becomes more critical due to shift in business priorities. There is a need to add another 30K client applications of another business with this setup.
Task before us is to integrate it with another application which is based on microservice architecture with middleware using RabbitMQ. This is a customer facing application sensitive to higher QoS. We can not afford outage & downtime in it. As part of this integration, there is a need to process request messages coming from the above legacy application over TCP Socket before passing them to database. In other words, we want to introduce a component which would process requests of legacy application before handing over to database. This additional process is part of our client request. Some of the processing requirement is very intensive and resource hungry in terms of CPU Cycle, Memory and socket i/o. As a result, there are chances, such processing may lead to server downtime & higher RTT. Our this layer is very flexible, we can easily add more server or replace faulty ones. But, this doesn't sound very efficient in this integration as we are limited with single socket connection of legacy application. So in total at max, we can only have 2(+ 6 for new 30k client application) servers. This is our cause of concern.
I want know, what different possible options are available to address high availability, scalability and latency issues of such integration? Especially with limitation of single TCP socket connection, how can we make this integration efficient, something which can handle back-pressure, better application uptime etc.
We were thinking of leveraging RabbitMQ, Layer 4 Load balancer(like haProxy, NginX), IPVS, NAT etc.. But all lead toward making some changes(or not very efficient technique) in the legacy code, which we don't want.

What's the best way to 'ping' thousands of servers every minute?

I run a server monitoring site for a video game. It monitors thousands of servers (currently 15,000 or so).
My current setup is a bit janky, and I want to improve it. Currently I use cron to submit every server to a resque job queue. I refill the queue just as soon as it's empty, essentially creating a constantly working queue. The job will then simply try and open a socket connection to the server ip and port in question, and mark it down if it fails to connect.
I have 20 workers, and it gets the job done in about 5 minutes. I feel that this should be able to go MUCH faster.
Is there a better, quicker way of doing this?
So, what you are doing currently I assume is doing a TCP socket connection which pings your game server. The problem with using TCP is obviously that it is a lot slower than UDP.
What I would advise instead is creating a UDP socket that just checks for the game server port.
Here's a nice quote from another question:
> UDP is really faster than TCP, and the simple reason is because
> it's non-existent acknowledge packet (ACK) that permits a continuous
> packet stream, instead of TCP that acknowledges each packet.
Read this question here: UDP vs TCP, how much faster is it?
From my experience with game servers, the majority if not 100% of all modern game servers allow you to query them on a UDP socket. This will then respond with details on the game server. (I used to host a lot of servers myself too).
So basically, make sure that you are using UDP rather than TCP...
Example Query
I'm just searching for this information now and will update my question...when I find some source.. what game is it that you are trying to get information for?
Use typical solutions for typical tasks. This case is about available detection every n seconds - one of daily sysadmin task. It should not be over ICMP, use SNMP over UDP proto. One of complete solution is Nagious/Cacti/Zabbix, which have built-in functionality to combine everything about your servers: LA, HDD, RAM, IO, NET as well as available detection.
You don't mention how you are making the socket connections, but you might want to try using ruby curl bindings: curb instead of net/http.
This will typically be much faster.

What do a benefit from changing from blocking to non-blocking sockets?

We have an application server developed with Delphi 2010 and Indy 10. This server receives more than 50 requests per second and it works well. But in some cases, it seems to me that Indy is very obscure. Their components are good, but sometimes I found myself digging into the source code only to understand a simple thing. Indy lacks on good documentation and good support.
The last thing that i came across was a big problem for me: I must detect when a client disconnects non gracefully (When the the client crashes or shutdown, for instance. Not telling the server that it will disconnect) and indy was not able to do that. If I want that, I will have to develop a algorithm like heartbeat, pooling or TCP keep-alive. I do not want to spend more time doing a, at least I think, component job. After a few study, I found out that this is not Indy's fault, but this is an issue of all blocking sockets components.
Now I am really thinking of changing the core of the Server to another good suite. I must admit I am tending to use a non-blocking socket. Based on that, I have some questions:
What do a benefit from changing from blocking to non-blocking sockets?
Will I be able to detect client disconnects (non gracefully)?
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
I know this must be a subjective question, but I really want to hear that from you. My first question is the one I care most. I do not care if I have to pay 100, 500, 1000, 10000 dollars, but I want a complete solution. For now, I am thinking about Ip*works .
I think some guys are not understand what I want. I don't want to create my own socket. I have been working with sockets for a long time and I am getting tired of it. Really.
And non-blocking sockets CAN detect client disconnects. That is a fact and it has good documentation all over the internet. A non-blocking socket checks the socket state for new incoming data all the time, and it makes possible to detect that the socket is not valid. This is not a heartbeat algorithm. A heartbeat algorithm is used on client side and it sends periodically packets (aka keep-alive) to the server to tells it is still alive.
I am not make myself clear. Maybe because English is not my main language. I am not saying that it is possible to detect a dropped connection without trying to send or receiving data from a socket. What I am saying is that every non-blocking socket is able to do that because they constantly tries to read from the socket for new incoming data. Why is that so hard to understand? If you guys download and run ip*works demos, in special, the echoserver and echoclient ones (both use TCP) you can test by yourselves. I already tested it, and it works like I expected to do. Even if you use the old TCPSocketServer and TCPSocketClient in a non-blocking mode you will see what I meant.
"What do a benefit from changing from blocking to non-blocking sockets? Will I be able to detect client disconnects (non gracefully)?"
Just my two cents to get the ball rolling on this question - I'm not a socket EXPERT, but I do have a good deal of experience with them. If I'm mistaken, I'm sure someone will correct me... :-)
I assume that since you're running a server using blocking sockets with 50 connections per second, you have a threading mechanism in place to handle client requests. If so, you don't really stand to gain anything from non-blocking sockets. On the contrary - you will have to change your server logic to be event driven- based on events fired in your main thread from the non-blocking sockets, or use constant polling to know what your sockets are up to.
Non-blocking sockets can't detect clients disconnecting without notification any more than blocking sockets can - they don't have telepathic powers... The nature of the TCP/IP 'conversation' between client and server is the same - blocking and non-blocking is only with respect to your application's interaction with the socket connection conducting the 'conversation'.
If you need to purge dead connections, you need to implement a heartbeat or timeout mechanism on your socket (I've never seen a modern socket implementation that didn't support timeouts).
What do a benefit from changing from blocking to non-blocking sockets?
Increased speed, availability, and throughput (from my experience). I had an IndySockets client that was getting about 15 requests per second and when I went directly to asynchronous sockets the throughput increased to about 90 requests per second (on the same machine). In a separate benchmark test on a server at a data-center with a 30 Mbit connection I was able to get more than 300 requests per second.
Will I be able to detect client disconnects (non gracefully)?
That's one thing I haven't had to try yet, since all of my code has been on the client side.
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
You can build your own socket client in a couple of days and it can be very robust and fast... much faster than most of the stuff I've seen "off the shelf". Feel free to take a look at my asynchronous socket client:
(Per Mikey's comments)
I'm asking you for a generic, technical explanation of how NBS increase throughput as opposed to a properly designed BS server.
Let's take a high load server as an example: say your server is supposed to handle 1000 connections at any given time, with blocking sockets you would have to create 1000 threads and even if they're mostly idle, the CPU will still spend a lot of time context switching. As the number of clients increases you will have to increase the number of threads in order to keep up and the CPU will inevitably increase the context switching. For every connection you establish with a blocking socket, you will incur the overhead of spawning of a new thread and you eventually you will incur the overhead of cleaning up after the thread. Of course, the first thing that comes to mind is: why not use the ThreadPool, you can reuse the threads and reduce the overhead of creating/cleaning-up of threads.
Here is how this is handled on Windows (hence the .NET connection): sure you could, but the first thing you'll notice with the .NET ThreadPool is that it has two types of threads and it's not a coincidence: user threads and I/O completion port threads. Asynchronous sockets use the IO completion ports which "allows a single thread to perform simultaneous I/O operations on different handles, or even simultaneous read and write operations on the same handle."(1) The I/O completion port threads are specifically designed to handle I/O in a much more efficient way than you would ever be able to achieve if you used the user threads in ThreadPool, unless you wrote your own kernel-mode driver.
"The com­ple­tion port uses some spe­cial voodoo to make sure only a spe­cif­ic num­ber of threads can run at once — if one thread blocks in ker­nel-​mode, it will au­to­mat­i­cal­ly start up an­oth­er one."(2)
There are other advantages also: "in addition to the nonblocking advantage of the overlapped socket I/O, the other advantage is better performance because you save a buffer copy between the TCP stack buffer and the user buffer for each I/O call." (3)
I am using Indy and Synapse TCP libraries with good results for some years now, and did not find any showstoppers in them. I use the libraries in threads - client and server side, stability and performance was not a problem. (Six thousand request and response messages per second and more with the server running on the same system are typical.)
Blocking sockets are very useful if the protocol is more advanced than a simple 'send a string / receive a string'. Non-blocking sockets cause a higher coupling of message protocol handlers with the socket read / write logic, so I quickly moved away from non-blocking code.
No library can overcome the limitations of the TCP/IP protocol regarding detection of connection loss. Only trying to read or send data can tell wether the connection is still present.
In Windows, there is a third option which is overlapped I/O. Non-blocking sockets are essential a model using Windows messages developed to avoid single-threaded GUI apps to become "blocked" while waiting for data. A modern application IMHO would be better designed using threads and overlapped I/O.
See for example
Aahhrrgghh - the myth of being able to always detect "dropped" connections. If you pull the power on a machine with a client connection then the server cannot tell, without sending data, that the connection is "dead". The is through the design of the TCP protocol. Don't take my word for it - read this article (Detection of Half-Open (Dropped) TCP/IP Socket Connections).
This article explains the main differences between blocking and non-blocking:
Introduction to Indy, by Chad Z. Hower
Pros of Blocking
Easy to program - Blocking is very easy to program. All user code can
exist in one place, and in a
sequential order.
Easy to port to Unix - Since Unix uses blocking sockets, portable code
can be written easily. Indy uses this
fact to achieve its single source
Work well in threads - Since blocking sockets are sequential they
are inherently encapsulated and
therefore very easily used in threads.
Cons of Blocking
User Interface "Freeze" with clients - Blocking socket calls do not
return until they have accomplished
their task. When such calls are made
in the main thread of an application,
the application cannot process the
user interface messages. This causes
the User Interface to "freeze" because
the update, repaint and other messages
cannot be processed until the blocking
socket calls return control to the
applications message processing loop.
He also wrote:
Blocking is NOT Evil
Blocking sockets have been repeatedly
attacked with out warrant. Contrary to
popular belief, blocking sockets are
not evil.
It is not is an issue of all blocking sockets components that they are unable to detect a client disconnect. There is no technical advantage on the side of non-blocking components in this area.

Erlang e Thrift

I want to make a Windows Service using Erlang and Thrift.
The service will have a single thread listening in a port (socket communication) and send request to a worker's thread. The Windows Service have to response quickily (milisencods) and the throughput is mandatory. (requests per second)
The workers thread will communicate each other. I think in Earlang to resolve this issue.
So i think erlang+thrift will work good. Am I right? Any suggestions?
Your solution is reasonable. To bring you up to speed i would suggest reading up on gen_server, supervisor, application.
Thrift will generate stub files which by compiling will yield you a transport/acceptor. It's up to you to provide both the thrift api and the handler for this api.
Moreover be advised not to synchronize alot between processes if you need fast response times (ie. dont design your solution around synchronizing calls)

Getting the number of existing connections in WCF

Is it possible to check the number of existing available connections a wcf service has? programmatically?
I want to see if connections to the web service were closed properly in the ASP.NET code.
You could check out something like Windows Server AppFabric for that purpose.
In WCF, most of the time, the "connections" are only open very briefly anyway - for as long as a service call lasts. So you can't really go check if there are any connections around - they'll be gone when the call terminates.
You can also check into the WCF performance counters that are available on the server side to keep an eye on the number of concurrent sessions. You can definitely query performance counters from .NET code. The Service Performance Counters offer e.g. a number of instances (of your service class) that are in memory at any given time - that's the number of requests being handled at any given time (which is probably what you could call a "connection" to a WCF service).
