Max of Indy clients? - delphi

How many clients can connect to TidTCPServer in same time ? I used Indy10 , DelphiXE2 and target os is windows server 2003.
Is there a better option Instead of Indy for delphi?

By default, the MaxConnections is set to 0, so the number of active threads isn't checked by the Indy server before accepting another connection, but it mostly depends on what the clients are doing on the server. For example, if your server accepts a client connection and then calculates pie to a trillion digits within that client thread context, you'll get significantly fewer connections handled properly than if you are handing off the work to another process. Basically, your result will vary based directly on the tasks performed.
For a generic answer... if you override the default stack size allocated to each thread, you could have up to a few thousand connections in a 32-bit server application, but likely not much more than that. See: What's the maximum number of threads in Windows Server 2003? and http://www.deltics.co.nz/blog/?p=1330
Also check the ListenQueue property, set to 15 by default. Apparently the OS can increase it further on its own... I don't know the current Windows Server default listen queue, but I typically bump up the default amount quite a bit.
Bottom line - get to a thousand active threads/connections and you are likely going to hit a wall sooner rather than later.

However many clients the OS can handle with available resources. Keep in mind that each connected client uses its own thread, so you have to factory in the process's default thread size.

Related

IOCP server and send data with single wsasend

I work on IOCP Server in windows. And i have to send buffer to all connected socket.
The buffer size is small - up to 10 bytes. When i get notification for each wsasend in GetQueuedCompletionStatus, is there guarantee that the buffer was sent in one piece by single wsasend? Or should i put additional code, that check if all 10 bytes was sent, and post another wsasend if necessary?
There is no guarantee but it's highly unlikely that a send that is less than a single operating system page size would partially fail.
Failures are more likely if you're sending a buffer that is more than a single operating system page size in length and if you're not actively managing how many outstanding operations you have and how many your system can support before running out of "non paged pool" or hitting the "I/O page lock limit"
It's only possible to recover from a partial failure if you never have any other sends pending on that connection.
I tend to check that the value is as expected in the completion handler and abort the connection with an RST if it's not. I've never had this code execute in production and I've been building lots of different kinds of IOCP based client and server systems for well over 10 years now.

Is DataSnap Optimized for responding to more than 1k users at the same time?

We want to start a big multi-tier application. The server side application must respond to more than 1000 users at the same time. We want to create server application by 64 bit compiler and client side with 32 bit. In this case we don't know DataSnap can respond to all client without any problem or not?
In this case The Server computer is very powerful (multi-processor and more than 16GB of RAM) and DataBase Management system is FireBird 2.5.
You need a way to perform realistic load tests.
For the Firebird database, you can simulate concurrent users with the free Apache JMeter tool. It can run SQL statements and record their execution time statistics (average, min/max etc.). So you could for example create a thread group with twenty different SQL queries, and then run twenty threads which each will perform these queries sequentially.
JMeter allows to define time limits on the SQL query, and JMeter treats it as an error if the query exceeds this limit. Then you can try to find the maximum client count where the overall error rate is still less than (for example) five percent.
But you also need to know how high the expected database load will be, and you will also need to have a test database with a realistic size, not only a couple of records. Also, some database queries like reports might cause higher load - these should be included in the simulation too, as they can affect overall performance. In JMeter, you can create a second thread group, running in parallel with the first one, for these long-running statements with different settings (less simulated clients).
Testing the database will show if there is a bottleneck already in this area. For example, the test result could be that the database can serve twenty clients with a total average transaction rate of 20 TPS (transactions per second), which means one client executes one transaction per second. But this TPS value will decrease with higher user count.
Related question: Firebird usage in big projects which also has a link to http://www.firebirdsql.org/en/case-studies-catalog/
Regarding DataSnap client load simulation: this can be done with a scripted client, which runs a predefined set of statements / commands over the connection.
To run a high number of load test clients simultaneaously you could use a service like Amazon Elastic Compute Cloud (EC2), to launch clones of your test machine image, saving you hardware costs. But of course I would start with a small client machine which simply runs ten or twenty scripted clients.
As far as I know DataSnap is based on Indy. And Indy's connection handling model is not very scaleable - one thread per connection, which is very resource consuming. Even using Indy's thread pools is not an option I think... Also in Windows (32 bit) for example there is a limit of the maximum threads you can create (2000 IIRC). Anyway - using many threads is not good and hits performance of the server (for reference - Windows Internals book, Windows Performance Team blog etc.)
A scalable, robust and professional application server would use IO Completion ports (IOCP) for data processing. But I don't know if DataSnap can take advantage of it.
UPDATE:
On the CodeRage7 I asked similar scaleability questions. Here are the answers:
Q: Recently there was a question on StackOverflow about DataSnap's scaleability/performance. So can DS handle, for example, 2000 or more concurent user request at the network and application level?
A: the scalability is based on scalability of TCP/HTTP/HTTPs and # of connections allowed in your server operating system. Also based on memory and hardware you employ. There is no specific limit in DataSnap.
My comment: While this is true, Indy's connection handling model, i.e. one thread per connection, introduces bottleneck especially in 32 bit Windows (2000 threads max). In Win64 it should not be so much problem, but again - this kind of handling data flow leads to performance degradation.
Q: Does DataSnap support some kind of load balancing?
A: Not directly. You can do this in code in your DataSnap server(s).
My comment: I've found very good paper on implementing Failover/Load Balancing in DataSnap in Andreano Lanusse's blog
Q: Does DataSnap support IO Completion ports for better scaleability?
This my question was left unanswered.
Hope this helps!
UPDATE2:
I found very interesting post on DS Performance: DataSnap analysis based on Speed & Stability tests
UPDATE3:
DataSnap, Deployment, Performance, and More (Marco Cantu)
Monitoring and control of connections in DataSnap XE2 - translated in English
Monitoring and control of connections in DataSnap XE2 - original
When the specifications for a system are made, you need to be very precise when it's about multiple users.
For example: you create a website, and the client expects 15.000 unique users.
Then the client usually comes up with a requirement that the system needs to support 15.000 simultaneous users, which is very naive.
You'll need a more detailed specification than that.
Usually it's more sensible to say something like: in 99% of the requests, 99% of the users can get a response to their request within 5 seconds average.
In normal usage, you'll never see all users send a request within the same second. If at some point they all arrive within the same minute (also very unlikely), you'll have a lot fewer concurrent users.
Even for websites with tens of thousands of users, where most of them connect on a daily basis, the webserver is idle most the time, and once and a while it jumps to 5% or in extreme cases to 20%. If we really have to serve all of these users at once we'd be screwed, but that never happens, and it's not realistic to prepare a server for such loads.

Scalable Delphi TCP server implementation

Any suggestions for components to use as a base for a scalable TCP server? I currently have an implementation that uses Indy which works well for say 100 relatively active connections or 1,000 relatively inactive connections, but the one thread per connection model limits the number of concurrent active connections that can be handled.
Let's say my goal might be 1,000 connections each processing 10 messages per second or 10,000 connections each processing 1 message per second on a good server (8-16 cores). Is this realistic? I'd really like to hear of any real-world implementations because I have found that what might work in theory does not necessarily work in practice and I do not want to be chasing a proposed solution that will not work.
Edit: IOCP would be good, but I only want to use commercial-grade classes/components, so they need to be as "professional" as Indy or IP*Works before I would think of using them. Furthermore, I have no intention of "rolling my own" solution - it would take too much time to make it commercial-grade. Lastly, I am looking for a significant improvement on what I already have. I am sure I can squeeze at least 20-50% more out of what I have (based on Indy), but I am never going to be able to handle 10,000 concurrent clients, or 10,000 messages per second, no matter how hard I try. Whether there is something out there that meets these conditions is another matter.
I have decided to accept the answer referring to the IOCP classes, even though I have not used them, because they look like the best path for investigation at this stage.
There is a project at http://voipobjects.com/ which is based on the former iopcclasses project.
It claims to handle thousands simultaneous connections:
IOCP engine is set of classes, components and routines for rapid
creation high scalable and performance TCP/UDP applications.
Application created using IOCP classes can handle thousands
simultaneous connections.
Library is written in Delphi - Delphi 7 - 2010 are supported.
Library uses IO completion ports technology. There is most powerful
technology in Win32 world for creation highly scalable and performance
TCP/UDP applications. This technology is supported in all desktop
Windows OSes except old Win9x/WinME versions.
This library is licensed under MPL1.1. Also It includes some files
from Jedi project (Winsock2 header translation).
https://bitbucket.org/voipobjects/iocpengine
My favorite Delphi network layer is ICS by Francois Piette. It's fantastically easy to understand, very scalable, and ultra-high performance. Free, and open source. Will probably scale to 1000 clients for most people, without significant effort, and without the complexity that gives me trouble when I use Indy.
I got about a 20% scalability/performance boost from switching all my stuff from Indy to ICS.
You should look at RealThinClient SDK http://www.realthinclient.com/about.htm
Well proven solution. Good support. Test results for different server solutions on the home page.
The real deciding fact is what you plan on doing on each of those transactions.
I use Indy with Network Load Balanced windows servers. One of these Delphi applications is serviced by 3 physical servers listening one public IP address where we have received millions of requests since yesterday with zero errors. Load overnight is pretty idle so the actual requests are around 350/second/server during the day and there's plenty of room for growth.
If there's not a lot of CPU/Memory needed per transaction you might get away with it on one box using Indy. It all depends on the load...as you likely can't write to 1000 different files every second.
There's other items to worry about too - like the OS supporting this amount of activity. You may need to tweak some registry settings. (see this stackoverflow question)
IOCP is the way to go for ultra-capacity servers. I have used Indy for ease of use in implementation/debugging for a very long time. I have my own IOCP implementation that I wrote years ago but never rolled it out on production as we simply haven't needed to.
My simple advice - I'd highly suggest rolling it out in Indy, using NLB as your crutch for load, and after that if you are still desiring the utmost speed, write your own IOCP implementation so you can craft it towards your specific requirements. Note that this is based on knowing nothing of the actual implementation requirements.
I've tried multiple Delphi solutions to do networking, and found that many if not all solutions add complexity and code which impacts on either performance or footprint or both. So I started searching for the lightest wrapper around the winsock API. The I (re)discovered Delphi's own TTcpClient and TTcpServer components. Used in blocking mode, and overriding TCustomTcpServer overriding the DoAccept method, I've had the best results till now.
If you expect to have a really high number of incoming connections and (small) responses to serve to (small) requests, it's highly advisable to implement I/O completion ports, as this handles incoming requests better.
I have been using ICS for the last 12 years. It is non-blocking. I support upto 2000 concurrent connections, each getting atleast 5000 bytes per second, sent as 1000 bytes every 200 msec. Never faced any problem and cpu usage of the app is very small.
Good support in forum, but not required at all.
Shekar

What do a benefit from changing from blocking to non-blocking sockets?

We have an application server developed with Delphi 2010 and Indy 10. This server receives more than 50 requests per second and it works well. But in some cases, it seems to me that Indy is very obscure. Their components are good, but sometimes I found myself digging into the source code only to understand a simple thing. Indy lacks on good documentation and good support.
The last thing that i came across was a big problem for me: I must detect when a client disconnects non gracefully (When the the client crashes or shutdown, for instance. Not telling the server that it will disconnect) and indy was not able to do that. If I want that, I will have to develop a algorithm like heartbeat, pooling or TCP keep-alive. I do not want to spend more time doing a, at least I think, component job. After a few study, I found out that this is not Indy's fault, but this is an issue of all blocking sockets components.
Now I am really thinking of changing the core of the Server to another good suite. I must admit I am tending to use a non-blocking socket. Based on that, I have some questions:
What do a benefit from changing from blocking to non-blocking sockets?
Will I be able to detect client disconnects (non gracefully)?
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
I know this must be a subjective question, but I really want to hear that from you. My first question is the one I care most. I do not care if I have to pay 100, 500, 1000, 10000 dollars, but I want a complete solution. For now, I am thinking about Ip*works .
EDIT
I think some guys are not understand what I want. I don't want to create my own socket. I have been working with sockets for a long time and I am getting tired of it. Really.
And non-blocking sockets CAN detect client disconnects. That is a fact and it has good documentation all over the internet. A non-blocking socket checks the socket state for new incoming data all the time, and it makes possible to detect that the socket is not valid. This is not a heartbeat algorithm. A heartbeat algorithm is used on client side and it sends periodically packets (aka keep-alive) to the server to tells it is still alive.
EDIT
I am not make myself clear. Maybe because English is not my main language. I am not saying that it is possible to detect a dropped connection without trying to send or receiving data from a socket. What I am saying is that every non-blocking socket is able to do that because they constantly tries to read from the socket for new incoming data. Why is that so hard to understand? If you guys download and run ip*works demos, in special, the echoserver and echoclient ones (both use TCP) you can test by yourselves. I already tested it, and it works like I expected to do. Even if you use the old TCPSocketServer and TCPSocketClient in a non-blocking mode you will see what I meant.
"What do a benefit from changing from blocking to non-blocking sockets? Will I be able to detect client disconnects (non gracefully)?"
Just my two cents to get the ball rolling on this question - I'm not a socket EXPERT, but I do have a good deal of experience with them. If I'm mistaken, I'm sure someone will correct me... :-)
I assume that since you're running a server using blocking sockets with 50 connections per second, you have a threading mechanism in place to handle client requests. If so, you don't really stand to gain anything from non-blocking sockets. On the contrary - you will have to change your server logic to be event driven- based on events fired in your main thread from the non-blocking sockets, or use constant polling to know what your sockets are up to.
Non-blocking sockets can't detect clients disconnecting without notification any more than blocking sockets can - they don't have telepathic powers... The nature of the TCP/IP 'conversation' between client and server is the same - blocking and non-blocking is only with respect to your application's interaction with the socket connection conducting the 'conversation'.
If you need to purge dead connections, you need to implement a heartbeat or timeout mechanism on your socket (I've never seen a modern socket implementation that didn't support timeouts).
What do a benefit from changing from blocking to non-blocking sockets?
Increased speed, availability, and throughput (from my experience). I had an IndySockets client that was getting about 15 requests per second and when I went directly to asynchronous sockets the throughput increased to about 90 requests per second (on the same machine). In a separate benchmark test on a server at a data-center with a 30 Mbit connection I was able to get more than 300 requests per second.
Will I be able to detect client disconnects (non gracefully)?
That's one thing I haven't had to try yet, since all of my code has been on the client side.
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
You can build your own socket client in a couple of days and it can be very robust and fast... much faster than most of the stuff I've seen "off the shelf". Feel free to take a look at my asynchronous socket client: http://codesprout.blogspot.com/2011/04/asynchronous-http-client.html
Update:
(Per Mikey's comments)
I'm asking you for a generic, technical explanation of how NBS increase throughput as opposed to a properly designed BS server.
Let's take a high load server as an example: say your server is supposed to handle 1000 connections at any given time, with blocking sockets you would have to create 1000 threads and even if they're mostly idle, the CPU will still spend a lot of time context switching. As the number of clients increases you will have to increase the number of threads in order to keep up and the CPU will inevitably increase the context switching. For every connection you establish with a blocking socket, you will incur the overhead of spawning of a new thread and you eventually you will incur the overhead of cleaning up after the thread. Of course, the first thing that comes to mind is: why not use the ThreadPool, you can reuse the threads and reduce the overhead of creating/cleaning-up of threads.
Here is how this is handled on Windows (hence the .NET connection): sure you could, but the first thing you'll notice with the .NET ThreadPool is that it has two types of threads and it's not a coincidence: user threads and I/O completion port threads. Asynchronous sockets use the IO completion ports which "allows a single thread to perform simultaneous I/O operations on different handles, or even simultaneous read and write operations on the same handle."(1) The I/O completion port threads are specifically designed to handle I/O in a much more efficient way than you would ever be able to achieve if you used the user threads in ThreadPool, unless you wrote your own kernel-mode driver.
"The com­ple­tion port uses some spe­cial voodoo to make sure only a spe­cif­ic num­ber of threads can run at once — if one thread blocks in ker­nel-​mode, it will au­to­mat­i­cal­ly start up an­oth­er one."(2)
There are other advantages also: "in addition to the nonblocking advantage of the overlapped socket I/O, the other advantage is better performance because you save a buffer copy between the TCP stack buffer and the user buffer for each I/O call." (3)
I am using Indy and Synapse TCP libraries with good results for some years now, and did not find any showstoppers in them. I use the libraries in threads - client and server side, stability and performance was not a problem. (Six thousand request and response messages per second and more with the server running on the same system are typical.)
Blocking sockets are very useful if the protocol is more advanced than a simple 'send a string / receive a string'. Non-blocking sockets cause a higher coupling of message protocol handlers with the socket read / write logic, so I quickly moved away from non-blocking code.
No library can overcome the limitations of the TCP/IP protocol regarding detection of connection loss. Only trying to read or send data can tell wether the connection is still present.
In Windows, there is a third option which is overlapped I/O. Non-blocking sockets are essential a model using Windows messages developed to avoid single-threaded GUI apps to become "blocked" while waiting for data. A modern application IMHO would be better designed using threads and overlapped I/O.
See for example http://support.microsoft.com/kb/181611
Aahhrrgghh - the myth of being able to always detect "dropped" connections. If you pull the power on a machine with a client connection then the server cannot tell, without sending data, that the connection is "dead". The is through the design of the TCP protocol. Don't take my word for it - read this article (Detection of Half-Open (Dropped) TCP/IP Socket Connections).
This article explains the main differences between blocking and non-blocking:
Introduction to Indy, by Chad Z. Hower
Pros of Blocking
Easy to program - Blocking is very easy to program. All user code can
exist in one place, and in a
sequential order.
Easy to port to Unix - Since Unix uses blocking sockets, portable code
can be written easily. Indy uses this
fact to achieve its single source
solution.
Work well in threads - Since blocking sockets are sequential they
are inherently encapsulated and
therefore very easily used in threads.
Cons of Blocking
User Interface "Freeze" with clients - Blocking socket calls do not
return until they have accomplished
their task. When such calls are made
in the main thread of an application,
the application cannot process the
user interface messages. This causes
the User Interface to "freeze" because
the update, repaint and other messages
cannot be processed until the blocking
socket calls return control to the
applications message processing loop.
He also wrote:
Blocking is NOT Evil
Blocking sockets have been repeatedly
attacked with out warrant. Contrary to
popular belief, blocking sockets are
not evil.
It is not is an issue of all blocking sockets components that they are unable to detect a client disconnect. There is no technical advantage on the side of non-blocking components in this area.

Delphi Server Socket component

We have a C/S application all written in Delphi (Client and Server-or middleware if you want)
For the client part we use Indy.
For the server we use DXSock.
Since DXSock is dead for a while we are investigating alternatives for the sever part.
I want to hear some comments about the best Server Socket alternative component for Delphi.
The current system usually have tens of permanent connections working each one on its own thread but could be hundreads in the future (this should be improved to a thread pool if possible)
If you want to have the best possible performance, you'd have to use sockets in non blocking mode, or using completion ports. IPWorks is implemented like that, as well as iocp. As far as I can tell, Indy or Synapse don't implement them (at least officially).
We used completion ports and a thread pool in our open source SynCrtSock unit, used in our Synopse SQLite3 framework.
Here are some benchmarks of this solution, working from Delphi 6 up to Delphi XE. I don't tell this is the "best component", but it's a working and speedy one (every request is about 4 KB of JSON data):
Http client keep alive (i.e. one HTTP/1.1 client connection kept alive during requests):
first in 7.87ms, done in 153.37ms i.e. 6520/s, average 153us
Http client multi connect (i.e. one new HTTP/1.0 client connection created for each request - this one uses completion ports and a thread pool):
first in 151us, done in 305.98ms i.e. 3268/s, average 305us
For speed comparison, here are other communication protocols available in our framework:
Named pipe access:
first in 78.67ms, done in 187.15ms i.e. 5343/s, average 187us
Local window messages:
first in 148us, done in 112.90ms i.e. 8857/s, average 112us
Direct in process access:
first in 44us, done in 41.69ms i.e. 23981/s, average 41us
We use HTTP/1.1 protocol over TCP/IP, because there is very little overhead over plain TCP/IP, and this is a well handled protocol for firewalls and such, and allows our framework to be used by an AJAX application, whereas its main purpose is to serve Delphi clients.
IMHO there is no "best Server Socket alternative component for Delphi", it depends what is the purpose of your server application. The main bottleneck will be in the Windows kernel itself. Perhaps direct access to the HTTP Kernel-Mode Driver (Http.sys) of Windows could help.
Consider using a dedicated optimized Server instead of a Delphi server, like lighttpd or Cherokee using FastCGI to handle the requests via a Free Pascal (or CrossKylix) application, under Linux. I guess this will be the best performance possible.
I use Indy components for commercial server-side work and the component set is pretty solid (9 or 10). My servers have millions of connections per day with no issues.
I used DXSock many moons ago. He was always optimizing, but never seemed to finish it. He does seem to have another version out.
If you want commercial support, then I'd recommend IPWorks from nSoftware.
Actually DXSock is not dead, v6.1 was just released. The web hosting company we used to use in Tennessee lost the domain - so only customers who have kept their subscription renewed annually have received DXSock 5.0, 6.0 and 6.1.
Indy CANNOT support more than 2,000 concurrent connections on 32bit Windows - as Chad and crew use TThread, which implements the defacto 1MB per thread/socket connection - 2000x1MB = >2.5GB of RAM which 32bit OSes do not support. DXSock implements a 0b per connection model (unless you define otherwise) and can handle over 50,000 concurrent on Windows, Linux, Mac, Pi, etc.
Ozz Nixon - ozznixon#bpdx.com if you want more details on 6.1
Author of DXSock
Co-Author of Winshoes which became INDY.

Resources