MPI: How many sockets? - scalability

I am working on an MPI application, which uses threaded-MPI calls between processes. Threads are added and removed as per the load requirements. Now, I have a question, which I could not find answer in the open-mpi forum.
If a set of MPI processes ("ranks") already has a connection, ie, they are already making send-receive calls, and then a new thread comes in (either processes) which also makes the send-receive calls between the same MPI peers, would MPI open up new set of sockets?
I know that the details are implementation dependent, so there may not be a general answer. But, is there a way to find out?
There are questions on the scalability of this technique, which was chosen for other reasons. It would be great to get some stats, on the number of new sockets per connection.
Anyone knows how to do this? For instance, query which socket is a particular instance of MPI_Send writing to?
I already tried adding --mca btl self,sm,tcp --mca btl_base_verbose 30 -v -report-pid -display-map -report-bindings -leave-session-attached
Thanks a lot.

To answer my own question, here is what I learnt from brilliant folks at Open-MPI:
On Jan 24, 2012, at 5:34 PM, devendra rai wrote:
I am trying to find out how many separate connections are opened by MPI as messages are sent. Basically, I have threaded-MPI calls to a bunch of different MPI processes (who, in turn have threaded MPI calls).
The point is, with every thread added, are new ports opened (even if the sender-receiver pairs already have a connection between them)?
In Open MPI: no. The underlying connections are independent of how many threads you have.
Is there any way to find out? I went through MPI APIs, and the closest thing I found was related to cartographic information. This is not sufficient, since this only tells me the logical connections (or does it)?
MPI does not have a user-level concept of a connection. You send a message, a miracle occurs, and the message is received on the other side. MPI doesn't say anything about how it got there (e.g., it may have even been routed through some other process).
Reading Open MPI FAQ, I thought adding "--mca btl self,sm,tcp --mca btl_base_verbose 30 -display-map" to mpirun would help. But I am not getting what I need. Basically, I want to know how many ports each process is accessing (reading as well as writing).
For Open MPI's TCP implementation, it's basically one TCP socket per peer (plus a few other utility fd's). But TCP sockets are only opened lazily, meaning that we won't open the socket until you actually send to a peer.
--
Jeff Squyres
Credits to Jeff Squyres.

Related

How do I increase the priority of a TCP packet in Delphi?

I have a server application that receives some special TCP packet from a client and needs to react to it as soon as possible by sending an high-level ACK to the client (the TCP ACK won't suite my needs).
However, this server is really network intensive and sometimes the packet will take too long to be sent (like 200ms in a local network, when a simple server application can send it in less than 1ms).
Is there a way to mark this packet with a high-priority tag or something like that in Delphi? Or maybe with the Win32 API?
Thanks in advance.
EDIT
Thanks for all the answers so far. I'll add some details. My product has the following setup: there are several devices that are built upon vehicles with WIFI conectivity. When they arrive at the garage, those device connect to my server and start to transmit data.
Because of hardware limitations, I implemented a high-level ACK to make the device aware that the last packet arrived successfully (please, don't argue about this - the data may be broken even if I got a correct TCP ACK). However, if I use my server software, that communicates with a remote database, to issue this ACK, I get very long delay (>200ms). If I use an exclusive software to do this task, I get small latencies (<1ms). So, I was imagining if I could just tell Windows to send those special packets first, as it seems to me that this package is getting delayed so the database ones can get delivered.
That's the motivation behind my question.
EDIT 2
As requested: this is legacy software and I'm using the legacy dclsockets140.bpl package and Delphi 2010 (14.0.3593.25826).
IMO it is very difficult to realize this. there are a lot of equipment and software involved. first of all, if you communicate between 2 different OS's you got a latency. second, soft and hard firewalls, antiviruses, everything is filtering/delaying your package.
you can try also to 'hack' the system(this involve some very good knowledge on how the frames/segments are packed/send,flow control,congestion,etc), either by altering it from code, either by using some tools like http://half-open.com/ or others.
In short, passing MSG_OOB flag to the send function marks the data as "urgent". Detailed discussion about the OOB in the context of Windows Sockets implementation specifics is available here.

WebSockets in Relation with TCP/IP Sockets on Misultin Erlang HTTP Library

i must say that i am impressed by Misultin's support for Web Sockets (some examples here). My JavaScript is firing requests and getting responses down the wire with "negligible" delay or lag, Great !!
Looking at how the data handler loop for WebSockets looks like, it resembles that of normal TCP/IP Sockets, atleast the basic way in Erlang
% callback on received websockets data
handle_websocket(Ws) ->
receive
{browser, Data} ->
Ws:send(["received '", Data, "'"]),
handle_websocket(Ws);
_Ignore ->
handle_websocket(Ws)
after 5000 ->
Ws:send("pushing!"),
handle_websocket(Ws)
end.
This piece of code is executed in a process which is spawned by Misultin, a function you give to it while starting your server like this below:
start(Port)->
HTTPHandler = fun(Req) -> handle_http(Req, Port) end,
WebSocketHandler = fun(Ws) -> handle_websocket(Ws) end,
Options = [{port, Port},{loop, HTTPHandler},{ws_loop, WebSocketHandler}],
misultin:start_link(Options).
. More Code about this, check out the example page.I have several questions.
Question 1: Can i change the controlling Process of a Web Socket as we normally do with the TCP/IP Sockets in Erlang ? (we normally use: gen_tcp:controlling_process(Socket,NewProcessId))
Question 2: Is Misultin the only Erlang/OTP HTTP library which supports WebSockets ? Where are the rest ?
EDIT :
Now, the reason why i need to be able to transfer the WebSocket control from Misultin
Think of a gen_server that will control a pool of WebSockets, say its a game Server. In the current Misultin Example, for every WebSocket Connection, there is a controlling process, in other-words for every WebSocket, there will be a spawned process. Now, i know Erlang is a hero with Processes but, i do not want this, i want these initial processes to die as soon as they handle over to my gen_server the control authority of the WebSocket.
I would want this gen_server to switch data amongst these WebSockets. In the current implementation, i need to keep track of the Pid of the Misultin handle_websocket process like this:
%% Here is misultin's control process
%% I get its Pid and save it somewhere
%% and link it to my_gen_server so that
%% if it exits i know its gone
handle_websocket(Ws)->
process_flag(trap_exit,true),
Pid = self(),
link(my_gen_server),
save_connection(Pid),
wait_msgs(Ws).
wait_msgs(Ws)->
receive
{browser,Data}->
FromPid = self(),
send_to_gen_server(Data,FromPid),
handle_websocket(Ws);
{broadcast,Message} ->
%% i can broadcast to all connected WebSockets
Ws:send(Message),
handle_websocket(Ws);
_Ignore -> handle_websocket(Ws)
end.
Above, the idea works very well, whereby i save all controlling process into Mnesia Ram Table and look it up against a given criteria if the application wants to send to that particular user a message. However, with what i want to achieve, i realise that in the real-world, the processes may be so many that my server may crash. I want atleast one gen_server to control thousands of the Web Sockets than having a process for each Web Socket, in this way, i could some how conserve memory.
Suggestion: Misultin's Author could create Web Socket Groups implementation for us in his next release, whereby we can have a group of WebSockets controlled by the same process. This would be similar to Nitrogen's Comet Groups in which comet connections are grouped together under the same control. If this aint possible, we will need the control ourselves, provide an API where we can take over the control of these Web Sockets.
What do you Engineers think about this ? What is your suggestion and/or Comment about this ? Misultin's Author could say something about this. Thanks to all
(one) Cowboy developer here.
I wouldn't recommend using any type of central server being responsible for controlling a set of websocket connections. The main reason is that this is a premature optimization, you are only speculating about the memory usage.
A test done earlier last year for half a million websocket connections on a single server resulted in misultin using 20GB of memory, cowboy using 16.2GB or 14.3GB, depending on if the websocket processes were hibernating or not. You can assume that all erlang implementations of websockets are very close to these numbers.
The difference between cowboy not using hibernate and misultin should be pretty close to the memory overhead of using an extra process per connection. (feel free to correct me on this ostinelli).
I am willing to bet that it is much cheaper to take this into account when buying the servers than it is to design and resolve issues in an application where you don't have a 1:1 mapping between tasks/resources and processes.
https://twitter.com/#!/nivertech/status/114460039674212352
Misultin's author here.
I strongly discourage you from changing the controlling process, because that will break all Misultin's internals. Just as Steve suggested, YAWS and Cowboy support WebSockets, and there are implementations done over Mochiweb but I'm not aware of any being actively maintained.
You are discussing about memory concerns, but I think you are mixing concepts. I cannot understand why you do need to control everything 'centrally' from a gen_server: your assumption that 'many processes will crash your VM' is actually wrong, Erlang is built upon the actor's model and this has many advantages:
performance due to multicore usage which is not there if you use a single gen_server
being able to use the 'let it crash' philosophy: currently it looks like your gen_server crashing would bring down all available games
...
Erlang is able to handle hundreds of thousands processes on a single VM, and you'll be out of available file descriptors for your open Sockets way before that happens.
So, I'd suggest you consider having your game logic within individual Websocket processes, and use message passing to make them interact. You may consider spawning 'game processes' which hold information of a single game's participants and status, for instance. Eventually, a gen_server that keeps track of the available games - and does only that (eventually by owning an ETS table). That's the way I'd probably want to go, all with the appropriate supervisors' structure.
Obviously, I'm not sure what you are trying to achieve so I'm just assuming here. But if your concern is memory - well, as TRIAL AND ERROR EXP said here below: don't premature optimize something, especially when you are considering to use Erlang in a way that looks like it might actually limit it from doing what it is capable of.
My $0.02.
Not sure about question 1, but regarding question 2, Yaws and Cowboy also support WebSockets.

What's the best way to 'ping' thousands of servers every minute?

I run a server monitoring site for a video game. It monitors thousands of servers (currently 15,000 or so).
My current setup is a bit janky, and I want to improve it. Currently I use cron to submit every server to a resque job queue. I refill the queue just as soon as it's empty, essentially creating a constantly working queue. The job will then simply try and open a socket connection to the server ip and port in question, and mark it down if it fails to connect.
I have 20 workers, and it gets the job done in about 5 minutes. I feel that this should be able to go MUCH faster.
Is there a better, quicker way of doing this?
So, what you are doing currently I assume is doing a TCP socket connection which pings your game server. The problem with using TCP is obviously that it is a lot slower than UDP.
What I would advise instead is creating a UDP socket that just checks for the game server port.
Here's a nice quote from another question:
> UDP is really faster than TCP, and the simple reason is because
> it's non-existent acknowledge packet (ACK) that permits a continuous
> packet stream, instead of TCP that acknowledges each packet.
Read this question here: UDP vs TCP, how much faster is it?
From my experience with game servers, the majority if not 100% of all modern game servers allow you to query them on a UDP socket. This will then respond with details on the game server. (I used to host a lot of servers myself too).
So basically, make sure that you are using UDP rather than TCP...
Example Query
I'm just searching for this information now and will update my question...when I find some source.. what game is it that you are trying to get information for?
Use typical solutions for typical tasks. This case is about available detection every n seconds - one of daily sysadmin task. It should not be over ICMP, use SNMP over UDP proto. One of complete solution is Nagious/Cacti/Zabbix, which have built-in functionality to combine everything about your servers: LA, HDD, RAM, IO, NET as well as available detection.
You don't mention how you are making the socket connections, but you might want to try using ruby curl bindings: curb instead of net/http.
This will typically be much faster.

What do a benefit from changing from blocking to non-blocking sockets?

We have an application server developed with Delphi 2010 and Indy 10. This server receives more than 50 requests per second and it works well. But in some cases, it seems to me that Indy is very obscure. Their components are good, but sometimes I found myself digging into the source code only to understand a simple thing. Indy lacks on good documentation and good support.
The last thing that i came across was a big problem for me: I must detect when a client disconnects non gracefully (When the the client crashes or shutdown, for instance. Not telling the server that it will disconnect) and indy was not able to do that. If I want that, I will have to develop a algorithm like heartbeat, pooling or TCP keep-alive. I do not want to spend more time doing a, at least I think, component job. After a few study, I found out that this is not Indy's fault, but this is an issue of all blocking sockets components.
Now I am really thinking of changing the core of the Server to another good suite. I must admit I am tending to use a non-blocking socket. Based on that, I have some questions:
What do a benefit from changing from blocking to non-blocking sockets?
Will I be able to detect client disconnects (non gracefully)?
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
I know this must be a subjective question, but I really want to hear that from you. My first question is the one I care most. I do not care if I have to pay 100, 500, 1000, 10000 dollars, but I want a complete solution. For now, I am thinking about Ip*works .
EDIT
I think some guys are not understand what I want. I don't want to create my own socket. I have been working with sockets for a long time and I am getting tired of it. Really.
And non-blocking sockets CAN detect client disconnects. That is a fact and it has good documentation all over the internet. A non-blocking socket checks the socket state for new incoming data all the time, and it makes possible to detect that the socket is not valid. This is not a heartbeat algorithm. A heartbeat algorithm is used on client side and it sends periodically packets (aka keep-alive) to the server to tells it is still alive.
EDIT
I am not make myself clear. Maybe because English is not my main language. I am not saying that it is possible to detect a dropped connection without trying to send or receiving data from a socket. What I am saying is that every non-blocking socket is able to do that because they constantly tries to read from the socket for new incoming data. Why is that so hard to understand? If you guys download and run ip*works demos, in special, the echoserver and echoclient ones (both use TCP) you can test by yourselves. I already tested it, and it works like I expected to do. Even if you use the old TCPSocketServer and TCPSocketClient in a non-blocking mode you will see what I meant.
"What do a benefit from changing from blocking to non-blocking sockets? Will I be able to detect client disconnects (non gracefully)?"
Just my two cents to get the ball rolling on this question - I'm not a socket EXPERT, but I do have a good deal of experience with them. If I'm mistaken, I'm sure someone will correct me... :-)
I assume that since you're running a server using blocking sockets with 50 connections per second, you have a threading mechanism in place to handle client requests. If so, you don't really stand to gain anything from non-blocking sockets. On the contrary - you will have to change your server logic to be event driven- based on events fired in your main thread from the non-blocking sockets, or use constant polling to know what your sockets are up to.
Non-blocking sockets can't detect clients disconnecting without notification any more than blocking sockets can - they don't have telepathic powers... The nature of the TCP/IP 'conversation' between client and server is the same - blocking and non-blocking is only with respect to your application's interaction with the socket connection conducting the 'conversation'.
If you need to purge dead connections, you need to implement a heartbeat or timeout mechanism on your socket (I've never seen a modern socket implementation that didn't support timeouts).
What do a benefit from changing from blocking to non-blocking sockets?
Increased speed, availability, and throughput (from my experience). I had an IndySockets client that was getting about 15 requests per second and when I went directly to asynchronous sockets the throughput increased to about 90 requests per second (on the same machine). In a separate benchmark test on a server at a data-center with a 30 Mbit connection I was able to get more than 300 requests per second.
Will I be able to detect client disconnects (non gracefully)?
That's one thing I haven't had to try yet, since all of my code has been on the client side.
What component suite has the best product? By best product I mean: fast, good support, good tools and easy to implement.
You can build your own socket client in a couple of days and it can be very robust and fast... much faster than most of the stuff I've seen "off the shelf". Feel free to take a look at my asynchronous socket client: http://codesprout.blogspot.com/2011/04/asynchronous-http-client.html
Update:
(Per Mikey's comments)
I'm asking you for a generic, technical explanation of how NBS increase throughput as opposed to a properly designed BS server.
Let's take a high load server as an example: say your server is supposed to handle 1000 connections at any given time, with blocking sockets you would have to create 1000 threads and even if they're mostly idle, the CPU will still spend a lot of time context switching. As the number of clients increases you will have to increase the number of threads in order to keep up and the CPU will inevitably increase the context switching. For every connection you establish with a blocking socket, you will incur the overhead of spawning of a new thread and you eventually you will incur the overhead of cleaning up after the thread. Of course, the first thing that comes to mind is: why not use the ThreadPool, you can reuse the threads and reduce the overhead of creating/cleaning-up of threads.
Here is how this is handled on Windows (hence the .NET connection): sure you could, but the first thing you'll notice with the .NET ThreadPool is that it has two types of threads and it's not a coincidence: user threads and I/O completion port threads. Asynchronous sockets use the IO completion ports which "allows a single thread to perform simultaneous I/O operations on different handles, or even simultaneous read and write operations on the same handle."(1) The I/O completion port threads are specifically designed to handle I/O in a much more efficient way than you would ever be able to achieve if you used the user threads in ThreadPool, unless you wrote your own kernel-mode driver.
"The com­ple­tion port uses some spe­cial voodoo to make sure only a spe­cif­ic num­ber of threads can run at once — if one thread blocks in ker­nel-​mode, it will au­to­mat­i­cal­ly start up an­oth­er one."(2)
There are other advantages also: "in addition to the nonblocking advantage of the overlapped socket I/O, the other advantage is better performance because you save a buffer copy between the TCP stack buffer and the user buffer for each I/O call." (3)
I am using Indy and Synapse TCP libraries with good results for some years now, and did not find any showstoppers in them. I use the libraries in threads - client and server side, stability and performance was not a problem. (Six thousand request and response messages per second and more with the server running on the same system are typical.)
Blocking sockets are very useful if the protocol is more advanced than a simple 'send a string / receive a string'. Non-blocking sockets cause a higher coupling of message protocol handlers with the socket read / write logic, so I quickly moved away from non-blocking code.
No library can overcome the limitations of the TCP/IP protocol regarding detection of connection loss. Only trying to read or send data can tell wether the connection is still present.
In Windows, there is a third option which is overlapped I/O. Non-blocking sockets are essential a model using Windows messages developed to avoid single-threaded GUI apps to become "blocked" while waiting for data. A modern application IMHO would be better designed using threads and overlapped I/O.
See for example http://support.microsoft.com/kb/181611
Aahhrrgghh - the myth of being able to always detect "dropped" connections. If you pull the power on a machine with a client connection then the server cannot tell, without sending data, that the connection is "dead". The is through the design of the TCP protocol. Don't take my word for it - read this article (Detection of Half-Open (Dropped) TCP/IP Socket Connections).
This article explains the main differences between blocking and non-blocking:
Introduction to Indy, by Chad Z. Hower
Pros of Blocking
Easy to program - Blocking is very easy to program. All user code can
exist in one place, and in a
sequential order.
Easy to port to Unix - Since Unix uses blocking sockets, portable code
can be written easily. Indy uses this
fact to achieve its single source
solution.
Work well in threads - Since blocking sockets are sequential they
are inherently encapsulated and
therefore very easily used in threads.
Cons of Blocking
User Interface "Freeze" with clients - Blocking socket calls do not
return until they have accomplished
their task. When such calls are made
in the main thread of an application,
the application cannot process the
user interface messages. This causes
the User Interface to "freeze" because
the update, repaint and other messages
cannot be processed until the blocking
socket calls return control to the
applications message processing loop.
He also wrote:
Blocking is NOT Evil
Blocking sockets have been repeatedly
attacked with out warrant. Contrary to
popular belief, blocking sockets are
not evil.
It is not is an issue of all blocking sockets components that they are unable to detect a client disconnect. There is no technical advantage on the side of non-blocking components in this area.

easy to write a script to test whether the network is ever down for the next 24 hours?

is it easy to write a script to test whether the network is ever down for the next 24 or 48 hours? I can use ssh to connect to a shell and come back 48 hours later to see if it is still connected to see if the network has ever been down, but can i do it programmatically easily?
The Internet (and your ethernet) is a packet-switched network, which makes the definition of 'down' difficult.
Some things are obvious; for example, if your ethernet card doesn't report a link, then it's down (unless you have redundant connections). But having a link doesn't mean its up.
The basic, 100 mile view of how the Internet works is that your computer splits the data it wants to send into ~1500-byte segments called packets. It then, pretty much blindly, sends them on their way, however your routing table says to. Then that machine repeats the process. Eventually, through many repetitions, it reaches the remote host.
So, you may be tempted to define up as the packet reached its destination. But, what happens if the packet gets corrupted, e.g., due to faulty hardware or interference? The next router will just discard it. Ok, that's fine, you may well want to consider that down. What if a router on the path is too busy, or the link it needs to be sent on is full? The packet will be dropped. You probably don't want to count that as down.
Packets routinely get lost; higher-level protocols (e.g., TCP) deal with it and retransmit the packet. In fact, TCP uses the packet drop on link full behavior to discover the link bandwidth.
You can monitor packet loss with a tool like ping, as the other answer states.
If your need is more adminstrative in nature, and using existing software is an option, you could try something like monit:
http://mmonit.com/monit/
Wikipedia has a list of similar software:
http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems
You should consider also whether very short outages need to be detected. Obviously, a periodic reachability test cannot guarantee detecting outages shorter than the testing interval.
If you only care about whether there was an outage, not how many there were or how long they lasted, you should be able to automate your existing ssh technique using expect pretty easily.
http://expect.nist.gov/
Most platforms support a ping command that can be used to find out if a network path exists to an IP address somewhere "else". Where, exactly, to check depends on what you are really trying to answer.
If the goal is to discover the reliability of your first hop to your ISP's network, then pinging a router or their DNS regularly might be sufficient.
If your concern is really the connection to a single node (your mention of leaving an ssh session open implies this) then just pinging that node is probably the best idea. The ping command will usually fail in a way that a script can test if the connection times out.
Regardless, it is probably a good idea to check at a rate no faster than once a minute, and slower than that is probably sufficient.

Resources