What is the performance overhead of Apache ActiveMQ vs. raw sockets? - network-programming

We're looking to implement ActiveMQ to handle messaging between two of our servers, over a geographically diverse environment (Australia to the UK and back, via the internet).
I've been looking for some vague indicators of performance round the net but so far have had no luck.
My question: compared to a DIY TCP/SSL implementation of basic messaging, how would ActiveMQ perform? Similar systems of our own can send and receive messages across Australia in 100-150ms, over a SSL layer with an already established connection.
Also, does ActiveMQ persist its TLS/SSL connections, thus saving a substantial amount of time that would already be used in connection creation/teardown?
What I am hoping is that it will at least perform better than HTTPS, at a per-request level.
I am aware that performance can vary remarkably, depending on hardware, networks, code and so on. I'm just after something to start with.
I know the above is a little fuzzy - if you need any clarification please let me know and I will only be too happy to oblige.
Thank you.

What Tim means is that this is not an apples to apples comparison. If you are solely concerned with the performance of a single point to point connection to transfer data, a direct link will give you a good result (although DIY is still a dubious design decision). If you are building a system that requires the transfer of data and you have more complex functional requirements, then a broker-based messaging platform like ActiveMQ will come into play.
You should consider broker-based messaging if you want:
a post-office style system where a producer sends a message, and knows that it will be consumed at some point, even if there is no consumer there at that time
to not care where the consumer of a message is, or how many of them there are
a guarantee that a message will be consumed, even if the consumer that first handle it dies mid-way through the process (transactions, redelivery)
many consumers, with a guarantee that a message will only be consumed once - queues
many consumers that will each react to a single message - topics
These patterns are pretty standard, and apply to all off the shelf messaging products. As a general rule, DIY in this domain is a bad idea, as messaging is complex (see http://www.ohloh.net/p/activemq/estimated_cost for an estimate of how long it would take you do do same); and has many existing implementations of various flavours (some without a broker) that are all well used, commercially supported and don't require you to maintain them. I would think very hard before going down to the TCP level for any sort of data transfer as there is so much prior art.

Related

why and when i need mqtt broker for IOT/M2M application

Just asking one silly question, hope someone can answer this.
I'm bit confused regarding MQTT broker. Basically, the confusion is, there are so many things being used for data storing, transfer and processing (like Flume, HDInsight, Spark etc). So, when and why I need to use one MQTT broker?
If I would like to use Windows 10 IoT application with HiveMQ, from where can I get the details? how to use it? How I get benefit out of this MQTT broker? Can I not send data from my IoT application directly using Azure or HDFS? So, how MQTT broker fits into it or helping me to achieve something?
I'm new to all these and tried to find some tutorials, however, I'm not getting anything proper. Please explain it in more details or give some tutorials for this?
MQTT is a client-server protocol for pub-sub based transport that has a comparatively small overhead, and thus applicable to mobile and IoT applications (unlike Flume, etc.). The MQTT broker is basically a server that handles messaging to/from MQTT clients and among them. The functionality pretty much stops at the transport layer, even though various MQTT add-ons exist.
If you are looking to implement a solution that would reliably transfer data from your IoT devices to the back-end system for processing, I would suggest you take a look into Kaa open-source IoT platform. It goes much further than MQTT by providing not only the transport layer, suitable for low-power IoT devices, but also a solid chunk of the application level logic (including the object bindings for your application-level data structures, temporary data persistence, etc.).
Here is a link to a webinar that explains how to build a scalable IoT analytics system with Kaa and Spark in less than an hour.
This is an architectural choice. IoT applications are possible without MQTT but there are some advantages when using MQTT. If you are completely new to MQTT, take a look at this in-depth MQTT series: http://forkbomb-blog.de/2015/all-you-need-to-know-about-mqtt
Basically the main architectural advantage is publish / subscribe designed for low-latency, high throughput (mobile) communication with minimal protocol overhead (which is important if bandwidth is at a premium). You can completely decouple consumers and producers.
HDFS is the (distributed) Hadoop file system and is the foundation for Map / Reduce processing. It is not comparable to a MQTT broker. The MQTT broker could write to the HDFS, though (in case of HiveMQ with a custom plugin).
Basically MQTT is a protocol while the products you are mentioning are, well, products which solve completely different problems:
Flume is basically used for log aggregation at scale. You won't use MQTT for that, at least there is not too much advantage because this is typically done in backend applications.
Spark and Hadoop shine at Big Data crunching. They are a framework and not a ready to use solution. They are not really comparable to MQTT. Often MQTT brokers like HiveMQ are used in conjunction with these, Spark / Hadoop for data processing and HiveMQ for communication.
I hope this helps you getting started. Best would be to read about typical use cases of all these technologies, this is a bit too broad for a single SO answer.
MQTT is a data transport, so the usual thing I have to compare it with is HTTP. HTTP has two important characteristics, a) It goes from one point to another, b) It is request/response, so only one end can start a data transfer. MQTT connects many end points to many end points, and either end can start a data transfer. So, if you have just one device and only one service or person that will ever access it, and only by polling, then HTTP is great. MQTT means many devices can post data to many services or people, AND the other way around. Your question assumes that your data is always going to land up in some sort of data store, but many interactions are about events and responding to them immediately, like ringing a doorbell, or lowering the landing gear. In these cases you will often want to both record the data, and have an immediate action occur, like your phone making a doorbell noise.
Finally, you send data to MQTT semantically, rather than by IP address.
This means that your services subscribes to /mikeshouse/doorbell rather than polling 192.168.22.4, which is a huge gain once you have a number of devices.

Websocket scalability, broadcasting concerns

If you have a complex requirement set with many users(&servers) how will your websocket infrastructure (server[s]) will scale, especially with broadcasting?
Of course, broadcasting is not part of the any websocket spec but it's there even in basic chat examples (a.k.a. hello world for websocket).
Client side (asking for new data) solution still seems more scalable than server side (broadcasting) solution with websockets' low latency and relatively cheap (http headerless) nature.
Edit:
OK, just think that you want to replace all your ajax code with websocket implementations which may mean that so many connections within so many different contexts. This adds enormous complexity to your system if you want to keep track of every possible scenario for broadcasting.
Low (network/thread etc) level implementation suggestions are also part of the problem not the solution, because this means you have to code a special server unlike general http servers.
Moreover, broadcasting brings some sort of stateful nature to the table which can't easily scale. Think about adding more servers and load balancing.
Scaling realtime web solutions can be a complex problem but one that services like Pusher (who I work for) have solved, and one that there are most definitely solutions defined for self hosted realtime web solutions - the PubSub paradigm is well understood and has been solved many times and in order to solve the problem there needs to be some state (who is subscribing to what). This paradigm is used in broadcasting the the types of scenarios that you are talking about.
Realtime web technologies have been built with large amounts of simultaneous connections in mind - many from the ground up. If you wanted to create a scalable solution you would most likely use an existing realtime web server that supports WebSockets, in the same way that it's highly unlikely that you would decide to implement your own HTTP Server you are unlikely to want to implement your own server which supports WebSockets from scratch.
Dedicated Realtime web servers also let you separate your application logic from your realtime communication mechanism (separation of concerns). Your application might need to maintain some state but the realtime technology deals with managing subscriptions and connections. How communication between the application and the realtime web technology is achieved is up to you but frequently messages queues are used and specifically redis is very popular in this space.
HTTP polling may conceptually be easier to understand - you can maintain statelessness and with each HTTP poll request you specify exactly what you are looking for. But it most definitely means that you will need to start scaling much sooner (adding more resource to handle the load).
WebSocket polling is something I've not considered before and I don't think I've seen it suggested anywhere before either; the idea that the client should say "I'm ready for my next set of data and here's what I want" is an interesting one. WebSockets have generally taken a leap away from the request/response paradigm but there may be scenarios where the increased efficiency of WebSockets and request/response using them may have some benefits. The SocketStream application framework might be worth a look as it might be relevant; after the initial application load all communication is performed over WebSockets which means that event basic request/response functionality uses WebSockets.
However, since we are talking about broadcasting data we need to go back to the PubSub paradigm where it makes much more sense to have active subscriptions and when new data is available that new data is distributed to those active subscriptions (pushed). All your application needs to know is if there are any active subscriptions or not in order to decide whether to publish the data or not. That problem has been solved.
The idea of websockets is that you keep a persistent connection with each client. When there is new data that you want to send to every client, you already know who all the clients are so you should just send it.
It sound like you want each client to constantly be sending requests to the server for new data. Why? It seems like that would waste everyone's bandwidth and I don't know why you think it will be more scalable. Maybe you could add more detail to your question like what kind of information you are broadcasting, how often, how many bytes, how many clients, etc.
Why not just consider an open websocket connection to be like a standing request from the client for more data?

Scalable Pub/Sub engine for realtime

I'm looking for a pub/sub engine, with the following requirements:
Very low latency < 0.5 sec
Scalable
Shardable (based on geo localisation)
I'd like to be able to have multiple pub/sub servers and be able to publish or subscribe to channels from any server, no matter on which server the channel is declared.
For example:
If user A is connected to server SRV1 and user B connected to server SRV2, If user B subscribe to "MyChannel" and user A publish something on channel "MyChannel", user B will get the message even if he's not connected to the same server.
I don't know if Redis is able to do that. I didn't find anything about the subject.
We've been using ZeroMQ and it's pub/sub features for while now and we're very happy with what we're seeing.
It's also worth looking at what's coming up in the next version (reducing network bandwidth by pushing subscription requests upstream)
I suggest you look at Data Distribution Service for Real Time Systems (DDS) standard. It's specifically designed to be a scalable pub/sub middleware both for real-time and non-real-time systems.
It has a few mature implementations all of which has it's own strength points, but generally the implementations are scalable and low latency.
These are the implementations I would suggest you look at (If you need them to work on a WAN environment, I guess the first two ones have great support for that):
RTI DDS
OpenSplice DDS
OpenDDS
CoreDX DDS
It seems you are looking for some sort of messaging. Try RabbitMQ, with shovel or federation plugins
nanomsg is a successor to ZeroMQ, written by the same author, and with many language-bindings.
It is written in C and uses zero-copy mechanisms. If you're looking for exceptional latency, and are willing to get your hands dirty ( which you should if you aim for something extreme ), I would recommend one of those two.
If you're looking for exceptional throughput, go with Kafka.
Note, none of those solutions implement geolocation out of the box, Redis 3.2 will have something : http://antirez.com/news/89
If you are looking for an easy, "good enough" solution, I'd go with Redis ( be sure to read this blog post from aphyr first ).

How to communicate within this system?

We intend to design a system with three "tiers".
HQ, with a single server
lots of "nodes" on a regional basis
users, with iPads.
HQ communicates 2-way with the nodes which communciate 2-way with the users. Users never communicate with HQ nor vice-versa.
The powers that be decree a Windows app from HQ (using Delphi) and a native desktop app for the users' iPads. They have no opinion on the nodes.
If there are compelling technical arguments, I might be able to beat them down from "decree" to "prefer" on the Windows program (and, for isntance, make it browser based). The nodes have no GUI, they just sit there playing middle-man.
What's the best way for these things to communicate (SOAP/HTTP/AJAX/jQuery/home-brewed-protocol-on-top-of-TCP/something-else?) Is it best to use the same protocol end to end, or different protocols for hq<-->node and node<-->iPad?
Both ends of each of those two interfaces might wish to initiate a transaction (which I can easily do if I roll my own protocol), so should I use push/pull/long-poll or what?
I hope that this description makes sense. Please ask questions if it does not. Thanks.
Update:
File size is typcially below 1MB with nothing likely to be above 10MB or even 5MB. No second file will be sent before a first file is acknowledged.
Files flow "downhill" from HQ to node to iPad. Files will never flow "uphill", but there will be some small packets of data (in addition to acks) which are initiated by user action on the iPad. These will go to the local node and then to the HQ. We are probably talking <128 bytes.
I suppose there will also be general control & maintenance traffic at a low rate, in all directions.
For push / pull (publish / subscribe or peer to peer communication), cross-platform message brokers could be used. I am not sure if there are (iOS) client libraries for Microsoft Message Queue (MSMQ), but I would also evaluate open source solutions like HornetQ, Apache ActiveMQ, Apollo, OpenMQ, Apache QPid or RabbitMQ.
All these solutions provide a reliable foundation for distributed messaging, like failover, clustering, persistence, with high performance and many clients attached. On this infrastructure message with any content type (JSON, binary, plain text) can be exchanged, and on top messages can contain routing and priority information. They also support transacted messaging.
There are Delphi and Free Pascal client libraries available for many enterprise quality open source messaging products. (I am am the author of some of them, supporting ActiveMQ, Apollo, HornetQ, OpenMQ and RabbitMQ)
Check out MessagePack: http://msgpack.org/
Also, here's more RPC discussion on SO:
RPC frameworks available?
MessagePack: fast cross-platform serializer and RPC - please share experience
ICE might be of interest to you: http://zeroc.com/index.html
They have an iOS layer: http://zeroc.com/icetouch/index.html
IMHO there are too little requisites to decide what technology to use. What data are exchanged, how often, what size? Are there request/response time constraints? etc. etc. Never start selecting a technology before you understand your needs deeply.

Are there some general Network programming best practices?

I am implementing some networking stuff in our project. It has been decided that the communication is very important and we want to do it synchronously. So the client sends something the server acknowledges.
Are there some general best practices for the interaction between the client and the server. For instance if there isn't an answer from the server should the client automatically retry? Should there be a timeout period before it retries? What happens if the acknowledgement fails? At what point do we break the connection and reconnect? Is there some material? I have done searches but nothing is really coming up.
I am looking for best practices in general. I am implementing this in c# (probably with sockets) so if there is anything .Net specific then please let me know too.
First rule of networking - you are sending messages, you are not calling functions.
If you approach networking that way, and don't pretend that you can call functions remotely or have "remote objects", you'll be fine. You never have an actual "thing" on the other side of the network connection - what you have is basically a picture of that thing.
Everything you get from the network is old data. You are never up to date. Because of this, you need to make sure that your messages carry the correct semantics - for instance, you may increment or decrement something by a value, you should not set its value to the current value plus or minus another (as the current value may change by the time your message gets there).
If both the client and the server are written in .NET/C# I would recommend WCF insted of raw sockets as it saves you a from a lot of plumbing code with serialization and deserialization, synchronization of messages etc.
That maybe doesn't really answer your question about best practices though ;-)
The first thing to do is to characterize your specific network in terms of speed, probability of lost messages, nominal and peak traffic, bottlenecks, client and server MTBF, ...
Then and only then you decide what you need for your protocol. In many cases you don't need sophisticated error-handling mechanisms and can reliably implement a service with plain UDP.
In few cases, you will need to build something much more robust in order to maintain a consistent global state among several machines connected through a network that you cannot trust.
The most important thing I found is that messages always should be stateless (read up on REST if this means nothing to you)
For example if your application monitors the number of shipments over a network do not send incremental updates (+x) but always the new total.
In a common think about network programming, I think you should learn about :
1. Socket (of course).
2. Fork and Threading.
3. Locking process (use mutex or semaphore or others).
Hope this help..

Resources