I was planning on using Mosca or Mosquitto brokers (due they are open source) in order to achieve a scalable architecture with messages queue replication to avoid losing a message not yet delivered by the broker in an eventual failure of the broker.
As I read, mosquitto is a mature and very stable solution with the capacity of horizontal scalability using bridges. But I couldn't find any plugin to write messages into a database (common to all brokers), so I think that this is a limitation since if we have i.e. two brokers load balanced and one of them die, then all the messages of this broker cannot be delivered until the broker recover.
Mosca in the other hand allows us to scale using Redis, and if the broker 1 die, then broker2 still can deliver messages because they are stored in a common database. And in that way I can use master-slave configuration of redis to avoid single point of failure.
So my questions are:
1) Is mosca a good choice for production?
2) Is it possible to use redis to allocate messages queues with mosquitto?
Horizontal scalability is incredibly hard to add as an feature to MQTT brokers, since it requires engineering for that scalability from the start. Also, just replicating queues for undelivered messages won't help for resilience or fault-tolerance.
Even if it would be easy to add, I would NOT go with redis, since it essentially loses messages: https://aphyr.com/posts/283-jepsen-redis
If you want horizontal scalability, I'd recommend to check out a broker that has clustering, horizontal (or better: linear) scalability built-in and does allow network splits.
Here is a series about MQTT and clustering: http://www.hivemq.com/blog/clustering-mqtt-introduction-benefits/
Related
MQTT is a publish/subscribe protocol. Whenever a publisher publishes to a topic, all the subscribers that have subscribed to that topic will get the message via an MQTT broker. I would like to know the maximum number of clients an MQTT broker can handle. Is there any upper limit for that?
The only way to work this out is to test depending on your specific workload.
It will be entirely dependent on the following:
The size of the machine you run the broker on.
The size of the messages you send.
The rate of messages.
The number of clients (both subscribers and publishers).
The performance characteristics you need to meet.
Which broker you are using.
And possibly many more factors.
How many clients an MQTT broker can serve depends on the MQTT broker software you're using. Most MQTT brokers will likely only be limited by the amount of memory available (each socket uses a chunk of memory) and it therefore becomes a question of which broker software utilizes the memory (and other resources) in the most efficient manner. Of course some brokers might have other limitations.
In practice you'd also have to look at what you can do with the connected clients - some brokers may behave differently (performance wise) depending on how many clients are connected etc.
Just asking one silly question, hope someone can answer this.
I'm bit confused regarding MQTT broker. Basically, the confusion is, there are so many things being used for data storing, transfer and processing (like Flume, HDInsight, Spark etc). So, when and why I need to use one MQTT broker?
If I would like to use Windows 10 IoT application with HiveMQ, from where can I get the details? how to use it? How I get benefit out of this MQTT broker? Can I not send data from my IoT application directly using Azure or HDFS? So, how MQTT broker fits into it or helping me to achieve something?
I'm new to all these and tried to find some tutorials, however, I'm not getting anything proper. Please explain it in more details or give some tutorials for this?
MQTT is a client-server protocol for pub-sub based transport that has a comparatively small overhead, and thus applicable to mobile and IoT applications (unlike Flume, etc.). The MQTT broker is basically a server that handles messaging to/from MQTT clients and among them. The functionality pretty much stops at the transport layer, even though various MQTT add-ons exist.
If you are looking to implement a solution that would reliably transfer data from your IoT devices to the back-end system for processing, I would suggest you take a look into Kaa open-source IoT platform. It goes much further than MQTT by providing not only the transport layer, suitable for low-power IoT devices, but also a solid chunk of the application level logic (including the object bindings for your application-level data structures, temporary data persistence, etc.).
Here is a link to a webinar that explains how to build a scalable IoT analytics system with Kaa and Spark in less than an hour.
This is an architectural choice. IoT applications are possible without MQTT but there are some advantages when using MQTT. If you are completely new to MQTT, take a look at this in-depth MQTT series: http://forkbomb-blog.de/2015/all-you-need-to-know-about-mqtt
Basically the main architectural advantage is publish / subscribe designed for low-latency, high throughput (mobile) communication with minimal protocol overhead (which is important if bandwidth is at a premium). You can completely decouple consumers and producers.
HDFS is the (distributed) Hadoop file system and is the foundation for Map / Reduce processing. It is not comparable to a MQTT broker. The MQTT broker could write to the HDFS, though (in case of HiveMQ with a custom plugin).
Basically MQTT is a protocol while the products you are mentioning are, well, products which solve completely different problems:
Flume is basically used for log aggregation at scale. You won't use MQTT for that, at least there is not too much advantage because this is typically done in backend applications.
Spark and Hadoop shine at Big Data crunching. They are a framework and not a ready to use solution. They are not really comparable to MQTT. Often MQTT brokers like HiveMQ are used in conjunction with these, Spark / Hadoop for data processing and HiveMQ for communication.
I hope this helps you getting started. Best would be to read about typical use cases of all these technologies, this is a bit too broad for a single SO answer.
MQTT is a data transport, so the usual thing I have to compare it with is HTTP. HTTP has two important characteristics, a) It goes from one point to another, b) It is request/response, so only one end can start a data transfer. MQTT connects many end points to many end points, and either end can start a data transfer. So, if you have just one device and only one service or person that will ever access it, and only by polling, then HTTP is great. MQTT means many devices can post data to many services or people, AND the other way around. Your question assumes that your data is always going to land up in some sort of data store, but many interactions are about events and responding to them immediately, like ringing a doorbell, or lowering the landing gear. In these cases you will often want to both record the data, and have an immediate action occur, like your phone making a doorbell noise.
Finally, you send data to MQTT semantically, rather than by IP address.
This means that your services subscribes to /mikeshouse/doorbell rather than polling 192.168.22.4, which is a huge gain once you have a number of devices.
We operate two dual-node brokers, each broker having quite different queues and workloads. Each box has 24 cores (H/T) worth of Xeon E5645 # 2.4GHz with 48GB RAM, connected by Gigabit LAN with ~150μs latency, running RHEL 5.6, RabbitMQ 3.1, Erlang R16B with HiPE off. We've tried with HiPE on but it made no noticeable performance impact, and was very crashy.
We appear to have hit a ceiling for our message rates of between 1,000/s and 1,400/s both in and out. This is broker-wide, not per-queue. Adding more consumers doesn't improve throughput overall, just gives that particular queue a bigger slice of this apparent "pool" of resource.
Every queue is mirrored across the two nodes that make up the broker. Our publishers and consumers connect equally to both nodes in a persistant way. We notice an ADSL-like asymmetry in the rates too; if we manage to publish a high rate of messages the deliver rate drops to high double digits. Testing with an un-mirrored queue has much higher throughput, as expected. Queues and Exchanges are durable, messages are not persistent.
We'd like to know what we can do to improve the situation. The CPU on the box is fine, beam takes a core and a half for 1 process, then another 80% each of two cores for another couple of processes. The rest of the box is essentially idle. We are using ~20GB of RAM in userland with system cache filling the rest. IO rates are fine. Network is fine.
Is there any Erlang/OTP tuning we can do? delegate_count is the default 16, could someone explain what this does in a bit more detail please?
This is difficult to answer without knowing more about how your producers and consumers are configured, which client library you're using and so on. As discussed on irc (http://dev.rabbitmq.com/irclog/index.php?date=2013-05-22) a minute ago, I'd suggest you attempt to reproduce the topology using the MulticastMain java load test tool that ships with the RabbitMQ java client. You can configure multiple producers/consumers, message sizes and so on. I can certainly get 5Khz out of a two-node cluster with HA on my desktop, so this may be a client (or application code) related issue.
I read in forum that while implementing any application using AMQP it is necessary to use fewer queues. So would I be completely wrong to assume that if I were cloning twitter I would have a unique and durable queue for each user signing up? It just seems the most natural approach and if not assign a unique queue for each user how would one design something like that.
What is the most used approach for web messaging. I see RabbitHUb and Rabbit WebHooks but Webhooks doesn't seem to be a scalable solution. i am working with Rails and my AMQP server as running as a Daemon.
In RabbitMQ, queues are quite cheap. They're effectively lightweight Erlang processes, and you can run tens to hundreds of thousands of queues on a single commodity machine (i.e. my laptop). Of course, each will consume a bit of RAM, but unused-recently queues will hibernate, so they'll consume as little memory as possible. In addition, if Rabbit runs low on memory for messages, it will page old messages to disk.
The above only applies to a single machine. RabbitMQ supports a form of lightweight clustering. When you join several Rabbit nodes into a cluster, each can see the queues and exchanges on the other nodes but each runs only its own queues. So, you'll be able to have even more queues! (to the limit of Erlang clusters, which is usually a few hundred nodes) So, a cluster forms a logical broker distributed over several machines; clients connect to it and use it transparently through any of the nodes.
That said, having a single durable queue for each user seems a bit strange: in AMQP, you cannot browse messages while they're on the queue; you may only get/consume messages which takes them off the queue and publish which adds the to the end of the queue. So, you can use AMQP as a message router, but you can't use it as a sort of message database.
Here is a thread that just talks about that: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-February/003041.html
I want to know whether RabbitMQ is more scalable than other brokers or not?
If yes what are the specific reasons? If not how can we scale it up?
I am using rabbitmq for the first time with Spring framework.
Even a single RabbitMQ broker is ridiculously fast. A stock desktop machine can handle tens to hundreds of thousand of messages per second.
If one rabbit turns out to not be enough, RabbitMQ supports a form of light-weight clustering that's designed specifically to improve scalability. Basically, it allows you to create "logical" brokers that are made up of many physical brokers.