In the context on the MQTT protocol, is there a way to make a client not send publish messages when there are no subscribers to that topic?
In other words, is there a standard way to perform subscriber-aware publishing, reducing network traffic from publishing clients to the broker?
This important in applications where we have many sensors capable of producing huge amounts of data, but most of the time nobody will be interested in all of that data but on a small subset, and we want to save battery or avoid network congestion.
In the upcoming MQTT v5 specification the broker can indicate to a client that there are no subscribers for a topic when the client publishes to that topic. This is only possible for QoS 1 or QoS 2 publishes because a QoS 0 message does not result in a reply.
No, the publisher has absolutely no idea how many subscribers to a given topic there are, there could be zero or thousands.
This is a key point to pub/sub messaging, the near total decoupling of the information producer and consumer.
Presumably you can design your devices and applications so that the device as well as publishing data to a 'data topic', it also subscribes to another device-specific 'command topic' which controls the device data publishing. If the application is interested in data from a specific device it must know which device that is to know which data topic to subscribe to, so it can publish the 'please publish data now' command to the corresponding command topic.
I suppose there might be a somewhere-in-between solution where devices publish data less frequently when no apps are interested, and faster when at least one app is asking for data to be published.
Seems to me that one thing about MQTT is that you should ideally design the devices and applications as a system, not in isolation.
Related
I have a system that relies on a message bus and broker to spread messages and tasks from producers to workers.
It benefits both from being able to do true pub/sub-type communications for the messages.
However, it also needs to communicate tasks. These should be done by a worker and reported back to the broker when/if the worker is finished with the task.
Can MQTT be used to publish this task by a producer, so that it is picked up by a single worker?
In my mind the producer would publish the task with a topic "TASK_FOR_USER_A" and there are X amount of workers subscribed to that topic.
The MQTT broker would then determine that it is a task and send it selectively to one of the workers.
Can this be done or is it outside the scope of MQTT brokers such as Mosquitto?
MQTT v5 has an optional extension called Shared Subscriptions which will deliver messages to a group of subscribers in a round robin approach. So each message will only be delivered to one of the group.
Mosquitto v1.6.x has implemented MQTT v5 and the shared subscription capability.
It's not clear what you mean by 1 message at a time. Messages will be delivered as they arrive and the broker will not wait for one subscriber to finish working on a message before delivering the next message to the next subscriber in the group.
If you have low enough control over the client then you can prevent the high QOS responses to prevent the client from acknowledging the message and force the broker to only allow 1 message to be in flight at a time which would effectively throttle message delivery, but you should only do this if message processing is very quick to prevent the broker from deciding delivery has failed and attempting to deliver the message to another client in the shared group.
Normally the broker will not do any routing above and beyond that based on the topic. The as mentioned in a comment on this answer the Flespi has implemented "sticky sessions" so that messages from a specific publisher will be delivered to the same client in the shared subscription pool, but this is a custom add on and not part of the spec.
What you're looking for is a message broker for a producer/consumer scenario. MQTT is a lightweight messaging protocol which is based on pub/sub model. If you start using any MQTT broker for this, you might face issues depending upon your use case. A few issues to list:
You need ordering of the messages (consumer must get the messages in the same order the producer published those). While QoS 2 guarantees message order without having shared subscriptions, having shared subscriptions doesn't provide ordered topic guarantees.
Consumer gets the message but fails before processing it and the MQTT broker has already acknowledged the message delivery. In this case, the consumer needs to specifically handle the reprocessing of failed messages.
If you go with a single topic with multiple subscribers, you must have idempotency in your consumer.
I would suggest to go for a message broker suitable for this purpose, e.g. Kafka, RabbitMQ to name a few.
As far as I know, MQTT is not meant for this purpose. It doesn't have any internal working to distribute the tasks on workers (consumers). On the Otherhand, AMQP can be used here. One hack would be to conditionalize the workers to accept only a particular type of tasks, but that needs producers to send task type as well. In this case, you won't be able to scale as well.
It's better if you explore other protocols for this type of usecase.
In MQTT, if you publish to a topic where there is no subscriber for, the message gets dropped.
While this is fine for classic pub/sub messaging, it is not so great for shared subscriptions (which have been introduced in MQTT 5), since this pattern is typically used for some kind of job queue, and you usually don't want to drop jobs just because there is no worker there right now (maybe it just crashed and is restarting).
Is it possible to tell MQTT servers not to drop messages, at least for shared subscriptions, even if there are no subscribers right now? If so, how?
PS: This is not just a persistent session, since I do not want to keep the subscriptions per client. It's more like a "persistent session" that spans multiple clients.
I don't know if any of the brokers supporting MQTT v5 shared subscriptions support this, but I can foresee a way it could work in a way that is line with the spec and spirit of pub/sub messaging.
A MQTT broker will queue messages for topics subscribed to at QOS 1 or 2 for a client that is currently offline, with a persistent session. So I see no reason why shared subscriptions should be any different. I can see it might be a little bit more technically complicated to implement but should be possible (You would need to treat the shared group as a single session).
That said I think the main focus for shared subscriptions is load balancing, followed by HA. So unless you are running all your shared subscribers on the same machine it should be unlikely that they all fail at the same time.
I implemented a MQTT message broker using mosquitto on my network. I have one web app publishing things to the broker and several servers that subscribed the same topic. So i have a redundancy scenario.
My question is, using mosquitto alone, is there any way to configure it to publish data only on the first subscriber? Otherwise, all of them will do the same thing.
I don't think that is possible.
But you can do this.
Have the first subscriber program respond with an ack on the channel as soon as it gets the message, and have the redundancy program look for the ack for a small time after the initial message.
IF the ack is received the redundancy should not do anything.
So if the first subscriber gets and uses the message, the others wont do anything even if they get the message.
No this is not possible with mosquitto at the moment (without communication between the 2 subscribers as described in the other answer).
For the new release of the MQTT spec (v5)* there is a new mode called "Shared Subscriptions". This allow s multiple clients to subscribe to a single topic and messages will be delivered by round robin to each client. This is more for load balancing rather than master/slave fail over.
*There are some brokers (HiveMQ, IBM MessageSight) that already support some version of Shared Subscriptions at MQTT v3.1.1, but they implement it in slightly different ways (different topic prefixes) so they are not cross compatible.
What is the basic difference between stream processing and traditional message processing? As people say that kafka is good choice for stream processing but essentially kafka is a messaging framework similar to ActivMQ, RabbitMQ etc.
Why do we generally not say that ActiveMQ is good for stream processing as well.
Is it the speed at which messages are consumed by the consumer determines if it is a stream?
In traditional message processing, you apply simple computations on the messages -- in most cases individually per message.
In stream processing, you apply complex operations on multiple input streams and multiple records (ie, messages) at the same time (like aggregations and joins).
Furthermore, traditional messaging systems cannot go "back in time" -- ie, they automatically delete messages after they got delivered to all subscribed consumers. In contrast, Kafka keeps the messages as it uses a pull-based model (ie, consumers pull data out of Kafka) for a configurable amount of time. This allows consumers to "rewind" and consume messages multiple times -- or if you add a new consumer, it can read the complete history. This makes stream processing possible, because it allows for more complex applications. Furthermore, stream processing is not necessarily about real-time processing -- it's about processing infinite input streams (in contrast to batch processing, which is applied to finite inputs).
And Kafka offers Kafka Connect and Streams API -- so it is a stream-processing platform and not just a messaging/pub-sub system (even if it uses this in its core).
If you like splitting hairs:
Messaging is communication between two or more processes or components whereas streaming is the passing of event log as they occur. Messages carry raw data whereas events contain information about the occurrence of and activity such as an order.
So Kafka does both, messaging and streaming. A topic in Kafka can be raw messages or and event log that is normally retained for hours or days. Events can further be aggregated to more complex events.
Although Rabbit supports streaming, it was actually not built for it(see Rabbit´s web site)
Rabbit is a Message broker and Kafka is a event streaming platform.
Kafka can handle a huge number of 'messages' towards Rabbit.
Kafka is a log while Rabbit is a queue which means that if once consumed, Rabbit´s messages are not there anymore in case you need it.
However Rabbit can specify message priorities but Kafka doesn´t.
It depends on your needs.
Message Processing implies operations on and/or using individual messages. Stream Processing encompasses operations on and/or using individual messages as well as operations on collection of messages as they flow into the system. For e.g., let's say transactions are coming in for a payment instrument - stream processing can be used to continuously compute hourly average spend. In this case - a sliding window can be imposed on the stream which picks up messages within the hour and computes average on the amount. Such figures can then be used as inputs to fraud detection systems
Apologies for long answer but I think short answer will not be justice to question.
Consider queue system. like MQ, for:
Exactly once delivery, and to participate into two phase commit transaction
Asynchronous request / reply communication: the semantic of the communication is for one component to ask a second command to do something on its data. This is a command pattern with delay on the response.
Recall messages in queue are kept until consumer(s) got them.
Consider streaming system, like Kafka, as pub/sub and persistence system for:
Publish events as immutable facts of what happened in an application
Get continuous visibility of the data Streams
Keep data once consumed, for future consumers, for replay-ability
Scale horizontally the message consumption
What are Events and Messages
There is a long history of messaging in IT systems. You can easily see an event-driven solution and events in the context of messaging systems and messages. However, there are different characteristics that are worth considering:
Messaging: Messages transport a payload and messages are persisted until consumed. Message consumers are typically directly targeted and related to the producer who cares that the message has been delivered and processed.
Events: Events are persisted as a replayable stream history. Event consumers are not tied to the producer. An event is a record of something that has happened and so can't be changed. (You can't change history.)
Now Messaging versus event streaming
Messaging are to support:
Transient Data: data is only stored until a consumer has processed the message, or it expires.
Request / reply most of the time.
Targeted reliable delivery: targeted to the entity that will process the request or receive the response. Reliable with transaction support.
Time Coupled producers and consumers: consumers can subscribe to queue, but message can be remove after a certain time or when all subscribers got message. The coupling is still loose at the data model level and interface definition level.
Events are to support:
Stream History: consumers are interested in historic events, not just the most recent.
Scalable Consumption: A single event is consumed by many consumers with limited impact as the number of consumers grow.
Immutable Data
Loosely coupled / decoupled producers and consumers: strong time decoupling as consumer may come at anytime. Some coupling at the message definition level, but schema management best practices and schema registry reduce frictions.
Hope this answer help!
Basically Kafka is messaging framework similar to ActiveMQ or RabbitMQ. There are some effort to take Kafka towards streaming:
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
Then why Kafka comes into picture when talking about Stream processing?
Stream processing framework differs with input of data.In Batch processing,you have some files stored in file system and you want to continuously process that and store in some database. While in stream processing frameworks like Spark, Storm, etc will get continuous input from some sensor devices, api feed and kafka is used there to feed the streaming engine.
Recently, I have come across a very good document that describe the usage of "stream processing" and "message processing"
https://developer.ibm.com/articles/difference-between-events-and-messages/
Taking the asynchronous processing in context -
Messaging:
Consider it when there is a "request for processing" i.e. client makes a request for server to process.
Event streaming:
Consider it when "accessing enterprise data" i.e. components within the enterprise can emit data that describe their current state. This data does not normally contain a direct instruction for another system to complete an action. Instead, components allow other systems to gain insight into their data and status.
To facilitate this evaluation, consider these key selection criteria to consider when selecting the right technology for your solution:
Event history - Kafka
Fine-grained subscriptions - MQ
Scalable consumption - Kafka
Transactional behavior - MQ
I'm using mosquitto (http://mosquitto.org/) as an MQTT broker and am looking for advice on load balancing subscribers (to the same topic). How is this achieved? Everything I've read about the protocol states that all subscribers to the same topic will be given a published message.
This seems inefficient, and so I'm looking for a way for a published message to be given to one of the connected subscribers in a round-robin approach that would ensure a load balanced state.
If this is not possible with MQTT, how does a subscriber avoid being overwhelmed with messages?
Typically you design MQTT applications in a way that you don't have overwhelmed subscribers. You can achieve this by spreading load to different topics.
If you really can't do that, take a look at the shared subscription approach sophisticated MQTT brokers like MessageSight and HiveMQ have. This is exactly the feature you're looking for but is broker dependant and is not part of the official MQTT spec.
MQTT v5 has the support of shared subscriptions and mosquitto version 1.6 added support of MQTT v5.
Check release notes
Good article on shared subscriptions here
MQTT is a Pub/Sub protocol, the basis is for a 1 to many distribution of messages not a 1 to 1 (of many) you describe. What you describe would be more like a Message Queuing system which is distinctly different from Pub/Sub.
Mosquitto as a pure implementation of this protocol does not support the delivery as you describe it. One solution is to us a local queue with in the subscriber which incoming messages are added to and then consumed by a thread pool.
I do believe that the IBM Message Sight appliance may offer the type of message delivery you are looking for as an extension to the protocol called Shared Subscriptions, but with this enabled it is deviating from the pure MQTT spec.