Memory usage on MQTT subscription - mqtt

How much of memory used whenever we use from # (wildcard) to subscription into many topics? for example if we have over 10M topics, it's possible to use # to subscribe into all of them, or it caused to memory leaks?

This problem is strictly related to the MQTT broker and client implementation.
Of course, the MQTT standard specification doesn't provide any information on the features related to such implementation.
Paolo.

Extending on ppatierno's answer.
For most well designed brokers the number or scope (for wild card) subscriptions shouldn't really change the amount of memory used under normal circumstances . At most the storage should equate to the topic string that the client subscribes to, this will be matched against a incoming message to see if it should be delivered.
Where this may not hold true is with persistent subscriptions (where the clean session value is not set to true). In this case if a client disconnects then messages may be queued until it reconnects. The amount of memory consumed here will be a function of the number of messages and their size (plus what discard policy the broker may have) and not directly a function of the number of subscribed topics.
To answer the second part of your question, subscribing to 10,000,000 topics using the wildcard is not likely to cause a memory leak, but it may very well flood the client depending on how often messages are published on those topics.

Related

what is the better practice when designing MQTT topics for thousands of devices of multiple customers?

I am developing a MQTT service and I am confused when I design the topics. I think there are two ways to do this:
Topic data/customer-a
Payload {"sensor-id":"23aafff2d23659cc97c557909de12f16","type":"demo","value":"hello world"}
Topic data/customer-a/23aafff2d23659cc97c557909de12f16
Payload {"type":"demo","value":"hello world"}
I wonder which is better. I think the first one may result in complexity when I need to subscribe message from a single sensor. Meanwhile, the topic data/customer-a is always under heavy load because of thousands of messages per second. And the second one has drawbacks too, obviously it makes the mqtt broker maintain a huge topic tree, I don't know whether it would cause significant performance degradation?
Which is better for thousands of devices of multiple customers?
There is no real overhead to a broker "maintaining a massive topic tree" because they don't.
The only thing the broker tracks is the list of subscription patterns that each client has active, it then checks that list every time a message is published.

Any implications of using UUIDs in MQTT topic names?

I am doing a request/response flow using a MQTT broker and I wondered if brokers like VerneMQ or Mosquitto deal well with huge amount of topics. Basically every time I want to do a request/response, I publish to a topic that looks like rpc/{UUID} meaning every request creates a new topic and then unsubscribe from it when the response is received. Will this come and bite me later ?
Topics are effectively ephemeral already.
Usually the only overhead to a topic is in the list of subscribed topic patterns (because they can be wildcards) held for each client. The topic is read from an incoming messages and checked against this list.
Using UUIDs in topics should not cause any problems.

mqtt timestamp in the topic name: anti-pattern?

Would the mqtt community consider placing message information in the topic name an anti-pattern?
I have a client that has a vast library written around rabbitmq, and I'm trying to tweak their client and server code to allow them to configure their services for mosquitto instead. One central requirement for them is TTL, the clients can sometimes sit for hours publishing data before the server comes back online and they do not want messages to show up that are beyond their TTL.
Their message envelope system is an elaborate json and 1) it would be painful to wrap or alter this json 2) I do not want to incur the expense of unmarshalling json to retrieve a timestamp.
The easiest thing to do is place the timestamp at the end of the topic and consume with wildcards: mytopic/mysubtopic/{timestamp} consumed by mytopic/mysubtopic/#
Are there any unintended consequences for this, and would this be considered an anti-pattern?
Whether this is an anti-pattern is a matter of opinion; the spec defines the topic as "The label attached to an Application Message..." so does not preclude your usage. I can think of a few potential "unintended consequences" to your approach (which may, or may not, apply to your specific situation):
Retain flag: As per your comment you will not be able to set the Retain flag to 1 (because all messages would be retained).
Latest Message only when comms re-established: A subscriber may only want the latest message when communications are re-established. This can be achieved by publishing messages with the retain flag set to 1 which results in your subscriber receiving the latest message (and only the latest message; subject to QOS/CleanSession) on each topic it subscribes to (docs). As per the above this will not work with your topic structure.
Order of delivery: the spec requires that "A Server MUST by default treat each Topic as an 'Ordered Topic'" but there is no such guarantee across topics. Note that ordered delivery is dependent upon settings (see the "Non normative comment" in the spec) so this may not be an issue.
Topic Alias: MQTT V5 introduces Topic Alias which can be used to reduce the amount of data transmitted. This will not provide a benefit with your structure.

How to tell MQTT to keep messages even if there is no subscriber right now?

In MQTT, if you publish to a topic where there is no subscriber for, the message gets dropped.
While this is fine for classic pub/sub messaging, it is not so great for shared subscriptions (which have been introduced in MQTT 5), since this pattern is typically used for some kind of job queue, and you usually don't want to drop jobs just because there is no worker there right now (maybe it just crashed and is restarting).
Is it possible to tell MQTT servers not to drop messages, at least for shared subscriptions, even if there are no subscribers right now? If so, how?
PS: This is not just a persistent session, since I do not want to keep the subscriptions per client. It's more like a "persistent session" that spans multiple clients.
I don't know if any of the brokers supporting MQTT v5 shared subscriptions support this, but I can foresee a way it could work in a way that is line with the spec and spirit of pub/sub messaging.
A MQTT broker will queue messages for topics subscribed to at QOS 1 or 2 for a client that is currently offline, with a persistent session. So I see no reason why shared subscriptions should be any different. I can see it might be a little bit more technically complicated to implement but should be possible (You would need to treat the shared group as a single session).
That said I think the main focus for shared subscriptions is load balancing, followed by HA. So unless you are running all your shared subscribers on the same machine it should be unlikely that they all fail at the same time.

Reduce MQTT traffic when there are no subscribers

In the context on the MQTT protocol, is there a way to make a client not send publish messages when there are no subscribers to that topic?
In other words, is there a standard way to perform subscriber-aware publishing, reducing network traffic from publishing clients to the broker?
This important in applications where we have many sensors capable of producing huge amounts of data, but most of the time nobody will be interested in all of that data but on a small subset, and we want to save battery or avoid network congestion.
In the upcoming MQTT v5 specification the broker can indicate to a client that there are no subscribers for a topic when the client publishes to that topic. This is only possible for QoS 1 or QoS 2 publishes because a QoS 0 message does not result in a reply.
No, the publisher has absolutely no idea how many subscribers to a given topic there are, there could be zero or thousands.
This is a key point to pub/sub messaging, the near total decoupling of the information producer and consumer.
Presumably you can design your devices and applications so that the device as well as publishing data to a 'data topic', it also subscribes to another device-specific 'command topic' which controls the device data publishing. If the application is interested in data from a specific device it must know which device that is to know which data topic to subscribe to, so it can publish the 'please publish data now' command to the corresponding command topic.
I suppose there might be a somewhere-in-between solution where devices publish data less frequently when no apps are interested, and faster when at least one app is asking for data to be published.
Seems to me that one thing about MQTT is that you should ideally design the devices and applications as a system, not in isolation.

Resources