CometD long polling - Does it scale nicely to high traffic? - scalability

If I use CometD long polling:
Suppose there are 1000 messages in a second to be sent to subscribers, does CometD allow them to be auto-batched so that each client doesn't have to re-connect for each single message?
Do "lazy channels" (as described here: http://docs.cometd.org/3/reference/#_java_server_lazy_messages) auto-batch queued messages sent to clients upon timeout?
If on the other hand I don't use lazy channels, and suppose I "batch-publish" messages on channels 1, 2 and 3:
cometd.batch(function()
{
cometd.publish('/channel1', { product: 'foo' });
cometd.publish('/channel2', { notificationType: 'all' });
cometd.publish('/channel3', { update: false });
});
(http://docs.cometd.org/3/reference/#_javascript_batch)
does a client subscribed to all 3 channels receive them in a batch too? Or does it send them all separately, forcing the client to re-connect after each message (slow)?

CometD offers application developers full control of batching features, allowing to have maximum flexibility, performance and scalability.
When using the HTTP long-polling transports, there are 2 places where batching may happen.
From client to server is solved using the CometD API and explicit batching (like your snippet above).
Batching at this level is typically in control of the application, although CometD does an internal batching to avoid exhausting the connections to the server.
From server to client there are more variations.
For broadcast non-lazy channels there is no automation, and what normally happens is that the first message to a client (that is not the publisher) will trigger the flush of the message queue; while this is being sent, other messages will queue up on the server side for that client and on the next /meta/connect the whole queue will be flushed. For 10 messages the scheme could be something like: 1-flush-9-flush (enqueue 1, flush the queue, enqueue the other 9 while waiting for the /meta/connect to come back, flush the other 9).
For broadcast lazy channels there is automation, so CometD will wait before sending those messages following the rules of lazy messages. A typical scheme could be: 10-flush.
For service channels, everything is back in control of the application.
The client can send batched messages to the application via a service channel (whose messages are not broadcast automatically by CometD). The application on server can receive the first message and know that other 9 will come, so it can wait to send them until the last has arrived. When the last arrives, it can just use the batching API to batch together the responses to clients, something like:
List<ServerSession> subscribers = ...;
for (ServerSession subscriber : subscribers) {
subscriber.batch(() -> {
subscriber.deliver(sender, "/response", response1);
subscriber.deliver(sender, "/response", response2);
subscriber.deliver(sender, "/response", response3);
});
}
Of course responses may be different from the messages received, both in content and number.
The scheme here can be almost anything the application wants, but it's common to have it as a 10-flush, which is the most efficient.
A note regarding the batching of messages to be sent back to the publisher. This is a special case and it's by default automated: while processing the incoming messages from that publisher, CometD starts an internal batch for that particular publisher, so that any message that is to be delivered back to the publisher is batched and will be flushed at the end of the processing of the incoming messages.
The bottom line is that CometD is already pretty well tuned to give the maximum of performance and scalability in common cases, but yet leaves the application room for customizing the behaviour to achieve maximum efficiency using application specific knowledge of message patterns.
I encourage you to look at the CometD documentation, tutorials and javadocs.

Related

Avoiding congestion using Solace if producer sending rate higher than subscriber handling rate

I have following usage pattern in an application:
The publisher sends messages to a topic with rate 5 microseconds per message (i.e. send one message every 5 micros).
Consumer subscribes to the topic and handles messages with rate 10 microseconds per message (i.e. it takes 10 micros to complete onReceive callback in JAVA API).
The consumer is interested only in the last message published to the topic, so all intermediate not handled messages can be dropped.
Is it possible to avoid the congestion in the queue of unprocessed messages on the consumer side?
I tried to use eliding with delay=0 (documentation link), however, it doesn’t help if the message already put to the internal queue in the Solace on the consumer side (I refer to com.solacesystems.jcsmp.impl.XMLMessageQueue).
Setting delay to some specific value works fine, but it doesn't scale well, because this number is dynamic and depends on the number of publishers and consumer performance.
A possible solution to this would be to create a LVQ (last value queue) which subscribes to the topic. You create a LVQ by setting the queue quota to 0 MB.
Then have your subscribing application consume messages from the LVQ.
In the appliance you should see the nearly same performance as when sending using direct messages as it will never hit the spool.

MQTT catch-up missed messages, looking for feedback on design/assumptions

I would like some feedback on this problem and my proposed solution to catching up after missed MQTT messages please:
[Update 1] Simplified problem diagram and added solution diagram. Added mention of QoS
Scenario:
Client A publishes messages that we wish Client B to receive, even if connections are temporarily dropped then restored.
Config
Client A: connect with clean=false. Publish stateful messages with retain = true, non-stateful messages published with retain = false
Client B: connect with clean=false
What will happen
Each time Client A publishes to topic "foo", previous messages are replaced on the broker. Ex. Client A publishes 111, 222, 333. Client B connects after the messages are published. Client B will receive only, 333. Thus, messages 111 and 222 were missed because each message replaced the previous one on that same topic (different topics do not replace each other).
Proposed solution
I envision two types of messages. Stateful and non-stateful. Stateful messages would be things like, voltage, temperature, gps location, pressure. Non-stateful messages would be things like a chat message where history is more likely to be important for context. Missed stateful messages are more likely to be tolerable while non-stateful messages might not be tolerable.
All messages are published with QoS 1 in my case.
For the stateful messages I am thinking Client A will publish with retain = true.
For the non-stateful messages, I am thinking Client A will publish with retain = false (because what good is the last message if we don't have the full historical context of previous messages). When Client B connects/reconnects, I will publish a catch-up (arbitrary name) message containing all the ids of the messages it received, which when Client A receives it, will respond by publishing the whole history of messages minus those in the id list (ids maintained in Client A db). This might work for me if the total aggregate message history isn't too big.
The alternative might be for Client B to send read receipts for each message received.
For me, these two solutions will require a database of messages and some custom logic
This is a follow-up question to this one which I tried answering but was asked to instead form it as an independent, follow-up question.

Synchronous MQTT communication using Paho client

I have a scenario where mobile app calls rest API hosted by my application. With in this process, I need to send message to downstream system over MQTT and wait until I get the response for that message. And then I have reply back to mobile app.
The challenge here is, messaging over MQTT is asynchronous. So the message which I receive back will be in different thread (some listener class, listening on messageArrived()). How to get back to calling http thread?
Do we have synchronous communication supported by Paho library.? Something like I send a message, open some topic and wait on it till some message is received or timeout?
MQTT by it's very nature is asynchronous, as are all Pub/Sub implementations. There is no concept of a reply to a message at the protocol level, you have no way of knowing if you will EVER get a response (or you may get many) to a published message as you can't know if there is even a subscriber to the topic you publish on.
It is possible to build a system that will work this way, but you need to maintain a state machine of all in flight requests, implement a sensible timeout policy and work out what to do if you get more than one response.
You have not mentioned which of the different Paho libraries you are using, but I'm guessing Java from the method names, but without knowing what HTTP framework you are using and a host of other factors I'm not going to suggest a solution, especially as it will involve a lot of polling and synchronisation.
Is there any reason why the mobile application can't publish and subscribe to MQTT topics directly? This would remove the need for this.

akka stream ActorSubscriber does not work with remote actors

http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-integrations.html says:
"ActorPublisher and ActorSubscriber cannot be used with remote actors, because if signals of the Reactive Streams protocol (e.g. request) are lost the the stream may deadlock."
Does this mean akka stream is not location transparent? How do I use akka stream to design a backpressure-aware client-server system where client and server are on different machines?
I must have misunderstood something. Thanks for any clarification.
They are strictly a local facility at this time.
You can connect it to an TCP sink/source and it will apply back-pressure using TCP as well though (that's what Akka Http does).
How do I use akka stream to design a backpressure-aware client-server system where client and server are on different machines?
Check out streams in Artery (Dec. 2016, so 18 months later):
The new remoting implementation for actor messages was released in Akka 2.4.11 two months ago.
Artery is the code name for it. It’s a drop-in replacement to the old remoting in many cases, but the implementation is completely new and it comes with many important improvements.
(Remoting enables Actor systems on different hosts or JVMs to communicate with each other)
Regarding back-pressure, this is not a complete solution, but it can help:
What about back-pressure? Akka Streams is all about back-pressure but actor messaging is fire-and-forget without any back-pressure. How is that handled in this design?
We can’t magically add back-pressure to actor messaging. That must still be handled on the application level using techniques for message flow control, such as acknowledgments, work-pulling, throttling.
When a message is sent to a remote destination it’s added to a queue that the first stage, called SendQueue, is processing. This queue is bounded and if it overflows the messages will be dropped, which is in line with the actor messaging at-most-once delivery nature. Large amount of messages should not be sent without application level flow control. For example, if serialization of messages is slow and can’t keep up with the send rate this queue will overflow.
Aeron will propagate back-pressure from the receiving node to the sending node, i.e. the AeronSink in the outbound stream will not progress if the AeronSource at the other end is slower and the buffers have been filled up.
If messages are sent at a higher rate than what can be consumed by the receiving node the SendQueue will overflow and messages will be dropped. Aeron itself has large buffers to be able to handle bursts of messages.
The same thing will happen in the case of a network partition. When the Aeron buffers are full messages will be dropped by the SendQueue.
In the inbound stream the messages are in the end dispatched to the recipient actor. That is an ordinary actor tell that will enqueue the message in the actor’s mailbox. That is where the back-pressure ends on the receiving side. If the actor is slower than the incoming message rate the mailbox will fill up as usual.
Bottom line, flow control for actor messages must be implemented at the application level. Artery does not change that fact.

Message Broker with synchronous delivery

we are implementing (or more reimplementing) a distributed software system. What we have are different processes (possibly running on different computers) that should communicate with each other (let's call these clients). We don't want them to directly communicate with each other, but instead use some kind of message broker.
Since we like to avoid implementing the message broker ourselves we would like to use an existing implementation. But we don't find a protocol or system that fully fulfilles our requirements.
MQTT with its publish-subscribe-mechanism seems nice and could even be used for point-to-point communication (where some specific topics are only subscribed by certain clients).
But it is (like JSM, STOMP, etc.) asynchronous. The sender sends a message into the broker and doesn't know whether it is ever delivered to it's recipient. We want that the sender gets informed about a successful delivery or an elapsed timeout (when no one is receiving the message).
Is there some protocol/implementation available that provides such synchronous messaging functionality?
(It would be nice however if asynchronous delivery would be possible, too)
The messaging by default is ( usually ) asynchronous .
You can considerer RabbitMQ, it contains the following features:
Publisher-confirms (in asynchronous way):
http://www.rabbitmq.com/blog/2011/02/10/introducing-publisher-confirms/
Transaction Commit:
https://www.rabbitmq.com/semantics.html
Messages TTL (to handle time out)
https://www.rabbitmq.com/ttl.html
With this features you can handle the time-out situations and the successful delivery.
If this is not enough you can use the RPC:
https://www.rabbitmq.com/tutorials/tutorial-six-java.html
Let me know if you need more information.

Resources