http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-integrations.html says:
"ActorPublisher and ActorSubscriber cannot be used with remote actors, because if signals of the Reactive Streams protocol (e.g. request) are lost the the stream may deadlock."
Does this mean akka stream is not location transparent? How do I use akka stream to design a backpressure-aware client-server system where client and server are on different machines?
I must have misunderstood something. Thanks for any clarification.
They are strictly a local facility at this time.
You can connect it to an TCP sink/source and it will apply back-pressure using TCP as well though (that's what Akka Http does).
How do I use akka stream to design a backpressure-aware client-server system where client and server are on different machines?
Check out streams in Artery (Dec. 2016, so 18 months later):
The new remoting implementation for actor messages was released in Akka 2.4.11 two months ago.
Artery is the code name for it. It’s a drop-in replacement to the old remoting in many cases, but the implementation is completely new and it comes with many important improvements.
(Remoting enables Actor systems on different hosts or JVMs to communicate with each other)
Regarding back-pressure, this is not a complete solution, but it can help:
What about back-pressure? Akka Streams is all about back-pressure but actor messaging is fire-and-forget without any back-pressure. How is that handled in this design?
We can’t magically add back-pressure to actor messaging. That must still be handled on the application level using techniques for message flow control, such as acknowledgments, work-pulling, throttling.
When a message is sent to a remote destination it’s added to a queue that the first stage, called SendQueue, is processing. This queue is bounded and if it overflows the messages will be dropped, which is in line with the actor messaging at-most-once delivery nature. Large amount of messages should not be sent without application level flow control. For example, if serialization of messages is slow and can’t keep up with the send rate this queue will overflow.
Aeron will propagate back-pressure from the receiving node to the sending node, i.e. the AeronSink in the outbound stream will not progress if the AeronSource at the other end is slower and the buffers have been filled up.
If messages are sent at a higher rate than what can be consumed by the receiving node the SendQueue will overflow and messages will be dropped. Aeron itself has large buffers to be able to handle bursts of messages.
The same thing will happen in the case of a network partition. When the Aeron buffers are full messages will be dropped by the SendQueue.
In the inbound stream the messages are in the end dispatched to the recipient actor. That is an ordinary actor tell that will enqueue the message in the actor’s mailbox. That is where the back-pressure ends on the receiving side. If the actor is slower than the incoming message rate the mailbox will fill up as usual.
Bottom line, flow control for actor messages must be implemented at the application level. Artery does not change that fact.
Related
I implemented a MQTT message broker using mosquitto on my network. I have one web app publishing things to the broker and several servers that subscribed the same topic. So i have a redundancy scenario.
My question is, using mosquitto alone, is there any way to configure it to publish data only on the first subscriber? Otherwise, all of them will do the same thing.
I don't think that is possible.
But you can do this.
Have the first subscriber program respond with an ack on the channel as soon as it gets the message, and have the redundancy program look for the ack for a small time after the initial message.
IF the ack is received the redundancy should not do anything.
So if the first subscriber gets and uses the message, the others wont do anything even if they get the message.
No this is not possible with mosquitto at the moment (without communication between the 2 subscribers as described in the other answer).
For the new release of the MQTT spec (v5)* there is a new mode called "Shared Subscriptions". This allow s multiple clients to subscribe to a single topic and messages will be delivered by round robin to each client. This is more for load balancing rather than master/slave fail over.
*There are some brokers (HiveMQ, IBM MessageSight) that already support some version of Shared Subscriptions at MQTT v3.1.1, but they implement it in slightly different ways (different topic prefixes) so they are not cross compatible.
What is the basic difference between stream processing and traditional message processing? As people say that kafka is good choice for stream processing but essentially kafka is a messaging framework similar to ActivMQ, RabbitMQ etc.
Why do we generally not say that ActiveMQ is good for stream processing as well.
Is it the speed at which messages are consumed by the consumer determines if it is a stream?
In traditional message processing, you apply simple computations on the messages -- in most cases individually per message.
In stream processing, you apply complex operations on multiple input streams and multiple records (ie, messages) at the same time (like aggregations and joins).
Furthermore, traditional messaging systems cannot go "back in time" -- ie, they automatically delete messages after they got delivered to all subscribed consumers. In contrast, Kafka keeps the messages as it uses a pull-based model (ie, consumers pull data out of Kafka) for a configurable amount of time. This allows consumers to "rewind" and consume messages multiple times -- or if you add a new consumer, it can read the complete history. This makes stream processing possible, because it allows for more complex applications. Furthermore, stream processing is not necessarily about real-time processing -- it's about processing infinite input streams (in contrast to batch processing, which is applied to finite inputs).
And Kafka offers Kafka Connect and Streams API -- so it is a stream-processing platform and not just a messaging/pub-sub system (even if it uses this in its core).
If you like splitting hairs:
Messaging is communication between two or more processes or components whereas streaming is the passing of event log as they occur. Messages carry raw data whereas events contain information about the occurrence of and activity such as an order.
So Kafka does both, messaging and streaming. A topic in Kafka can be raw messages or and event log that is normally retained for hours or days. Events can further be aggregated to more complex events.
Although Rabbit supports streaming, it was actually not built for it(see Rabbit´s web site)
Rabbit is a Message broker and Kafka is a event streaming platform.
Kafka can handle a huge number of 'messages' towards Rabbit.
Kafka is a log while Rabbit is a queue which means that if once consumed, Rabbit´s messages are not there anymore in case you need it.
However Rabbit can specify message priorities but Kafka doesn´t.
It depends on your needs.
Message Processing implies operations on and/or using individual messages. Stream Processing encompasses operations on and/or using individual messages as well as operations on collection of messages as they flow into the system. For e.g., let's say transactions are coming in for a payment instrument - stream processing can be used to continuously compute hourly average spend. In this case - a sliding window can be imposed on the stream which picks up messages within the hour and computes average on the amount. Such figures can then be used as inputs to fraud detection systems
Apologies for long answer but I think short answer will not be justice to question.
Consider queue system. like MQ, for:
Exactly once delivery, and to participate into two phase commit transaction
Asynchronous request / reply communication: the semantic of the communication is for one component to ask a second command to do something on its data. This is a command pattern with delay on the response.
Recall messages in queue are kept until consumer(s) got them.
Consider streaming system, like Kafka, as pub/sub and persistence system for:
Publish events as immutable facts of what happened in an application
Get continuous visibility of the data Streams
Keep data once consumed, for future consumers, for replay-ability
Scale horizontally the message consumption
What are Events and Messages
There is a long history of messaging in IT systems. You can easily see an event-driven solution and events in the context of messaging systems and messages. However, there are different characteristics that are worth considering:
Messaging: Messages transport a payload and messages are persisted until consumed. Message consumers are typically directly targeted and related to the producer who cares that the message has been delivered and processed.
Events: Events are persisted as a replayable stream history. Event consumers are not tied to the producer. An event is a record of something that has happened and so can't be changed. (You can't change history.)
Now Messaging versus event streaming
Messaging are to support:
Transient Data: data is only stored until a consumer has processed the message, or it expires.
Request / reply most of the time.
Targeted reliable delivery: targeted to the entity that will process the request or receive the response. Reliable with transaction support.
Time Coupled producers and consumers: consumers can subscribe to queue, but message can be remove after a certain time or when all subscribers got message. The coupling is still loose at the data model level and interface definition level.
Events are to support:
Stream History: consumers are interested in historic events, not just the most recent.
Scalable Consumption: A single event is consumed by many consumers with limited impact as the number of consumers grow.
Immutable Data
Loosely coupled / decoupled producers and consumers: strong time decoupling as consumer may come at anytime. Some coupling at the message definition level, but schema management best practices and schema registry reduce frictions.
Hope this answer help!
Basically Kafka is messaging framework similar to ActiveMQ or RabbitMQ. There are some effort to take Kafka towards streaming:
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
Then why Kafka comes into picture when talking about Stream processing?
Stream processing framework differs with input of data.In Batch processing,you have some files stored in file system and you want to continuously process that and store in some database. While in stream processing frameworks like Spark, Storm, etc will get continuous input from some sensor devices, api feed and kafka is used there to feed the streaming engine.
Recently, I have come across a very good document that describe the usage of "stream processing" and "message processing"
https://developer.ibm.com/articles/difference-between-events-and-messages/
Taking the asynchronous processing in context -
Messaging:
Consider it when there is a "request for processing" i.e. client makes a request for server to process.
Event streaming:
Consider it when "accessing enterprise data" i.e. components within the enterprise can emit data that describe their current state. This data does not normally contain a direct instruction for another system to complete an action. Instead, components allow other systems to gain insight into their data and status.
To facilitate this evaluation, consider these key selection criteria to consider when selecting the right technology for your solution:
Event history - Kafka
Fine-grained subscriptions - MQ
Scalable consumption - Kafka
Transactional behavior - MQ
we are implementing (or more reimplementing) a distributed software system. What we have are different processes (possibly running on different computers) that should communicate with each other (let's call these clients). We don't want them to directly communicate with each other, but instead use some kind of message broker.
Since we like to avoid implementing the message broker ourselves we would like to use an existing implementation. But we don't find a protocol or system that fully fulfilles our requirements.
MQTT with its publish-subscribe-mechanism seems nice and could even be used for point-to-point communication (where some specific topics are only subscribed by certain clients).
But it is (like JSM, STOMP, etc.) asynchronous. The sender sends a message into the broker and doesn't know whether it is ever delivered to it's recipient. We want that the sender gets informed about a successful delivery or an elapsed timeout (when no one is receiving the message).
Is there some protocol/implementation available that provides such synchronous messaging functionality?
(It would be nice however if asynchronous delivery would be possible, too)
The messaging by default is ( usually ) asynchronous .
You can considerer RabbitMQ, it contains the following features:
Publisher-confirms (in asynchronous way):
http://www.rabbitmq.com/blog/2011/02/10/introducing-publisher-confirms/
Transaction Commit:
https://www.rabbitmq.com/semantics.html
Messages TTL (to handle time out)
https://www.rabbitmq.com/ttl.html
With this features you can handle the time-out situations and the successful delivery.
If this is not enough you can use the RPC:
https://www.rabbitmq.com/tutorials/tutorial-six-java.html
Let me know if you need more information.
If I use CometD long polling:
Suppose there are 1000 messages in a second to be sent to subscribers, does CometD allow them to be auto-batched so that each client doesn't have to re-connect for each single message?
Do "lazy channels" (as described here: http://docs.cometd.org/3/reference/#_java_server_lazy_messages) auto-batch queued messages sent to clients upon timeout?
If on the other hand I don't use lazy channels, and suppose I "batch-publish" messages on channels 1, 2 and 3:
cometd.batch(function()
{
cometd.publish('/channel1', { product: 'foo' });
cometd.publish('/channel2', { notificationType: 'all' });
cometd.publish('/channel3', { update: false });
});
(http://docs.cometd.org/3/reference/#_javascript_batch)
does a client subscribed to all 3 channels receive them in a batch too? Or does it send them all separately, forcing the client to re-connect after each message (slow)?
CometD offers application developers full control of batching features, allowing to have maximum flexibility, performance and scalability.
When using the HTTP long-polling transports, there are 2 places where batching may happen.
From client to server is solved using the CometD API and explicit batching (like your snippet above).
Batching at this level is typically in control of the application, although CometD does an internal batching to avoid exhausting the connections to the server.
From server to client there are more variations.
For broadcast non-lazy channels there is no automation, and what normally happens is that the first message to a client (that is not the publisher) will trigger the flush of the message queue; while this is being sent, other messages will queue up on the server side for that client and on the next /meta/connect the whole queue will be flushed. For 10 messages the scheme could be something like: 1-flush-9-flush (enqueue 1, flush the queue, enqueue the other 9 while waiting for the /meta/connect to come back, flush the other 9).
For broadcast lazy channels there is automation, so CometD will wait before sending those messages following the rules of lazy messages. A typical scheme could be: 10-flush.
For service channels, everything is back in control of the application.
The client can send batched messages to the application via a service channel (whose messages are not broadcast automatically by CometD). The application on server can receive the first message and know that other 9 will come, so it can wait to send them until the last has arrived. When the last arrives, it can just use the batching API to batch together the responses to clients, something like:
List<ServerSession> subscribers = ...;
for (ServerSession subscriber : subscribers) {
subscriber.batch(() -> {
subscriber.deliver(sender, "/response", response1);
subscriber.deliver(sender, "/response", response2);
subscriber.deliver(sender, "/response", response3);
});
}
Of course responses may be different from the messages received, both in content and number.
The scheme here can be almost anything the application wants, but it's common to have it as a 10-flush, which is the most efficient.
A note regarding the batching of messages to be sent back to the publisher. This is a special case and it's by default automated: while processing the incoming messages from that publisher, CometD starts an internal batch for that particular publisher, so that any message that is to be delivered back to the publisher is batched and will be flushed at the end of the processing of the incoming messages.
The bottom line is that CometD is already pretty well tuned to give the maximum of performance and scalability in common cases, but yet leaves the application room for customizing the behaviour to achieve maximum efficiency using application specific knowledge of message patterns.
I encourage you to look at the CometD documentation, tutorials and javadocs.
I am using TIdCmdTCPClient and TIdCmdTCPServer. Suddenly I find that I might like to have bi-directional communication.
What would be best? Should I possibly use some other components? If so, which? Or should I kludge and have the 'client' poll the 'server' to ask if it wishes to communciate anything?
This is a very small system. Two clients and ten servers, with a burst of one tarnscation every 30 to 60 seconds for a few minutes once a day, so overhead for polling is inconsequential.
I'm just woder if there is a 'correct' way.
Update: this really is an incredibly simple system. Very little traffic and all of it simple. All transmissions are an indication of even type an an optional single parameter.
<event type> [ <parameter>] e.g. "HERE_IS_SOME_DATA 42"
This can be sent in both directions, hover here is no "reply" as such. Just fire off a message (and hope that it got there)? Receive an Ack with no data? Non-catching of an exception indicates that message was successfully sent?)
Would it be possible (would it be overkill) to use two TIdCmdTCPServer?
Both TIdCmdTCPClient and TIdCmdTCPServer continuously poll their socket endpoints for inbound data during the lifetime of the connection. You do not have to do anything extra for that. So, as soon as a TIdCmdTCPClient connects to the TIdCmdTCPServer, both components will initially be in a reading state until one of them sends a command to the other.
Now, there is a problem with doing that - as soon as either component sends that first command, the receiving component will interpret it as a command and send back a reply, which the other component will interpret as a command and send back a reply, which will be interpretted as a command and send back a reply, and so on, causing an endless cycle of replies back and forth. For that reason, it is not wise to use TIdCmdTCPClient and TIdCmdTCPServer together. You should either use TIdTCPClient with TIdCmdTCPServer, or use TIdCmdTCPClient with TIdTCPServer. Depending on what exactly your protocol looks like, you may have to forgo using TIdCmdTCPClient and TIdCmdTCPServer altogether and just use TIdTCPClient with TIdTCPServer so you have more control over reading and writing on both ends. It is hard to answer with actual code without first knowing what the communication protocol should look like.
A single TCP socket connection can be used in two directions. The server can send data asynchronously to the client at any time. It is up to the client however to read the socket, for asynchronous processing this is done in a listener thread which reads from the socket and synchronizes incoming data operations with the main worker thread.
An example use case in the Indy components is the Telnet client component (TIdTelnet) which has a receive thread listening for server messages.
But you also asked about the 'correct' way - and then the answer depends on other factors such as network stability, guaranteed delivery and how to handle temporary server outages. In enterprise environments, one central messaging hub is preferred in many use cases, so that all parties connect only to this central server which is only responsible for reliable message delivery, and keeps messages until the recipient is available.
You can download the INDY 10 TCP server demo sample code here.