How to send image data via different microservices with Redis - image-processing

I wanted to ask what options make sense with Redis, as I am unsure about Redis Pub/Sub in particular. Suppose I have a service A (Java client) that processes images. Unfortunately it can't process all kinds of images (because the language/framework doesn't support it yet). This is where service B comes into play (Node.js).
Service A streams the image bytes to Redis. Service B should read these bytes from Redis and encode them into the correct format. Then stream back to Redis and Service A is somehow notified to read the result from Redis.
There are two strategies I consider for this:
Using the Pub/Sub feature of Redis. Service A streams via writeStream e.g. the chunks to Redis and then publishes as publisher certain metadata to Service B (& replicas) as subscriber. Service B then reads the stream ( locks it for other replicas), processes it, and then streams the result back to Redis. Then sends a message to Service A as Publisher that the result can be fetched from Redis.
I put everything directly into the pub/sub Redis. Metadata and bytes and then proceed as in 1). But how do I then lock the message for other replicas of B? I want to avoid that all process the same image.
So my question is:
Does the pub/sub feature of Redis allow strategy no. 2 in terms of performance or is this exclusively intended for "lightweight" messages such as log data, metadata, IDs?
And if Redis in general would not be a good solution for this approach. Which one then? Async rest endpoints?

Related

Share data between two docker containers sharing same network

I have a requirement to build two applications (in Golang), first application just receives data via UART and send it to the second application for processing, second application should receive the data and process.
I have already completed receiving data via UART in first application, now I'm looking for better way to get data from first module to second module. They both are running as docker containers and sharing same docker network.
I was thinking of creating rest API in second application and first application will simply send data with http call, but is there a better way to do? Any other option that can take advantage of docker network?
In general, yes sockets are what you need. Plain TCP/UDP, HTTP server (RESTful API or not), gRPC, etc.
Or you start another container of a message queue (NATS, Kafka, RabbitMQ, etc), and write pub-sub logic. Or you can use a database.
Or you can mount a shared Docker volume between both containers and communicate via files.
None of these are necessarily unique to Golang and will work with any language.

Using a load balancer to dispatch messages from Redis pub sub

I have several Python application that all connect to a Redis server and consume messages using the pubsub mechanism. I have containerized the applications with Docker and I would like to scale each application by replicating the number of container instances. The challenge is that I don’t want each container to act as an independent subscriber to Redis, meaning I would essentially like to load balance the network traffic so that, when a message is published, only one container receives it per service.
Let’s take the simple example of two services, Service A and Service B. Both services need to be subscribed to the same topic so that each is notified upon a message published to that topic. Each service will process the message differently; in other words the same message will trigger two different outcomes, one executed by Service A and one by Service B. Now, I am trying to imagine an architecture in which these services consist of replicated containers, let’s call them workers. Say Service A consists of two workers A1 and A2, and Service B consists of three workers B1, B2, and B3 (maybe it requires more processing power per message than Service A, so it requires more workers for the same message load). So my use case requires that both Service A and Service B need to subscribe to the same topic so that they both receive updates as they come in, but I only want one worker to handle the message per service. Imagine that a message comes in and worker A1 handles it for Service A while B3 handles it for Service B.
Overall this feels like it should be pretty straightforward, I essentially have multiple applications, each of which needs to scale horizontally and should handle network traffic as if they were sitting behind a load balancer.
I am intending to deploy these applications with something like Amazon ECS, where each application is essentially a service with task replication and all services connect to a centralized Redis cache acting as a message broker. In a situation like this, from the limited research I’ve done, it would be nice to just put a network load balancer up in front of each service so that published messages would be directed to what looks like a single subscriber, but behind the scenes is a collection of workers acting like they’re pulling off a task queue.
I haven’t had much luck finding examples of this kind of architecture, or for that matter any examples of tasks that use something like Redis in the way I’m imagining. This is an architecture I’ve more or less dreamed up, so I could just be thinking about this all wrong, but at the same time it doesn’t seem like a crazy use case to me. I’m looking for any advice about how this could be accomplished and/or if what I’m talking about just sounds insane and there’s a better way.

Django Channels: chan you check the number of sockets inside a room/channel_layer

Same as the question itself. How can you check the number of live sockets inside a room or a channel_layer if you are using django channels?
You can't do this directly from the generic channel layer api, if you'r using Redis you could look into the Redis api and check how many subscriptions are open.
this could be done using this api:
https://redis.io/commands/client-list
(this might be quite slow and costly if you have lots of open connections to your redis cluster)
You will need to convert the group name into the group key in the same way the redis layer is doing this see here:
https://github.com/django/channels_redis/blob/master/channels_redis/core.py#L582

Difference between stream processing and message processing

What is the basic difference between stream processing and traditional message processing? As people say that kafka is good choice for stream processing but essentially kafka is a messaging framework similar to ActivMQ, RabbitMQ etc.
Why do we generally not say that ActiveMQ is good for stream processing as well.
Is it the speed at which messages are consumed by the consumer determines if it is a stream?
In traditional message processing, you apply simple computations on the messages -- in most cases individually per message.
In stream processing, you apply complex operations on multiple input streams and multiple records (ie, messages) at the same time (like aggregations and joins).
Furthermore, traditional messaging systems cannot go "back in time" -- ie, they automatically delete messages after they got delivered to all subscribed consumers. In contrast, Kafka keeps the messages as it uses a pull-based model (ie, consumers pull data out of Kafka) for a configurable amount of time. This allows consumers to "rewind" and consume messages multiple times -- or if you add a new consumer, it can read the complete history. This makes stream processing possible, because it allows for more complex applications. Furthermore, stream processing is not necessarily about real-time processing -- it's about processing infinite input streams (in contrast to batch processing, which is applied to finite inputs).
And Kafka offers Kafka Connect and Streams API -- so it is a stream-processing platform and not just a messaging/pub-sub system (even if it uses this in its core).
If you like splitting hairs:
Messaging is communication between two or more processes or components whereas streaming is the passing of event log as they occur. Messages carry raw data whereas events contain information about the occurrence of and activity such as an order.
So Kafka does both, messaging and streaming. A topic in Kafka can be raw messages or and event log that is normally retained for hours or days. Events can further be aggregated to more complex events.
Although Rabbit supports streaming, it was actually not built for it(see Rabbit´s web site)
Rabbit is a Message broker and Kafka is a event streaming platform.
Kafka can handle a huge number of 'messages' towards Rabbit.
Kafka is a log while Rabbit is a queue which means that if once consumed, Rabbit´s messages are not there anymore in case you need it.
However Rabbit can specify message priorities but Kafka doesn´t.
It depends on your needs.
Message Processing implies operations on and/or using individual messages. Stream Processing encompasses operations on and/or using individual messages as well as operations on collection of messages as they flow into the system. For e.g., let's say transactions are coming in for a payment instrument - stream processing can be used to continuously compute hourly average spend. In this case - a sliding window can be imposed on the stream which picks up messages within the hour and computes average on the amount. Such figures can then be used as inputs to fraud detection systems
Apologies for long answer but I think short answer will not be justice to question.
Consider queue system. like MQ, for:
Exactly once delivery, and to participate into two phase commit transaction
Asynchronous request / reply communication: the semantic of the communication is for one component to ask a second command to do something on its data. This is a command pattern with delay on the response.
Recall messages in queue are kept until consumer(s) got them.
Consider streaming system, like Kafka, as pub/sub and persistence system for:
Publish events as immutable facts of what happened in an application
Get continuous visibility of the data Streams
Keep data once consumed, for future consumers, for replay-ability
Scale horizontally the message consumption
What are Events and Messages
There is a long history of messaging in IT systems. You can easily see an event-driven solution and events in the context of messaging systems and messages. However, there are different characteristics that are worth considering:
Messaging: Messages transport a payload and messages are persisted until consumed. Message consumers are typically directly targeted and related to the producer who cares that the message has been delivered and processed.
Events: Events are persisted as a replayable stream history. Event consumers are not tied to the producer. An event is a record of something that has happened and so can't be changed. (You can't change history.)
Now Messaging versus event streaming
Messaging are to support:
Transient Data: data is only stored until a consumer has processed the message, or it expires.
Request / reply most of the time.
Targeted reliable delivery: targeted to the entity that will process the request or receive the response. Reliable with transaction support.
Time Coupled producers and consumers: consumers can subscribe to queue, but message can be remove after a certain time or when all subscribers got message. The coupling is still loose at the data model level and interface definition level.
Events are to support:
Stream History: consumers are interested in historic events, not just the most recent.
Scalable Consumption: A single event is consumed by many consumers with limited impact as the number of consumers grow.
Immutable Data
Loosely coupled / decoupled producers and consumers: strong time decoupling as consumer may come at anytime. Some coupling at the message definition level, but schema management best practices and schema registry reduce frictions.
Hope this answer help!
Basically Kafka is messaging framework similar to ActiveMQ or RabbitMQ. There are some effort to take Kafka towards streaming:
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
Then why Kafka comes into picture when talking about Stream processing?
Stream processing framework differs with input of data.In Batch processing,you have some files stored in file system and you want to continuously process that and store in some database. While in stream processing frameworks like Spark, Storm, etc will get continuous input from some sensor devices, api feed and kafka is used there to feed the streaming engine.
Recently, I have come across a very good document that describe the usage of "stream processing" and "message processing"
https://developer.ibm.com/articles/difference-between-events-and-messages/
Taking the asynchronous processing in context -
Messaging:
Consider it when there is a "request for processing" i.e. client makes a request for server to process.
Event streaming:
Consider it when "accessing enterprise data" i.e. components within the enterprise can emit data that describe their current state. This data does not normally contain a direct instruction for another system to complete an action. Instead, components allow other systems to gain insight into their data and status.
To facilitate this evaluation, consider these key selection criteria to consider when selecting the right technology for your solution:
Event history - Kafka
Fine-grained subscriptions - MQ
Scalable consumption - Kafka
Transactional behavior - MQ

How do I retrieve data from statsd?

I'm glossing over their documentation here :
http://www.rubydoc.info/github/github/statsd-ruby/Statsd
And there's methods for recording data, but I can't seem to find anything about retrieving recorded data. I'm adopting a projecting with an existing statsd addition. It's host is likely a defunct URL. Perhaps, is the host where those stats are recorded?
The statsd server implementations that Mircea links just take care of receiving, aggregating metrics and publishing them to a backend service. Etsy's statsd definition (bold is mine):
A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers, sent over UDP or TCP and sends
aggregates to one or more pluggable backend services (e.g.,
Graphite).
To retrieve the recorded data you have to query the backend. Check the list of available backends. The most common one is Graphite.
See also this question: How does StatsD store its data?
There are 2 parts to statsd: a client and a server.
What you're looking at is the client part. You will not see functionality related to retrieving the data as it's not there - it normally is on the server side.
Here is a list of statsd server implementations:
http://www.joemiller.me/2011/09/21/list-of-statsd-server-implementations/
Research and pick one that fits your needs.
Statsd originally started at etsy: https://github.com/etsy/statsd/wiki

Resources