confluent Kafka consumer stuck after broker restarts - docker

I am using kafka_image=wurstmeister/kafka
zookeeper_version=3.4.14
kafka_version=2.12-2.4.0
C# client: Confluent kafka v1.2.0
We are
using 3 brokers , 1 zookeeper cluster As a pat deployment we use to
stop all the brokers ,zookeeper,producer,consumers and delete the
kafka-log files, and starts consumers first later will starts the
brokers and zookeeper. In this process some time consumer getting
stuck, its not picking any messages even it alive. If i restarts
the consumer, it started picking

Rebalance can be the reason for such a behaviour. When rebalance starts in a consumer group, all the consumers in this group are revoked and during rebalance consumers cannot commit offset or poll data until rebalance finishes and partitions are assigned to new consumers.
Some important notes to consider:
rebalance timeout is equal to max.poll.interval.ms. So if your max.poll.inteval.ms is so high because of long running processes then rebalance can take so much time.
Reasons to rebalance:
Joining new consumer to consumer group
Clean shutdown of a consumer
Adding new partition(s) to a topic which is subscribed by the consumer group
When a consumer is considered dead by the group coordinator
Expiring session.timeout.ms without sending heartbeat
Not calling poll during max.poll.interval.ms
Reason to face with rebalance after restart can be the joinGroupRequests that consumers send to group coordinator by calling poll. Each requests trigger to rebalance. (in potentially) Then you are getting lots of rebalances. To overcome this problem, you can increase group.initial.rebalance.delay.ms. It is 3 sec in default.
group.initial.rebalance.delay.ms: The amount of time the group
coordinator will wait for more consumers to join a new group before
performing the first rebalance. A longer delay means potentially fewer
rebalances, but increases the time until processing begins.

Related

Delay on requests from Google API Gateway to Cloud Run

I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs.
We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the
chance of cold starts by keeping some idle instances around to handle
spikes in traffic. For example, when a container instance has finished
handling requests, it might remain idle for a period of time in case
another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some
time if no requests need to be handled. This means a cold start can
still occur. Container instances are scaled as needed, and it will
initialize the execution environment completely. While you can keep
idle instances permanently available using the min-instance setting,
this incurs cost even when the service is not actively serving
requests.
So, let’s say you want to minimize both cost and response time latency
during a possible cold start. You don’t want to set a minimum number
of idle instances, but you also know any additional computation needed
upon container startup before it can start listening to requests means
longer load times and latency.
Cloud Run container startup There are a few tricks you can do to
optimize your service for container startup times. The goal here is to
minimize the latency that delays a container instance from serving
requests. But first, let’s review the Cloud Run container startup
routine.
When Starting the service
Starting the container
Running the entrypoint command to start your server
Checking for the open service port
You want to tune your service to minimize the time needed for step 1a.
Let’s walk through 3 ways to optimize your service for Cloud Run
response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response
has been delivered. To determine whether there is background activity
in your service that is not readily apparent, check your logs for
anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request
processing, when the Cloud Run service finishes handling a
request, the container instance's access to CPU will be disabled or
severely limited. You should not start background threads or routines
that run outside the scope of the request handlers if you use this
type of CPU allocation. Review your code to make sure all asynchronous
operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create
unpredictable behavior because any subsequent request to the same
container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times.
You can also have a look on the blog related to use of Google API Gateway with Cloud Run.

RabbitMQ in ECS Cluster with Autoscaling

I have the following situation:
Two times a day for about 1h we receive a huge inflow in messages which are currently running through RabbitMQ. The current Rabbit cluster with 3 nodes can't handle the spikes, otherwise runs smoothly. It's currently setup on pure EC2 instances. The instance type is currenty t3.medium, which is very low, unless for the other 22h per day, where we receive ~5 msg/s. It's also setup currently has ha-mode=all.
After a rather lengthy and revealing read in the rabbitmq docs, I decided to just try and setup an ECS EC2 Cluster and scale out when cpu load rises. So, create a service on it and add that service to the service discovery. For example discovery.rabbitmq. If there are three instances then all of them would run on the same name, but it would resolve to all three IPs. Joining the cluster would work based on this:
That would be the rabbitmq.conf part:
cluster_formation.peer_discovery_backend = dns
# the backend can also be specified using its module name
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = discovery.rabbitmq
I use a policy ha_mode=exact with 2 replicas.
Our exchanges and queues are created manually upfront for reasons I cannot discuss any further, but that's a given. They can't be removed and they won't be re-created on the fly. We have 3 exchanges with each 4 queues.
So, the idea: during times of high load - add more instances, during times of no load, run with three instances (or even less).
The setup with scale-out/in works fine, until I started to use the benchmarking tool and discovered that queues are always created on one single node which becomes the queue master. Which is fine considering the benchmarking tool is connected to one single node. Problem is, after scale-in/out, also our manually created queues are not moved to other nodes. This is also in line with what I read on the rabbit 3.8 release page:
One of the pain points of performing a rolling upgrade to the servers of a RabbitMQ cluster was that queue masters would end up concentrated on one or two servers. The new rebalance command will automatically rebalance masters across the cluster.
Here's the problems I ran into, I'm seeking some advise:
If I interpret the docs correctly, scaling out wouldn't help at all, because those nodes would sit there idling until someone would manually call rabbitmq-queues rebalance all.
What would be the preferred way of scaling out?

Restart KSQL-Server when some queries are running

I try to find some document about it, that when some queries are running and KSQL-Server restarts. What will happened?
Does it perform similar to Kafka-Streams, so the consumer offset is not committed and at-least-once is guaranteed?
I can observe that the queries stored in the command topic, and queries are executed when ksql-server restarts
I try to find some document about it, that when some queries are running and KSQL-Server restarts. What will happened?
If you only have a single KSQL server, then stopping that server will of course stop all the queries. Once the server is running again, all queries will continue from the points they stopped processing. No data is lost.
If you have multiple KSQL servers running, then stopping one (or some) of them will cause the remaining servers to take over any query processing tasks from the stopped servers. Once the stopped servers have been restarted the query processing workload will be shared again across all servers.
Does it perform similar to Kafka-Streams, so the consumer offset is not committed and at-least-once is guaranteed?
Yes.
But (even better): Whether the processing guarantees are at-least-once or exactly-once depends solely on the KSQL server's configuration. It does of course not depend on whether or when the server is being restarted, crashes, etc.

What's the main advantage of using replicas in Docker Swarm Mode?

I'm struggling to understand the idea of replica instances in Docker Swarm Mode. I've read that it's a feature that helps with high availability.
However, Docker automatically starts up a new task on a different node if one node goes down even with 1 replica defined for the service, which also provides high availability.
So what's the advantage of having 3 replica instances rather than 1 for an arbitrary service? My assumption was that with more replicas, Docker spends less time creating a new instance on another node in the event of failure, which aids performance. Is this correct?
What Makes a System Highly Available?
One of the goals of high availability is to eliminate single points of
failure in your infrastructure. A single point of failure is a
component of your technology stack that would cause a service
interruption if it became unavailable.
Let's take your example of a replica that consists of a single instance. Now let's suppose there is a failure. Docker Swarm will notice that the service failed and restart it. The service restarts, but a restart isn't instant. Let's say the restart takes 5 seconds. For those 5 seconds your service is unavailable. Single point of failure.
What if you had a replica that consists of 3 instances. Now when one of them fails (no service is perfect), Docker Swarm will notice that one of the instances is unavailable and create a new one. During that time you still have 2 healthy instances serving requests. To a user of your service, it appears as if there was no down time. This component is no longer a single point of failure.
ROMANARMY answer is very good and i just wanted to mention that the replicas can be on different nodes, so if one of your servers goes down(become unavailable) the container(replica) on the other server can be run without problem.

Second and Third Distributed Kafka Connector workers failing to work correctly

With a Kafka cluster of 3 and a Zookeeper cluster of the same I brought up one distributed connector node. This node ran successfully with a single task. I then brought up a second connector, this seemed to run as some of the code in the task definitely ran. However it then didn't seem to stay alive (though with no errors thrown, the not staying alive was observed by a lack of expected activity, while the first connector continued to function correctly). When I call the URL http://localhost:8083/connectors/mqtt/tasks, on each connector node, it tells me the connector has one task. I would expect this to be two tasks, one for each node/worker. (Currently the worker configuration says tasks.max = 1 but I've also tried setting it to 3.
When I try and bring up a third connector, I get the error:
"POST /connectors HTTP/1.1" 500 90 5
(org.apache.kafka.connect.runtime.rest.RestServer:60)
ERROR IO error forwarding REST request:
(org.apache.kafka.connect.runtime.rest.RestServer:241)
java.net.ConnectException: Connection refused
Trying to call the connector POST method again from the shell returns the error:
{"error_code":500,"message":"IO Error trying to forward REST request:
Connection refused"}
I also tried upgrading to Apache Kafka 0.10.1.1 that was released today. I'm still seeing the problems. The connectors are each running on isolated Docker containers defined by a single image. They should be identical.
The problem could be that I'm trying to run the POST request to http://localhost:8083/connectors on each worker, when I only need to run it once on a single worker and then the tasks for that connector will automatically distribute to the other workers. If this is the case, how do I get the tasks to distribute? I currently have the max set to three, but only one appears to be running on a single worker.
Update
I ultimately got things running using essentially the same approach that Yuri suggested. I gave each worker a unique group ID, then gave each connector task the same name. This allowed the three connectors and their single tasks to share a single offset, so that in the case of sink connectors the messages they consumed from Kafka were not duplicated. They are basically running as standalone connectors since the workers have different group ids and thus won't communicate with each other.
If the connector workers have the same group ID, you can't add more than one connector with the same name. If you give the connectors different names, they will have different offsets and consume duplicate messages. If you have three workers in the same group, one connector and three tasks, you would theoretically have an ideal situation where the tasks share an offset and the workers make sure the tasks are always running and well distributed (with each task consuming a unique set of partitions). In practice the connector framework doesn't create more than one task, even with tasks.max set to 3 and when the topic tasks are consuming has 25 partitions.
If anyone knows why I'm seeing this behaviour, please let me know.
I've encountered with similar issue in the same situation as yours.
Task.max is configured for a topic and distributed workers automatically decide what nodes handle topic. So, if you have 3 workers in a cluster and your topic configuration says task.max=2 then only 2 of 3 workers will process the topic. In theory, if one of workers fails, 3rd should pick up workload. But..
The distributed connector turned out to be very unreliable: once you add\remove some nodes, the cluster broke down and all workers did nothing but tried to choose leader and failed. The only way to fix was to restart whole cluster and preferably all workers simultaneously.
I chose another way - I used standalone worker and it works like a charm to me because distribution of load is implemented on Kafka client level and once some worker dropped, the cluster re-balances automatically and clients connected to unoccupied topics.
PS. Maybe it will be useful for you too. Confluent connector is not tolerate to invalid payload that does not match topic's schema. Once the connector get some invalid message it silently dies. The only way to find out is to analyze metrics.
I'm posting an answer to an old question, since Kafka Connect has moved on a lot in three years.
In the latest version (2.3.1) there is incremental rebalancing which massively improves the behaviour of Kafka Connect.
It's also worth noting that when configuring Kafka Connect rest.advertised.host.name must be set correctly, as if it's not you will see errors including the one quoted
{"error_code":500,"message":"IO Error trying to forward REST request: Connection refused"}
See this post for more details.

Resources