Speeding up Confluent docker load time - docker

I have a docker-compose file stat spins up a single-node Kafka, Zookeeper & schema registry stack for testing my application. Currently, it takes a couple of minutes to get the stack available; are there any settings to speed up the launch time?
The config I'm using (apart from SSL) is as follows:
kafka:
image: confluentinc/cp-kafka:3.3.1
depends_on:
- zookeeper
hostname: kafka
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: SSL://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "false"
KAFKA_LOG4J_ROOT_LOGLEVEL: WARN
KAFKA_JMX_PORT: 9585
KAFKA_JMX_HOSTNAME: kafka
KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=9585"
volumes:
- ../../txs-data/kafka-data:/var/lib/kafka/data

Other than using a newer container version, there isn't a way to make it faster.
Zookeeper starts fairly quickly, but Kafka relies on Zookeeper and needs to coordinate extra tasks to elect a leader, load some other metadata, etc. (Update: No longer needed)
If you're adding the schema registry on top of that, it requires Kafka to start, then create it's _schemas topic, which requires a round-trip to Zookeeper.
All-in-all, there is a lot of pre-initialization steps happening, of which all are required and cannot be skipped to reduce the start-time.
Assuming you're running this as part of a JVM testing framework, the "faster" way would be to use embedded versions of each of the services.

You can now start Kafka without Zookeeper:
git clone https://github.com/confluentinc/cp-all-in-one
cd cp-all-in-one/cp-all-in-one-kraft
docker compose up broker

Related

Kafka Docker CPU 100% after a restart

I am running Kafka and Zookeeper (bitnami) locally on my M1 Macbook in Docker using Docker Compose. If I delete the data volumes for Kafka and Zookeeper and restart, it restarts perfectly and runs at a low CPU utilisation.
I have about 6 different consumer groups and 2 producers all running locally on the Mac that are different Node JS scripts. Everything is great, until I stop the consumers, then shutdown and restart the docker containers (Ctrl-C to stop, followed by Docker Compose Up).
When I do that everything looks to restart fine, but the producers and consumers get Connection Refused errors (though sometimes connect, but don't stay connected). There are no errors or warnings in the Kafka or Zookeeper logs, but when I look at the container for Kafka, Docker reports it is running at 100% CPU utilisation. I can leave it for a stay and it will day at that utilisation level.
I can always resolve this by stopping the containers and deleting the volumes associated with Kafka and Zookeeper. But, why is it doing this? Why is it every time I stop and restart Kafka it gets stuck at 100% CPU?
These are the relevant portions of my docker-compose.yml file:
version: '3.8'
services:
zookeeper:
container_name: zookeeper
image: 'bitnami/zookeeper'
pull_policy: always
environment:
ALLOW_ANONYMOUS_LOGIN: yes
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- '2181:2181'
volumes:
- /Users/localuser/Documents/MintMonsterData/zookeeper:/bitnami/zookeeper
kafka:
container_name: kafka
pull_policy: always
image: 'bitnami/kafka'
restart: always
ports:
- '29092:29092'
environment:
KAFKA_CFG_ADVERTISED_HOST_NAME: kafka
KAFKA_BROKER_ID: 1
KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_CFG_LISTENERS: CLIENT://:9092,EXTERNAL://:29092
KAFKA_CFG_ADVERTISED_LISTENERS: CLIENT://kafka:9092,EXTERNAL://localhost:29092
KAFKA_CFG_INTER_BROKER_LISTENER_NAME: CLIENT
KAFKA_CFG_LOG_RETENTION_HOURS: 24
ALLOW_PLAINTEXT_LISTENER: yes
KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
volumes:
- /Users/localuser/Documents/MintMonsterData/kafka:/bitnami/kafka
depends_on:
- zookeeper
I believe that it might be caused by the fact that Bitnami's docker images do not support arm64 architecture. See the following issue on Bitnami GitHub

Increasing number of brokers in kafka using KAFKA_ADVERTISED_HOST_NAME

I'm new to Kafka and I'm trying to run a Kafka service on my local machine and use it to transfer some data from one .NET project to another.
I'm using docker-compose.yml file to create two docker containers for zookeeper and Kafka from wurstmeister.
In Kafka definition in environment variables there is KAFKA_ADVERTISED_HOST_NAME which I set to 127.0.0.1.
In docker hub of wurstmeister/kafka says I quote
"modify the KAFKA_ADVERTISED_HOST_NAME in docker-compose.yml to match your docker host IP (Note: Do not use localhost or 127.0.0.1 as the host IP if you want to run multiple brokers.)".
When I set my topic to have more than 1 replica and more than 1 partition I get this message
"Error while executing topic command : Replication factor: 2 larger than available brokers: 1.".
What is the right IP address to define in KAFKA_ADVERTISED_HOST_NAME which will allow me to get more than 1 broker?
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: 127.0.0.1
KAFKA_CREATE_TOPICS: "simpletalk_topic:2:2"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Note - Running multiple brokers on one machine does not provide true fault tolerance.
use it to transfer some data from one .NET project to another
You only need one broker for this.
First, suggest reading https://github.com/wurstmeister/kafka-docker/wiki/Connectivity
KAFKA_ADVERTISED_HOST_NAME is deprecated. Don't use it.
It has been replaced by KAFKA_ADVERTISED_LISTENERS, which can both contain ${DOCKER_HOST_IP:-127.0.0.1} (or your host IP from ifconfig) since this is the IP your client will use (from the host).
Since brokers are clients of themselves, you also need to advertise the container names. And, if you want to containerize your client, those addresses would also be used by your app.
Beyond Kafka networking configs, KAFKA_BROKER_ID needs to be different between each, and your error is suggesting you need to override all the other replication factor broker configs.
All in all.
version: '3'
services:
zookeeper:
image: wurstmeister/zookeeper
kafka-1:
image: wurstmeister/kafka
depends_on: [zookeeper]
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://${DOCKER_HOST_IP:-127.0.0.1}:9092,PLAINTEXT_INTERNAL://kafka-1:29092
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,PLAINTEXT_INTERNAL://:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT_INTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_DEFAULT_REPLICATION_FACTOR: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 2
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 2
kafka-2:
image: wurstmeister/kafka
ports:
- "9093:9093"
environment:
KAFKA_BROKER_ID: 2 # unique
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://${DOCKER_HOST_IP:-127.0.0.1}:9093,PLAINTEXT_INTERNAL://kafka-2:29093
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9093,PLAINTEXT_INTERNAL://:29093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT_INTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_DEFAULT_REPLICATION_FACTOR: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 2
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 2
KAFKA_CREATE_TOPICS: "simpletalk_topic:2:2"
depends_on: # ensure this joins the other
- zookeeper
- kafka-1
I'm trying to run a Kafka service on my local machine
Set bootstrap.servers="localhost:9092,localhost:9093"
Further reading - Connect to Kafka running in Docker
Example usage
Note that topic creation doesn't seem to automatically work. You should ideally be using your application to actually create / check topic existence.
$ kcat -b localhost:9093 -L
Metadata for all topics (from broker -1: localhost:9093/bootstrap):
2 brokers:
broker 2 at 127.0.0.1:9093
broker 1 at 127.0.0.1:9092 (controller)
0 topics:
producing some data... and consuming it
$ kcat -b localhost:9093 -t test -o earilest
% Auto-selecting Consumer mode (use -P or -C to override)
hello
world
sample
data
of
no
importance
it should be ip address of your machine.
Linux: use ifconfig command and your IP will be inet <IP_ADDRESS>
in windows you will get this by using ipconfig

accessing kafka running in docker-compose from other machines

I want to run kafka in a single node, single broker, in one of computers on our network and be able to access it from other machines. for example by running docker-compose on 192.168.0.36 I want to access it from 192.168.0.19.
since we can't use any Linux distribution I have to run kafka as a docker container on windows.
I know there are already a ton of questions and documents on this topic including this question and this example and also this blog post, but unfortunately none of them worked out for me.
this is the compose file I'm using right now:
version: '3.7'
services:
zookeeper:
image: wurstmeister/zookeeper:3.4.6
ports:
- "2181:2181"
expose:
- "2181"
volumes:
- type: bind
source: "G:\\path\\to\\zookeeper"
target: /opt/zookeeper-3.4.6/data
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
expose:
- "9092"
environment:
KAFKA_ADVERTISED_LISTENERS: INSIDE://:9093, OUTSIDE://192.168.0.36:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT, OUTSIDE:PLAINTEXT
KAFKA_LISTENERS: INSIDE://:9093,OUTSIDE://:9092
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_BROKER_ID: 1
KAFKA_LOG_DIRS: "/kafka"
volumes:
- type: bind
source: "G:\\path\\to\\logs"
target: /kafka/
things I tried for debugging the issue:
alraedy tried all the different configurations in mentioned questions
and blog posts.
I can access Kafka from 192.168.0.36 which is machine running docker-compose but not from
192.168.0.19 (NoBrokersAvailable error in kafka-python).
just to see if it's internal networking problem or not, I tried a similar docker-compose file running a falcon API using gunicorn and I can call the API from 192.168.0.19.
I also tried the windows telnet tool to see the 9092 port is
accessible from different machines, it's accessible from 0.36 but not
from 0.19.
tried using a custom network like this one
I'm testing the connection using python's kafka-python package. we have a multi-broker kafka running on our on-premise kubernetes cluster and it's working fine, so I don't think my testing scripts have any issues.
UPDATE
as OneCricketeer suggested, I tried this solution with different configurations like 0.0.0.0:9092=>127.0.0.1:9092 and 192.168.0.36:9092=>127.0.0.1:9092. also disabled firewall. still getting NoBrokersAvailable but at least I can access 0.36:9092 from other machine's telnet now.

Cannot connect to kafka docker container from outside client (wurstmeister images)

There are so many answers for this question that I ended up being totally confused about how I can connect to Kafka docker container from an outside client.
I have created two docker machines, a manager and a worker with these commands:
docker-machine create manager
docker-machine create worker1
I have add these two nodes inside a docker swarm.
docker#manager:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
6bmovp3hr0j2w5irmexvvjgzq * manager Ready Active Leader 19.03.5
mtgbd9bg8d6q0lk9ycw10bxos worker1 Ready Active 19.03.5
docker-compose.yml:
version: '3.2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka:latest
ports:
- target: 9094
published: 9094
protocol: tcp
mode: host
environment:
HOSTNAME_COMMAND: "hostname | awk -F'-' '{print $$2}'"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094
KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9094
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
volumes:
- /var/run/docker.sock:/var/run/docker.sock
From inside docker, everything works fine. I can create topics and then produce/consume messages.
I created a python script in order to consume messages from outside docker. The simple code is presented below:
from kafka import KafkaConsumer
import json
try:
print('Welcome to parse engine')
consumer = KafkaConsumer('streams-plaintext-input', bootstrap_servers='manager:9094')
for message in consumer:
print(message)
except Exception as e:
print(e)
# Logs the error appropriately.
pass
But the code is stack forever. The connection is not correct. Can anyone provide any help on how to setup a connection?
Since you are using docker-machine you have to either
Run your code also in a container (using kafka:9092)
Run your code within the VM OS (using vm-host-name:9094)
Add PLAINTEXT://localhost:9096 to the advertised listeners, expose 9096 from the VM to your host, then use localhost:9096 in your code (note: 9096 is some random port)
The gist is that clients must be able to connect to the bootstrap address and the advertised one that is being returned. If it cannot connect to the second, code will timeout.

Why could Kafka warn "partitions have leader brokers without a matching listener"?

I'm trying to get Kafka to work on docker-compose for the first time. The application runs fine without docker. But on docker, I get the error as described below. Any reason why Kafka would throw this error?
The error:
email-service_1 | 2018-12-01 14:32:02.448 WARN 1 ---
[ntainer#0-0-C-1] o.a.k.c.NetworkClient : [Consumer
clientId=consumer-2, groupId=kafka] 1 partitions have leader brokers
without a matching listener, including [email-token-0]
My docker-compose config:
version: '3.3'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
command: [start-kafka.sh]
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_HOST_NAME: 192.168.23.134
KAFKA_CREATE_TOPICS: "email-token:1:1"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
ports:
- "9092:9092"
depends_on:
- zookeeper
email-service:
build: ./email-service
environment:
SPRING_KAFKA_BOOTSTRAPSERVERS: kafka:9092
ports:
- "8081:8081"
depends_on:
- kafka
As stated in the comments to your question the problem seems to be with the advertised name for the Kafka broker. According to your docker-compose you should be using 192.168.23.134 but your email-service is using kafka:9092. You can try with this docker-compose. I replaced the wurstmeister services with the latest Zookeeper and Kafka provided by confluentinc and added your email-service.
---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
email-service:
build: ./email-service
environment:
SPRING_KAFKA_BOOTSTRAPSERVERS: kafka:29092
ports:
- "8081:8081"
depends_on:
- kafka
advertised.listeners: Listeners to publish to ZooKeeper for clients to use, if different than the listeners config property. In IaaS environments, this may need to be different from the interface to which the broker binds. If this is not set, the value for listeners will be used. Unlike listeners it is not valid to advertise the 0.0.0.0 meta-address.
Please note that KAFKA_ADVERTISED_HOST_NAME has been deprecated and it's recommended to use KAFKA_ADVERTISED_LISTENERS instead. For more information about KAFKA_ADVERTISED_LISTENERS check here.
This is Apache Kafka 2.4.0.
I'm sharing the low-level code-based findings to shed more light when this WARN message could be printed out and why. That's certainly a misconfiguration of a Kafka cluster. Read on and comment if there's something missing. Thanks!
The WARN message is printed out when the DefaultMetadataUpdater (of NetworkClient) is requested to handle a completed metadata response.
[count] partitions have leader brokers without a matching listener, including [partitions]
It is a warning that corresponds to Errors.LISTENER_NOT_FOUND that has the following default exception text:
There is no listener on the leader broker that matches the listener on which metadata request was processed.
That's on the client side.
Digging deeper you can find that this Errors.LISTENER_NOT_FOUND is used on a Kafka broker when MetadataCache is requested to find partition metadata. That's where you can find just before there's this DEBUG message:
Error while fetching metadata for [topicPartition]: listener [listenerName] not found on leader [leaderBrokerId]
Simply turn the DEBUG logging level for kafka.server.MetadataCache logger and you should see it in the controller broker's logs.
In this particular case, this MetadataCache is used by a broker (via KafkaApis) to handle TopicMetadata request where they say:
// In versions 5 and below, we returned LEADER_NOT_AVAILABLE if a matching listener was not found on the leader.
// From version 6 onwards, we return LISTENER_NOT_FOUND to enable diagnosis of configuration errors.
And at that moment, it's clear that the WARN message in question is for a connection on the listenerName.
In my case, when I was debugging the issue, it turned out that I used SSL://:9093 to connect to a Kafka broker while the partition leader was neither available nor configured to listen to the listeners configuration property.
I used kafka-topics to review the partition configuration and then reviewed the state of partitions in ZooKeeper.
get /brokers/topics/ssl/partitions/0/state
{"controller_epoch":1,"leader":0,"version":1,"leader_epoch":0,"isr":[0]}
I had -1 for the leader, but the isr showed a broker that was simply misconfigured. That's why people reported they fixed the issue by restarting their clusters (to get all the brokers up and running) or fixing the broker ID to the one that worked previously.
[context]
I am trying to run a docker compose with a kafka client using the registry of https://github.com/wurstmeister/kafka-docker
I am trying to run a very simple kafka cluster with a single broker and 3 topics with each 1 partition and a replication factor of 1.
this great link explains connectivity for a kafka cluster with one broker, a kafka cluster with several brokers and also notions about the listeners, all using docker, please have a look : https://github.com/wurstmeister/kafka-docker/wiki/Connectivity
[result]
The first time i run docker-compose up --force-recreate --build, everything runs just fine !
The topics are created automatically using KAFKA_CREATE_TOPICS and I can use kafka producer and consumer just fine.
list topics : bin/kafka-topics.sh --list --bootstrap-server localhost:9092
producer : bin/kafka-console-producer.sh --broker-list localhost:9092 --topic productadvisor_sales_dev
consumer : bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic productadvisor_sales_dev --from-beginning
After that, everytime I do docker-compose stop, and relaunch using docker-compose up --force-recreate --build and try to produce data I get the following error message ...
Error Message :
[2019-09-23 19:41:33,037] WARN [Producer clientId=console-producer] 1 partitions have leader brokers without a matching listener, including [productadvisor_purchase_dev-0] (org.apache.kafka.clients.NetworkClient)
[Answer]
It appears you need to specify the value of KAFKA_BROKER_ID (=1 for instance) so that the zookeeper doesn't try to create a new broker which can't have a listener because it is binded to the old one.
[Code]
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_HOST: localhost
KAFKA_PORT: 9092
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_CREATE_TOPICS: "productadvisor_sales_dev:1:1,productadvisor_stock_dev:1:1,productadvisor_purchase_dev:1:1"
depends_on:
- zookeeper
command: [start-kafka.sh]
[Some documentation]
https://rmoff.net/2018/08/02/kafka-listeners-explained/
https://www.tutorialspoint.com/apache_kafka/apache_kafka_fundamentals.htm
http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
https://blog.k2datascience.com/running-kafka-using-docker-332207aec73c
NB
if anyone has more information about the inner working of kafka, the zookeeper and the broker and why we need to specify it, why the information is kept even if I do a --force-recreate --build ... please do not hesitate. I am new to kafka and this is one of my first complete post on stackoverflow :)
Cheers !
In our case, one of broker in the cluster shutdown due to disk problem and we got this error in other replicas. When we solve the disk problem the problem was solved

Resources