I am running Kafka and Zookeeper (bitnami) locally on my M1 Macbook in Docker using Docker Compose. If I delete the data volumes for Kafka and Zookeeper and restart, it restarts perfectly and runs at a low CPU utilisation.
I have about 6 different consumer groups and 2 producers all running locally on the Mac that are different Node JS scripts. Everything is great, until I stop the consumers, then shutdown and restart the docker containers (Ctrl-C to stop, followed by Docker Compose Up).
When I do that everything looks to restart fine, but the producers and consumers get Connection Refused errors (though sometimes connect, but don't stay connected). There are no errors or warnings in the Kafka or Zookeeper logs, but when I look at the container for Kafka, Docker reports it is running at 100% CPU utilisation. I can leave it for a stay and it will day at that utilisation level.
I can always resolve this by stopping the containers and deleting the volumes associated with Kafka and Zookeeper. But, why is it doing this? Why is it every time I stop and restart Kafka it gets stuck at 100% CPU?
These are the relevant portions of my docker-compose.yml file:
version: '3.8'
services:
zookeeper:
container_name: zookeeper
image: 'bitnami/zookeeper'
pull_policy: always
environment:
ALLOW_ANONYMOUS_LOGIN: yes
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- '2181:2181'
volumes:
- /Users/localuser/Documents/MintMonsterData/zookeeper:/bitnami/zookeeper
kafka:
container_name: kafka
pull_policy: always
image: 'bitnami/kafka'
restart: always
ports:
- '29092:29092'
environment:
KAFKA_CFG_ADVERTISED_HOST_NAME: kafka
KAFKA_BROKER_ID: 1
KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_CFG_LISTENERS: CLIENT://:9092,EXTERNAL://:29092
KAFKA_CFG_ADVERTISED_LISTENERS: CLIENT://kafka:9092,EXTERNAL://localhost:29092
KAFKA_CFG_INTER_BROKER_LISTENER_NAME: CLIENT
KAFKA_CFG_LOG_RETENTION_HOURS: 24
ALLOW_PLAINTEXT_LISTENER: yes
KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
volumes:
- /Users/localuser/Documents/MintMonsterData/kafka:/bitnami/kafka
depends_on:
- zookeeper
I believe that it might be caused by the fact that Bitnami's docker images do not support arm64 architecture. See the following issue on Bitnami GitHub
Related
I am using a kafka cluster of three kafka nodes, and it is running on zookeeper cluster of nodes which is already up and running.
Docker-compose is used with kafka docker image version 2.2.0 and zookeeper docker image version 3.5.
Now my plan is to use the upgraded version of kafka 3.x, where it is claimed that zookeeper is not required (may be explicitly not required). I was under impression that zookeeper will be started automatically. But I saw that Zookeeper is embedded in kafka and so I needed to start it also explicitly.
For that, I have to repackage kafka 3.x into two separate docker images.
First image ZOOKEEPER_IMAGE, will call zookeeper’s zoo-start.sh through Docker file’s CMD command and it will set the zoo specific parameters.
Second image KAFKA_IMAGE will on contrary, will call kafka’s start.sh through Docker file’s CMD command and it will set the kafka specific parameters.
CMD ["/start.sh"]
Also I can see that many kafka parameters are changed and at present I am getting this exception.
KafkaException: Unable to parse PLAINTEXT://127.0.0.1: to a broker endpoint -
Please suggest me if I am follow the correct approach and what is the solution of this exception.
Below is the code with one zookeeper and one Kafka node. Similar way in the original docker compose, it contains 3 kafka, 3 zookeeper nodes
zookeeper:
hostname: ${HOST_IP}
image: ${ZOOKEEPER_IMAGE}
container_name: zookeeper
command: /bin/bash -c "/start.sh"
volumes:
- ${VOLUMES_FOLDER}/zk/data:/data
- ${VOLUMES_FOLDER}/zk/conf:/conf
network_mode: "host"
ports:
- 0.0.0.0:${ZOOKEEPER_EXPOSED_PORT}:${ZOOKEEPER_EXPOSED_PORT}
kafka1:
hostname: ${HOST_IP}
image: ${KAFKA_IMAGE}
depends_on:
- zookeeper
container_name: kafka1
command: /bin/bash -c "/start.sh"
volumes:
- ${VOLUMES_FOLDER}/kf/1/logs:/logs
- ${VOLUMES_FOLDER}/kf/1/data:/data
environment:
- KAFKA_ADVERTISED_HOST_NAME=${HOST_IP}
- KAFKA_DELETE_TOPIC_ENABLE=true
- KAFKA_BROKER_ID=10
- port = ${KAFKA1_EXPOSED_PORT}
- ZOOKEEPER_IP=${HOST_IP}
- JMX_PORT=7208
network_mode: "host"
ports:
- 0.0.0.0:${KAFKA1_EXPOSED_PORT}:${KAFKA1_EXPOSED_PORT}
under impression that zookeeper will be started automatically
It won't. Kafka KRaft uses Raft protocol. Zookeeper is not required, at all, therefore is not started.
Zookeeper is embedded in kafka
It is not. Zookeeper server scripts are available, but they are not used in any embedded way, or automatically.
repackage kafka 3.x into two separate docker images
bitnami/kafka images, for example, already offer KRaft mode. Unclear what images you're currently using.
Your error is likely because advertised.host.name and port were both removed as Kafka properties. You should be using listeners and advertised.listeners.
Absolutely new to Docker.
I am trying to install Kafka image from hub and failing to do so. I keep getting below error (screenshot attached):
Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Here is what I did:
Downloaded Docker for Windows MSI and installed
Installed WSL 2 and Linux (Ubuntu)
Run CMD in admin mode and ran command "docker compose up" which keeps giving error
Troubleshooting: Things I tried hoping to allow me pull/up .yml but nothing worked
Enabled Virtualization
Changed DNS to 1.1.1.1 and then to 8.8.8.8
Restarted docker several times
Reinstalled docker
Switched debug mode on
Login with docker credentials in CMD
I have a WIFI internet connection with 30MBPS download speed.
I have no idea what to do. I am trying from last 6 hours to fix this.
YML file content
---
version: '2'
services:
zookeeper:
image: wurstmeister/cp-zookeeper:7.1.1
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: 127.0.0.1
KAFKA_CREATE_TOPICS: "simpletalk_topic:1:1"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Snapshot of CMD running in Admin mode
I am trying to create a analytics dashboard based from mobile events. I want to dockerize all the components to containers in docker and deploy it in localhost and create an analytical dashboard.
Sunbird telemetry https://github.com/project-sunbird/sunbird-telemetry-service
Kafka https://github.com/wurstmeister/kafka-docker
Druid https://github.com/apache/incubator-druid/tree/master/distribution/docker
Superset https://github.com/apache/incubator-superset
What i did
Druid
I executed the command docker build -t apache/incubator-druid:tag -f distribution/docker/Dockerfile .
I executed the command docker-compose -f distribution/docker/docker-compose.yml up
After everything get executed open http://localhost:4008/ and see DRUID running
It takes 3.5 hours to complete both build and run
Kafka
Navigate to kafka folder
docker-compose up -d executed this command
Issue
When we execute druid a zookeeper starts running, and when we start kafka the docker file starts another zookeeper and i cannot establish a connection between kafka and zookeeper.
After i start sunbird telemetry and tries to create topic and connect kafka from sunbird its not getting connected.
I dont understand what i am doing wrong.
Can we tell kafka to share the zookeeper started by DRUID. I am completed new to this environment and these stacks.
I am studying this stacks. Am i doing something wrong. Can anybody point out how to properly connect kafka and druid over docker.
Note:- I am running all this in my mac
My kafka compose file
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
build: .
ports:
- "9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: **localhost ip**
KAFKA_ZOOKEEPER_CONNECT: **localhost ip**:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Can we tell kafka to share the zookeeper started by DRUID
You would put all services in the same compose file.
Druids kafka connection is listed here
https://github.com/apache/incubator-druid/blob/master/distribution/docker/environment#L31
You can set KAFKA_ZOOKEEPER_CONNECT to the same address, yes
For example, downloading the file above and adding Kafka to the Druid Compose file...
version: "2.2"
volumes:
metadata_data: {}
middle_var: {}
historical_var: {}
broker_var: {}
coordinator_var: {}
overlord_var: {}
router_var: {}
services:
# TODO: Add sunbird
postgres:
container_name: postgres
image: postgres:latest
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=FoolishPassword
- POSTGRES_USER=druid
- POSTGRES_DB=druid
# Need 3.5 or later for container nodes
zookeeper:
container_name: zookeeper
image: zookeeper:3.5
environment:
- ZOO_MY_ID=1
druid-coordinator:
image: apache/incubator-druid
container_name: druid-coordinator
volumes:
- coordinator_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
ports:
- "3001:8081"
command:
- coordinator
env_file:
- environment
# renamed to druid-broker
druid-broker:
image: apache/incubator-druid
container_name: druid-broker
volumes:
- broker_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- druid-coordinator
ports:
- "3002:8082"
command:
- broker
env_file:
- environment
# TODO: add other Druid services
kafka:
image: wurstmeister/kafka
ports:
- "9092"
depends_on:
- zookeeper
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181/kafka # This is the same service that Druid is using
Can we tell kafka to share the zookeeper started by DRUID
Yes, as there's a zookeeper.connect setting for Kafka broker that specifies the Zookeeper address to which Kafka will try to connect. How to do it depends entirely on the docker image you're using. For example, one of the popular images wurstmeister/kafka-docker does this by mapping all environmental variables starting with KAFKA_ to broker settings and adds them to server.properties, so that KAFKA_ZOOKEEPER_CONNECT becomes a zookeeper.connect setting. I suggest taking a look at the official documentation to see what else you can configure.
and when we start kafka the docker file starts another zookeeper
This is your issue. It's the docker-compose file that starts Zookeeper, Kafka, and configures Kafka to use the bundled Zookeeper. You need to modify it, by removing the bundled Zookeeper and configuring Kafka to use a different one. Ideally, you should have a single docker-compose file that starts the whole setup.
I have the following setup:
Dockerized environments (in docker-compose.yml):
zookeeper image: 'bitnami/zookeeper:3'
kafka image: kafka:dev
pipeline (own image)
1 kafka brokers and a 1 zookeeper ensamble
When I'm starting my docker-compose following configuration will be applied (please see attached code).
After zookeeper and Kafka is up and running (and listen to each other) pipeline should start streaming the data.
For some unknown reason, streaming will start after 10 minutes delay. This problem occurs each time when and reproduced only in docker.
sample of the log
Does anyone know what config parameters in Kafka and/or zookeeper influence the time taken for start streaming? Or any way to debug what's exactly going on with Kafka and zookeeper during these ten minutes?
I apologize in advance if my question is too general. I'm a beginner with Kafka.
I have tried to change the different time out parameters but it doesn't help to fix this problem.
Sample of the configs parameters which I have changed:
zookeeper tickTime = 2000
kafka zookeeper.connection.timeout.ms = 6000
(basically the default config)
version: '3'
services:
zookeeper:
image: 'bitnami/zookeeper:3'
ports:
- '2182:2182'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: kafka:dev
ports:
- "9092:9092"
links:
- zookeeper
environment:
- KAFKA_ADVERTISED_HOST_NAME=kafka
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ADVERTISED_PORT=9092
pipeline:
image: project-pipeline:dev
links:
- "kafka"
ports:
- "3000:3000"
After Kafka and zookeeper containers up and running pipeline container should start streaming to Kafka without any delays.
I have a docker-compose file stat spins up a single-node Kafka, Zookeeper & schema registry stack for testing my application. Currently, it takes a couple of minutes to get the stack available; are there any settings to speed up the launch time?
The config I'm using (apart from SSL) is as follows:
kafka:
image: confluentinc/cp-kafka:3.3.1
depends_on:
- zookeeper
hostname: kafka
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: SSL://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "false"
KAFKA_LOG4J_ROOT_LOGLEVEL: WARN
KAFKA_JMX_PORT: 9585
KAFKA_JMX_HOSTNAME: kafka
KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=9585"
volumes:
- ../../txs-data/kafka-data:/var/lib/kafka/data
Other than using a newer container version, there isn't a way to make it faster.
Zookeeper starts fairly quickly, but Kafka relies on Zookeeper and needs to coordinate extra tasks to elect a leader, load some other metadata, etc. (Update: No longer needed)
If you're adding the schema registry on top of that, it requires Kafka to start, then create it's _schemas topic, which requires a round-trip to Zookeeper.
All-in-all, there is a lot of pre-initialization steps happening, of which all are required and cannot be skipped to reduce the start-time.
Assuming you're running this as part of a JVM testing framework, the "faster" way would be to use embedded versions of each of the services.
You can now start Kafka without Zookeeper:
git clone https://github.com/confluentinc/cp-all-in-one
cd cp-all-in-one/cp-all-in-one-kraft
docker compose up broker