I need to make a Docker container for a project involving streaming data using Kafka and Zookeeper. Looking around I found this docker image from Spotify, including Kafka and Zookeeper.
How should I include it in my project? Should I include in the Dockerfile the suggested commands, listed below?
docker run -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=`docker-machine ip \`docker-machine active\`` --env ADVERTISED_PORT=9092 spotify/kafka
export KAFKA=`docker-machine ip \`docker-machine active\``:9092
kafka-console-producer.sh --broker-list $KAFKA --topic test
export ZOOKEEPER=`docker-machine ip \`docker-machine active\``:2181
kafka-console-consumer.sh --zookeeper $ZOOKEEPER --topic test
How about using a docker-compose file?
In your *.yaml you can set-up the services to pull the Kafka and Zookeeper images from Spotify's DockerHub, map ports (e.g. "2181:2181" and "9092:9092" for ZK and Kafka, respectively), set ENV variables, and persist data to a volume so you don't lose your topics and offsets.
Related
I am running Mongo DB image with following command:
docker run -d -p 27017:27017 -e MONGO_INITDB_ROOT_USERNAME=test -e MONGO_INITDB_ROOT_PASSWORD=password --name=testdb mongo
This created container and I'm able to connect to this from robo3T.
Now I ran mongo-express image with following command and trying to above DB:
docker run -d -p 8081:8081 -e ME_CONFIG_MONGODB_ADMINUSERNAME=test -e ME_CONFIG_MONGODB_ADMINPASSWORD=password -e ME_CONFIG_MONGODB_SERVER=testdb --name=mongo-ex mongo-express
But I'm getting following error:
UnhandledPromiseRejectionWarning: MongoNetworkError: failed to connect to server [testb:27017] on first connect [Error: getaddrinfo ENOTFOUND testb
If I'm creating a custom bridge network and running these two images in that container it's working.
My question is: As the default network is bridge network, and these containers are creating in default bridge network, why are they not able to communicate? Why is it working with custom bridge network?
There are two kinds of "bridge network"; if you don't have a docker run --net option then you get the "default" bridge network which is pretty limited. You almost always want to docker network create a "user-defined" bridge network, which has the standard Docker networking features.
# Use modern Docker networking
docker network create myapp
docker run -d --net myapp ... --name testdb mongo
docker run -d --net myapp ... -e ME_CONFIG_MONGODB_SERVER=testdb mongo-express
# Because both containers are on the same --net, the first
# container's --name is usable as a host name from the second
The "default" bridge network that you get without --net by default forbids inter-container communication, and you need a special --link option to make the connection. This is considered obsolete, and the Docker documentation page describing links notes that links "may eventually be removed".
# Use obsolete Docker networking; may stop working at some point
docker run -d ... --name testdb mongo
docker run -d ... -e ME_CONFIG_MONGODB_SERVER=testdb --link testdb mongo-express
# Containers can only connect to each other by name if they use --link
On modern Docker setups you really shouldn't use --link or the equivalent Compose links: option. Prefer to use the more modern docker network create form. If you're using Compose, note that Compose creates a network named default but this is a "user-defined bridge"; in most cases you don't need any networks: options at all to get reasonable inter-container networking.
I'm trying to connect to the Kafka using a KafkaTool. I got an error:
Error connecting to the cluster. failed create new KafkaAdminClient
Kafka and Zookeeper is hosting in the Docker. I run next commands
docker network create kafka
docker run --network=kafka -d --name zookeeper -e ZOOKEEPER_CLIENT_PORT=2181 confluentinc/cp-zookeeper:latest
docker run --network=kafka -d -p 9092:9092 --name kafka -e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 confluentinc/cp-kafka:latest
Settings for KafkaTool
Why does KafkaTool not connect to the Kafka that is hosting in the Docker?
I'm assuming this GUI is not coming from a Docker container. Therefore, your host machine doesn't know what zookeeper or kafka are, only the Docker network does.
In the GUI, you will want to use localhost for both, then in your Kafka run command, leave all the other variables alone but change -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092
Zookeeper run command is fine, but add -p 2181:2181 to expose the port out to the host so that the GUI can connect
The docker daemon is running on an Ubuntu machine. I'm trying to start up a zookeeper ensemble in a swarm. The zookeeper nodes themselves can talk to each other. However, from the host machine, I don't seem to be able to access the published ports.
If I start the container with -
docker run \
-p 2181:2181 \
--env ZOO_MY_ID=1 \
--env ZOO_SERVERS="server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888" \
zookeeper
It works like a charm. On my host machine I can say echo conf | nc localhost 2181 and zookeeper says something back.
However if I do,
docker service create \
-p 2181:2181 \
--env ZOO_MY_ID=1 \
--env ZOO_SERVERS="server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888" \
zookeeper
and run the same command echo conf | nc localhost 2181,
it just gets stuck. I don't even get a new prompt on my terminal.
This works just as expected on the Docker Playground on the official Zookeeper Docker Hub page. So I expect it should for me too.
But... If I docker exec -it $container sh and then try the command in there, it works again.
Aren't published ports supposed to be accessible even by the host machine for a service?
Is there some trick I'm missing about working with overlay networks?
Try to use docket service create --publish 2181:2181 instead.
I believe the container backing the service is not directly exposed and has to go through the Swarm networking.
Otherwise, inspect your service to check which port are published: docker service inspect <service_name>
Source: documentation
Hi I’m using Spotify/Kafka and am running it with
docker run —name ka -p 9092:9092 -p 2181:2181 —env ADVERTISED_HOST=localhost —env ADVERTISED_PORT 2181 —net mynet spotify/kafka
I make sure I run my second container using the same net and I can ping the Kafka container using ka.mynet
Also in this second container I downloaded kafka and it’s shell scripts and I’m able to do a
./kafka-topics.sh —zookeeper ka.mynet —list and see the “Test” topic
Now any attempt to produce or consume spits out errors. Producer complains about something to do with not finding a leader.
Other Googling has led me to believe it has something to do with the advertised host.
Ok it seems like the only way to get this to work is to assign my machine's current IP address as the ADVERTISED_HOST env variable.
So if my machine's IP is 192.168.1.11 then:
docker run —name ka -p 9092:9092 -p 2181:2181 —env ADVERTISED_HOST=192.168.1.11 —env ADVERTISED_PORT=9092 —net mynet spotify/kafka
I found this docker image for Kafka
https://hub.docker.com/r/spotify/kafka/
and I can easily create a docker container using command documented in the link
docker run -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=`boot2docker ip` --env ADVERTISED_PORT=9092 spotify/kafka
This is good. But I want to configure a "multiple" node Kafka cluster running on a docker swarm.
How can I do that?
Edit 28/11/2017:
Kafka added listener.security.protocol.map to their config. This allows you to set different listener addresses and protocols depending on whether you are inside or outside the cluster, and stops Kafka getting confused by any load balancing or ip translation which occurs in docker. Wurstmeister has a working docker image and example compose file here. I tried this a while back with a few docker machine nodes set up as a swarm and it seems to work.
tbh though I just attach a Kafka image to the overlay network and run the Kafka console commands when ever I want to interact with it now.
Hope that helps
Old Stuff Below
I have been trying this with docker 1.12 using docker swarm mode
create nodes
docker-machine create -d virtualbox master
docker-machine create -d virtualbox worker
master_config=$(docker-machine config master | tr -d '\"')
worker_config=$(docker-machine config worker | tr -d '\"')
master_ip=$(docker-machine ip master)
docker $master_config swarm init --advertise-addr $master_ip --listen-addr $master_ip:2377
worker_token=$(docker $master_config swarm join-token worker -q)
docker $worker_config swarm join --token $worker_token $master_ip:2377
eval $(docker-machine env master)
create the zookeeper service
docker service create --name zookeeper \
--constraint 'node.role == manager' \
-p 2181:2181 \
wurstmeister/zookeeper
create the kafka service
docker service create --name kafka \
--mode global \
-e 'KAFKA_PORT=9092' \
-e 'KAFKA_ADVERTISED_PORT=9092' \
-e 'KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092' \
-e 'KAFKA_ZOOKEEPER_CONNECT=tasks.zookeeper:2181' \
-e "HOSTNAME_COMMAND=ip r | awk '{ ip[\$3] = \$NF } END { print ( ip[\"eth0\"] ) }'" \
--publish '9092:9092' \
wurstmeister/kafka
Though for some reason this will only work from within the ingress or user defined overlay network and the connection will break to Kafka if you try and connect to it through one of the guest machines.
Changing the advertised IP doesn't make things any better...
docker service create --name kafka \
--mode global \
-e 'KAFKA_PORT=9092' \
-e 'KAFKA_ADVERTISED_PORT=9092' \
-e 'KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092' \
-e 'KAFKA_ZOOKEEPER_CONNECT=tasks.zookeeper:2181' \
-e 'KAFKA_LOG_DIRS=/kafka/kafka-logs' \
-e "HOSTNAME_COMMAND=curl 192.168.99.1:5000" \
--publish '9092:9092' \
wurstmeister/kafka
I think the new mesh networking and load balancing in docker might be interfering with the Kafka connection some how....
to get the host container I have a flask app running locally which I curl
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route('/')
def hello_world():
return request.remote_addr
The previous approach raise some questions:
How to specify the IDs for the zookeeper nodes?
How to specify the id of the kafka nodes, and the zookeeper nodes?
#kafka configs
echo "broker.id=${ID}
advertised.host.name=${NAME}
zookeeper.connect=${ZOOKEEPERS}" >> /opt/kafka/config/server.properties
Everything should be resolvable in the overlay network.
Moreover, in the issue Cannot create a Kafka service and publish ports due to rout mesh network there is a comment to don't use the ingress network.
I think the best option is to specify your service by using a docker compose with swarm. I'll edit the answer with an example.
There are 2 concerns to consider: networking and storage.
Since Kafka is stateful service, until cloud native storage is figured out, it is advisable to use global deployment mode. That is each swarm node satisfying constraints will have one kafka container.
Another recommendation is to use host mode for published port.
It's also important to properly set advertised listeners option so that each kafka broker knows which host it's running on. Use swarm service templates to provide real hostname automatically.
Also make sure that published port is different from target port.
kafka:
image: debezium/kafka:0.8
volumes:
- ./kafka:/kafka/data
environment:
- ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
- KAFKA_MAX_MESSAGE_BYTES=20000000
- KAFKA_MESSAGE_MAX_BYTES=20000000
- KAFKA_CLEANUP_POLICY=compact
- LISTENERS=PLAINTEXT://:9092
- BROKER_ID=-1
- ADVERTISED_LISTENERS=PLAINTEXT://{{.Node.Hostname}}:11092
depends_on:
- zookeeper
deploy:
mode: global
ports:
- target: 9092
published: 11092
protocol: tcp
mode: host
networks:
- kafka
I can't explain all the options right now, but it's the configuration that works.
set broker.id=-1 in server.properties to allow kafka to auto generate the broker ID. Helpful in Swarm mode.