There are so many answers for this question that I ended up being totally confused about how I can connect to Kafka docker container from an outside client.
I have created two docker machines, a manager and a worker with these commands:
docker-machine create manager
docker-machine create worker1
I have add these two nodes inside a docker swarm.
docker#manager:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
6bmovp3hr0j2w5irmexvvjgzq * manager Ready Active Leader 19.03.5
mtgbd9bg8d6q0lk9ycw10bxos worker1 Ready Active 19.03.5
docker-compose.yml:
version: '3.2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka:latest
ports:
- target: 9094
published: 9094
protocol: tcp
mode: host
environment:
HOSTNAME_COMMAND: "hostname | awk -F'-' '{print $$2}'"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094
KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9094
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
volumes:
- /var/run/docker.sock:/var/run/docker.sock
From inside docker, everything works fine. I can create topics and then produce/consume messages.
I created a python script in order to consume messages from outside docker. The simple code is presented below:
from kafka import KafkaConsumer
import json
try:
print('Welcome to parse engine')
consumer = KafkaConsumer('streams-plaintext-input', bootstrap_servers='manager:9094')
for message in consumer:
print(message)
except Exception as e:
print(e)
# Logs the error appropriately.
pass
But the code is stack forever. The connection is not correct. Can anyone provide any help on how to setup a connection?
Since you are using docker-machine you have to either
Run your code also in a container (using kafka:9092)
Run your code within the VM OS (using vm-host-name:9094)
Add PLAINTEXT://localhost:9096 to the advertised listeners, expose 9096 from the VM to your host, then use localhost:9096 in your code (note: 9096 is some random port)
The gist is that clients must be able to connect to the bootstrap address and the advertised one that is being returned. If it cannot connect to the second, code will timeout.
Related
I want to use kafka with elk stack, where filebeat reads logs from file and send data to kafka which then sends it to logstash. I am running kafka using docker-compose file.
I managed to run it by using port 9092 which is default for kafka, but problem is my organization is already running some other service on port 9092. So I have used 9095 in my docker-compose for host and 9092 inside container.
When I try to use filebeat and logstash with kafka, data is not getting to logstash. I can see that logstash is subscribed to topic I created, but it is not getting any data even when I send message from console-producer. But when I use console-producer and console-consumer it is working fine. But it is not working with logstash and filebeat which are running outside the container.
Here is my docker-compose file for kafka:
version: "3"
services:
zookeeper:
image: wurstmeister/zookeeper
container_name: zookeeper
ports:
- 2181:2181
kafka:
image: wurstmeister/kafka
container_name: kafka
ports:
- 9095:9092
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9095
KAFKA_ADVERTISED_LISTENERS: INSIDE://172.18.0.3:9092,OUTSIDE://192.168.135.57:9095
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
Here 192.168.135.57 is my host address(public IP of my machine) and 172.18.0.3 is address of docker container running kafka.
and this is my logstash config file:
input {
kafka
{
bootstrap_servers => "192.168.135.57:9095"
topics => "topicname"
}
}
output {
stdout{codec => rubydebug}
elasticsearch {
hosts => "http://localhost:9200"
index => "topic-test"
}
}
This is what I used in filebeat to send data to kafka:
output.kafka:
hosts: ["192.168.135.57:9095"]
topic: "topicname"
I can't figure out what is wrong with this pipeline. I would like to run this docker-compose on port 9095(or any other port except 9092) on host machine. Help would be appreciated.
You're still mapping 9095 on the host to the advertised INSIDE listener
You should use 9095:9095 to map to the OUTSIDE listener if that is what other clients will connect on
I have two docker machines and I want to create a kafka cluster inside docker swarm. My docker-compose.yml looks like this:
version: '3.2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka:latest
ports:
- "9092:9092"
- "29092:29092"
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_LISTENERS: PLAINTEXT://:9092,PLAINTEXT_HOST://:29092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
I followed this question: Unable to connect to Kafka run in container from Spring Boot app run outside container and I am trying to access kafka from outside using localhost:29092.
I have already create the topic mytesttopic inside kafka. The below python code:
from kafka import KafkaConsumer, SimpleProducer, TopicPartition, KafkaClient
def consume_from_topic():
try:
consumer = KafkaConsumer('mytesttopic',
group_id= None,
bootstrap_servers=['localhost:29092'],
auto_offset_reset='earliest')
for message in consumer:
#consumer.commit()
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
message.offset, message.key,
message.value))
except Exception as e:
print(e)
pass
if __name__ == '__main__':
consume_from_topic()
returns:
NoBrokersAvailable
Does anyone know what I am missing here?
Given you are running docker swarm on 2 other machines you won't be able to connect on localhost:29092 due to the fact that kafka will be exposed on port 29092 on your nodes of the docker swarm. Try connecting to kafka by using the hostname of one of your nodes + port 29092. You should be able to connect to kafka this way.
Please note that this will only work if you are running docker swarm with routing mesh, the routing mesh makes sure that each node accepts incoming requests on a published port for any service, no matter if it is running on the same hosts and makes sure the traffic reaches the actual host where your container is running.
If you have not yet setup routing mesh try connceting to the actual hostname where a kafka container is running (not recommended, but for testing purposes it works)
I hope this helps you!
Your listeners are the exact same.
You need to set PLAINTEXT_HOST://0.0.0.0:29092 to bind the listener to all interfaces
On my linux server, I am running 3 images -
A) Docker and Zookeeper with this docker-compose file -
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper:3.4.6
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka:2.11-2.0.0
ports:
- "9092:9092"
expose:
- "9093"
environment:
KAFKA_ADVERTISED_LISTENERS: INSIDE://kafka:9093,OUTSIDE://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_LISTENERS: INSIDE://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
This will open up the kafka broker to the host machine.
B) JupyterHub
docker run -v /notebooks:/notebooks -p 8000:8000 jupyterhub
C) Confluent Schema Registry (I have not tried it yet, but in my final setup I will have a schema registry container as well)
docker run confluentinc/cp-schema-registry
Both are starting up without any issues. But how do I open up jupyterhub container to kafka container and schema registry ports so that my python scripts can access the brokers.
I'm assuming you want to run your jupyter notebook container on demand whereas your zookeeper and kafka containers will always be running separately? You can create a docker network and join all the containers to this network. Then your containers will be able resolve each other by their names.
Create a network
Specify this network in compose file
When starting your other containers with docker run, use --network option.
If you run docker network ls then you can find the name of the network that Compose creates for you; it will be named something like directoryname_default. You can then launch the containers connected to that network,
docker run --net directoryname_default confluentinc/cp-schema-registry
If you can include these files in the same docker-compose.yml file then you won’t need to do anything special. In particular this probably makes sense for the Confluent schema registry, which you can consider a core part of the Kafka stack if you’re using Avro messages.
You can use the Docker Compose service name kafka as a host name here, but since you need to connect to the “inside” listener you’ll need to configure a non-default port 9093. (The Docker Compose expose: directive doesn’t do much and you can safely delete it.)
I'm new to docker. I'm trying to run a spark streaming application using docker.
I have kafka and spark streaming application running separately in 2 containers.
My kafka service is up and running fine. I tested with $KAFKA_HOME/bin/kafka-console-producer.sh and $KAFKA_HOME/bin/kafka-console-consumer.sh. I'm able to receive messages.
But when I'm running my spark streaming application, it's showing:
[Consumer clientId=consumer-1, groupId=consumer-spark] Connection to node -1 could not be established. Broker may not be available.
So, I'm not able to consume messages.
kafka : docker-compose.yml
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
build: .
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_LISTENERS: PLAINTEXT://:9092
depends_on:
- zookeeper
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Spark Streaming code:
val sparkConf = new SparkConf().setAppName("Twitter Ingest Data")
sparkConf.setIfMissing("spark.master", "local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val kafkaTopics = "sentiment"
val kafkaBroker = "kafka:9092"
val topics : Set[String] = kafkaTopics.split(",").map(_.trim).toSet
val kafkaParams = Map[String,Object](
"bootstrap.servers" -> kafkaBroker,
"group.id" -> "consumer-spark",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer]
)
logger.info("Connecting to broker...")
logger.info(s"kafkaParams: $kafkaParams")
val tweetStream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams))
I'm not sure if I'm missing anything.
Any help would be highly appreciated!!
If you're new to Docker, I wouldn't recommend having Kafka or Spark being the first things you're trying it with. Besides, seems like you just copied the wurstmeister example one without reading the README about configuring it... (which I can tell because you don't need the build: . property because that container already exists on DockerHub)
Basically, Kafka is only available within your Docker network via this configuration
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
You will need to edit this to have the port forwarding work properly from outside of Docker Compose's default network, or you must run your Spark code within a container as well.
If the Spark code is not in a container, then pointing it at kafka:9092 won't work at all
Ref. Kafka listeners explained
And lots of previous questions with similar problems (the issue is not just Spark related)
I'm trying to get Kafka to work on docker-compose for the first time. The application runs fine without docker. But on docker, I get the error as described below. Any reason why Kafka would throw this error?
The error:
email-service_1 | 2018-12-01 14:32:02.448 WARN 1 ---
[ntainer#0-0-C-1] o.a.k.c.NetworkClient : [Consumer
clientId=consumer-2, groupId=kafka] 1 partitions have leader brokers
without a matching listener, including [email-token-0]
My docker-compose config:
version: '3.3'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
command: [start-kafka.sh]
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_HOST_NAME: 192.168.23.134
KAFKA_CREATE_TOPICS: "email-token:1:1"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
ports:
- "9092:9092"
depends_on:
- zookeeper
email-service:
build: ./email-service
environment:
SPRING_KAFKA_BOOTSTRAPSERVERS: kafka:9092
ports:
- "8081:8081"
depends_on:
- kafka
As stated in the comments to your question the problem seems to be with the advertised name for the Kafka broker. According to your docker-compose you should be using 192.168.23.134 but your email-service is using kafka:9092. You can try with this docker-compose. I replaced the wurstmeister services with the latest Zookeeper and Kafka provided by confluentinc and added your email-service.
---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
email-service:
build: ./email-service
environment:
SPRING_KAFKA_BOOTSTRAPSERVERS: kafka:29092
ports:
- "8081:8081"
depends_on:
- kafka
advertised.listeners: Listeners to publish to ZooKeeper for clients to use, if different than the listeners config property. In IaaS environments, this may need to be different from the interface to which the broker binds. If this is not set, the value for listeners will be used. Unlike listeners it is not valid to advertise the 0.0.0.0 meta-address.
Please note that KAFKA_ADVERTISED_HOST_NAME has been deprecated and it's recommended to use KAFKA_ADVERTISED_LISTENERS instead. For more information about KAFKA_ADVERTISED_LISTENERS check here.
This is Apache Kafka 2.4.0.
I'm sharing the low-level code-based findings to shed more light when this WARN message could be printed out and why. That's certainly a misconfiguration of a Kafka cluster. Read on and comment if there's something missing. Thanks!
The WARN message is printed out when the DefaultMetadataUpdater (of NetworkClient) is requested to handle a completed metadata response.
[count] partitions have leader brokers without a matching listener, including [partitions]
It is a warning that corresponds to Errors.LISTENER_NOT_FOUND that has the following default exception text:
There is no listener on the leader broker that matches the listener on which metadata request was processed.
That's on the client side.
Digging deeper you can find that this Errors.LISTENER_NOT_FOUND is used on a Kafka broker when MetadataCache is requested to find partition metadata. That's where you can find just before there's this DEBUG message:
Error while fetching metadata for [topicPartition]: listener [listenerName] not found on leader [leaderBrokerId]
Simply turn the DEBUG logging level for kafka.server.MetadataCache logger and you should see it in the controller broker's logs.
In this particular case, this MetadataCache is used by a broker (via KafkaApis) to handle TopicMetadata request where they say:
// In versions 5 and below, we returned LEADER_NOT_AVAILABLE if a matching listener was not found on the leader.
// From version 6 onwards, we return LISTENER_NOT_FOUND to enable diagnosis of configuration errors.
And at that moment, it's clear that the WARN message in question is for a connection on the listenerName.
In my case, when I was debugging the issue, it turned out that I used SSL://:9093 to connect to a Kafka broker while the partition leader was neither available nor configured to listen to the listeners configuration property.
I used kafka-topics to review the partition configuration and then reviewed the state of partitions in ZooKeeper.
get /brokers/topics/ssl/partitions/0/state
{"controller_epoch":1,"leader":0,"version":1,"leader_epoch":0,"isr":[0]}
I had -1 for the leader, but the isr showed a broker that was simply misconfigured. That's why people reported they fixed the issue by restarting their clusters (to get all the brokers up and running) or fixing the broker ID to the one that worked previously.
[context]
I am trying to run a docker compose with a kafka client using the registry of https://github.com/wurstmeister/kafka-docker
I am trying to run a very simple kafka cluster with a single broker and 3 topics with each 1 partition and a replication factor of 1.
this great link explains connectivity for a kafka cluster with one broker, a kafka cluster with several brokers and also notions about the listeners, all using docker, please have a look : https://github.com/wurstmeister/kafka-docker/wiki/Connectivity
[result]
The first time i run docker-compose up --force-recreate --build, everything runs just fine !
The topics are created automatically using KAFKA_CREATE_TOPICS and I can use kafka producer and consumer just fine.
list topics : bin/kafka-topics.sh --list --bootstrap-server localhost:9092
producer : bin/kafka-console-producer.sh --broker-list localhost:9092 --topic productadvisor_sales_dev
consumer : bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic productadvisor_sales_dev --from-beginning
After that, everytime I do docker-compose stop, and relaunch using docker-compose up --force-recreate --build and try to produce data I get the following error message ...
Error Message :
[2019-09-23 19:41:33,037] WARN [Producer clientId=console-producer] 1 partitions have leader brokers without a matching listener, including [productadvisor_purchase_dev-0] (org.apache.kafka.clients.NetworkClient)
[Answer]
It appears you need to specify the value of KAFKA_BROKER_ID (=1 for instance) so that the zookeeper doesn't try to create a new broker which can't have a listener because it is binded to the old one.
[Code]
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_HOST: localhost
KAFKA_PORT: 9092
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_CREATE_TOPICS: "productadvisor_sales_dev:1:1,productadvisor_stock_dev:1:1,productadvisor_purchase_dev:1:1"
depends_on:
- zookeeper
command: [start-kafka.sh]
[Some documentation]
https://rmoff.net/2018/08/02/kafka-listeners-explained/
https://www.tutorialspoint.com/apache_kafka/apache_kafka_fundamentals.htm
http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
https://blog.k2datascience.com/running-kafka-using-docker-332207aec73c
NB
if anyone has more information about the inner working of kafka, the zookeeper and the broker and why we need to specify it, why the information is kept even if I do a --force-recreate --build ... please do not hesitate. I am new to kafka and this is one of my first complete post on stackoverflow :)
Cheers !
In our case, one of broker in the cluster shutdown due to disk problem and we got this error in other replicas. When we solve the disk problem the problem was solved