Docker : kafka.errors.NoBrokersAvailable: NoBrokersAvailable - docker

I am using the docker-compose to create 2 containers.
Kafka
Pyspark
Am performing a POC using Python producer inside Pyspark --> Kafka docker
but am getting the error kafka.errors.NoBrokersAvailable: NoBrokersAvailable.
version: '3'
services:
zookeeper:
image: wurstmeister/zookeeper
container_name: zookeeper
ports:
- "2182:2181"
kafka:
image: wurstmeister/kafka
container_name: kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
pyspark:
image: jupyter/pyspark-notebook:latest
container_name: pyspark
ports:
- "8888:8888"
- "4040:4040"
- "4141:4141"
- "4242:4242"
volumes:
- ./spark:/home/jovyan/work
environment:
JUPYTER_ENABLE_LAB: "yes"
depends_on:
- kafka
#data-generator.py
import random
import string
user_ids = list(range(1, 101))
recipient_ids = list(range(1,101))
def generate_message() -> dict:
random_user_id = random.choice(user_ids)
# Copy the recipients array
recipient_ids_copy = recipient_ids.copy()
# User can't send message to himself
recipient_ids_copy.remove(random_user_id)
random_recipient_id = random.choice(recipient_ids_copy)
# Generate a random message
message = ''.join(random.choice(string.ascii_letters) for i in range(32))
return {
'user_id' : random_user_id,
'recipient_id' : random_recipient_id,
'message' : message
}
#Testing
# if __name__ == '__main__':
# print(generate_message())
#producer.py
import time
import json
import random
from datetime import datetime
from data_generator import generate_message
from kafka import KafkaProducer
#Messages will be serialized as JSON
def serializer(message):
return json.dumps(message).encode('utf-8')
#Kafka Producer
producer = KafkaProducer(
bootstrap_servers = ['localhost:9092'],
value_serializer = serializer
)
if __name__ == '__main__':
#Infinite loop - runs until you kill the program
while True:
# Generate a message
dummy_message = generate_message()
#Send it to our 'messages'
print(f'Producing message # {datetime.now()} | Message = {str(dummy_message)}')
ack = producer.send('messages', dummy_message)
metadata = ack.get()
# print(metadata)
#Sleep for a random number of seconds
time_to_sleep = random.randint(1, 11)
time.sleep(time_to_sleep)
Tried multiple approaches
bootstrap_servers = ['localhost:9092']
bootstrap_servers = ['kafka:9092']
bootstrap_servers = ['ip:9092'] using an IP retrieved from docker inspect for Kafka results in timeout
telnet from Pyspark container results in connection refuse.
Added depends_on : Kafka
The Kafka producer works absolutely fine from the local to Kafka container. Please find the code snippet below for docker-compose & producer.py
Thanks

Related

Failed to find data source: kafka (Docker environment)

We are facing this issue at the moment and all the displayed "Similar questions" did not help to solve our problem. We are new to docker and also to spark.
We used the following Docker Compose to setup our containers:
networks:
spark_net:
volumes:
shared-workspace:
name: "hadoop-distributed-file-system"
driver: local
services:
jupyterlab:
image: jupyterlab
container_name: jupyterlab
ports:
- 8888:8888
volumes:
- shared-workspace:/opt/workspace
spark-master:
image: spark-master
networks:
- spark_net
container_name: spark-master
ports:
- 8080:8080
- 7077:7077
volumes:
- shared-workspace:/opt/workspace
spark-worker-1:
image: spark-worker
networks:
- spark_net
container_name: spark-worker-1
environment:
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=512m
ports:
- 8081:8081
volumes:
- shared-workspace:/opt/workspace
depends_on:
- spark-master
spark-worker-2:
image: spark-worker
networks:
- spark_net
container_name: spark-worker-2
environment:
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=512m
ports:
- 8082:8081
volumes:
- shared-workspace:/opt/workspace
depends_on:
- spark-master
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "7575"
environment:
KAFKA_ADVERTISED_HOST_NAME: 127.0.0.1
KAFKA_ADVERTISED_LISTENERS: INSIDE://kafka:9093,OUTSIDE://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_LISTENERS: INSIDE://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- ./var/run/docker.sock
We also created two pythonm files to test if kafka streaming works:
producer
import json
import time
producer = KafkaProducer(bootstrap_servers = ['twitter-streaming_kafka_1:9093'],
api_version=(0,11,5),
value_serializer=lambda x: json.dumps(x).encode('utf-8'))
for e in range(1000):
data = {'number' : e}
producer.send('corona', value=data)
time.sleep(0.5)
Consumer:
import time
from kafka import KafkaConsumer, KafkaProducer
from datetime import datetime
import json
print('starting consumer')
consumer = KafkaConsumer(
'corona',
bootstrap_servers=['twitter-streaming_kafka_1:9093'],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='my-group',
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
print('printing messages')
for message in consumer:
message = message.value
print(message)
When we executed both scripts in different CLIs in our jupyterlab container and it worked. When we want to connect to our producer stream via pyspark with the following code we get the mentioned error.
import random
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark.sql import SparkSession
spark = Spark = SparkSession.builder.appName('KafkaStreaming').getOrCreate()
df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "kafka:9093").option("subscribe", "corona").load()
We also executed the following command in the spark-master CLI:
./bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 ...
stacktrace
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
<ipython-input-2-4dba09a73304> in <module>
6
7 spark = SparkSession.builder.appName('KafkaStreaming').getOrCreate()
----> 8 df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "twitter-streaming_kafka_1:9093").option("subscribe", "corona").load()
/usr/local/lib/python3.7/dist-packages/pyspark/sql/streaming.py in load(self, path, format, schema, **options)
418 return self._df(self._jreader.load(path))
419 else:
--> 420 return self._df(self._jreader.load())
421
422 #since(2.0)
/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py in deco(*a, **kw)
132 # Hide where the exception came from that shows a non-Pythonic
133 # JVM exception message.
--> 134 raise_from(converted)
135 else:
136 raise
/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py in raise_from(e)
AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;
Your Kafka container needs to be placed on the spark_net network in order for Spark containers to resolve it by name
Same with Jupyter if you want it to be able to launch jobs on the Spark cluster
Also, you need to add the Kafka package

Kafka-Node not detecting kafka broker

I'm currently using Kafka-node in my application but I cant manage to connect it to the Kafka broker I have brought up before hand.
Firstly, I bring up the kafka broker with the wurstmeister/kafka docker image. I also bring zookeeper up with the jplock/zookeeper image.
I thne automatically create a topic with an environment variable with the wurstmeister/kafka image. like so:
zookeeper:
image: jplock/zookeeper
ports:
- "2181:2181"
networks:
- bitmex_backend
kafka:
image: wurstmeister/kafka:latest
ports:
- "9092:9092"
depends_on:
- zookeeper
environment:
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENERS: "PLAINTEXT://:9092"
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CREATE_TOPICS: "Topic1:1:1"
networks:
- bitmex_backend
I verify the container is up by listing all the topics from kafka which returns the correct number and name of topics.
I then want to bring up a producer and verify its up when I call an endpoint so I do:
// Import the WebFramework for routing
const Koa = require('koa')
const route = require('koa-route')
var kafka = require('kafka-node');
client = new kafka.Client(),
producer = new kafka.Producer(client);
// TODO : Generate a position/history endpoint
module.exports = async () => {
const app = new Koa()
// Retrive all the open positions from all the bots in the system
app.use(route.get('/open', async (ctx) => {
producer.on('ready', function () {
console.log('Producer is ready');
});
producer.on('error', function (err) {
console.log('Producer is in error state');
console.log(err);
})
// Response
ctx.status = 200
ctx.body = {
data : "success",
}
}))
return app
This code runs with out errors but no output either. Whn I check the logs the endpoint is called correctly with no console.log to prove that the endpoint has loaded.
Any ideas or pointers are very welcomed as I've been stuck with this for a while now.

kafka producer docker unable to connect to broker docker - AWS

Here is the yml file which is used to bring up docker containers in an AWS instance for kafka and zookeeper:
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
build: .
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOSTNAME: <machines private ip>
KAFKA_LISTENERS: PLAINTEXT://:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://<machines private ip>:9092
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
When I run the docker-compose command with the above file, it leads to the creation a docker network called kafka-docker with a kafka container and a zookeeper container.
Now, in the default bridge docker network, I have another container with the following piece of nodejs code:
const Producer = kafka.Producer;
const client = new kafka.Client("<machines private ip>:2181");
const producer = new Producer(client);
const kafka_topic = 'hello-topic';
event = ...
event_payload = ...
let payloads = [{topic:kafka_topic ,messages:JSON.stringify(event_payload), partition: 0 }]
let push_status = producer.send(payloads, (err, data) => {
if (err) {
console.log(err);
} else {
console.log('[kafka-producer -> '+kafka_topic+']: broker update success');
}
});
The console.log(err) gives me the error 'Broker not available'. Can someone please tell me what is wrong with my setup?
Notice the line:
const client = new kafka.Client("<machines private ip>:2181");
This is not the port that Kafka is listening on. Kafka is listening for connections on port 9092:
const client = new kafka.Client("<machines private ip>:9092");
It should work after this alteration.

How do I send messages to Docker (bitnami) Apache Kafka from host machine?

I can get my Apache Kafka producer to send messages when it is running inside a container. However, when my producer is running outside the container in the host machine it doesn't work. I suspect it is a Docker networking issue with my Docker compose file but I can't figure it out.
I tried the solutions posted online similar to my problem but they don't work for me. Help!
Docker-compose file
version: '3'
services:
zookeeper:
image: 'bitnami/zookeeper:latest'
ports:
- '2181:2181'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: 'bitnami/kafka:latest'
ports:
- '9092:9092'
environment:_
- KAFKA_BROKER_ID=1
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092
- ALLOW_PLAINTEXT_LISTENER=yes
Host producer
//import util.properties packages
import java.util.Properties;
//import simple producer packages
import org.apache.kafka.clients.producer.Producer;
//import KafkaProducer packages
import org.apache.kafka.clients.producer.KafkaProducer;
//import ProducerRecord packages
import org.apache.kafka.clients.producer.ProducerRecord;
//Create java class named “SimpleProducer”
public class SimpleProducer {
public static void main(String[] args) throws Exception{
// Check arguments length value
if(args.length == 0){
System.out.println("Enter topic name");
return;
}
//Assign topicName to string variable
String topicName = args[0].toString();
// create instance for properties to access producer configs
Properties props = new Properties();
//Assign localhost id
props.put("bootstrap.servers", "localhost:9092");
//Set acknowledgements for producer requests.
props.put("acks", "all");
//If the request fails, the producer can automatically retry,
props.put("retries", 0);
//Specify buffer size in config
props.put("batch.size", 16384);
//Reduce the no of requests less than 0
props.put("linger.ms", 1);
//The buffer.memory controls the total amount of memory available to the producer for buffering.
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), Integer.toString(i)));
System.out.println("Message sent successfully");
producer.close();
}
}
The host producer should post messages to the Docker Apache kafka but it doesn't. It creates the topic but the messages are never received. What am I doing wrong? This is a bitnami image, not Confluent image.
From my previous answer here:
What I needed to do was to declare the LISTENERS as both binding to the docker host, and then advertise them differently - one to the docker network, one to the host.
services:
zookeeper:
image: confluentinc/cp-zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_SYNC_LIMIT: 2
kafka:
image: confluentinc/cp-kafka
ports:
- 9094:9094
depends_on:
- zookeeper
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENERS: INTERNAL://kafka:9092,OUTSIDE://kafka:9094
KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka:9092,OUTSIDE://localhost:9094
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
Now you have the Kafka available on your localhost at :9094 (as per the OUTSIDE listener and the ports entry in the docker-compose file), and inside the Docker network at :9092.
This solution is for the bitnami Docker image of Apache Kafka. Thanks to #cricket_007 and #daniu for the solution. I updated several lines in my Docker-compose file in the Kafka environment section.
Here's the complete, updated Docker-compose file:
version: '3'
services:
zookeeper:
image: 'bitnami/zookeeper:latest'
ports:
- '2181:2181'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: 'bitnami/kafka:latest'
ports:
- '9092:9092'
- '29092:29092'
environment:
- KAFKA_BROKER_ID=1
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
- KAFKA_LISTENERS=PLAINTEXT://:9092,PLAINTEXT_HOST://:29092
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
- ALLOW_PLAINTEXT_LISTENER=yes

Kafka Client Timeout of 60000ms expired before the position for partition could be determined

I'm trying to connect Flink to a Kafka consumer
I'm using Docker Compose to build 4 containers zookeeper, kafka, Flink JobManager and Flink TaskManager.
For zookeeper and Kafka I'm using wurstmeister images, and for Flink I'm using the official image.
docker-compose.yml
version: '3.1'
services:
zookeeper:
image: wurstmeister/zookeeper:3.4.6
hostname: zookeeper
expose:
- "2181"
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka:2.11-2.0.0
depends_on:
- zookeeper
ports:
- "9092:9092"
hostname: kafka
links:
- zookeeper
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_PORT: 9092
KAFKA_CREATE_TOPICS: 'pipeline:1:1:compact'
jobmanager:
build: ./flink_pipeline
depends_on:
- kafka
links:
- zookeeper
- kafka
expose:
- "6123"
ports:
- "8081:8081"
command: jobmanager
environment:
JOB_MANAGER_RPC_ADDRESS: jobmanager
BOOTSTRAP_SERVER: kafka:9092
ZOOKEEPER: zookeeper:2181
taskmanager:
image: flink
expose:
- "6121"
- "6122"
links:
- jobmanager
- zookeeper
- kafka
depends_on:
- jobmanager
command: taskmanager
# links:
# - "jobmanager:jobmanager"
environment:
JOB_MANAGER_RPC_ADDRESS: jobmanager
And When I submit a simple job to Dispatcher the Job fails with the following error:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition pipeline-0 could be determined
My Job code is:
public class Main {
public static void main( String[] args ) throws Exception
{
// get the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// get input data by connecting to the socket
Properties properties = new Properties();
String bootstrapServer = System.getenv("BOOTSTRAP_SERVER");
String zookeeperServer = System.getenv("ZOOKEEPER");
if (bootstrapServer == null) {
System.exit(1);
}
properties.setProperty("zookeeper", zookeeperServer);
properties.setProperty("bootstrap.servers", bootstrapServer);
properties.setProperty("group.id", "pipeline-analysis");
FlinkKafkaConsumer kafkaConsumer = new FlinkKafkaConsumer<String>("pipeline", new SimpleStringSchema(), properties);
// kafkaConsumer.setStartFromGroupOffsets();
kafkaConsumer.setStartFromLatest();
DataStream<String> stream = env.addSource(kafkaConsumer);
// Defining Pipeline here
// Printing Outputs
stream.print();
env.execute("Stream Pipeline");
}
}
I know I'm late to the party but I had the exact same error. In my case, I was not setting up TopicPartitions correctly. My topic had 2 partitions and my producer was producing messages just fine, but it's the spark streaming application, as my consumer, that wasn't really starting and giving up after 60 secs complaining the same error.
Wrong code that I had -
List<TopicPartition> topicPartitionList = Arrays.asList(new topicPartition(topicName, Integer.parseInt(numPartition)));
Correct code -
List<TopicPartition> topicPartitionList = new ArrayList<TopicPartition>();
for (int i = 0; i < Integer.parseInt(numPartitions); i++) {
topicPartitionList.add(new TopicPartition(topicName, i));
}
I had an error that looks the same.
17:34:37.668 [org.springframework.kafka.KafkaListenerEndpointContainer#1-0-C-1] ERROR o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-3, groupId=api.dev] User provided listener org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer$ListenerConsumerRebalanceListener failed on partition assignment
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition aaa-1 could be determined
Turns out it's my hosts file has been changed so the broker address is wrong.
Try this log settings to debug more details.
<logger name="org.apache.kafka.clients.consumer.internals.Fetcher" level="info" />
I was having issues with this error in a vSphere Integrated Containers environment. For me the problem was that I had advertise on the hostname and not the IP. I had to set the hostname and container name in my compose file.
Here are my settings that finally worked:
kafka:
depends_on:
- zookeeper
image: wurstmeister/kafka
ports:
- "9092:9092"
mem_limit: 10g
container_name: kafka
hostname: kafka
environment:
KAFKA_ADVERTISED_LISTENERS: OUTSIDE://kafka:9092
KAFKA_LISTENERS: OUTSIDE://0.0.0.0:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: OUTSIDE:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: OUTSIDE
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: <REPLACE_WITH_IP>:2181
I had the same problem, the issue was I had a wrong host entry in /etc/hosts file for kafka node!

Resources