Setting up sunbird-telemetry Kafka DRUID and superset - docker

I am trying to create a analytics dashboard based from mobile events. I want to dockerize all the components to containers in docker and deploy it in localhost and create an analytical dashboard.
Sunbird telemetry https://github.com/project-sunbird/sunbird-telemetry-service
Kafka https://github.com/wurstmeister/kafka-docker
Druid https://github.com/apache/incubator-druid/tree/master/distribution/docker
Superset https://github.com/apache/incubator-superset
What i did
Druid
I executed the command docker build -t apache/incubator-druid:tag -f distribution/docker/Dockerfile .
I executed the command docker-compose -f distribution/docker/docker-compose.yml up
After everything get executed open http://localhost:4008/ and see DRUID running
It takes 3.5 hours to complete both build and run
Kafka
Navigate to kafka folder
docker-compose up -d executed this command
Issue
When we execute druid a zookeeper starts running, and when we start kafka the docker file starts another zookeeper and i cannot establish a connection between kafka and zookeeper.
After i start sunbird telemetry and tries to create topic and connect kafka from sunbird its not getting connected.
I dont understand what i am doing wrong.
Can we tell kafka to share the zookeeper started by DRUID. I am completed new to this environment and these stacks.
I am studying this stacks. Am i doing something wrong. Can anybody point out how to properly connect kafka and druid over docker.
Note:- I am running all this in my mac
My kafka compose file
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
build: .
ports:
- "9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: **localhost ip**
KAFKA_ZOOKEEPER_CONNECT: **localhost ip**:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock

Can we tell kafka to share the zookeeper started by DRUID
You would put all services in the same compose file.
Druids kafka connection is listed here
https://github.com/apache/incubator-druid/blob/master/distribution/docker/environment#L31
You can set KAFKA_ZOOKEEPER_CONNECT to the same address, yes
For example, downloading the file above and adding Kafka to the Druid Compose file...
version: "2.2"
volumes:
metadata_data: {}
middle_var: {}
historical_var: {}
broker_var: {}
coordinator_var: {}
overlord_var: {}
router_var: {}
services:
# TODO: Add sunbird
postgres:
container_name: postgres
image: postgres:latest
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=FoolishPassword
- POSTGRES_USER=druid
- POSTGRES_DB=druid
# Need 3.5 or later for container nodes
zookeeper:
container_name: zookeeper
image: zookeeper:3.5
environment:
- ZOO_MY_ID=1
druid-coordinator:
image: apache/incubator-druid
container_name: druid-coordinator
volumes:
- coordinator_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
ports:
- "3001:8081"
command:
- coordinator
env_file:
- environment
# renamed to druid-broker
druid-broker:
image: apache/incubator-druid
container_name: druid-broker
volumes:
- broker_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- druid-coordinator
ports:
- "3002:8082"
command:
- broker
env_file:
- environment
# TODO: add other Druid services
kafka:
image: wurstmeister/kafka
ports:
- "9092"
depends_on:
- zookeeper
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181/kafka # This is the same service that Druid is using

Can we tell kafka to share the zookeeper started by DRUID
Yes, as there's a zookeeper.connect setting for Kafka broker that specifies the Zookeeper address to which Kafka will try to connect. How to do it depends entirely on the docker image you're using. For example, one of the popular images wurstmeister/kafka-docker does this by mapping all environmental variables starting with KAFKA_ to broker settings and adds them to server.properties, so that KAFKA_ZOOKEEPER_CONNECT becomes a zookeeper.connect setting. I suggest taking a look at the official documentation to see what else you can configure.
and when we start kafka the docker file starts another zookeeper
This is your issue. It's the docker-compose file that starts Zookeeper, Kafka, and configures Kafka to use the bundled Zookeeper. You need to modify it, by removing the bundled Zookeeper and configuring Kafka to use a different one. Ideally, you should have a single docker-compose file that starts the whole setup.

Related

Docker inter-container communication

I'm facing a relatively simple problem here but I'm starting to wonder why it doesn't work.
I want to start two Docker Containers with Docker Compose: InfluxDB and Chronograph.
Unfortunately, the chronograph does not reach InfluxDB under the given hostname: "Unable to connect to InfluxDB Influx 1: Error contacting source"
What could be the reason for this?
Here is my docker-compose.yml:
version: "3.8"
services:
influxdb:
image: influxdb
restart: unless-stopped
ports:
- 8086:8086
volumes:
- influxdb-volume:/var/lib/influxdb
networks:
- test
chronograf:
image: chronograf
restart: unless-stopped
ports:
- 8888:8888
volumes:
- chronograf-volume:/var/lib/chronograf
depends_on:
- influxdb
networks:
- test
volumes:
influxdb-volume:
chronograf-volume:
networks:
test:
driver: bridge
I have also tried to start a shell inside the two containers and then ping the containers to each other or use wget to get the HTTP-API of the other container. Even this communication between the containers does not work. On both attempts with wget and ping I get timeouts.
It must be said that I use a Banana Pi BPI-M1 here. Is it possible that it is somehow due to the Linux that container to container communication does not work?
If not configured, chronograf will try to access influxdb on localhost:8086. To be able to reach the correct influxdb instance, you need to specify the url accordingly using either the --influxdb-url command line flag or (personal preference) an environment variable INFLUXDB_URL. Those should be set to the value of http://influxdb:8086 which is the docker DNS name derived from the service name of your compose file (the keys one level below services).
This should do the trick (snippet):
chronograf:
image: chronograf
restart: unless-stopped
ports:
- 8888:8888
volumes:
- chronograf-volume:/var/lib/chronograf
environment:
- INFLUXDB_URL=http://influxdb:8086
depends_on:
- influxdb
networks:
- test
Please check the chronograf readme (section Using the container with InfluxDB) for details on configuring the image and check the docker compose networking docs on some more info about networks and dns naming.
The Docker service creates some iptables entries in the tables filter and nat. My OpenVPN Gateway script executed the following commands at startup:
iptables --flush -t filter
iptables --flush -t nat
This will delete the entries from Docker and communication between the containers and the Internet will no longer be possible.
I have rewritten the script and now everything works again.

Kafka broker not accessible in docker compose

I have created a docker compose file where my application wants to use kafka.
docker-compose.yaml is:
version: '3.7'
services:
api:
depends_on:
- kafka
restart: on-failure
build:
context: .
dockerfile: Dockerfile
ports:
- 8080:8080
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
depends_on:
- zookeeper
environment:
KAFKA_ADVERTISED_HOST_NAME: 192.168.1.7
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_CREATE_TOPICS: "mytopic:1:1"
192.168.1.7 is my ip that i got from ifconfig.
In my service i am giving broker as 192.168.1.7:9092.
When i do docker ps and exec to my kafka container. I am not able to access the 192.168.1.0
What am i doing wrong here though the strange thing is in my application logs i see that the topic is created.
When i try to create the topic:
You don't need IP addresses other than 127.0.0.1
192.168.1.7 seems like your host IP, not the docker IP, and yet you are not using network_mode: host, and so the network is not allowing you to connect to the broker.
I recommend finding existing, functional Docker Compose files such as ones in this answer
As posted above by #oneCricketeer you don't have to hardcode any of your host ip addresses.
You can connect to broker using "broker" name inside your api itself. And same can be set to advertise host name as well.

How to change target of the Spring Cloud Stream Kafka binder?

Using Spring cloud Stream 2.1.4 with Spring Boot 2.1.10, I'm trying to target a local instance of Kafka.
This is an extract of my projetc configuation so far:
spring.kafka.bootstrap-servers=PLAINTEXT://localhost:9092
spring.kafka.streams.bootstrap-servers=PLAINTEXT://localhost:9092
spring.cloud.stream.kafka.binder.brokers=PLAINTEXT://localhost:9092
spring.cloud.stream.kafka.binder.zkNodes=localhost:2181
spring.cloud.stream.kafka.streams.binder.brokers=PLAINTEXT://localhost:9092
spring.cloud.stream.kafka.streams.binder.zkNodes=localhost:2181
But the binder keeps on calling a wrong target :
java.io.IOException: Can't resolve address: kafka.example.com:9092
How can can I specify the target if those properties won't do he trick?
More, I deploy the Kafka instance through a Docker Bitnami image and I'd prefer not to use SSL configuration (see PLAINTEXT protocol) but I'm don't find properties for basic credentials login. Does anyone know if this is hopeless?
This is my docker-compose.yml
version: '3'
services:
zookeeper:
image: bitnami/zookeeper:latest
container_name: zookeeper
environment:
- ZOO_ENABLE_AUTH=yes
- ZOO_SERVER_USERS=kafka
- ZOO_SERVER_PASSWORDS=kafka_password
networks:
- kafka-net
kafka:
image: bitnami/kafka:latest
container_name: kafka
hostname: kafka.example.com
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ZOOKEEPER_USER=kafka
- KAFKA_ZOOKEEPER_PASSWORD=kafka_password
networks:
- kafka-net
networks:
kafka-net:
driver: bridge
Thanks in advance
The hostname isn't the issue, rahter the advertised listeners protocol//:port mapping that causes the hostname to be advertised, by default. You should change that, rather than the hostname.
kafka:
image: bitnami/kafka:latest
container_name: kafka
hostname: kafka.example.com # <--- Here's what you are getting in the request
...
environment:
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://:9092 # <--- This returns the hostname to the clients
If you plan on running your code outside of another container, you should advertise localhost in addition to, or instead of the container hostname.
One year later, my comment still is not been merged into the bitnami README, where I was able to get it working with the following vars (changed to match your deployment)
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_CFG_LISTENERS=PLAINTEXT://:29092,PLAINTEXT_HOST://:9092
KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka.example.com:29092,PLAINTEXT_HOST://localhost:9092
All right: got this to work by looking twice to the "dockerfile" (thx to cricket_007):
kafka:
...
hostname: localhost
For the record: I could get rid of all properties above, default being for Kafka localhost:9092

How to configure the time it takes for a kafka to strat streaming?

I have the following setup:
Dockerized environments (in docker-compose.yml):
zookeeper image: 'bitnami/zookeeper:3'
kafka image: kafka:dev
pipeline (own image)
1 kafka brokers and a 1 zookeeper ensamble
When I'm starting my docker-compose following configuration will be applied (please see attached code).
After zookeeper and Kafka is up and running (and listen to each other) pipeline should start streaming the data.
For some unknown reason, streaming will start after 10 minutes delay. This problem occurs each time when and reproduced only in docker.
sample of the log
Does anyone know what config parameters in Kafka and/or zookeeper influence the time taken for start streaming? Or any way to debug what's exactly going on with Kafka and zookeeper during these ten minutes?
I apologize in advance if my question is too general. I'm a beginner with Kafka.
I have tried to change the different time out parameters but it doesn't help to fix this problem.
Sample of the configs parameters which I have changed:
zookeeper tickTime = 2000
kafka zookeeper.connection.timeout.ms = 6000
(basically the default config)
version: '3'
services:
zookeeper:
image: 'bitnami/zookeeper:3'
ports:
- '2182:2182'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: kafka:dev
ports:
- "9092:9092"
links:
- zookeeper
environment:
- KAFKA_ADVERTISED_HOST_NAME=kafka
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ADVERTISED_PORT=9092
pipeline:
image: project-pipeline:dev
links:
- "kafka"
ports:
- "3000:3000"
After Kafka and zookeeper containers up and running pipeline container should start streaming to Kafka without any delays.

Can not test apache kafka official site commands using docker: no such file or directory

I have run the apache kafka and prometheus using docker. I will attach the docker-compose and other configurations at the bottom of this post!
Introduction: First I should explain that each metric of kafka works well on prometheus. So there is no problem in the implementation and running of the images.
Problem: The only problem is where I want to test the stream (Producer, Broker and Consumer) following the tutorial of the official site of apache kafka. But whenever I execute the commands found on the site, I faced with the command not found error, because I don't know where the files exactly are! As an example whenever I execute the bin/zookeeper-server-start.sh config/zookeeper.properties command I face with the following error:
no such file or directory: bin/zookeeper-server-start.sh
Attachments:
docker-compose.yml:
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
build: .
links:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_OPTS: -javaagent:/usr/app/jmx_prometheus_javaagent.jar=7071:/usr/app/prom-jmx-agent-config.yml
volumes:
- /var/run/docker.sock:/var/run/docker.sock
prometheus:
image: prom/prometheus
ports:
- 9090:9090/tcp
volumes:
- ./mount/prometheus:/etc/prometheus
links:
- kafka
Dockerfile:
FROM wurstmeister/kafka
ADD prom-jmx-agent-config.yml /usr/app/prom-jmx-agent-config.yml
ADD jmx_prometheus_javaagent-0.10.jar /usr/app/jmx_prometheus_javaagent.jar
Question: Is there any solution to find where are the original files are mapped in the created container and execute them?
The quickstart on the Apache site never references Docker. Those scripts need downloaded (as part of Kafka), or you need to docker exec into the container to run them
However, Docker already starts Kafka and Zookeeper, so you wouldn't need to run those commands. You therefore could skip to writing your own producers/consumers without using any provided scripts

Resources