Landoop. Kafka connect how to change worker properties - docker

landoop/fast-data-dev:2.6
I want to change default batch.size using 'producer.override.batch.size=65536' when creating new connector.
But in order to do that, it's required to apply override policy on the worker side
connector.client.config.override.policy=All
Otherwise there is exception
"producer.override.batch.size" : The 'None' policy does not allow
'batch.size' to be overridden in the connector configuration.
It's not clear, how exactly:
to change the default worker properties
where it expects them to be placed
which name they should have
So that landoop sees them
I start the landoop using the following docker-compose.
version: '2'
services:
kafka-cluster:
image: landoop/fast-data-dev:2.6
environment:
ADV_HOST: 127.0.0.1
RUNTESTS: 0
ports:
- 2181:2181 # Zookeeper
- 3030:3030 # Landoop UI
- 8081-8083:8081-8083 # REST Proxy, Schema Registry, Kafka Connect ports
- 9581-9585:9581-9585 # JMX Ports
- 9092:9092 # Kafka Broker
volumes:
- ./connectors/news/crypto-panic-connector-1.0.jar:/connectors/crypto-panic-connector-1.0.jar
distributed.properties at folder /connect/connect-avro-distributed.properties generated by Landoop
offset.storage.partitions=5
key.converter.schema.registry.url=http://127.0.0.1:8081
value.converter.schema.registry.url=http://127.0.0.1:8081
config.storage.replication.factor=1
offset.storage.topic=connect-offsets
status.storage.partitions=3
offset.storage.replication.factor=1
key.converter=io.confluent.connect.avro.AvroConverter
config.storage.topic=connect-configs
config.storage.partitions=1
group.id=connect-fast-data
rest.advertised.host.name=127.0.0.1
port=8083
value.converter=io.confluent.connect.avro.AvroConverter
rest.port=8083
status.storage.replication.factor=1
status.storage.topic=connect-statuses
access.control.allow.origin=*
access.control.allow.methods=GET,POST,PUT,DELETE,OPTIONS
jmx.port=9584
plugin.path=/var/run/connect/connectors/stream-reactor,/var/run/connect/connectors/third-party,/connectors
bootstrap.servers=PLAINTEXT://127.0.0.1:9092
crypto-panic-connector-1.0 connector directories structure:
/config:
> worker.properties
/src:
> ...
UPDATE
Adding to environment properties:
CONNECT_CONNECT_CLIENT_CONFIG_OVERRIDE_POLICY: 'All'
CONNECT_PRODUCER_OVERRIDE_BATCH_SIZE: 65536
Doesn't work for landoop/fast-data-dev:2.6
In logs it's still
'connector.client.config.override.policy = None'
And warning
WARN The configuration 'connect.client.config.override.policy' was supplied but isn't a known config.
Changing this to
CONNECTOR_CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY: 'All'
CONNECT_PRODUCER_OVERRIDE_BATCH_SIZE: 65536
Removes warning, but at the end the override policy is still 'None' and it's not possible to override properties for client when creating connector.
Changing to
CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY: 'All'
CONNECT_PRODUCER_OVERRIDE_BATCH_SIZE: 65536
has same effect, policy 'None'.
Also batch size overriding is not aplied. So I assume those overriding features are not supported in Landoop.
WARN The configuration 'producer.override.batch.size' was supplied but isn't a known config.
I assume 'confluentinc/cp-kafka-connect' doesn't have UI built-in, and for learning purposes seems better to have it. So it's more preferable to do it in Landoop. But thanks for recommendation to use 'confluentinc/cp-kafka-connect'. I will try to do this config overriding there also

For starters, that image is very old and no longer maintained. I'd recommend you use confluentinc/cp-kafka-connect
In any case, for both images, you can use
environment:
CONNECT_CONNECT_CLIENT_CONFIG_OVERRIDE_POLICY: 'All'
CONNECT_PRODUCER_OVERRIDE_BATCH_SIZE: 65536
It's not clear, how exactly ... change the default worker properties
Look at the source code

Related

Monitor ksqlDB with JMX

I have followed the instructions on this page: https://docs.ksqldb.io/en/latest/operate-and-deploy/monitoring/
So this is my ksqldb-server part of docker-compose:
ksqldb-server:
image: confluentinc/ksqldb-server:0.15.0
hostname: ksqldb-server
container_name: ksqldb-server
depends_on:
- kafka
- schema-registry
- kafka-connect
ports:
- "8088:8088"
- "1099:1099"
environment:
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: kafka:29092
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081
KSQL_KSQL_CONNECT_URL: http://kafka-connect:8083
KSQL_KSQL_QUERY_PULL_METRICS_ENABLED: "true"
KSQL_JMX_OPTS: >
-Djava.rmi.server.hostname=localhost
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.rmi.port=1099
I have setup Prometheus in the same docker-compose file, and when I visit {prometheus-url}/targets, I see Get "http://ksqldb-server:1099/metrics": EOF
I have already tried plenty configurations during my research, including changing the -Djava.rmi.server.hostname either to the host's IP address or to ksqldb-server's container IP address, but none of them worked. Does anyone have a solution?
Well, six months later after having dealt with this topic once again, I managed to set this up. This follows the approach suggested by swist in my GitHub issue I created back then when I created this issue, too.
You need JMX Exporter. Download it here
You need a YAML file, telling the JMX exporter which metrics to export. You can get it here. If you are only interested in the ksqlDB metrics, remove all other patterns, e.g. the kafka patterns.
Place the JMX Exporter and the YAML file on every node on which you want to monitor a ksqlDB instance
Before starting ksqlDB, create the environment variable KSQL_JMX_OPTS as follows:
export KSQL_JMX_OPTS="-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.util.logging.config.file=logging.properties \
-javaagent:[BLUB]/jmx_prometheus_javaagent.jar=7010:ksqldb.yml"
You need to either create this variable every time you have a new session or create it permantently. [BLUB] is the absolute path to your JMX JAR.
Now you can run ksqlDB and the metrics become available at port 7010 (you can specify any other free port). If you want to have a good dashboard, go with this one.
The jmxremote.port value is also not a proper Prometheus target; it's for jconsole, Visualvm, or other JMX monitoring tools, as the documentation you've linked to says
If you want to use Prometheus, you need to download and mount the JMX exporter agent JAR into the container and modify the JVM arguments to include the agent+scraper port+mbeans config file...
You could also switch to using minikube and apply the Confluent ksqlDB Helm Chart, which does this for you

Graylog in Docker persistent

I'm trying to make a Graylog Docker Container persistent.
Meaning that after restarting (docker-compose down; docker-compose up) the logs will still be there alongside the configuration.
I've used the documentation at https://docs.graylog.org/en/3.1/pages/installation/docker.html I created a yml file with the content under the topic "Persisting data".
I only edited the line "GRAYLOG_HTTP_EXTERNAL_URI=http://127.0.0.1:9000/" to not use localhost but the external ip the machine is using.
Docker works, i can create an input and collect logfiles. What does not work is the data being persistent. Also every time i restart the node id changes, so i have to reconfigure the input. Running docker volume ls lists five volumes 3 of which are the ones created in the yml file.
I don't understand why data is not persistent. Can anybody help?
I had the same problem and I'd been struggling for a while before I found a solution. I'm on 3.2 and also had issues with node persistence. The documentation doesn't seem to directly state that there is one more configuration folder you need to persist, which is:
/usr/share/graylog/data/config
They actually mention it in the Custom configuration files section and when I took a look via CLI in that directory, it turns out that it's where the graylog.conf and node-id (the file Graylog uses to store information about its nodes) are stored as well!
Here's my docker-compose.override.yml section with the necessary changes (marked with '# ADDED' comments)
services:
graylog:
environment:
# CHANGE ME (must be at least 16 characters)!
- GRAYLOG_PASSWORD_SECRET=somepasswordpepper
# Password: admin
- GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
- GRAYLOG_HTTP_EXTERNAL_URI=http://127.0.0.1:9000/
- GRAYLOG_IS_MASTER=true
#- GRAYLOG_NODE_ID_FILE=/usr/share/graylog/data/config/node-id
ports:
# Graylog web interface and REST API
- 9000:9000
# Syslog TCP
- 1514:1514
# Syslog UDP
- 1514:1514/udp
# GELF TCP
- 12201:12201
# GELF UDP
- 12201:12201/udp
volumes:
- "graylogjournal:/usr/share/graylog/data/journal"
- "graylogconfig:/usr/share/graylog/data/config" # ADDED
volumes:
graylogjournal:
driver: local
graylogconfig: # ADDED
driver: local # ADDED
Hope this helps
You can add into daemon.json file these lines ;
{
"log-driver": "gelf",
"log-opts": {
"gelf-address": "udp://1.2.3.4:12201"
}
}
https://docs.docker.com/config/containers/logging/gelf/

Traefik v2 [how to route to specific port]

I'm trying to start the change of backends to be compatible with traefik v2.0.
The old configuration was:
labels:
- traefik.port=8500
- traefik.docker.network=proxy
- traefik.frontend.rule=Host:consul.{DOMAIN}
I assumed, the network is not necessary anymore, it would change the new traefik for:
- traefik.http.routers.consul-server-bootstrap.rule=Host('consul.scoob.thrust.com.br')
But how I set, that this should forward to my backend at port 8500? and not 80 where the entrypoint was reached at Traefik?
My goal would try to accomplish something like this:
https://docs.traefik.io/user-guide/cluster-docker-consul/#migrate-configuration-to-consul
Is it still possible?
I saw, there was no --consul or storeconfig command in v2.0
You need traefik.http.services.{SERVICE}.loadbalancer.server.port
labels:
- "traefik.http.services.{SERVICE}.loadbalancer.server.port=8500"
- "traefik.docker.network=proxy"
- "traefik.http.routers.{SERVICE}.rule=Host(`{DOMAIN}`)"
Replace {SERVICE} with the name of your service.
Replace {DOMAIN} with your domain name.
If you want to remove the proxy network you'll need to look at https://docs.traefik.io/v2.0/providers/docker/#usebindportip

Containerized Kafka client errors when producing messages to the host Kafka server

There are a number of similar types of queries on stackoverflow, but none quite match the problem that I am seeing.
I have a zookeeper/kafka setup on my server which work perfectly. One can produce
bin/kafka-console-producer.sh --broker-list 192.168.2.80:9092 --topic test
and consume
bin/kafka-console-consumer.sh --bootstrap-server 192.168.2.80:9092 --topic test --from-beginning
locally on the Linux Ubuntu 16.04 server.
From a Docker container - also running Ubuntu 16.04 - I want to produce and consume. The container's Kafka code was copied from that on the server.
Firstly I can create a new topic
bin/kafka-topics.sh --create --zookeeper 192.168.2.80:2181 --replication-factor 1 --partitions 1 --topic test2
from the container and then list it again
bin/kafka-topics.sh --list --zookeeper 192.168.2.80:2181
However when I try to produce new messages, using the above (kafka-console-producer.sh) command it fails with the following message:
[2017-06-05 13:59:05,317] ERROR Error when sending message to topic test2 with key: null, value: 2 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test2-0: 1526 ms has passed since batch creation plus linger time
immediately after entering the text of the message and pressing enter.
It may seem strange running a Docker container on the same host, but once this works I will move the container to a separate host for production.
My kafka server.properties file:
listeners=PLAINTEXT://0.0.0.0:9092
Kafka version:
2.12-0.10.2.1
Docker version:
Docker version 1.12.6, build 78d1802
The problem is (slightly simplified) caused by how Kafka's protocol works. Given a list of "bootstrap servers" (e.g. localhost:9092), a Kafka client will contact those bootstrap servers, but then use the hostnames of the actual Kafka brokers as returned by the bootstrap servers (the broker's advertised.listeners config, depending on your Kafka/Docker setup, might be set to e.g. kafka:9092). So here, the client would talk to localhost:9092 for bootstrapping (which will work), but then switch to kafka:9092 (which will not work, "thanks" to the networking setup).
Fortunately there is a way to configure Kafka + Docker in a way that "just works", and it doesn't require shenanigans such as fiddling with your host's /etc/hosts file and such. As part of this you need to set a few (new) Kafka settings though, which were added in kafka's KIP-103: Separation of Internal and External traffic.
Here's a snippet for Docker Compose (docker-compose.yml) that demonstrates how to do this:
---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:3.2.1
hostname: zookeeper
ports:
- '32181:32181'
environment:
ZOOKEEPER_CLIENT_PORT: 32181
kafka:
image: confluentinc/cp-kafka:3.2.1
hostname: kafka
ports:
- '9092:9092'
- '29092:29092'
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:32181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
# Following line is needed for Kafka versions 0.11+
# in case you run less than 3 Kafka brokers in your
# cluster because the broker config
# `offsets.topic.replication.factor` (default: 3)
# is now enforced upon topic creation
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
Here, the key settings are:
listener.security.protocol.map (which is being set via KAFKA_LISTENER_SECURITY_PROTOCOL_MAP)
inter.broker.listener.name
advertised.listeners
In the setup above, the containerized Kafka broker listens on localhost:9092 for access from your host machine (e.g. your Mac laptop) and on kafka:29092 for access from other containers.
A full end-to-end example is available at:
https://github.com/confluentinc/cp-docker-images/blob/v3.2.1/examples/kafka-streams-examples/docker-compose.yml (documentation at http://docs.confluent.io/3.2.1/cp-docker-images/docs/tutorials/kafka-streams-examples.html).
Your producer (in the container) can't resolve the host name of your Linux guest OS which is returned in the Kafka producers initial metadata request to the bootstrap server. You can add it manually to the /etc/hosts file inside the container or add "--add-host" parameter to the docker run command that launches the image running your producer
Aha!
After further reading and the answers given above the solution came. As is often the case it is an easy one.
A simple edit of the kafka server.properties file:
advertised.listeners=PLAINTEXT://192.168.2.80:9092
Also note, the parameter 'listeners' is not set in this file.

Docker-compose : understanding linking environment variables

I'm now using docker-compose for all of my projects. Very convenient. Much more comfortable than manual linking through several docker commands.
There is something that is not clear to me yet though: the logic behind the linking environment variables.
Eg. with this docker-compose.yml:
mongodb:
image: mongo
command: "--smallfiles --logpath=/dev/null"
web:
build: .
command: npm start
volumes:
- .:/myapp
ports:
- "3001:3000"
links:
- mongodb
environment:
PORT: 3000
NODE_ENV: 'development'
In the node app, I need to retrieve the mongodb url. And if I console.log(process.env), I get so many things that it feels very random (just kept the docker-compose-related ones):
MONGODB_PORT_27017_TCP: 'tcp://172.17.0.2:27017',
MYAPP_MONGODB_1_PORT_27017_TCP_PORT: '27017',
MYAPP_MONGODB_1_PORT_27017_TCP_PROTO: 'tcp',
MONGODB_ENV_MONGO_VERSION: '3.2.6',
MONGODB_1_ENV_GOSU_VERSION: '1.7',
'MYAPP_MONGODB_1_ENV_affinity:container': '=d5c9ebd7766dc954c412accec5ae334bfbe836c0ad0f430929c28d4cda1bcc0e',
MYAPP_MONGODB_1_ENV_GPG_KEYS: 'DFFA3DCF326E302C4787673A01C4E7FAAAB2461C \t42F3E95A2C4F08279C4960ADD68FA50FEA312927',
MYAPP_MONGODB_1_PORT_27017_TCP: 'tcp://172.17.0.2:27017',
MONGODB_1_PORT: 'tcp://172.17.0.2:27017',
MYAPP_MONGODB_1_ENV_MONGO_VERSION: '3.2.6',
MONGODB_1_ENV_MONGO_MAJOR: '3.2',
MONGODB_ENV_GOSU_VERSION: '1.7',
MONGODB_1_PORT_27017_TCP_ADDR: '172.17.0.2',
MONGODB_1_NAME: '/myapp_web_1/mongodb_1',
MONGODB_1_PORT_27017_TCP_PORT: '27017',
MONGODB_1_PORT_27017_TCP_PROTO: 'tcp',
'MONGODB_1_ENV_affinity:container': '=d5c9ebd7766dc954c412accec5ae334bfbe836c0ad0f430929c28d4cda1bcc0e',
MONGODB_PORT: 'tcp://172.17.0.2:27017',
MONGODB_1_ENV_GPG_KEYS: 'DFFA3DCF326E302C4787673A01C4E7FAAAB2461C \t42F3E95A2C4F08279C4960ADD68FA50FEA312927',
MYAPP_MONGODB_1_ENV_GOSU_VERSION: '1.7',
MONGODB_ENV_MONGO_MAJOR: '3.2',
MONGODB_PORT_27017_TCP_ADDR: '172.17.0.2',
MONGODB_NAME: '/myapp_web_1/mongodb',
MONGODB_1_PORT_27017_TCP: 'tcp://172.17.0.2:27017',
MONGODB_PORT_27017_TCP_PORT: '27017',
MONGODB_1_ENV_MONGO_VERSION: '3.2.6',
MONGODB_PORT_27017_TCP_PROTO: 'tcp',
MYAPP_MONGODB_1_PORT: 'tcp://172.17.0.2:27017',
'MONGODB_ENV_affinity:container': '=d5c9ebd7766dc954c412accec5ae334bfbe836c0ad0f430929c28d4cda1bcc0e',
MYAPP_MONGODB_1_ENV_MONGO_MAJOR: '3.2',
MONGODB_ENV_GPG_KEYS: 'DFFA3DCF326E302C4787673A01C4E7FAAAB2461C \t42F3E95A2C4F08279C4960ADD68FA50FEA312927',
MYAPP_MONGODB_1_PORT_27017_TCP_ADDR: '172.17.0.2',
MYAPP_MONGODB_1_NAME: '/myapp_web_1/novatube_mongodb_1',
Don't know what to pick, and why so many entries? Is it better to use the general ones, or the MYAPP-prefixed one? Where does the MYAPP name comes from? Folder name?
Could someone clarify this?
Wouldn't it be easier to let the user define the ones he needs in the docker-compose.yml file with a custom mapping? Like:
links:
- mongodb:
- MONGOIP: IP
- MONGOPORT : PORT
What I'm saying might not have sense though. :-)
Environment variables are a legacy way of defining links between containers. If you are using a newer version of compose, you don't need the links declaration at all. Trying to connect to mongodb from your app container will work fine by just using the name of the service (mongodb) as a hostname, without any links defined in the compose file (instead using docker's builtin DNS resolution, check /etc/hosts, nothing in there either!)
In answer to your question about why the prefix with MYAPP, you're right. Compose prefixes the service name with the name of the folder (or 'project', in compose nomenclature). It does the same thing when creating custom networks and volumes.

Resources