Spark - Zeppelin - Docker - Can not access to MySQL with jdbc - docker

I am tryng to connect to MySQL with Spark and Zeppelin using Docker.
Some docker-yaml file settup:
zeppelin:
image: apache/zeppelin:0.9.0
container_name: zeppelin
volumes:
- ${PWD}/notebook:/notebook
- ${PWD}/logs:/logs
- ${PWD}/data:/learn
- ${PWD}/spark/conf:/spark/conf
- ${PWD}/spark/jars:/spark/user_jars
- ${PWD}/spark/sql/:/spark/sql/
environment:
- SPARK_SUBMIT_OPTIONS=--packages=org.mariadb.jdbc:mariadb-java-client:2.7.2 --jars=/spark/user_jars/mysql-connector-java-8.0.23.jar
- ZEPPELIN_LOG_DIR=/logs
- ZEPPELIN_NOTEBOOK_DIR=/notebook
- ZEPPELIN_ADDR=0.0.0.0
- ZEPPELIN_SPARK_MAXRESULT=10000
- ZEPPELIN_INTERPRETER_OUTPUT_LIMIT=204800
- ZEPPELIN_NOTEBOOK_COLLABORATIVE_MODE_ENABLE=false
And this is env when i exec to container:
LOG_TAG=[ZEPPELIN_0.9.0]:
Z_VERSION=0.9.0
HOSTNAME=zeppelin
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ZEPPELIN_LOG_DIR=/logs
PWD=/opt/zeppelin
SPARK_SUBMIT_OPTIONS=--packages=org.mariadb.jdbc:mariadb-java-client:2.7.2 --jars=/spark/user_jars/mysql-connector-java-8.0.23.jar
ZEPPELIN_NOTEBOOK_DIR=/notebook
HOME=/opt/zeppelin
LANG=en_US.UTF-8
TERM=xterm
ZEPPELIN_NOTEBOOK_COLLABORATIVE_MODE_ENABLE=false
SHLVL=1
ZEPPELIN_ADDR=0.0.0.0
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT=204800
LC_ALL=en_US.UTF-8
Z_HOME=/opt/zeppelin
ZEPPELIN_SPARK_MAXRESULT=10000
PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/env
But when i try to access to MySQL, with scala code as below
val jdbcDriver = spark.conf.get("spark.jdbc.driver.class", "org.mariadb.jdbc.Driver")
val dbHost = spark.conf.get("spark.jdbc.host","mysql")
val dbPort = spark.conf.get("spark.jdbc.port", "3306")
val defaultDb = spark.conf.get("spark.jdbc.default.db", "default")
val dbTable = spark.conf.get("spark.jdbc.table", "customers")
val dbUser = spark.conf.get("spark.jdbc.user", "root")
val dbPass = spark.conf.get("spark.jdbc.password", "dataengineering")
val connectionUrl = s"jdbc:mysql://$dbHost:$dbPort/$defaultDb"
I get error about org.mariadb.jdbc.Driver
java.lang.ClassNotFoundException: org.mariadb.jdbc.Driver
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
I don't know what wrong with this setup and code above ? Many thanks!

Related

Ansible List or Dict doesn't work with community.docker.docker_container module

From this:
cat /etc/hosts
172.26.42.112 test.foobar
172.26.42.112 elastic.foobar
- vars:
- docker_add_end: ['elastic', 'test']
To this:
- ansible.builtin.shell: |
IP=$(cut -d ' ' -f 1 <<< "$(grep {{ item }} /etc/hosts)")
FQDNHOST=$(cut -d ' ' -f 2 <<< "$(grep {{ item }} /etc/hosts)")
echo -e "${FQDNHOST} $(cut -d '.' -f 1 <<< ${FQDNHOST})%${IP}"
register: end_find
with_items: "{{ docker_add_end }}"'
Then split the found result and add to a list (or dict ??):
- ansible.builtin.set_fact:
end_add: "{{ end_add|default([]) + [ item.stdout.split('%')[0] + ':' + item.stdout.split('%')[1] ] }}"
with_items: "{{ end_find.results }}"
With the follow result:
ok: [integration] => {
"msg": [
"elastic.foobar elastic:172.26.42.112",
"test.foobar test:172.26.42.112"
]
}
But, when pass it to docker:
- community.docker.docker_container:
etc_hosts: "{{ end_add }}"
I got:
FAILED! => {"changed": false, "msg": "argument 'etc_hosts' is of type <class 'list'> and we were unable to convert to dict: <class 'list'> cannot be converted to a dict"}
So I try another approach:
- ansible.builtin.set_fact:
end_add: "{{ end_add|default({}) | combine( { item.stdout.split('%')[0] : item.stdout.split('%')[1] } ) }}"
with_items: "{{ end_find.results }}"
ok: [integration] => {
"msg": {
"elastic.foobar elastic": "172.26.42.112",
"test.foobar test": "172.26.42.112"
}
}
And end with the same error:
fatal: [integration]: FAILED! => {"changed": false, "msg": "argument 'etc_hosts' is of type <class 'list'> and we were unable to convert to dict: <class 'list'> cannot be converted to a dict"}
community.docker.docker_container
etc_hosts
dictionary
Dict of host-to-IP mappings, where each host name is a key in the dictionary. Each host name will be added to the container’s /etc/hosts file.
Ho man, I made a mistake in a var name when wrote the playbook.
The question above show's the right way to do it in the second example.
cat /etc/hosts
172.26.42.112 test.foobar
172.26.42.112 elastic.foobar
- vars:
- docker_add_end: ['elastic', 'test']
- ansible.builtin.shell: |
IP=$(cut -d ' ' -f 1 <<< "$(grep {{ item }} /etc/hosts)")
FQDNHOST=$(cut -d ' ' -f 2 <<< "$(grep {{ item }} /etc/hosts)")
echo -e "${FQDNHOST} $(cut -d '.' -f 1 <<< ${FQDNHOST})%${IP}"
register: end_find
with_items: "{{ docker_add_end }}"'
- ansible.builtin.set_fact:
end_add: "{{ end_add|default({}) | combine( { item.stdout.split('%')[0] : item.stdout.split('%')[1] } ) }}"
with_items: "{{ end_find.results }}"
- community.docker.docker_container:
etc_hosts: "{{ end_add }}"
When I ran the tests, I rename the var: "end_add" to "end_add_someshit" and screwed it all.

What does "Error: Unable to records bytes produced to topic as the node is not recognized" indicate?

I have a Kafka Streams application that works fine locally, but when I run it in Docker containers not all data is processed, and I get a lot of repeated errors in the logs about "unable to records bytes produce to topic"
18:58:32.647 [kafka-producer-network-thread | my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1-producer] ERROR o.a.k.s.p.i.RecordCollectorImpl - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] task [0_0] Unable to records bytes produced to topic my-app.packet.surface by sink node split-server-log as the node is not recognized.
Known sink nodes are [].
18:58:49.216 [kafka-producer-network-thread | my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1-producer] ERROR o.a.k.s.p.i.RecordCollectorImpl - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] task [0_0] Unable to records bytes produced to topic my-app.packet.surface by sink node split-server-log as the node is not recognized.
Known sink nodes are [].
18:59:05.981 [kafka-producer-network-thread | my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1-producer] ERROR o.a.k.s.p.i.RecordCollectorImpl - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] task [0_0] Unable to records bytes produced to topic my-app.packet.surface by sink node split-server-log as the node is not recognized.
Known sink nodes are [].
19:00:28.484 [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] INFO o.a.k.s.p.internals.StreamThread - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] Processed 3 total records, ran 0 punctuators, and committed 3 total tasks since the last update
When I run the application, not all data is processed, only some. Some KafkaStream instances produce data, while others only seem to consume it. I expect it to consume JSON data, and produce images (to be used in a Leaflet web-map). However it will only do this for some of the KafkaStream instances.
I don't get this error when I run locally. What does it mean? How can I fix it?
Application setup
I have a single application, events-processors, written in Kotlin that uses Kafka Streams. The application uses a Kafka Admin instance to create the topics, then launches 4 separate KafkaStream instances using independent Kotlin Coroutines. events-processors runs in a Docker container.
The Kafka instance is using Kafka Kraft, and is running in another Docker container on the same Docker network.
I am using
Kafka 3.3.1
Kotlin 1.7.20
docker-compose version 1.29.2
Docker version 20.10.19
Debian GNU/Linux 11 (bullseye)
Kernel: Linux 5.10.0-18-amd64
Architecture: x86-64
Kafka config
Here is the config of one of the KafkaStreams instances:
18:38:25.138 [DefaultDispatcher-worker-5 #my-app-events-processor.splitPackets#5] INFO o.a.k.s.p.internals.StreamThread - stream-thread [my-app-events-processor.splitPackets-d7b897b3-3a10-48d6-95c7-e291cb1839d8-StreamThread-1] Creating restore consumer client
18:38:25.142 [DefaultDispatcher-worker-5 #my-app-events-processor.splitPackets#5] INFO o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = none
bootstrap.servers = [http://kafka:29092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = my-app-events-processor.splitPackets-d7b897b3-3a10-48d6-95c7-e291cb1839d8-StreamThread-1-restore-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = null
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = true
isolation.level = read_committed
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor, class org.apache.kafka.clients.consumer.CooperativeStickyAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.connect.timeout.ms = null
sasl.login.read.timeout.ms = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.login.retry.backoff.max.ms = 10000
sasl.login.retry.backoff.ms = 100
sasl.mechanism = GSSAPI
sasl.oauthbearer.clock.skew.seconds = 30
sasl.oauthbearer.expected.audience = null
sasl.oauthbearer.expected.issuer = null
sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
sasl.oauthbearer.jwks.endpoint.url = null
sasl.oauthbearer.scope.claim.name = scope
sasl.oauthbearer.sub.claim.name = sub
sasl.oauthbearer.token.endpoint.url = null
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 45000
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
The server config is the Kafka Kraft config, https://github.com/apache/kafka/blob/215d4f93bd16efc8e9b7ccaa9fc99a1433a9bcfa/config/kraft/server.properties, although I have changed the advertised listeners.
advertised.listeners=PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
Docker config
The Docker config is defined in a docker-compose file.
version: "3.9"
services:
events-processors:
image: events-processors
container_name: events-processors
restart: unless-stopped
environment:
KAFKA_BOOTSTRAP_SERVERS: "http://kafka:29092"
networks:
- my-app-infra-nw
depends_on:
- infra-kafka
secrets:
- source: my-app_config
target: /.secret.config.yml
infra-kafka:
image: kafka-kraft
container_name: infra-kafka
restart: unless-stopped
networks:
my-app-infra-nw:
aliases: [ kafka ]
volumes:
- "./config/kafka-server.properties:/kafka/server.properties"
ports:
# note: other Docker containers should use 29092
- "9092:9092"
- "9093:9093"

spark streaming integration flume

I follow the guid of spark streaming + flume integration. But i can't get any events in the end.
(https://spark.apache.org/docs/latest/streaming-flume-integration.html)
Can any one help me analysis it?
In the fume, I created the file of "avro_flume.conf" as follows:
Describe/configure the source
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 123.57.54.113
a1.sources.r1.port = 4141
Describe the sink
a1.sinks = k1
a1.sinks.k1.type = avro
Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 123.57.54.113
a1.sinks.k1.port = 6666
a1.sources = r1
a1.sinks = spark
a1.channels = c1
In the file , 123.57.54.113 is the ip of localhost.
I start the programing as follows:
1.Start agent
flume-ng agent -c . -f conf/avro_spark.conf -n a1 Start Spark-streaming
2.Start spark-streaming example
bin/run-example org.apache.spark.examples.streaming.FlumeEventCount 123.57.54.113 6666
3.Then I start the avro-cilent
flume-ng avro-client -c . -H 123.57.54.113 -p 4141 -F test/log.01
4.test/log.01" is a file created by echo which contains some string
In the end ,there is no events at all.
What's the problem?
Thanks !
I see "a1.sinks = spark" under heading "Binding the source and sink to the channel". But the sink with name "spark" is not defined elsewhere in your configuration.
Are you trying approach 1 or approach 2 from "https://spark.apache.org/docs/latest/streaming-flume-integration.html"?
Try removing the line "a1.sinks = spark" if you are trying approach 1.
For approach 2 use the following template:
agent.sinks = spark
agent.sinks.spark.type = org.apache.spark.streaming.flume.sink.SparkSink
agent.sinks.spark.hostname = <hostname of the local machine>
agent.sinks.spark.port = <port to listen on for connection from Spark>
agent.sinks.spark.channel = memoryChannel

NoSuchMethod error in flume with kafka

kafka version is 0.7.2,flume version is 1.5.0, flume + kafka plugin: https://github.com/baniuyao/flume-kafka
error info:
2014-08-20 18:55:51,755 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:149)] Unhandled error
java.lang.NoSuchMethodError: scala.math.LowPriorityOrderingImplicits.ordered()Lscala/math/Ordering;
at kafka.producer.ZKBrokerPartitionInfo$$anonfun$kafka$producer$ZKBrokerPartitionInfo$$getZKTopicPartitionInfo$1.apply(ZKBrokerPartitionInfo.scala:172)
flume configuration:
agent_log.sources = r1
agent_log.sinks = kafka
agent_log.channels = c1
agent_log.sources.r1.type = exec
agent_log.sources.r1.channels = c1
agent_log.sources.r1.command = tail -f /var/log/test.log
agent_log.channels.c1.type = memory
agent_log.channels.c1.capacity = 1000
agent_log.channels.c1.trasactionCapacity = 100
agent_log.sinks.kafka.type = com.vipshop.flume.sink.kafka.KafkaSink
agent_log.sinks.kafka.channel = c1
agent_log.sinks.kafka.zk.connect = XXXX:2181
agent_log.sinks.kafka.topic = my-replicated-topic
agent_log.sinks.kafka.batchsize = 200
agent_log.sinks.kafka.producer.type = async
agent_log.sinks.kafka.serializer.class = kafka.serializer.StringEncoder
​
what could be the error? THX
​
scala.math.LowPriorityOrderingImplicits.ordered()
Perhaps you need to import the Scala standard libarary and have it in your Flume lib directory.

phpMyAdmin logs out times out (Ubuntu Maverick) "config.inc.php" and "php.ini (cgi)" changed!

phpMyAdmin 3.3.7deb2build0.10.10.1
php5-cgi 5.3.3-1ubuntu9.1
/etc/phpmyadmin/config.inc.php edited:
$cfg['blowfish_secret'] = 'phpmyadmin_logs_out';
$cfg['LoginCookieValidity'] = 86400; // 24 h
ini_set('session.gc_maxlifetime', $cfg['LoginCookieValidity']);
$cfg['Servers'][$i]['auth_type'] = 'cookie';
/etc/php5/cgi/php.ini edited (cli too):
session.gc_maxlifetime = 86400
full PMA conf> http://shorttext.com/i2gfzmyor0r
can anyone help me fix this?
try changing the php.ini of the virtual host:
/var/www/vhosts/system/%host_name%/etc/php.ini
you can check the value with phpinfo():
session-> local value for session.gc_maxlifetime

Resources