I am tryng to connect to MySQL with Spark and Zeppelin using Docker.
Some docker-yaml file settup:
zeppelin:
image: apache/zeppelin:0.9.0
container_name: zeppelin
volumes:
- ${PWD}/notebook:/notebook
- ${PWD}/logs:/logs
- ${PWD}/data:/learn
- ${PWD}/spark/conf:/spark/conf
- ${PWD}/spark/jars:/spark/user_jars
- ${PWD}/spark/sql/:/spark/sql/
environment:
- SPARK_SUBMIT_OPTIONS=--packages=org.mariadb.jdbc:mariadb-java-client:2.7.2 --jars=/spark/user_jars/mysql-connector-java-8.0.23.jar
- ZEPPELIN_LOG_DIR=/logs
- ZEPPELIN_NOTEBOOK_DIR=/notebook
- ZEPPELIN_ADDR=0.0.0.0
- ZEPPELIN_SPARK_MAXRESULT=10000
- ZEPPELIN_INTERPRETER_OUTPUT_LIMIT=204800
- ZEPPELIN_NOTEBOOK_COLLABORATIVE_MODE_ENABLE=false
And this is env when i exec to container:
LOG_TAG=[ZEPPELIN_0.9.0]:
Z_VERSION=0.9.0
HOSTNAME=zeppelin
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ZEPPELIN_LOG_DIR=/logs
PWD=/opt/zeppelin
SPARK_SUBMIT_OPTIONS=--packages=org.mariadb.jdbc:mariadb-java-client:2.7.2 --jars=/spark/user_jars/mysql-connector-java-8.0.23.jar
ZEPPELIN_NOTEBOOK_DIR=/notebook
HOME=/opt/zeppelin
LANG=en_US.UTF-8
TERM=xterm
ZEPPELIN_NOTEBOOK_COLLABORATIVE_MODE_ENABLE=false
SHLVL=1
ZEPPELIN_ADDR=0.0.0.0
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT=204800
LC_ALL=en_US.UTF-8
Z_HOME=/opt/zeppelin
ZEPPELIN_SPARK_MAXRESULT=10000
PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/env
But when i try to access to MySQL, with scala code as below
val jdbcDriver = spark.conf.get("spark.jdbc.driver.class", "org.mariadb.jdbc.Driver")
val dbHost = spark.conf.get("spark.jdbc.host","mysql")
val dbPort = spark.conf.get("spark.jdbc.port", "3306")
val defaultDb = spark.conf.get("spark.jdbc.default.db", "default")
val dbTable = spark.conf.get("spark.jdbc.table", "customers")
val dbUser = spark.conf.get("spark.jdbc.user", "root")
val dbPass = spark.conf.get("spark.jdbc.password", "dataengineering")
val connectionUrl = s"jdbc:mysql://$dbHost:$dbPort/$defaultDb"
I get error about org.mariadb.jdbc.Driver
java.lang.ClassNotFoundException: org.mariadb.jdbc.Driver
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
I don't know what wrong with this setup and code above ? Many thanks!
Related
From this:
cat /etc/hosts
172.26.42.112 test.foobar
172.26.42.112 elastic.foobar
- vars:
- docker_add_end: ['elastic', 'test']
To this:
- ansible.builtin.shell: |
IP=$(cut -d ' ' -f 1 <<< "$(grep {{ item }} /etc/hosts)")
FQDNHOST=$(cut -d ' ' -f 2 <<< "$(grep {{ item }} /etc/hosts)")
echo -e "${FQDNHOST} $(cut -d '.' -f 1 <<< ${FQDNHOST})%${IP}"
register: end_find
with_items: "{{ docker_add_end }}"'
Then split the found result and add to a list (or dict ??):
- ansible.builtin.set_fact:
end_add: "{{ end_add|default([]) + [ item.stdout.split('%')[0] + ':' + item.stdout.split('%')[1] ] }}"
with_items: "{{ end_find.results }}"
With the follow result:
ok: [integration] => {
"msg": [
"elastic.foobar elastic:172.26.42.112",
"test.foobar test:172.26.42.112"
]
}
But, when pass it to docker:
- community.docker.docker_container:
etc_hosts: "{{ end_add }}"
I got:
FAILED! => {"changed": false, "msg": "argument 'etc_hosts' is of type <class 'list'> and we were unable to convert to dict: <class 'list'> cannot be converted to a dict"}
So I try another approach:
- ansible.builtin.set_fact:
end_add: "{{ end_add|default({}) | combine( { item.stdout.split('%')[0] : item.stdout.split('%')[1] } ) }}"
with_items: "{{ end_find.results }}"
ok: [integration] => {
"msg": {
"elastic.foobar elastic": "172.26.42.112",
"test.foobar test": "172.26.42.112"
}
}
And end with the same error:
fatal: [integration]: FAILED! => {"changed": false, "msg": "argument 'etc_hosts' is of type <class 'list'> and we were unable to convert to dict: <class 'list'> cannot be converted to a dict"}
community.docker.docker_container
etc_hosts
dictionary
Dict of host-to-IP mappings, where each host name is a key in the dictionary. Each host name will be added to the container’s /etc/hosts file.
Ho man, I made a mistake in a var name when wrote the playbook.
The question above show's the right way to do it in the second example.
cat /etc/hosts
172.26.42.112 test.foobar
172.26.42.112 elastic.foobar
- vars:
- docker_add_end: ['elastic', 'test']
- ansible.builtin.shell: |
IP=$(cut -d ' ' -f 1 <<< "$(grep {{ item }} /etc/hosts)")
FQDNHOST=$(cut -d ' ' -f 2 <<< "$(grep {{ item }} /etc/hosts)")
echo -e "${FQDNHOST} $(cut -d '.' -f 1 <<< ${FQDNHOST})%${IP}"
register: end_find
with_items: "{{ docker_add_end }}"'
- ansible.builtin.set_fact:
end_add: "{{ end_add|default({}) | combine( { item.stdout.split('%')[0] : item.stdout.split('%')[1] } ) }}"
with_items: "{{ end_find.results }}"
- community.docker.docker_container:
etc_hosts: "{{ end_add }}"
When I ran the tests, I rename the var: "end_add" to "end_add_someshit" and screwed it all.
I have a Kafka Streams application that works fine locally, but when I run it in Docker containers not all data is processed, and I get a lot of repeated errors in the logs about "unable to records bytes produce to topic"
18:58:32.647 [kafka-producer-network-thread | my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1-producer] ERROR o.a.k.s.p.i.RecordCollectorImpl - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] task [0_0] Unable to records bytes produced to topic my-app.packet.surface by sink node split-server-log as the node is not recognized.
Known sink nodes are [].
18:58:49.216 [kafka-producer-network-thread | my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1-producer] ERROR o.a.k.s.p.i.RecordCollectorImpl - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] task [0_0] Unable to records bytes produced to topic my-app.packet.surface by sink node split-server-log as the node is not recognized.
Known sink nodes are [].
18:59:05.981 [kafka-producer-network-thread | my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1-producer] ERROR o.a.k.s.p.i.RecordCollectorImpl - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] task [0_0] Unable to records bytes produced to topic my-app.packet.surface by sink node split-server-log as the node is not recognized.
Known sink nodes are [].
19:00:28.484 [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] INFO o.a.k.s.p.internals.StreamThread - stream-thread [my-app-events-processor.splitPackets-cf462b02-f1e3-4ed5-a1e7-acc1f040495b-StreamThread-1] Processed 3 total records, ran 0 punctuators, and committed 3 total tasks since the last update
When I run the application, not all data is processed, only some. Some KafkaStream instances produce data, while others only seem to consume it. I expect it to consume JSON data, and produce images (to be used in a Leaflet web-map). However it will only do this for some of the KafkaStream instances.
I don't get this error when I run locally. What does it mean? How can I fix it?
Application setup
I have a single application, events-processors, written in Kotlin that uses Kafka Streams. The application uses a Kafka Admin instance to create the topics, then launches 4 separate KafkaStream instances using independent Kotlin Coroutines. events-processors runs in a Docker container.
The Kafka instance is using Kafka Kraft, and is running in another Docker container on the same Docker network.
I am using
Kafka 3.3.1
Kotlin 1.7.20
docker-compose version 1.29.2
Docker version 20.10.19
Debian GNU/Linux 11 (bullseye)
Kernel: Linux 5.10.0-18-amd64
Architecture: x86-64
Kafka config
Here is the config of one of the KafkaStreams instances:
18:38:25.138 [DefaultDispatcher-worker-5 #my-app-events-processor.splitPackets#5] INFO o.a.k.s.p.internals.StreamThread - stream-thread [my-app-events-processor.splitPackets-d7b897b3-3a10-48d6-95c7-e291cb1839d8-StreamThread-1] Creating restore consumer client
18:38:25.142 [DefaultDispatcher-worker-5 #my-app-events-processor.splitPackets#5] INFO o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = none
bootstrap.servers = [http://kafka:29092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = my-app-events-processor.splitPackets-d7b897b3-3a10-48d6-95c7-e291cb1839d8-StreamThread-1-restore-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = null
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = true
isolation.level = read_committed
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor, class org.apache.kafka.clients.consumer.CooperativeStickyAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.connect.timeout.ms = null
sasl.login.read.timeout.ms = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.login.retry.backoff.max.ms = 10000
sasl.login.retry.backoff.ms = 100
sasl.mechanism = GSSAPI
sasl.oauthbearer.clock.skew.seconds = 30
sasl.oauthbearer.expected.audience = null
sasl.oauthbearer.expected.issuer = null
sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
sasl.oauthbearer.jwks.endpoint.url = null
sasl.oauthbearer.scope.claim.name = scope
sasl.oauthbearer.sub.claim.name = sub
sasl.oauthbearer.token.endpoint.url = null
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 45000
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
The server config is the Kafka Kraft config, https://github.com/apache/kafka/blob/215d4f93bd16efc8e9b7ccaa9fc99a1433a9bcfa/config/kraft/server.properties, although I have changed the advertised listeners.
advertised.listeners=PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
Docker config
The Docker config is defined in a docker-compose file.
version: "3.9"
services:
events-processors:
image: events-processors
container_name: events-processors
restart: unless-stopped
environment:
KAFKA_BOOTSTRAP_SERVERS: "http://kafka:29092"
networks:
- my-app-infra-nw
depends_on:
- infra-kafka
secrets:
- source: my-app_config
target: /.secret.config.yml
infra-kafka:
image: kafka-kraft
container_name: infra-kafka
restart: unless-stopped
networks:
my-app-infra-nw:
aliases: [ kafka ]
volumes:
- "./config/kafka-server.properties:/kafka/server.properties"
ports:
# note: other Docker containers should use 29092
- "9092:9092"
- "9093:9093"
I follow the guid of spark streaming + flume integration. But i can't get any events in the end.
(https://spark.apache.org/docs/latest/streaming-flume-integration.html)
Can any one help me analysis it?
In the fume, I created the file of "avro_flume.conf" as follows:
Describe/configure the source
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 123.57.54.113
a1.sources.r1.port = 4141
Describe the sink
a1.sinks = k1
a1.sinks.k1.type = avro
Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 123.57.54.113
a1.sinks.k1.port = 6666
a1.sources = r1
a1.sinks = spark
a1.channels = c1
In the file , 123.57.54.113 is the ip of localhost.
I start the programing as follows:
1.Start agent
flume-ng agent -c . -f conf/avro_spark.conf -n a1 Start Spark-streaming
2.Start spark-streaming example
bin/run-example org.apache.spark.examples.streaming.FlumeEventCount 123.57.54.113 6666
3.Then I start the avro-cilent
flume-ng avro-client -c . -H 123.57.54.113 -p 4141 -F test/log.01
4.test/log.01" is a file created by echo which contains some string
In the end ,there is no events at all.
What's the problem?
Thanks !
I see "a1.sinks = spark" under heading "Binding the source and sink to the channel". But the sink with name "spark" is not defined elsewhere in your configuration.
Are you trying approach 1 or approach 2 from "https://spark.apache.org/docs/latest/streaming-flume-integration.html"?
Try removing the line "a1.sinks = spark" if you are trying approach 1.
For approach 2 use the following template:
agent.sinks = spark
agent.sinks.spark.type = org.apache.spark.streaming.flume.sink.SparkSink
agent.sinks.spark.hostname = <hostname of the local machine>
agent.sinks.spark.port = <port to listen on for connection from Spark>
agent.sinks.spark.channel = memoryChannel
kafka version is 0.7.2,flume version is 1.5.0, flume + kafka plugin: https://github.com/baniuyao/flume-kafka
error info:
2014-08-20 18:55:51,755 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:149)] Unhandled error
java.lang.NoSuchMethodError: scala.math.LowPriorityOrderingImplicits.ordered()Lscala/math/Ordering;
at kafka.producer.ZKBrokerPartitionInfo$$anonfun$kafka$producer$ZKBrokerPartitionInfo$$getZKTopicPartitionInfo$1.apply(ZKBrokerPartitionInfo.scala:172)
flume configuration:
agent_log.sources = r1
agent_log.sinks = kafka
agent_log.channels = c1
agent_log.sources.r1.type = exec
agent_log.sources.r1.channels = c1
agent_log.sources.r1.command = tail -f /var/log/test.log
agent_log.channels.c1.type = memory
agent_log.channels.c1.capacity = 1000
agent_log.channels.c1.trasactionCapacity = 100
agent_log.sinks.kafka.type = com.vipshop.flume.sink.kafka.KafkaSink
agent_log.sinks.kafka.channel = c1
agent_log.sinks.kafka.zk.connect = XXXX:2181
agent_log.sinks.kafka.topic = my-replicated-topic
agent_log.sinks.kafka.batchsize = 200
agent_log.sinks.kafka.producer.type = async
agent_log.sinks.kafka.serializer.class = kafka.serializer.StringEncoder
what could be the error? THX
scala.math.LowPriorityOrderingImplicits.ordered()
Perhaps you need to import the Scala standard libarary and have it in your Flume lib directory.
phpMyAdmin 3.3.7deb2build0.10.10.1
php5-cgi 5.3.3-1ubuntu9.1
/etc/phpmyadmin/config.inc.php edited:
$cfg['blowfish_secret'] = 'phpmyadmin_logs_out';
$cfg['LoginCookieValidity'] = 86400; // 24 h
ini_set('session.gc_maxlifetime', $cfg['LoginCookieValidity']);
$cfg['Servers'][$i]['auth_type'] = 'cookie';
/etc/php5/cgi/php.ini edited (cli too):
session.gc_maxlifetime = 86400
full PMA conf> http://shorttext.com/i2gfzmyor0r
can anyone help me fix this?
try changing the php.ini of the virtual host:
/var/www/vhosts/system/%host_name%/etc/php.ini
you can check the value with phpinfo():
session-> local value for session.gc_maxlifetime