Python KafkaConsumer not connecting - docker

Setup:
I have 3 docker containers
1) For Kafka
2) For Zookeeper
3) For JupyterLab
I setup networking between these containers and I see that kafka producer is able to run and produce the data.
KafkaProducer.ipynb
KAFKA_BROKER = ['172.20.0.2:9093']
from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=KAFKA_BROKER)
for _ in range(100):
print("sending")
producer.send('my-topic', key=b'foo', value=b'bar')
print("success")
Here the send() sends message 100 times.
KafkaConsumer.ipynb
KAFKA_BROKER = ['172.20.0.2:9093']
from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic',group_id='my-group',bootstrap_servers=KAFKA_BROKER)
print("Comm success")
for message in consumer:
# message value and key are raw bytes -- decode if necessary!
# e.g., for unicode: `message.value.decode('utf-8')`
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
message.offset, message.key,
message.value))
In the above consumer code the line print("Comm success") never gets gets executed. Based on producer code execution, the network is open and jupyter is able to talk to kafka broker. But, client is not able to connect to the same broker for data consumption. How can I start debugging this?

By default auto.offset.reset value is latest, so set it to earliest with new group.id
consumer = KafkaConsumer('my-topic',group_id='new-group',auto_offset_reset = 'earliest',bootstrap_servers=KAFKA_BROKER)

Related

MQTT5 User Properties with Mosquitto Bridge

I am running a local Mosquitto (MQTT) Broker that connects to a remote Mosquitto Broker using the build in MQTT Bridge functionality of Mosquitto. My mosquitto.conf looks like this:
# =================================================================
# Listeners
# =================================================================
listener 1883
# =================================================================
# Security
# =================================================================
allow_anonymous true
# =================================================================
# Bridges
# =================================================================
connection myConnectionName
address <<Remote Broker IP>>:1883
remote_username <<Remote Broker Username>>
remote_password <<Remote Broker Password>>
topic mytopic/# out 1 "" B2/
bridge_protocol_version mqttv50
cleansession false
bridge_attempt_unsubscribe true
upgrade_outgoing_qos true
max_queued_messages 5000
For testing I run a MqttPublisher using a C# console application which uses the MQTTnet library (version 3) and a MqttSubsbriber (also C# console application with MqttNet).
Now I want the Publisher to publish MQTT messages with User Properties (introduced by MQTT 5).
I build the message like this:
using MQTTnet;
using MQTTnet.Client;
using MQTTnet.Client.Options;
class Program
{
static void Main()
{
// Create a new MQTT client instance
var factory = new MqttFactory();
var mqttClient = factory.CreateMqttClient();
// Setup the options for the MQTT client
var options = new MqttClientOptionsBuilder()
.WithClientId("MqttPublisher")
.WithTcpServer("localhost", 1883)
.WithProtocolVersion(MQTTnet.Formatter.MqttProtocolVersion.V500)
.WithCleanSession()
.Build();
mqttClient.ConnectAsync(options).Wait();
var i = 0;
while (true)
{
Console.WriteLine("Client connected: " + mqttClient.IsConnected);
var message = new MqttApplicationMessageBuilder()
.WithTopic("mytopic/test")
.WithUserProperty("UPTest","Hi UP")
.WithPayload("Hello World: " + i)
.Build();
mqttClient.PublishAsync(message).Wait();
Console.WriteLine("Published message with payload: " + System.Text.Encoding.UTF8.GetString(message.Payload));
i++;
System.Threading.Thread.Sleep(1000);
}
mqttClient.DisconnectAsync().Wait();
}
}
With the subscriber (also with WithProtocolVersion(MQTTnet.Formatter.MqttProtocolVersion.V500) if I subscribe to the topic I get all the messages and I can read the MQTTnet.MqttApplicationMessage like shown in the following screenshot:
The messages are also published to the remote MQTT Broker due to the MQTT Bride configured. However, if I subscribe to the remote Broker with my MqttSubscriber, I am not getting the User Properties anymore:
Is there any way to configure the Mosquitto Bridge that also the user properties are send? I cant find a way and any help and comments are appreciated.
Thanks
Joshua
Using mosqutto 2.0.15 I have verified that MQTTv5 message properties are passed over the bridge.
Broker one.conf
listener 1883
allow_anonymous true
Broker two.conf
listener 1884
allow_anonymous true
connection one
address 127.0.0.1:1883
topic foo/# both 0
bridge_protocol_version mqttv50
Subscriber connected to Broker two
$ mosquitto_sub -t 'foo/#' -V mqttv5 -p 1884 -F "%t %P %p"
foo/bar foo:bar ben
Publisher connected to Broker one
$ mosquitto_pub -D PUBLISH user-property foo bar -t 'foo/bar' -m ben -p 1883
As you can see the the %P in the output format for the subscriber is outputting the user property foo with a value of bar when subscribed over the bridge.

Sending and receiving events successfully with socket.io, but nothing is happening

I'm trying to get my webapp to send messages and I can't figure out why it isn't working. There are no errors that I can see, it's just that the actions in my event.py function aren't happening. I am running a gunicorn server with eventlet workers serving a flask app.
Here's the command that starts the gunicorn server through docker:
CMD [ "gunicorn", "--reload", "-b", "0.0.0.0:5000", "--worker-class", "eventlet", "-w", "1", "app:app"]
here's the relevant code on notes.html:
// Imports socketio
<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.0.1/socket.io.js" integrity="sha512-q/dWJ3kcmjBLU4Qc47E4A9kTB4m3wuTY7vkFJDTZKjTs8jhyGQnaUrxa0Ytd0ssMZhbNua9hE+E7Qv1j+DyZwA==" crossorigin="anonymous"></script>
<script type="text/javascript" charset="utf-8">
// sets domain to talk to. (empty sets it to localhost)
const socket = io()
// send message to server on trigger from form.
socket.emit('send_new_session', new_session_form_id.value, new_session_form_number.value, new_session_form_title.value, new_session_form_synopsis.value)
console.log('send_new_session')
// console logs the message here, do I know it's getting this far. The problem seems to be that the server isn't getting the message for some reason.
events.py:
from . import db, socketio
from .classes import *
from flask_socketio import emit
#socketio.on('send_new_session')
def send_new_session(id, number, title, synopsis=None):
print("arrived!!!!!!!!!!!")
# more code that adds the new session the the database
..
I have the logging correctly set up to stdout but I never see the "arrived" message, so I know it's never hitting the server.
here is the server logs for when I send the message:
rest-server | Bpt-ydbpGYLF-HGKAAAC: Sending packet OPEN data {'sid': 'Bpt-ydbpGYLF-HGKAAAC', 'upgrades': ['websocket'], 'pingTimeout': 20000, 'pingInterval': 25000}
rest-server | Bpt-ydbpGYLF-HGKAAAC: Received packet MESSAGE data 0
rest-server | Bpt-ydbpGYLF-HGKAAAC: Sending packet MESSAGE data 0{"sid":"MRAxFiGYyLB3C6MBAAAD"}
rest-server | Bpt-ydbpGYLF-HGKAAAC: Received request to upgrade to websocket
rest-server | Bpt-ydbpGYLF-HGKAAAC: Upgrade to websocket successful
rest-server | Bpt-ydbpGYLF-HGKAAAC: Received packet MESSAGE data 2["send_new_session","1","2","foo","bar"]
rest-server | received event "send_new_session" from MRAxFiGYyLB3C6MBAAAD [/]
rest-server | Bpt-ydbpGYLF-HGKAAAC: Sending packet PING data None
rest-server | Bpt-ydbpGYLF-HGKAAAC: Received packet PONG data
you can see in the log that the message is in fact being sent and received, but for some reason the actions in the event aren't happening. I've been trying everything I can think of for a couple days not. any help would be greatly appreciated!!
Everything below here is probably not relevant, but if it it helps, this is how I set up the app:
file set up:
/app
--app.py
--requirements.txt
--Dockerfile
--docker-compose.yml
--.flaskenv
--/project
----/static
----/templates
----__init__.py
----settings.py
----events.py
----BONapp.py
----auth.py
etc...
settings.py
import os
from flask import Flask
app = Flask(__name__)
db_password = os.environ.get('DB_PASS')
app.config['SQLALCHEMY_DATABASE_URI'] = 'mysql+pymysql://root:' + db_password + '#bonmysqldb:3306/BON'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.config['SECRET_KEY'] = db_password
init.py
from flask_login import LoginManager
from flask_migrate import Migrate
from flask_sqlalchemy import SQLAlchemy
from flask_socketio import SocketIO
from .settings import app
db = SQLAlchemy(app)
socketio = SocketIO(app, logger=True, engineio_logger=True)
def create_app():
migrate = Migrate(app, db)
from .classes import Users
db.init_app(app)
socketio.init_app(app)
login_manager = LoginManager()
login_manager.login_view = 'auth.login'
login_manager.init_app(app)
# provide login_manager with a unicode user ID
#login_manager.user_loader
def load_user(user_id):
return Users.query.get(int(user_id))
# blueprint for auth routes of app
from .auth import auth as auth_blueprint
app.register_blueprint(auth_blueprint)
# blueprint for non-auth parts of app
from .BONapp import main as main_blueprint
app.register_blueprint(main_blueprint)
return app
app.py
from project.__init__ import create_app
app = create_app()
I figured it out while reading over my question again...
it was in events.py I changed:
from . import db, socketio
to:
from .__init__ import db, socketio
I'm not exactly sure why that mattered but it fixed it.
facepalm

Flink consumer job is unable to connect to Kafka from Docker

I have a simple streaming Flink Scala job which connects to a Kafka topic and maps its
org.apache.avro.generic.GenericRecord messages and map into Json string.
When it is running in IntelliJ it ingests the topic well and printing out the jsons.
When I run it in docker-compose I got the following exception:
com.esotericsoftware.kryo.KryoException: Error constructing instance of class: org.apache.avro.Schema$LockableArrayList
Serialization trace:
types (org.apache.avro.Schema$UnionSchema)
schema (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:136)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1061)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.create(CollectionSerializer.java:89)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:93)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:262)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:111)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:37)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:635)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:208)
Caused by: java.lang.IllegalAccessException: Class com.twitter.chill.Instantiators$ can not access a member of class org.apache.avro.Schema$LockableArrayList with modifiers "public"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:296)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:288)
at java.lang.reflect.Constructor.newInstance(Constructor.java:413)
at com.twitter.chill.Instantiators$.$anonfun$normalJava$1(KryoBase.scala:170)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:133)
... 37 more
I tried forcing Avro serialization with:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.getConfig.disableForceKryo()
env.getConfig.enableForceAvro()
but got the same error.
Based on [this][1] I'm using all Flink related dependencies as "provided" with no good result.
What can be the difference between running the job in the IDE and in
Docker?
How can I fix the job to be able to read the Kafka topic from
Docker?
How shall I setup Docker for this?
What can I handle Kryo/Avro serialization issue?
SS
[1]: http://www.alternatestack.com/development/com-esotericsoftware-kryo-kryoexception-unusual-solution-upgrading-flink/

How to retrieve worker logs for a Dask-YARN job?

I have a simple Dask-YARN script that does only one task: load a file from HDFS, as shown below. However, I'm running into a bug in the code, so I added a print statement in the function, but I don't see that statement being executed in the worker logs which I obtain using yarn logs -applicationId {application_id}. I even tried the method Client.get_worker_logs(), however that doesn't display the stdout as well, just shows some INFO about the worker(s). How does one obtain worker logs after the execution of the code has completed?
import sys
import numpy as np
import scipy.signal
import json
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster
#dask.delayed
def load(input_file):
print("In call of Load...")
with open(input_file, "r") as fo:
data = json.load(fo)
return data
# Process input args
(_, filename) = sys.argv
dag_1 = {
'load-1': (load, filename)
}
print("Building tasks...")
tasks = dask.get(dag_1, 'load-1')
print("Creating YARN cluster now...")
cluster = YarnCluster()
print("Scaling YARN cluster now...")
cluster.scale(1)
print("Creating Client now...")
client = Client(cluster)
print("Getting logs..1")
print(client.get_worker_logs())
print("Doing Dask computations now...")
dask.compute(tasks)
print("Getting logs..2")
print(client.get_worker_logs())
print("Shutting down cluster now...")
cluster.shutdown()
I'm not sure what's going on here, print statements should (and usually do) end up in the log files stored by yarn.
If you want your debug statements to appear in the worker logs from get_worker_logs, you can use the worker logger directly:
from distributed.worker import logger
logger.info("This will show up in the worker logs")

source data from syslog into flume

I tried to setup a flume agent to source data from syslog server.
basically, I have setup a syslog server on an server so-called (server1) to receive syslog events, then forward all messages to different server (server2) where the flume agent installed, then finally all data will be sink to kafka cluster.
Flume configuration as below.
# For each one of the sources, the type is defined
agent.sources.syslogSrc.type = syslogudp
agent.sources.syslogSrc.port = 9090
agent.sources.syslogSrc.host = server2
# The channel can be defined as follows.
agent.sources.syslogSrc.channels = memoryChannel
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100
# config for kafka sink
agent.sinks.kafkaSink.channel = memoryChannel
agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkaSink.kafka.topic = flume
agent.sinks.kafkaSink.kafka.bootstrap.servers = <kafka.broker.list>:9092
agent.sinks.kafkaSink.kafka.flumeBatchSize = 20
agent.sinks.kafkaSink.kafka.producer.acks = 1
agent.sinks.kafkaSink.kafka.producer.linger.ms = 1
agent.sinks.kafkaSink.kafka.producer.compression.type = snappy
But, somehow logsys is not getting injected into flume agent.
appricate for your advice.
I have setup a syslog server on an server so-called (server1)
The syslogudp Source must bind to server1 host
agent.sources.syslogSrc.host = server1
then forward all messages to different server (server2)
the different server refers to the Sink
agent.sinks.kafkaSink.kafka.bootstrap.servers = server2:9092
Flume agent is only a process that hosts these components (Source, Sink, Channel) to facilitate the flow of events.

Resources