Apache flume : How to detect errors in flume process - flume

I have set up flume agents and the process is working fine. However, I am not sure how to implement error detection for production environment. Any direction/guidelines regarding error handling/notification for flume implementation is appreciated.

Related

Apache Beam logs messages with the wrong tags

Error logs don't log in the GCP console.
Warning logs do log as info (so I've been using them to log info messages). E.g.,
test = "hello debug world"
logging.warning("%s", test) # will log as info message in GCP dataflow console
Info logs don't log in the console either.
I'm using Apache Beam Python 3.7 SDK 2.23.0, but this seems to be an old issue.
Also happens by Apache Beam SDK itself, which logs errors silently at times as info.
Any idea what's causing this? Seems to be a bug on the Apache Beam side of things more than an scripting error.
You will have to change the drop down value from Info to some higher log level to see Error or Warning type messages. In the screeenshot the log level is set to Info and you are searching for a string error in the log entries and stackdriver is filtering based on it.

kafka connect in distributed mode is not generating logs specified via log4j properties

I have been using Kafka Connect in my work setup for a while now and it works perfectly fine.
Recently I thought of dabbling with few connectors of my own in my docker based kafka cluser with just one broker (ubuntu:18.04 with kafka installed) and a separate node acting as client for deploying connector apps.
Here is the problem:
Once my broker is up and running, I login to the client node (with no broker running,just the vanilla kafka installation), i setup the class path to point to my connector libraries. Also the KAFKA_LOG4J_OPTS environment variable to point to the location of log file to generate with debug mode enabled.
So every time i start the kafka worker using command:
nohup /opt//bin/connect-distributed /opt//config/connect-distributed.properties > /dev/null 2>&1 &
the connector starts running, but I don't see the log file getting generated.
I have tried several changes but nothing works out.
QUESTIONS:
Does this mean that connect-distributed.sh doesn't generate the log file after reading the variable
KAFKA_LOG4J_OPTS? and if it does, could someone explain how?
NOTE:
(I have already debugged the connect-distributed.sh script and tried the options where daemon mode is included and not included, by default if KAFKA_LOG4J_OPTS is not provided, it uses the connect-log4j.properties file in config directory, but even then no log file is getting generated).
OBSERVATION:
Only when I start zookeeper/broker on the client node, then provided KAFKA_LOG4J_OPTS value is picked and logs start getting generated but nothing related to the Kafka connector. I have already verified the connectivity b/w the client and the broker using kafkacat
The interesting part is:
The same process i follow in my workpalce and logs start getting generated every time the worker (connnect-distributed.sh) is started, but I haven't' been to replicate the behaviors in my own setup). And I have no clue what I am missing here.
Could someone provide some reasoning, this is really driving me mad.

Spring Cloud Data Flow Local Server + Skipper Server: Error after undeploying streams

I trying to manage my streams on spring cloud data flow with skipper server.
I followed the instruction here:
https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-spring-cloud-dataflow
The app registration and stream definition/deployment goes quite well, but after I undeploy the deployed stream, I can't see any stream on the dashboard any more.
The dashboard shows an error instead:
Could not parse Skipper Platform Status JSON:null
I have to restart the scdf server and skipper server in order to see my stream definition again.
The version of the components are:
scdf local server: 1.6.0.RELEASE
skipper server: 1.0.8.RELEASE
metrics collector: kafka-10-1.0.0.RELEASE
Some operation details:
I registered my app using scdf shell in skipper mode.
I defined and depolyed my stream on the scdf dashboard. I undeployed the stream via the stop button on the dashboard, too.
How should I solve this problem?
We have recently observed this on our side, too, and it has been fixed! [see spring-cloud/spring-cloud-dataflow#2361]
We are preparing for a 1.6.1 GA release, but in the meantime, please feel free to pull the 1.6.1.BUILD-SNAPSHOT from Spring repo and give it a go.
If you see any other anomaly, it'd be great to have a bug report on the matter.

Flume agent: How flume agent gets data from a webserver located in different physical server

I am trying to understand Flume and referring to the official page of flume at flume.apache.org
In particular, referring to this section, I am bit confused in this.
Do we need to run the flume agent on the actual webserver or can we run flume agents in a different physical server and acquire data from webserver?
If above is correct, then how flume agent gets the data from webserver logs? How can webserver make its data available to the flume agent ?
Can anyone help understand this?
The Flume agent must pull data from a source, publish to a channel, which then writes to a sink.
You can install Flume agent in either a local or remote configuration. But, keep in mind that having it remote will add some network latency to your event processing, if you are concerned about that. You can also "multiplex" Flume agents to have one remote aggregation agent, then individual local agents on each web server.
Assuming a flume agent is locally installed using a Spooldir or exec source, it'll essentially tail any file or run that command locally. This is how it would get data from logs.
If the Flume agent is setup as a Syslog or TCP source (see Data ingestion section on network sources), then it can be on a remote machine, and you must establish a network socket in your logging application to publish messages to the other server. This is a similar pattern to Apache Kafka.

Can flume source act as a jms consumer

I have just started looking into flume for writing messages to hdfs using the hdfs sink. I am wondering if flume source can act as a jms consumer for my message broker.
Does flume provide integration with message broker.
Or do i need to write a custom jms client that will push messages to a flume source.
Flume 1.3 does not provide JMS source out of the box. However, it seems that such component will be shipped with the next version, Flume 1.4: https://issues.apache.org/jira/browse/FLUME-924.
Meanwhile you can get the source code of this new component here and build it. AFAIK, Flume 1.4 does not break its interfaces, so probably the component will work with 1.3 as well.

Resources