The run result of flume and test flume - flume

enter image description here
enter image description here
my flume configuration file is as follows:
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /home/hadoop/flume-1.5.0-bin/log_exec_tail
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
And start my flume agent with the following stript:
bin/flume-ng agent -n a1 -c conf -f conf/flume_log.conf -Dflume.root.logger=INFO,console
question 1: the run result is as follows, however I don't konw if it run successful or not!
question 2: And there is the sentences as follows and I don't know what the mean is about "the queation of flume test":
NOTE: To test that the Flume agent is running properly, open a new terminal window and change directories to /home/horton/solutions/:
horton#ip:~$ cd /home/horton/solutions/
Run the following script, which writes log entries to nodemanager.log:
$ ./test_flume_log.sh
If successful, you should see new files in the /user/horton/flume_sink directory in HDFS
Stop the logagent Flume agent

As per your flume configuration, whenever the file /home/hadoop/flume-1.5.0-bin/log_exec_tail is changed, it will do a tail operation and append the results in the console.
So to test it working correctly,
1. run the command bin/flume-ng agent -n a1 -c conf -f conf/flume_log.conf -Dflume.root.logger=INFO,console
2. Open a terminal and add few lines in the file /home/hadoop/flume-1.5.0-bin/log_exec_tail
3. Save it
4. Now check the terminal where you triggered flume command
5. You can see newly added lines displayed

Related

Getting error - "Could not configure source r1 due to: Failed to configure component!"

I was learning Flume and encountered an error. It prompt me was "Could not configure source r1 due to: Failed to configure component!"
My requirements are as follows:
Flume1 monitors port data sends the monitored data to Flume2, Flume3, and Flume4 and prints it to the console.
Data containing "ATguigu" is sent to Flume2, data containing "Shangguigu" is sent to Flume3, and other data is sent to Flume4
My four configuration files are:
flume1:
a1.sources=r1
a1.channels=c1 c2 c3
a1.sinks=k1 k2 k3
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=5555
# Interceptor
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=com.cmcz.flume.interceptor.EventHeaderInterceptor$MyBuilder
# channel selector:multiplexing channel selector
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = title
a1.sources.r1.selector.mapping.at = c1
a1.sources.r1.selector.mapping.st = c2
a1.sources.r1.selector.default = c3
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=10000
a1.channels.c2.transactionCapacity=100
a1.channels.c3.type=memory
a1.channels.c3.capacity=10000
a1.channels.c3.transactionCapacity=100
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=localhost
a1.sinks.k1.port=6666
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=localhost
a1.sinks.k2.port=7777
a1.sinks.k3.type=avro
a1.sinks.k3.hostname=localhost
a1.sinks.k3.port=8888
a1.sources.r1.channnels=c1 c2 c3
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
a1.sinks.k3.channel=c3
flume2:
a2.sources=r1
a2.channels=c1
a2.sinks=k1
a2.sources.r1.type=avro
a2.sources.r1.bind=localhost
a2.sources.r1.port=6666
a2.channels.c1.type=memory
a2.channels.c1.capacity=10000
a2.channels.c1.transactionCapacity=100
a2.sinks.k1.type=logger
a2.sources.r1.channels=c1
a2.sinks.k1.channel=c1
flume3:
a3.sources=r1
a3.channels=c1
a3.sinks=k1
a3.sources.r1.type=avro
a3.sources.r1.bind=localhost
a3.sources.r1.port=7777
a3.channels.c1.type=memory
a3.channels.c1.capacity=10000
a3.channels.c1.transactionCapacity=100
a3.sinks.k1.type=logger
a3.sources.r1.channels=c1
a3.sinks.k1.channel=c1
flume4:
a4.sources=r1
a4.channels=c1
a4.sinks=r1
a4.sources.r1.type=avro
a4.sources.r1.bind=localhost
a4.sources.r1.port=8888
a4.channels.c1.type=memory
a4.channels.c1.capacity=10000
a4.channels.c1.transactionCapacity=100
a4.sinks.k1.type=logger
a4.sources.r1.channels=c1
a4.sinks.k1.channel=c1
And then I start them:
flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/jobs/multiplexing/flume4.conf -n a4 -Dflume.root.logger=INTO,console
flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/jobs/multiplexing/flume3.conf -n a3 -Dflume.root.logger=INTO,console
flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/jobs/multiplexing/flume2.conf -n a2 -Dflume.root.logger=INTO,console
flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/jobs/multiplexing/flume1.conf -n a1 -Dflume.root.logger=INTO,console
An error occurred during startup of flume1:
Source r1 failed to start.
What should I do?

How to monitor Apache Flume agents status?

I know the Enterprise (Cloudera for example) way, by using a CM (via browser) or by Cloudera REST API one can access monitoring and configuring facilities.
But how to schedule (run and rerun) flume agents livecycle, and monitor their running/failure status without CM? Are there such things in the Flume distribution?
Flume's JSON Reporting API can be used to monitor health and performance.
Link
I tried adding flume.monitoring.type/port to flume-ng on start. And it completely fits my needs.
Lets create a simple agent a1 for example. Which listens on localhost:44444 and logs to console as a sink:
# flume.conf
a1.sources = s1
a1.channels = c1
a1.sinks = d1
a1.sources.s1.channels = c1
a1.sources.s1.type = netcat
a1.sources.s1.bind = localhost
a1.sources.s1.port = 44444
a1.sinks.d1.channel = c1
a1.sinks.d1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100
a1.channels.c1.transactionCapacity = 10
Run it with additional parameters flume.monitoring.type/port:
flume-ng agent -n a1 -c conf -f flume.conf -Dflume.root.logger=INFO,console -Dflume.monitoring.type=http -Dflume.monitoring.port=44123
And then monitor output in browser at localhost:44123/metrics
{"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","EventTakeSuccessCount":"570448","ChannelSize":"0","EventTakeAttemptCount":"570573","StartTime":"1567002601836","EventPutAttemptCount":"570449","EventPutSuccessCount":"570448","StopTime":"0"}}
Just try some load:
dd if=/dev/urandom count=1024 bs=1024 | base64 | nc localhost 44444

FLUME EXCEPTION

I am trying to configure flume and am following this link. The following command works for me:
flume-ng agent -n TwitterAgent -c conf -f /usr/lib/apache-flume-1.7.0-bin/conf/flume.conf
The result I got with error is,
17/01/31 12:04:08 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
17/01/31 12:04:08 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data.
Exception follows. org.apache.flume.FlumeException:
Unable to load source type:
com.cloudera.flume.source.TwitterSource, class:
com.cloudera.flume.source.TwitterSource.
(This is part of the result, I just copied the error part of it)
Can anyone help to solve this error please? I need to fix it to go on step 24 which is the last step.
Please find CDH 5.12 Flume Twitter Setup:
1. Here is file /usr/lib/flume-ng/conf/flume.conf:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type= com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = Hadoop,BigData
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/cloudera/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
2. Rename the below flume-env.sh.template file as flume-env.sh
~]$ sudo cp /usr/lib/flume-ng/conf/flume-env.sh.template /usr/lib/flume-ng/conf/flume-env.sh
3. Set JAVA_HOME and FLUME_CLASSPATH in flume-env.sh file as:
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
FLUME_CLASSPATH="/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"
4. If you don't find "/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar" on your system then download the apache-flume-1.6.0-bin from google and copy lib folder of this to current lib folder.
Link: https://www.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
4.1. Rename old lib folder
4.2. Download this above link to your cloudera desktop and do the following:
~]$ sudo mv /usr/lib/flume-ng/lib /usr/lib/flume-ng/lib_cloudera
~]$ sudo mv /home/cloudera/Desktop/apache-flume-1.6.0-bin/lib /usr/lib/flume-ng/lib
5. Now run Flume Agent Command:
~]$ flume-ng agent --conf-file /usr/lib/flume-ng/conf/flume.conf --name TwitterAgent -Dflume.root.logger=INFO,console -n TwitterAgent
This should run successfully.
All the Best.

Pass value grep command in python

I am obtaining CPU and RAM statistics for the openvpn process by running the following command in a Python script on a Linux Debian 7 box.
>ps aux | grep openvpn
The output is parsed and sent to a zabbix monitoring server.
I currently use the following Python script called psperf.py.
If I want CPU% stats I run: psperf 2
>#!/usr/bin/env python
>
>import subprocess, sys, shlex
>
>psval=sys.argv[1] #ps aux val to extract such as CPU etc #2 = %CPU, 3 = %MEM, 4 = VSZ, 5 = RSS
>
>#https://stackoverflow.com/questions/6780035/python-how-to-run-ps-cax-grep-something-in-python
>proc1 = subprocess.Popen(shlex.split('ps aux'),stdout=subprocess.PIPE)
>proc2 = subprocess.Popen(shlex.split('grep >openvpn'),stdin=proc1.stdout,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>
>proc1.stdout.close() # Allow proc1 to receive a SIGPIPE if proc2 exits.
>out,err=proc2.communicate()
>
>#string stdout?
>output = (format(out))
>
>#create output list
>output = output.split()
>
>#make ps val an integer to enable list location
>psval = int(psval)
>
>#extract value to send to zabbix from output list
>val = output[psval]
>
>#OUTPUT
>print val
This script works fine for obtaining the data in relation to openvpn. However I now want to reuse the script by passing process details from which to extract data without having to have a script for each individual process. For example I might want CPU and RAM statistics for the zabbix process.
I have tried various solutions including the following but get an index out of range.
For example I run: psperf 2 apache
>#!/usr/bin/env python
>
>import subprocess, sys, shlex
>
>psval=sys.argv[1] #ps aux val to extract such as CPU etc #2 = %CPU, 3 = %MEM, 4 = VSZ, 5 = RSS
>psname=sys.argv[2] #process details/name
>
>#https://stackoverflow.com/questions/6780035/python-how-to-run-ps-cax-grep-something-in-python
>proc1 = subprocess.Popen(shlex.split('ps aux'),stdout=subprocess.PIPE)
>proc2 = subprocess.Popen(shlex.split('grep', >psname),stdin=proc1.stdout,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>
>proc1.stdout.close() # Allow proc1 to receive a SIGPIPE if proc2 exits.
>out,err=proc2.communicate()
>
>#string stdout?
>output = (format(out))
>
>#create output list
>output = output.split()
>
>#make ps val an integer to enable list location
>psval = int(psval)
>
>#extract value to send to zabbix from output list
>val = output[psval]
>
>#OUTPUT
>print val
Error:
>root#Deb764opVPN:~# python /usr/share/zabbix/externalscripts/psperf.py 4 openvpn
>Traceback (most recent call last):
> File "/usr/share/zabbix/externalscripts/psperf.py", line 25, in <module>
> val = output[psval]
>IndexError: list index out of range
In the past I haven't used the shlex class which is new to me. This was necessary to pipe the ps aux command to grep securely - avoiding shell = true - a security hazard (http://docs.python.org/2/library/subprocess.html).
I adopted the script from: How to run " ps cax | grep something " in Python?
I believe its to do with how shlex handles my request but I`m not to sure how to go forward.
Can you help? As in how can I successfully pass a value to the grep command.
I can see this being benfical to many others who pipe commands etc.
Regards
Aidan
I carried on researching and solved using the following:
!/usr/bin/env python
import subprocess, sys # , shlex
psval=sys.argv[1] #ps aux val to extract such as CPU etc #2 = %CPU, 3 = %MEM, 4 = VSZ, 5 = RSS
psname=sys.argv[2] #process details/name
#http://www.cyberciti.biz/tips/grepping-ps-output-without-getting-grep.html
proc1 = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE)
proc2 = subprocess.Popen(['grep', psname], stdin=proc1.stdout,stdout=subprocess.PIPE)
proc1.stdout.close() # Allow proc1 to receive a SIGPIPE if proc2 exits.
stripres = proc2.stdout.read()
#TEST RESULT
print stripres
#create output list
output = stripres.split()
#make ps val an integer to enable list location
psval = int(psval)
#extract value to send to zabbix from output list
val = output[psval]
#OUTPUT
print val
Regards
Aidan

Not able to get output in hdfs directory using hdfs as sink in flume

I am trying to give normal text file to flume as source and sink is hdfs ,the source ,channel and sink are showing registered and started but nothing is comming in output directory of hdfs.M new to flume can anyone help me through this???????
Conf for flume .conf file are
agent12.sources = source1
agent12.channels = channel1
agent12.sinks = HDFS
agent12.sources.source1.type = exec
agent12.sources.source1.command = tail -F /usr/sap/sample.txt
agent12.sources.source1.channels = channel1
agent12.sinks.HDFS.channels = channel1
agent12.sinks.HDFS.type = hdfs
agent12.sinks.HDFS.hdfs.path= hdfs://172.18.36.248:50070:user/root/xz
agent12.channels.channel1.type =memory
agent12.channels.channel1.capacity = 1000
agent started using
/usr/bin/flume-ng agent -n agent12 -c usr/lib//flume-ng/conf/sample.conf -f /usr/lib/flume-ng/conf/flume-conf.properties.template

Resources