Not able to get output in hdfs directory using hdfs as sink in flume - flume

I am trying to give normal text file to flume as source and sink is hdfs ,the source ,channel and sink are showing registered and started but nothing is comming in output directory of hdfs.M new to flume can anyone help me through this???????

Conf for flume .conf file are
agent12.sources = source1
agent12.channels = channel1
agent12.sinks = HDFS
agent12.sources.source1.type = exec
agent12.sources.source1.command = tail -F /usr/sap/sample.txt
agent12.sources.source1.channels = channel1
agent12.sinks.HDFS.channels = channel1
agent12.sinks.HDFS.type = hdfs
agent12.sinks.HDFS.hdfs.path= hdfs://172.18.36.248:50070:user/root/xz
agent12.channels.channel1.type =memory
agent12.channels.channel1.capacity = 1000
agent started using
/usr/bin/flume-ng agent -n agent12 -c usr/lib//flume-ng/conf/sample.conf -f /usr/lib/flume-ng/conf/flume-conf.properties.template

Related

How to monitor Apache Flume agents status?

I know the Enterprise (Cloudera for example) way, by using a CM (via browser) or by Cloudera REST API one can access monitoring and configuring facilities.
But how to schedule (run and rerun) flume agents livecycle, and monitor their running/failure status without CM? Are there such things in the Flume distribution?
Flume's JSON Reporting API can be used to monitor health and performance.
Link
I tried adding flume.monitoring.type/port to flume-ng on start. And it completely fits my needs.
Lets create a simple agent a1 for example. Which listens on localhost:44444 and logs to console as a sink:
# flume.conf
a1.sources = s1
a1.channels = c1
a1.sinks = d1
a1.sources.s1.channels = c1
a1.sources.s1.type = netcat
a1.sources.s1.bind = localhost
a1.sources.s1.port = 44444
a1.sinks.d1.channel = c1
a1.sinks.d1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100
a1.channels.c1.transactionCapacity = 10
Run it with additional parameters flume.monitoring.type/port:
flume-ng agent -n a1 -c conf -f flume.conf -Dflume.root.logger=INFO,console -Dflume.monitoring.type=http -Dflume.monitoring.port=44123
And then monitor output in browser at localhost:44123/metrics
{"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","EventTakeSuccessCount":"570448","ChannelSize":"0","EventTakeAttemptCount":"570573","StartTime":"1567002601836","EventPutAttemptCount":"570449","EventPutSuccessCount":"570448","StopTime":"0"}}
Just try some load:
dd if=/dev/urandom count=1024 bs=1024 | base64 | nc localhost 44444

FLUME EXCEPTION

I am trying to configure flume and am following this link. The following command works for me:
flume-ng agent -n TwitterAgent -c conf -f /usr/lib/apache-flume-1.7.0-bin/conf/flume.conf
The result I got with error is,
17/01/31 12:04:08 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
17/01/31 12:04:08 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data.
Exception follows. org.apache.flume.FlumeException:
Unable to load source type:
com.cloudera.flume.source.TwitterSource, class:
com.cloudera.flume.source.TwitterSource.
(This is part of the result, I just copied the error part of it)
Can anyone help to solve this error please? I need to fix it to go on step 24 which is the last step.
Please find CDH 5.12 Flume Twitter Setup:
1. Here is file /usr/lib/flume-ng/conf/flume.conf:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type= com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = Hadoop,BigData
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/cloudera/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
2. Rename the below flume-env.sh.template file as flume-env.sh
~]$ sudo cp /usr/lib/flume-ng/conf/flume-env.sh.template /usr/lib/flume-ng/conf/flume-env.sh
3. Set JAVA_HOME and FLUME_CLASSPATH in flume-env.sh file as:
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
FLUME_CLASSPATH="/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"
4. If you don't find "/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar" on your system then download the apache-flume-1.6.0-bin from google and copy lib folder of this to current lib folder.
Link: https://www.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
4.1. Rename old lib folder
4.2. Download this above link to your cloudera desktop and do the following:
~]$ sudo mv /usr/lib/flume-ng/lib /usr/lib/flume-ng/lib_cloudera
~]$ sudo mv /home/cloudera/Desktop/apache-flume-1.6.0-bin/lib /usr/lib/flume-ng/lib
5. Now run Flume Agent Command:
~]$ flume-ng agent --conf-file /usr/lib/flume-ng/conf/flume.conf --name TwitterAgent -Dflume.root.logger=INFO,console -n TwitterAgent
This should run successfully.
All the Best.

The run result of flume and test flume

enter image description here
enter image description here
my flume configuration file is as follows:
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /home/hadoop/flume-1.5.0-bin/log_exec_tail
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
And start my flume agent with the following stript:
bin/flume-ng agent -n a1 -c conf -f conf/flume_log.conf -Dflume.root.logger=INFO,console
question 1: the run result is as follows, however I don't konw if it run successful or not!
question 2: And there is the sentences as follows and I don't know what the mean is about "the queation of flume test":
NOTE: To test that the Flume agent is running properly, open a new terminal window and change directories to /home/horton/solutions/:
horton#ip:~$ cd /home/horton/solutions/
Run the following script, which writes log entries to nodemanager.log:
$ ./test_flume_log.sh
If successful, you should see new files in the /user/horton/flume_sink directory in HDFS
Stop the logagent Flume agent
As per your flume configuration, whenever the file /home/hadoop/flume-1.5.0-bin/log_exec_tail is changed, it will do a tail operation and append the results in the console.
So to test it working correctly,
1. run the command bin/flume-ng agent -n a1 -c conf -f conf/flume_log.conf -Dflume.root.logger=INFO,console
2. Open a terminal and add few lines in the file /home/hadoop/flume-1.5.0-bin/log_exec_tail
3. Save it
4. Now check the terminal where you triggered flume command
5. You can see newly added lines displayed

Can apache avro write to network?

I am trying to write huge amount of logs to hdfs. For that i am using flume with hdfs as sink and avro as source. What i need to do is serialize my logs using avro over the network to my flume. The source of the flume is configured as:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
EDIT: fixed code block
Use Flume's RpcClient:
RpcClient client = RpcClientFactory.getDefaultInstance(host, 4141);
client.append(EventBuilder.withBody(message));
client.close();

loading file into hdfs using flume

***I want to load a text file from my system into hdfs.
this is my conf file:
agent.sources = seqGenSrc
agent.sinks = loggerSink
agent.channels = memoryChannel
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F my.system.IP/D:/salespeople.txt
agent.sinks.loggerSink.type = hdfs
agent.sinks.loggerSink.hdfs.path = hdfs://IP.address:port:user/flume
agent.sinks.loggerSink.hdfs.filePrefix = events-
agent.sinks.loggerSink.hdfs.round = true
agent.sinks.loggerSink.hdfs.roundValue = 10
agent.sinks.loggerSink.hdfs.roundUnit = minute
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 1000
agent.channels.memoryChannel.transactionCapacity = 100
agent.sources.seqGenSrc.channels = memoryChannel
agent.sinks.loggerSink.channel = memoryChannel
** when i run it .. i get following .. and then it gets stuck.
13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel memoryChannel
13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Waiting for channel:
memoryChannel to start. Sleeping for 500 ms
13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink loggerSink
13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Starting Source seqGenSrc
13/07/23 16:30:44 INFO source.ExecSource: Exec source starting with command:tail -F 10.48.226.27/D:/salespeople.txt
** where am i wrong, or what could be the error ??
I assume you want to write your file to /user/flume, so your path should be :
agent.sinks.loggerSink.hdfs.path = hdfs://IP.address:port/user/flume
As your agent uses tail -F there is no message that tells you it is finished (because it never is ^^). if you want to know if your file is created you have to look at /user/flume folder.
I'm using a configuration like yours and it works perfectly. You could try using
-Dflume.root.logger=INFO,console to have more information ?

Resources