Flume Multiplexing not working - flume

I have configured my flume agent like below. Somehow, the flume agent doesn't run properly. It keeps hanging without any errors. Is there any problem with the below configuration.
FYI: I have a file named "country" with hard-coded header as state
#Define sources, sink and channels
foo.sources = s1
foo.channels = chn-az chn-oth
foo.sinks = sink-az sink-oth
#
### # # Define a source on agent and connect to channel memory-channel.
foo.sources.s1.type = exec
foo.sources.s1.command = cat /home/hadoop/flume/country.txt
foo.sources.s1.batchSize = 1
foo.sources.s1.channels = chn-ca chn-oth
#selector configuration
foo.sources.s1.selector.type = multiplexing
foo.sources.s1.selector.header = state
foo.sources.s1.selector.mapping.AZ = chn-az
foo.sources.s1.selector.default = chn-oth
#
#
### Define a memory channel on agent called memory-channel.
foo.channels.chn-az.type = memory
foo.channels.chn-oth.type = memory
#
#
##Define sinks that outputs to hdfs.
foo.sinks.sink-az.channel = chn-az
foo.sinks.sink-az.type = hdfs
foo.sinks.sink-az.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-az.hdfs.filePrefix = statefilter
foo.sinks.sink-az.hdfs.fileType = DataStream
foo.sinks.sink-az.hdfs.writeFormat = Text
foo.sinks.sink-az.batchSize = 1
foo.sinks.sink-az.rollInterval = 0
#
foo.sinks.sink-oth.channel = chn-oth
foo.sinks.sink-oth.type = hdfs
foo.sinks.sink-oth.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-oth.hdfs.filePrefix = statefilter
foo.sinks.sink-oth.hdfs.fileType = DataStream
foo.sinks.sink-oth.batchSize = 1
foo.sinks.sink-oth.rollInterval = 0
Thanks,
Vinoth

Regarding the channels list configured at the source:
foo.sources.s1.channels = chn-ca chn-oth
I think chn-ca should be chn-az.
Nevertheless, I think such a configuration will never work since the "state" header used by the selector is not created by any Flume component. You must introduce an interceptor for that, typically the Regex Extractor Interceptor.

Related

How to use multiple appenders of same type with one logger using properties file?

I have what should be a simple question but I can't figure it out. What is the correct syntax to use multiple appenders of the same type (RollingFile) with a single logger in Log4j2 properties file format?
For background, I am using Karaf 4.2.7 which uses pax logging. My logging config file is in the properties format.
log4j2.appender.fileapp1.type = RollingRandomAccessFile
log4j2.appender.fileapp1.name = FileApp1
...
log4j2.appender.fileapp2.type = RollingRandomAccessFile
log4j2.appender.fileapp2.name = FileApp2
...
log4j2.logger.myloggername.name = com.acme
log4j2.logger.myloggername.appenderRef.RollingFile.ref = FileApp1, FileApp2
Putting both appenders on that last line separated by a comma does not work. It works if I have only one appender or the other. I also tried
log4j2.logger.myloggername.appenderRef.RollingFile.ref = [FileApp1, FileApp2]
log4j2.logger.myloggername.appenderRef.RollingFile.ref = {FileApp1, FileApp2}
log4j2.logger.myloggername.appenderRef.RollingFile.ref = [{FileApp1}, {FileApp2}]
None of those works. I can't seem to find any examples online of how to do this.
I refer to two web page(thanks).
log4j 2 log4j2.properties(Configuration option)
Log4J 2 Configuration: Using the Properties File
Add and define "~s".
appenders, appenderRefs,
This is notice for define what will be on next.
name=PropertiesConfig
property.filename_fileapp1 = ./logs/fileapp1.log
property.filename_fileapp2 = ./logs/fileapp2.log
appenders = console, fileapp1, fileapp2
appender.console.type = Console
appender.console.name = STDOUT
...
appender.fileapp1.type = RollingRandomAccessFile
appender.fileapp1.name = fileapp1_AppenderName
appender.fileapp1.fileName = ${filename_fileapp1}
appender.fileapp1.filePattern = ${filename_fileapp1}.%d{yyyy-MM-dd}.log
...
appender.fileapp2.type = RollingRandomAccessFile
appender.fileapp2.name = fileapp2_AppenderName
appender.fileapp2.fileName = ${filename_fileapp2}
appender.fileapp2.filePattern = ${filename_fileapp2}.%d{yyyy-MM-dd}.log
...
loggers = mylogger1
logger.mylogger1.name = com.jornathan.sample.log4j2PropertyTest
logger.mylogger1.level = info
#keep this value for testing.
logger.mylogger1.additivity = true
#Here is what you need.
logger.mylogger1.appenderRefs = fileapp1Appender, fileapp2Appender
logger.mylogger1.appenderRef.fileapp1Appender.ref = fileapp1_AppenderName
logger.mylogger1.appenderRef.fileapp2Appender.ref = fileapp2_AppenderName

Apache karaf4.2.3 - separate log file for each bundle

How to create a separate log file for each bundle deployed in karaf-4.2.3 using pax logging, which has log4j2 native style config?
I've tried with routing appender, but no results.
I am excepted to write each bundle logs in a separate log file for easy debugging.
I don't know anyway doing this automatically. But what you could do is to create for each module a separate configuration based on the root package name
log4j2.logger.xy.name = com.company.module.xy
log4j2.logger.xy.level = INFO
log4j2.logger.xy.additivity = false
log4j2.logger.xy.appenderRef.inovel.ref = XyFile
log4j2.logger.zz.name = com.company.module.zz
log4j2.logger.zz.level = INFO
log4j2.logger.zz.additivity = false
log4j2.logger.zz.appenderRef.inovel.ref = ZzFile
log4j2.logger.keycloak.name = org.keycloak
log4j2.logger.keycloak.level = INFO
log4j2.logger.keycloak.additivity = false
log4j2.logger.keycloak.appenderRef.keycloak.ref = KeycloakFile
And a ref could look like
# keyclok file appender
log4j2.appender.keycloak.type = RollingRandomAccessFile
log4j2.appender.keycloak.name = KeycloakFile
log4j2.appender.keycloak.fileName = ${karaf.data}/log/keycloak.log
log4j2.appender.keycloak.filePattern = ${karaf.data}/log/keycloak.log.%i
log4j2.appender.keycloak.append = true
log4j2.appender.keycloak.layout.type = PatternLayout
log4j2.appender.keycloak.layout.pattern = %d{ISO8601}
log4j2.appender.keycloak.policies.type = Policies
log4j2.appender.keycloak.policies.size.type = SizeBasedTriggeringPolicy
log4j2.appender.keycloak.policies.size.size = 8MB
log4j2.appender.keycloak.strategy.type = DefaultRolloverStrategy
log4j2.appender.keycloak.strategy.max = 10
This is a lot of manual work. So maybe someone come up with an automatic configuration
Sincerely
Just have a look at the official Log4j 2.x configuration coming with every Karaf distribution and have a look at the commented "Routing" section.
E.g. I've used the following in one of my projects:
# Root logger
log4j2.rootLogger.level = INFO
log4j2.rootLogger.appenderRef.RollingFile.ref = RollingFile
log4j2.rootLogger.appenderRef.RollingFile.filter.threshold.type = ThresholdFilter
log4j2.rootLogger.appenderRef.RollingFile.filter.threshold.level = WARN
log4j2.rootLogger.appenderRef.PaxOsgi.ref = PaxOsgi
log4j2.rootLogger.appenderRef.Console.ref = Console
log4j2.rootLogger.appenderRef.Console.filter.threshold.type = ThresholdFilter
log4j2.rootLogger.appenderRef.Console.filter.threshold.level = ${karaf.log.console:-OFF}
# Enable log routing...
log4j2.rootLogger.appenderRef.Routing.ref = Routing
# Loggers configuration
...
# Configure the routing (pay close attention to the escapes)...
log4j2.appender.routing.type = Routing
log4j2.appender.routing.name = Routing
log4j2.appender.routing.routes.type = Routes
log4j2.appender.routing.routes.pattern = \$\$\\\{ctx:bundle.name\}
log4j2.appender.routing.routes.bundle.type = Route
log4j2.appender.routing.routes.bundle.appender.type = RollingRandomAccessFile
log4j2.appender.routing.routes.bundle.appender.name = Bundle-\$\\\{ctx:bundle.name\}
log4j2.appender.routing.routes.bundle.appender.fileName = ${karaf.data}/log/bundle-\$\\\{ctx:bundle.name\}.log
log4j2.appender.routing.routes.bundle.appender.filePattern = ${karaf.data}/log/bundle-\$\\\{ctx:bundle.name\}.log.%d{yyyy-MM-dd}
log4j2.appender.routing.routes.bundle.appender.append = true
log4j2.appender.routing.routes.bundle.appender.layout.type = PatternLayout
log4j2.appender.routing.routes.bundle.appender.layout.pattern = ${log4j2.pattern}
log4j2.appender.routing.routes.bundle.appender.policies.type = Policies
log4j2.appender.routing.routes.bundle.appender.policies.time.type = TimeBasedTriggeringPolicy
log4j2.appender.routing.routes.bundle.appender.strategy.type = DefaultRolloverStrategy
log4j2.appender.routing.routes.bundle.appender.strategy.max = 31
That clearly worked for me. I wouldn't even think about a static configuration in OSGi! ;-)
log4j Configuration commented section on below link
https://github.com/apache/karaf/blob/master/assemblies/features/base/src/main/resources/resources/etc/org.ops4j.pax.logging.cfg
will log messages for each bundle to a separate file but By default karaf comes with multiple bundles this will result one log file for each bundle. So many logs file will be generated.
How it can be done for specific bundles which user have deployed on deploy folder

Flume configuration not working showing exception

I have the followinf flume configuration. I am trying to transfer a file of size 9GB to hdfs using flume from spool directory. I have the following flume configuration.
#initialize agent's source, channel and sink
wagent.sources = wavetronix
wagent.channels = memoryChannel2
wagent.sinks = flumeHDFS
# Setting the source to spool directory where the file exists
wagent.sources.wavetronix.type = spooldir
wagent.sources.wavetronix.spoolDir = /johir/WAVETRONIX/output/Yesterday
wagent.sources.wavetronix.fileHeader = false
wagent.sources.wavetronix.basenameHeader = true
#agent.sources.wavetronix.fileSuffix = .COMPLETED
# Setting the channel to memory
wagent.channels.memoryChannel2.type = memory
# Max number of events stored in the memory channel
wagent.channels.memoryChannel2.capacity = 50000
agent.channels.memoryChannel2.batchSize = 1000
wagent.channels.memoryChannel2.transactioncapacity = 1000
# Setting the sink to HDFS
wagent.sinks.flumeHDFS.type = hdfs
#agent.sinks.flumeHDFS.useLocalTimeStamp = true
wagent.sinks.flumeHDFS.hdfs.path =/user/root/WAVETRONIXFLUME/%Y-%m-%d/
wagent.sinks.flumeHDFS.hdfs.useLocalTimeStamp = true
wagent.sinks.flumeHDFS.hdfs.filePrefix= %{basename}
wagent.sinks.flumeHDFS.hdfs.fileType = DataStream
# Write format can be text or writable
wagent.sinks.flumeHDFS.hdfs.writeFormat = Text
# use a single csv file at a time
wagent.sinks.flumeHDFS.hdfs.maxOpenFiles = 1
wagent.sinks.flumeHDFS.hdfs.rollCount=0
wagent.sinks.flumeHDFS.hdfs.rollInterval=0
wagent.sinks.flumeHDFS.hdfs.rollSize = 6400000
wagent.sinks.flumeHDFS.hdfs.batchSize =1000
# never rollover based on the number of events
wagent.sinks.flumeHDFS.hdfs.rollCount = 0
# rollover file based on max time of 1 min
#agent.sinks.flumeHDFS.hdfs.rollInterval = 0
# agent.sinks.flumeHDFS.hdfs.idleTimeout = 600
# Connect source and sink with channel
wagent.sources.wavetronix.channels = memoryChannel2
wagent.sinks.flumeHDFS.channel = memoryChannel2
But I am getting the following exception.
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor"
java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1043)
at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1535)
at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:463)
at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.log4j.spi.LoggingEvent.(LoggingEvent.java:165)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:479)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Can anyone help me to solve this problem?
Please edit the file ${FLUME_HOME}/conf/flume-env.sh, then add following code:
export JAVA_OPTS="-Xms1000m -Xmx12000m -Dcom.sun.management.jmxremote"
You can adjust the options "Xmx" and "Xms".

Graphite carbon-relay not working

I have two graphite setup and I am trying to relay the traffic between the two, but somehow the carbon-relay is not working.
My cache runs on 2003/2004 and relay on 2013/2014
Following are the configurations done :
#carbon file
[cache:b]
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004
CACHE_QUERY_PORT = 7012
[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2003:a, aa.bb.cc.dd:2003:b
#relay-rules file
[default]
default = true
destinations = 127.0.0.1:2003:a, aa.bb.cc.dd:2003:b
Any pointers will be helpful
As part of the recent project at work, I've figured out that carbon demons uses PICKLE protocol when sending data to the destinations.
So the destination of carbon-relay should be carbon-cache's pickle receiver port instead.
#carbon.conf
....
[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2004:a, aa.bb.cc.dd:2004:b
Also modify the relay-rules.conf with the same destinations specified in carbon.conf
relay-rules.conf
.....
[default]
default = true
destinations = 127.0.0.1:2004:a, aa.bb.cc.dd:2004:b

how to improve apache flume performance to write data in hbase

I m using apache-flume1.4.0 with hbase0.94.10 and hadoop1.1.2.
flume agent have spool directory as source and hbase as sink and file channel.It is running successfully but very slow.what should I do for improving hbase write performance.
Flume agent conf is as below:
agent1.sources = spool
agent1.channels = fileChannel
agent1.sinks = sink
agent1.sources.spool.type = spooldir
agent1.sources.spool.spoolDir = /opt/spoolTest/
agent1.sources.spool.fileSuffix = .completed
agent1.sources.spool.channels = fileChannel
#agent1.sources.spool.deletePolicy = immediate
agent1.sinks.sink.type = org.apache.flume.sink.hbase.HBaseSink
agent1.sinks.sink.channel = fileChannel
agent1.sinks.sink.table = test
agent1.sinks.sink.columnFamily = log
agent1.sinks.sink.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent1.sinks.sink.serializer.regex = (.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)
agent1.sinks.sink.serializer.colNames = id,no_fill_reason,adInfo,locationInfo,handsetInfo,siteInfo,reportDate,ipaddress,headerContent,userParaContent,reqParaContent,otherPara,others,others1
agent1.sinks.sink1.batchSize = 100
agent1.channels.fileChannel.type = file
agent1.channels.fileChannel.checkpointDir = /usr/flumeFileChannel/chkpointFlume
agent1.channels.fileChannel.dataDirs = /usr/flumeFileChannel/dataFlume
agent1.channels.fileChannel.capacity = 10000000
agent1.channels.fileChannel.transactionCapacity = 100000
What should be capacity,transaction capacity of file channel and batch size of sink.
Please help me.
Thanks in advance.

Resources