Flume configuration not working showing exception - flume

I have the followinf flume configuration. I am trying to transfer a file of size 9GB to hdfs using flume from spool directory. I have the following flume configuration.
#initialize agent's source, channel and sink
wagent.sources = wavetronix
wagent.channels = memoryChannel2
wagent.sinks = flumeHDFS
# Setting the source to spool directory where the file exists
wagent.sources.wavetronix.type = spooldir
wagent.sources.wavetronix.spoolDir = /johir/WAVETRONIX/output/Yesterday
wagent.sources.wavetronix.fileHeader = false
wagent.sources.wavetronix.basenameHeader = true
#agent.sources.wavetronix.fileSuffix = .COMPLETED
# Setting the channel to memory
wagent.channels.memoryChannel2.type = memory
# Max number of events stored in the memory channel
wagent.channels.memoryChannel2.capacity = 50000
agent.channels.memoryChannel2.batchSize = 1000
wagent.channels.memoryChannel2.transactioncapacity = 1000
# Setting the sink to HDFS
wagent.sinks.flumeHDFS.type = hdfs
#agent.sinks.flumeHDFS.useLocalTimeStamp = true
wagent.sinks.flumeHDFS.hdfs.path =/user/root/WAVETRONIXFLUME/%Y-%m-%d/
wagent.sinks.flumeHDFS.hdfs.useLocalTimeStamp = true
wagent.sinks.flumeHDFS.hdfs.filePrefix= %{basename}
wagent.sinks.flumeHDFS.hdfs.fileType = DataStream
# Write format can be text or writable
wagent.sinks.flumeHDFS.hdfs.writeFormat = Text
# use a single csv file at a time
wagent.sinks.flumeHDFS.hdfs.maxOpenFiles = 1
wagent.sinks.flumeHDFS.hdfs.rollCount=0
wagent.sinks.flumeHDFS.hdfs.rollInterval=0
wagent.sinks.flumeHDFS.hdfs.rollSize = 6400000
wagent.sinks.flumeHDFS.hdfs.batchSize =1000
# never rollover based on the number of events
wagent.sinks.flumeHDFS.hdfs.rollCount = 0
# rollover file based on max time of 1 min
#agent.sinks.flumeHDFS.hdfs.rollInterval = 0
# agent.sinks.flumeHDFS.hdfs.idleTimeout = 600
# Connect source and sink with channel
wagent.sources.wavetronix.channels = memoryChannel2
wagent.sinks.flumeHDFS.channel = memoryChannel2
But I am getting the following exception.
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor"
java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1043)
at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1535)
at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:463)
at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.log4j.spi.LoggingEvent.(LoggingEvent.java:165)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:479)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Can anyone help me to solve this problem?

Please edit the file ${FLUME_HOME}/conf/flume-env.sh, then add following code:
export JAVA_OPTS="-Xms1000m -Xmx12000m -Dcom.sun.management.jmxremote"
You can adjust the options "Xmx" and "Xms".

Related

Gatling to InfluxDB connection in windows

I am using gatling and influxdb in windows 10. I am trying to send some results from gatling to influxdb. But the results are not being pushed to the influxdb. Can someone help me.
My graphite config file
data {
#writers = [console, file, graphite]
console {
#light = false
#writePeriod = 5
}
file {
bufferSize = 8192 # FileDataWriter's internal data buffer size, in bytes
}
leak {
#noActivityTimeout = 30 # Period, in seconds, for which Gatling may have no activity before considering a leak may be happening
}
graphite {
# light = false # only send the all* stats
host = "localhost" # The host where the Carbon server is located
port = 2003 # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
bufferSize = 8192 # Internal data buffer size, in bytes
writePeriod = 1 # Write period, in seconds
}
}
My influxdb config file is
[[graphite]]
enabled = true
database = "gatlingdb"
retention-policy = ""
bind-address = ":2003"
protocol = "tcp"
consistency-level = "one"
batch-size = 5000
batch-pending = 10
batch-timeout = "1s"
udp-read-buffer = 0
separator = "."
templates = [
"gatling.*.*.*.count measurement.simulation.request.status.field",
"gatling.*.*.*.min measurement.simulation.request.status.field",
"gatling.*.*.*.max measurement.simulation.request.status.field",
"gatling.*.*.*.percentiles50 measurement.simulation.request.status.field",
"gatling.*.*.*.percentiles75 measurement.simulation.request.status.field",
"gatling.*.*.*.percentiles95 measurement.simulation.request.status.field",
"gatling.*.*.*.percentiles99 measurement.simulation.request.status.field"
]
Not sure why it is not working.
Uncomment #writers = [console, file, graphite]

Apache karaf4.2.3 - separate log file for each bundle

How to create a separate log file for each bundle deployed in karaf-4.2.3 using pax logging, which has log4j2 native style config?
I've tried with routing appender, but no results.
I am excepted to write each bundle logs in a separate log file for easy debugging.
I don't know anyway doing this automatically. But what you could do is to create for each module a separate configuration based on the root package name
log4j2.logger.xy.name = com.company.module.xy
log4j2.logger.xy.level = INFO
log4j2.logger.xy.additivity = false
log4j2.logger.xy.appenderRef.inovel.ref = XyFile
log4j2.logger.zz.name = com.company.module.zz
log4j2.logger.zz.level = INFO
log4j2.logger.zz.additivity = false
log4j2.logger.zz.appenderRef.inovel.ref = ZzFile
log4j2.logger.keycloak.name = org.keycloak
log4j2.logger.keycloak.level = INFO
log4j2.logger.keycloak.additivity = false
log4j2.logger.keycloak.appenderRef.keycloak.ref = KeycloakFile
And a ref could look like
# keyclok file appender
log4j2.appender.keycloak.type = RollingRandomAccessFile
log4j2.appender.keycloak.name = KeycloakFile
log4j2.appender.keycloak.fileName = ${karaf.data}/log/keycloak.log
log4j2.appender.keycloak.filePattern = ${karaf.data}/log/keycloak.log.%i
log4j2.appender.keycloak.append = true
log4j2.appender.keycloak.layout.type = PatternLayout
log4j2.appender.keycloak.layout.pattern = %d{ISO8601}
log4j2.appender.keycloak.policies.type = Policies
log4j2.appender.keycloak.policies.size.type = SizeBasedTriggeringPolicy
log4j2.appender.keycloak.policies.size.size = 8MB
log4j2.appender.keycloak.strategy.type = DefaultRolloverStrategy
log4j2.appender.keycloak.strategy.max = 10
This is a lot of manual work. So maybe someone come up with an automatic configuration
Sincerely
Just have a look at the official Log4j 2.x configuration coming with every Karaf distribution and have a look at the commented "Routing" section.
E.g. I've used the following in one of my projects:
# Root logger
log4j2.rootLogger.level = INFO
log4j2.rootLogger.appenderRef.RollingFile.ref = RollingFile
log4j2.rootLogger.appenderRef.RollingFile.filter.threshold.type = ThresholdFilter
log4j2.rootLogger.appenderRef.RollingFile.filter.threshold.level = WARN
log4j2.rootLogger.appenderRef.PaxOsgi.ref = PaxOsgi
log4j2.rootLogger.appenderRef.Console.ref = Console
log4j2.rootLogger.appenderRef.Console.filter.threshold.type = ThresholdFilter
log4j2.rootLogger.appenderRef.Console.filter.threshold.level = ${karaf.log.console:-OFF}
# Enable log routing...
log4j2.rootLogger.appenderRef.Routing.ref = Routing
# Loggers configuration
...
# Configure the routing (pay close attention to the escapes)...
log4j2.appender.routing.type = Routing
log4j2.appender.routing.name = Routing
log4j2.appender.routing.routes.type = Routes
log4j2.appender.routing.routes.pattern = \$\$\\\{ctx:bundle.name\}
log4j2.appender.routing.routes.bundle.type = Route
log4j2.appender.routing.routes.bundle.appender.type = RollingRandomAccessFile
log4j2.appender.routing.routes.bundle.appender.name = Bundle-\$\\\{ctx:bundle.name\}
log4j2.appender.routing.routes.bundle.appender.fileName = ${karaf.data}/log/bundle-\$\\\{ctx:bundle.name\}.log
log4j2.appender.routing.routes.bundle.appender.filePattern = ${karaf.data}/log/bundle-\$\\\{ctx:bundle.name\}.log.%d{yyyy-MM-dd}
log4j2.appender.routing.routes.bundle.appender.append = true
log4j2.appender.routing.routes.bundle.appender.layout.type = PatternLayout
log4j2.appender.routing.routes.bundle.appender.layout.pattern = ${log4j2.pattern}
log4j2.appender.routing.routes.bundle.appender.policies.type = Policies
log4j2.appender.routing.routes.bundle.appender.policies.time.type = TimeBasedTriggeringPolicy
log4j2.appender.routing.routes.bundle.appender.strategy.type = DefaultRolloverStrategy
log4j2.appender.routing.routes.bundle.appender.strategy.max = 31
That clearly worked for me. I wouldn't even think about a static configuration in OSGi! ;-)
log4j Configuration commented section on below link
https://github.com/apache/karaf/blob/master/assemblies/features/base/src/main/resources/resources/etc/org.ops4j.pax.logging.cfg
will log messages for each bundle to a separate file but By default karaf comes with multiple bundles this will result one log file for each bundle. So many logs file will be generated.
How it can be done for specific bundles which user have deployed on deploy folder

how to fetch all the records every minute from a sql table using Apache Flume

I am trying to get all the data from sql table every minute using Flume.
Can someone please suggest what config changes needs to be done?
Configs :
agent.channels = ch1
agent.sinks = kafkaSink
agent.sources = sql-source
agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 1000000
agent.sources.sql-source.channels = ch1
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource
# URL to connect to database
agent.sources.sql-source.connection.url = jdbc:sybase:Tds:abcServer:4500
# Database connection properties
agent.sources.sql-source.user = user
agent.sources.sql-source.password = XXXXXXX
agent.sources.sql-source.table = person
agent.sources.sql-source.columns.to.select = *
# Increment column properties
agent.sources.sql-source.incremental.column.name = person_id
# Increment value is from you want to start taking data from tables (0 will import entire table)
agent.sources.sql-source.incremental.value = 0
# Query delay, each configured milisecond the query will be sent
agent.sources.sql-source.run.query.delay=1000
# Status file is used to save last readed row
agent.sources.sql-source.status.file.path = /dump/apache-flume-1.6.0-bin
agent.sources.sql-source.status.file.name = sql-source.status
Change value of agent.sources.sql-source.run.query.delay to 60000..

Flume Multiplexing not working

I have configured my flume agent like below. Somehow, the flume agent doesn't run properly. It keeps hanging without any errors. Is there any problem with the below configuration.
FYI: I have a file named "country" with hard-coded header as state
#Define sources, sink and channels
foo.sources = s1
foo.channels = chn-az chn-oth
foo.sinks = sink-az sink-oth
#
### # # Define a source on agent and connect to channel memory-channel.
foo.sources.s1.type = exec
foo.sources.s1.command = cat /home/hadoop/flume/country.txt
foo.sources.s1.batchSize = 1
foo.sources.s1.channels = chn-ca chn-oth
#selector configuration
foo.sources.s1.selector.type = multiplexing
foo.sources.s1.selector.header = state
foo.sources.s1.selector.mapping.AZ = chn-az
foo.sources.s1.selector.default = chn-oth
#
#
### Define a memory channel on agent called memory-channel.
foo.channels.chn-az.type = memory
foo.channels.chn-oth.type = memory
#
#
##Define sinks that outputs to hdfs.
foo.sinks.sink-az.channel = chn-az
foo.sinks.sink-az.type = hdfs
foo.sinks.sink-az.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-az.hdfs.filePrefix = statefilter
foo.sinks.sink-az.hdfs.fileType = DataStream
foo.sinks.sink-az.hdfs.writeFormat = Text
foo.sinks.sink-az.batchSize = 1
foo.sinks.sink-az.rollInterval = 0
#
foo.sinks.sink-oth.channel = chn-oth
foo.sinks.sink-oth.type = hdfs
foo.sinks.sink-oth.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-oth.hdfs.filePrefix = statefilter
foo.sinks.sink-oth.hdfs.fileType = DataStream
foo.sinks.sink-oth.batchSize = 1
foo.sinks.sink-oth.rollInterval = 0
Thanks,
Vinoth
Regarding the channels list configured at the source:
foo.sources.s1.channels = chn-ca chn-oth
I think chn-ca should be chn-az.
Nevertheless, I think such a configuration will never work since the "state" header used by the selector is not created by any Flume component. You must introduce an interceptor for that, typically the Regex Extractor Interceptor.

how to improve apache flume performance to write data in hbase

I m using apache-flume1.4.0 with hbase0.94.10 and hadoop1.1.2.
flume agent have spool directory as source and hbase as sink and file channel.It is running successfully but very slow.what should I do for improving hbase write performance.
Flume agent conf is as below:
agent1.sources = spool
agent1.channels = fileChannel
agent1.sinks = sink
agent1.sources.spool.type = spooldir
agent1.sources.spool.spoolDir = /opt/spoolTest/
agent1.sources.spool.fileSuffix = .completed
agent1.sources.spool.channels = fileChannel
#agent1.sources.spool.deletePolicy = immediate
agent1.sinks.sink.type = org.apache.flume.sink.hbase.HBaseSink
agent1.sinks.sink.channel = fileChannel
agent1.sinks.sink.table = test
agent1.sinks.sink.columnFamily = log
agent1.sinks.sink.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent1.sinks.sink.serializer.regex = (.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)^C(.*)
agent1.sinks.sink.serializer.colNames = id,no_fill_reason,adInfo,locationInfo,handsetInfo,siteInfo,reportDate,ipaddress,headerContent,userParaContent,reqParaContent,otherPara,others,others1
agent1.sinks.sink1.batchSize = 100
agent1.channels.fileChannel.type = file
agent1.channels.fileChannel.checkpointDir = /usr/flumeFileChannel/chkpointFlume
agent1.channels.fileChannel.dataDirs = /usr/flumeFileChannel/dataFlume
agent1.channels.fileChannel.capacity = 10000000
agent1.channels.fileChannel.transactionCapacity = 100000
What should be capacity,transaction capacity of file channel and batch size of sink.
Please help me.
Thanks in advance.

Resources