how to fetch all the records every minute from a sql table using Apache Flume - flume

I am trying to get all the data from sql table every minute using Flume.
Can someone please suggest what config changes needs to be done?
Configs :
agent.channels = ch1
agent.sinks = kafkaSink
agent.sources = sql-source
agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 1000000
agent.sources.sql-source.channels = ch1
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource
# URL to connect to database
agent.sources.sql-source.connection.url = jdbc:sybase:Tds:abcServer:4500
# Database connection properties
agent.sources.sql-source.user = user
agent.sources.sql-source.password = XXXXXXX
agent.sources.sql-source.table = person
agent.sources.sql-source.columns.to.select = *
# Increment column properties
agent.sources.sql-source.incremental.column.name = person_id
# Increment value is from you want to start taking data from tables (0 will import entire table)
agent.sources.sql-source.incremental.value = 0
# Query delay, each configured milisecond the query will be sent
agent.sources.sql-source.run.query.delay=1000
# Status file is used to save last readed row
agent.sources.sql-source.status.file.path = /dump/apache-flume-1.6.0-bin
agent.sources.sql-source.status.file.name = sql-source.status

Change value of agent.sources.sql-source.run.query.delay to 60000..

Related

Limit of 1024 stream entries in the handler in DolphinDB subscription?

n=1000000
tmpTrades = table(n:0, colNames, colTypes)
lastMinute = [00:00:00.000]
colNames = `time`sym`vwap
colTypes = [MINUTE,SYMBOL,DOUBLE]
enableTableShareAndPersistence(table=streamTable(n:0, colNames, colTypes), tableName="vwap_stream")
go
def calcVwap(mutable vwap, mutable tmpTrades, mutable lastMinute, msg){
tmpTrades.append!(msg)
curMinute = time(msg.time.last().minute()*60000l)
t = select wavg(price, qty) as vwap from tmpTrades where time < curMinute, time >= lastMinute[0] group by time.minute(), sym
if(t.size() == 0) return
vwap.append!(t)
t = select * from tmpTrades where time >= curMinute
tmpTrades.clear!()
lastMinute[0] = curMinute
if(t.size() > 0) tmpTrades.append!(t)
}
subscribeTable(tableName="trades_stream", actionName="vwap", offset=-1, handler=calcVwap{vwap_stream, tmpTrades, lastMinute}, msgAsTable=true)
This is what I wrote to subscribe to the stream. Even though the data ingested to the publishing table vwap_stream is 5000 records per batch, the maximum number ingested to the handler is still 1024. Is there a limit to the subscription?
You can modify the configuration parameter maxMsgNumPerBlock to specify the maximum number of records in a message block and the default value is 1024.
For a standalone mode the configuration file is dolphindb.cfg, and for a clustered mode is cluster.cfg.

How to parse the config file shown to create a lua table that is desired?

I want to parse a config file which has information like:
[MY_WINDOW_0]
Address = 0xA0B0C0D0
Size = 0x100
Type = cpu0
[MY_WINDOW_1]
Address = 0xB0C0D0A0
Size = 0x200
Type = cpu0
[MY_WINDOW_2]
Address = 0xC0D0A0B0
Size = 0x100
Type = cpu1
into a LUA table as follows
CPU_TRACE_WINDOWS =
{
["cpu0"] = {{address = 0xA0B0C0D0, size = 0x100},{address = 0xB0C0D0A0, size = 0x200},}
["cpu1"] = {{address = 0xC0D0A0B0, size = 0x100},...}
}
I tried my best with some basic LUA string manipulation functions but couldn't get the output that I'm looking for due to repetition of strings in each sections like 'Address',' Size', 'Type' etc. Also my actual config file is huge with 20 such sections.
I got so far, this is basically one section of the code, rest would be just repetition of the logic.
OriginalConfigFile = "test.cfg"
os.execute("cls")
CPU_TRACE_WINDOWS = {}
local bus
for line in io.lines(OriginalConfigFile) do
if string.find(line, "Type") ~= nil then
bus = string.gsub(line, "%a=%a", "")
k,v = string.match(bus, "(%w+) = (%w+)")
table.insert(CPU_TRACE_WINDOWS, v)
end
end
Basically I'm having trouble with coming up with the FINAL TABLE STRUCTURE that I need. v here is the different rvalues of the string "Type". I'm having issues with arranging it in the table. I'm currently working to find a solution but I thought I could ask for help meanwhile.
This should work for you. Just change the filename to wherever you have your config file stored.
f, Address, Size, Type = io.input("configfile"), "", "", ""
CPU_TRACE_WINDOWS = {}
for line in f:lines() do
if line:find("MY_WINDOW") then
Type = ""
Address = ""
Size = ""
elseif line:find("=") then
_G[line:match("^%a+")] = line:match("[%d%a]+$")
if line:match("Type") then
if not CPU_TRACE_WINDOWS[Type] then
CPU_TRACE_WINDOWS[Type] = {}
end
table.insert(CPU_TRACE_WINDOWS[Type], {address = Address, size = Size})
end
end
end
end
It searches for the MY_WINDOW phrase and resets the variable. If the table exists within CPU_TRACE_WINDOWS, then it just appends a new table value, otherwise it just creates it. Note that this is dependent upon Type always being the last entry. If it switches up anywhere, then it will not have all the required information. There may be a cleaner way to do it, but this works (tested on my end).
Edit: Whoops, forgot to change the variables in the middle there if MY_WINDOW matched. That needed to be corrected.
Edit 2: Cleaned up the redundancy with table.insert. Only need it once, just need to make sure the table is created first.

Flume configuration not working showing exception

I have the followinf flume configuration. I am trying to transfer a file of size 9GB to hdfs using flume from spool directory. I have the following flume configuration.
#initialize agent's source, channel and sink
wagent.sources = wavetronix
wagent.channels = memoryChannel2
wagent.sinks = flumeHDFS
# Setting the source to spool directory where the file exists
wagent.sources.wavetronix.type = spooldir
wagent.sources.wavetronix.spoolDir = /johir/WAVETRONIX/output/Yesterday
wagent.sources.wavetronix.fileHeader = false
wagent.sources.wavetronix.basenameHeader = true
#agent.sources.wavetronix.fileSuffix = .COMPLETED
# Setting the channel to memory
wagent.channels.memoryChannel2.type = memory
# Max number of events stored in the memory channel
wagent.channels.memoryChannel2.capacity = 50000
agent.channels.memoryChannel2.batchSize = 1000
wagent.channels.memoryChannel2.transactioncapacity = 1000
# Setting the sink to HDFS
wagent.sinks.flumeHDFS.type = hdfs
#agent.sinks.flumeHDFS.useLocalTimeStamp = true
wagent.sinks.flumeHDFS.hdfs.path =/user/root/WAVETRONIXFLUME/%Y-%m-%d/
wagent.sinks.flumeHDFS.hdfs.useLocalTimeStamp = true
wagent.sinks.flumeHDFS.hdfs.filePrefix= %{basename}
wagent.sinks.flumeHDFS.hdfs.fileType = DataStream
# Write format can be text or writable
wagent.sinks.flumeHDFS.hdfs.writeFormat = Text
# use a single csv file at a time
wagent.sinks.flumeHDFS.hdfs.maxOpenFiles = 1
wagent.sinks.flumeHDFS.hdfs.rollCount=0
wagent.sinks.flumeHDFS.hdfs.rollInterval=0
wagent.sinks.flumeHDFS.hdfs.rollSize = 6400000
wagent.sinks.flumeHDFS.hdfs.batchSize =1000
# never rollover based on the number of events
wagent.sinks.flumeHDFS.hdfs.rollCount = 0
# rollover file based on max time of 1 min
#agent.sinks.flumeHDFS.hdfs.rollInterval = 0
# agent.sinks.flumeHDFS.hdfs.idleTimeout = 600
# Connect source and sink with channel
wagent.sources.wavetronix.channels = memoryChannel2
wagent.sinks.flumeHDFS.channel = memoryChannel2
But I am getting the following exception.
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor"
java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1043)
at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1535)
at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:463)
at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.log4j.spi.LoggingEvent.(LoggingEvent.java:165)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:479)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Can anyone help me to solve this problem?
Please edit the file ${FLUME_HOME}/conf/flume-env.sh, then add following code:
export JAVA_OPTS="-Xms1000m -Xmx12000m -Dcom.sun.management.jmxremote"
You can adjust the options "Xmx" and "Xms".

Flume Multiplexing not working

I have configured my flume agent like below. Somehow, the flume agent doesn't run properly. It keeps hanging without any errors. Is there any problem with the below configuration.
FYI: I have a file named "country" with hard-coded header as state
#Define sources, sink and channels
foo.sources = s1
foo.channels = chn-az chn-oth
foo.sinks = sink-az sink-oth
#
### # # Define a source on agent and connect to channel memory-channel.
foo.sources.s1.type = exec
foo.sources.s1.command = cat /home/hadoop/flume/country.txt
foo.sources.s1.batchSize = 1
foo.sources.s1.channels = chn-ca chn-oth
#selector configuration
foo.sources.s1.selector.type = multiplexing
foo.sources.s1.selector.header = state
foo.sources.s1.selector.mapping.AZ = chn-az
foo.sources.s1.selector.default = chn-oth
#
#
### Define a memory channel on agent called memory-channel.
foo.channels.chn-az.type = memory
foo.channels.chn-oth.type = memory
#
#
##Define sinks that outputs to hdfs.
foo.sinks.sink-az.channel = chn-az
foo.sinks.sink-az.type = hdfs
foo.sinks.sink-az.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-az.hdfs.filePrefix = statefilter
foo.sinks.sink-az.hdfs.fileType = DataStream
foo.sinks.sink-az.hdfs.writeFormat = Text
foo.sinks.sink-az.batchSize = 1
foo.sinks.sink-az.rollInterval = 0
#
foo.sinks.sink-oth.channel = chn-oth
foo.sinks.sink-oth.type = hdfs
foo.sinks.sink-oth.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-oth.hdfs.filePrefix = statefilter
foo.sinks.sink-oth.hdfs.fileType = DataStream
foo.sinks.sink-oth.batchSize = 1
foo.sinks.sink-oth.rollInterval = 0
Thanks,
Vinoth
Regarding the channels list configured at the source:
foo.sources.s1.channels = chn-ca chn-oth
I think chn-ca should be chn-az.
Nevertheless, I think such a configuration will never work since the "state" header used by the selector is not created by any Flume component. You must introduce an interceptor for that, typically the Regex Extractor Interceptor.

scapy dns sniff with additional records

i have python/scapy sniffer for DNS.
I am able to sniff DNS messages and get IP/UDP source and destination IP address and ports as well as DNS but I have problems parsing and getting additional answers and additional records if there is more then one.
from scapy i see DNS data i can get but do not know how to get additional records with ls(DNS),ls(DNSQR) and ls(DNSRR)
I would appreciate some help or solution to work this out.
My python/scapy script is below
#!/usr/bin/env python
from scapy.all import *
from datetime import datetime
import time
import datetime
import sys
############# MODIFY THIS PART IF NECESSARY ###############
interface = 'eth0'
filter_bpf = 'udp and port 53'
# ------ SELECT/FILTER MSGS
def select_DNS(pkt):
pkt_time = pkt.sprintf('%sent.time%')
# ------ SELECT/FILTER DNS MSGS
try:
if DNSQR in pkt and pkt.dport == 53:
# queries
print '[**] Detected DNS Message at: ' + pkt_time
p_id = pkt[DNS].id
cli_ip = pkt[IP].src
cli_port = pkt.sport
srv_ip = pkt[IP].dst
srv_port = pkt.dport
query = pkt[DNSQR].qname
q_class = pkt[DNSQR].qclass
qr_class = pkt[DNSQR].sprintf('%qclass%')
type = pkt[DNSQR].sprintf('%qtype%')
#
elif DNSRR in pkt and pkt.sport == 53:
# responses
pkt_time = pkt.sprintf('%sent.time%')
p_id = pkt[DNS].id
srv_ip = pkt[IP].src
srv_port = pkt.sport
cli_ip = pkt[IP].dst
cli_port = pkt.dport
response = pkt[DNSRR].rdata
r_class = pkt[DNSRR].rclass
rr_class = pkt[DNSRR].sprintf("%rclass%")
type = pkt[DNSRR].sprintf("%type%")
ttl = pkt[DNSRR].ttl
len = pkt[DNSRR].rdlen
#
print response
except:
pass
# ------ START SNIFFER
sniff(iface=interface, filter=filter_bpf, store=0, prn=select_DNS)

Resources