source data from syslog into flume - flume

I tried to setup a flume agent to source data from syslog server.
basically, I have setup a syslog server on an server so-called (server1) to receive syslog events, then forward all messages to different server (server2) where the flume agent installed, then finally all data will be sink to kafka cluster.
Flume configuration as below.
# For each one of the sources, the type is defined
agent.sources.syslogSrc.type = syslogudp
agent.sources.syslogSrc.port = 9090
agent.sources.syslogSrc.host = server2
# The channel can be defined as follows.
agent.sources.syslogSrc.channels = memoryChannel
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100
# config for kafka sink
agent.sinks.kafkaSink.channel = memoryChannel
agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkaSink.kafka.topic = flume
agent.sinks.kafkaSink.kafka.bootstrap.servers = <kafka.broker.list>:9092
agent.sinks.kafkaSink.kafka.flumeBatchSize = 20
agent.sinks.kafkaSink.kafka.producer.acks = 1
agent.sinks.kafkaSink.kafka.producer.linger.ms = 1
agent.sinks.kafkaSink.kafka.producer.compression.type = snappy
But, somehow logsys is not getting injected into flume agent.
appricate for your advice.

I have setup a syslog server on an server so-called (server1)
The syslogudp Source must bind to server1 host
agent.sources.syslogSrc.host = server1
then forward all messages to different server (server2)
the different server refers to the Sink
agent.sinks.kafkaSink.kafka.bootstrap.servers = server2:9092
Flume agent is only a process that hosts these components (Source, Sink, Channel) to facilitate the flow of events.

Related

Can't determine why inputs.mqtt_consumer is disconnecting and not sending data to influxdb via telegraf

i m using mqtt, telegraf and influxdb.
telegraf and influxdb are installed and works fine
i can execute the command mosquitto_sub ... and i get my data
but when i try to use inputs.mqtt_consumer on telegraf config, i get nothing and its disconnects so quickly from the server.
here is my telegraf conf
Telegraf Configuration
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})
# Global tags can be specified here in key="value" format.
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "20s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## Maximum number of unwritten metrics per output. Increasing this value
## allows for longer periods of output downtime without dropping metrics at the
## cost of higher maximum memory usage.
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s.
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
precision = ""
## Log at debug level.
debug = true
## Log only error level messages.
# quiet = false
## Log target controls the destination for logs and can be one of "file",
## "stderr" or, on Windows, "eventlog". When set to "file", the output file
## is determined by the "logfile" setting.
# logtarget = "file"
## Name of the file to be logged to when using the "file" logtarget. If set to
## the empty string then logs are written to stderr.
logfile = "log"
## The logfile will be rotated after the time interval specified. When set
## to 0 no time based rotation is performed. Logs are rotated only when
## written to, if there is no log activity rotation may be delayed.
# logfile_rotation_interval = "0d"
## The logfile will be rotated when it becomes larger than the specified
## size. When set to 0 no size based rotation is performed.
# logfile_rotation_max_size = "0MB"
## Maximum number of rotated archives to keep, any older logs are deleted.
## If set to -1, no archives are removed.
# logfile_rotation_max_archives = 5
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
## The full HTTP or UDP URL for your InfluxDB instance.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
# urls = ["unix:///var/run/influxdb.sock"]
# urls = ["udp://127.0.0.1:8089"]
urls = ["http://127.0.0.1:8086"]
## The target database for metrics; will be created as needed.
## For UDP url endpoint database needs to be configured on server side.
database = "telegraf"
## The value of this tag will be used to determine the database. If this
## tag is not set the 'database' option is used as the default.
database_tag = "telegraf"
## If true, the 'database_tag' will not be included in the written metric.
# exclude_database_tag = false
## If true, no CREATE DATABASE queries will be sent. Set to true when using
## Telegraf with a user without permissions to create databases or when the
## database already exists.
# skip_database_creation = false
## Name of existing retention policy to write to. Empty string writes to
## the default retention policy. Only takes effect when using HTTP.
# retention_policy = ""
## The value of this tag will be used to determine the retention policy. If this
## tag is not set the 'retention_policy' option is used as the default.
retention_policy_tag = ""
## If true, the 'retention_policy_tag' will not be included in the written metric.
# exclude_retention_policy_tag = false
## Write consistency (clusters only), can be: "any", "one", "quorum", "all".
## Only takes effect when using HTTP.
# write_consistency = "any"
## Timeout for HTTP messages.
timeout = "60s"
## HTTP Basic Auth
username = "admin1"
password = "admin"
## HTTP User-Agent
user_agent = "telegraf"
## UDP payload size is the maximum packet size to send.
# udp_payload = "512B"
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "identity"
## When true, Telegraf will output unsigned integers as unsigned values,
## i.e.: "42u". You will need a version of InfluxDB supporting unsigned
## integer values. Enabling this option will result in field type errors if
## existing data has been written.
# influx_uint_support = false
# # Configuration for sending metrics to InfluxDB
[[inputs.mqtt_consumer]]
servers = ["tcp://127.0.0.1:1883"]
name_override = "mqtt_consumer_floats"
topics = ["/emonpi/current1"]
connection_timeout = "120s"
username = "emonpi"
password = "******"
data_format = "value"
data_type = "float"
and here is the log from telegraf
2020-06-16T13:30:19Z I! Loaded inputs: mqtt_consumer win_perf_counters
2020-06-16T13:30:19Z I! Loaded aggregators:
2020-06-16T13:30:19Z I! Loaded processors:
2020-06-16T13:30:19Z I! Loaded outputs: influxdb
2020-06-16T13:30:19Z I! Tags enabled: host=win7-PC
2020-06-16T13:30:19Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"win7-PC", Flush Interval:10s
2020-06-16T13:30:19Z D! [agent] Initializing plugins
2020-06-16T13:30:19Z D! [agent] Connecting outputs
2020-06-16T13:30:19Z D! [agent] Attempting connection to [outputs.influxdb]
2020-06-16T13:30:19Z D! [agent] Successfully connected to outputs.influxdb
2020-06-16T13:30:19Z D! [agent] Starting service inputs
2020-06-16T13:30:19Z I! [inputs.mqtt_consumer] Connected [tcp://127.0.0.1:1883]
2020-06-16T13:30:30Z D! [outputs.influxdb] Wrote batch of 103 metrics in 230.0132ms
2020-06-16T13:30:30Z D! [outputs.influxdb] Buffer fullness: 103 / 10000 metrics
2020-06-16T13:37:43Z D! [agent] Stopping service inputs
2020-06-16T13:37:43Z D! [inputs.mqtt_consumer] Disconnecting [tcp://127.0.0.1:1883]
2020-06-16T13:37:43Z D! [inputs.mqtt_consumer] Disconnected [tcp://127.0.0.1:1883]
2020-06-16T15:28:59Z D! [agent] Initializing plugins
2020-06-16T15:28:59Z D! [agent] Starting service inputs
2020-06-16T15:28:59Z I! [inputs.mqtt_consumer] Connected [tcp://127.0.0.1:1883]
2020-06-16T15:29:01Z D! [agent] Waiting for service inputs
2020-06-16T15:29:01Z D! [agent] Stopping service inputs
2020-06-16T15:29:01Z D! [inputs.mqtt_consumer] Disconnecting [tcp://127.0.0.1:1883]
2020-06-16T15:29:01Z D! [inputs.mqtt_consumer] Disconnected [tcp://127.0.0.1:1883]
and i m not getting any creation of database in influxdb
pls help if there's something i m missing

Python KafkaConsumer not connecting

Setup:
I have 3 docker containers
1) For Kafka
2) For Zookeeper
3) For JupyterLab
I setup networking between these containers and I see that kafka producer is able to run and produce the data.
KafkaProducer.ipynb
KAFKA_BROKER = ['172.20.0.2:9093']
from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=KAFKA_BROKER)
for _ in range(100):
print("sending")
producer.send('my-topic', key=b'foo', value=b'bar')
print("success")
Here the send() sends message 100 times.
KafkaConsumer.ipynb
KAFKA_BROKER = ['172.20.0.2:9093']
from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic',group_id='my-group',bootstrap_servers=KAFKA_BROKER)
print("Comm success")
for message in consumer:
# message value and key are raw bytes -- decode if necessary!
# e.g., for unicode: `message.value.decode('utf-8')`
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
message.offset, message.key,
message.value))
In the above consumer code the line print("Comm success") never gets gets executed. Based on producer code execution, the network is open and jupyter is able to talk to kafka broker. But, client is not able to connect to the same broker for data consumption. How can I start debugging this?
By default auto.offset.reset value is latest, so set it to earliest with new group.id
consumer = KafkaConsumer('my-topic',group_id='new-group',auto_offset_reset = 'earliest',bootstrap_servers=KAFKA_BROKER)

Flume can't access s3 to write the file java.lang.IllegalArgumentException: Invalid hostname in URI s3://ACCESSKEY:SECRETKEY/#bucket

Flume is install on amazon EC2 (Amazon Linux AMI 2018.03.0.20190514 x86_64 HVM gp2) Flume version: 1.9
I try to use a local as a sink the copy works perfectly. But when I use S3 as a sink, I hit the invalid hostname in URI problem.
I doubled check my access key and secret key, they are all correct.
I tried to use s3n:// it did not work
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = localhost:9092
a1.sources.r1.kafka.topics = testflume
a1.sources.r1.kafka.consumer.group.id = flumeconsumer
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = s3://AWSACCESSKEY:AWSSECRETKEY#bucket/path
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = event
a1.sinks.k1.hdfs.rollInterval = 10
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
The error
[ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.IllegalArgumentException: Invalid hostname in URI s3://AWSACCESSKEY:AWSSECRETKEY#bucket/path/event.1558997927667.tmp
I expect flume to authenticate successfully in S3 and write the files
Can you try using s3a://?
But it is good practice to assign a role to EC2 instance and give permission to S3 for that role, instead of providing AWS Access and secret keys. Once you setup that you can the path as s3a://bucket_name/path/../

Neo4J browser with Bolt protocol does not work on my server

Neo4J browser with Bolt protocol does not work on my server.
Here is the error I get in the browser (Chrome):
VM608:35 WebSocket connection to 'ws://<server_name>:7687/' failed: Error during WebSocket handshake: net::ERR_CONNECTION_RESET
WrappedWebSocket # VM608:35
l # 8eea4b31.components.js:61
c # 8eea4b31.components.js:61
value # 8eea4b31.components.js:60
value # 8eea4b31.components.js:61
value # 8eea4b31.components.js:60
value # 8eea4b31.components.js:60
testConnection # 248a7ab3.scripts.js:10
makeRequest # 248a7ab3.scripts.js:11
AuthService.makeRequest # 248a7ab3.scripts.js:5
AuthService.authenticate # 248a7ab3.scripts.js:5
(anonymous) # 248a7ab3.scripts.js:9
$scope.authenticate # 248a7ab3.scripts.js:5
(anonymous) # 8eea4b31.components.js:13
callback # 8eea4b31.components.js:13
$eval # 8eea4b31.components.js:11
$apply # 8eea4b31.components.js:11
(anonymous) # 8eea4b31.components.js:13
dispatch # 8eea4b31.components.js:3
elemData.handle # 8eea4b31.components.js:3
And here is the error in Neo4J logs on the server:
2017-05-23 16:43:11.130+0000 WARN [io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1100)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:366)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:118)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:651)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:574)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:488)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:450)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
at java.lang.Thread.run(Thread.java:745)
I tried to search for this error but didn't find a solution.
Following is the network part of Neo4J configuration:
#*****************************************************************
# Network connector configuration
#*****************************************************************
# With default configuration Neo4j only accepts local connections.
# To accept non-local connections, uncomment this line:
dbms.connectors.default_listen_address=0.0.0.0
# You can also choose a specific network interface, and configure a non-default
# port for each connector, by setting their individual listen_address.
# The address at which this server can be reached by its clients. This may be the server's IP address or DNS name, or
# it may be the address of a reverse proxy which sits in front of the server. This setting may be overridden for
# individual connectors below.
#dbms.connectors.default_advertised_address=localhost
# You can also choose a specific advertised hostname or IP address, and
# configure an advertised port for each connector, by setting their
# individual advertised_address.
# Bolt connector
dbms.connector.bolt.enabled=true
#dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=:7687
# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
#dbms.connector.http.listen_address=:7474
# HTTPS Connector. There can be zero or one HTTPS connectors.
dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=:7473
# Number of Neo4j worker threads.
#dbms.threads.worker_count=
I made Neo4J accessible from outside by telling it to listen on 0.0.0.0.
I used telnet <server_name> 7687 to check the port was opened and it is.
If I disable Bolt it is working fine.
Does someone has an idea about what is going wrong?

Can't connect java client to Marklogic database

I've just installed a MarkLogic nosql database out of the box on a windows machine.
I wrote a simple javaclient to put data in to the database but I get this error:
org.apache.http.conn.HttpHostConnectException: Connection to http://my.caci.local:8003 refused
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
The Marklogic database is started. This is the code :
DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8003, "admin", "admin", Authentication.DIGEST);
XMLDocumentManager docMgr = client.newXMLDocumentManager(); BinaryDocumentManager binMgr = client.newBinaryDocumentManager();
DOMHandle handle = new DOMHandle(); for (int i = 0; i < AANT_PERSONEN; i++) {
Document document = createDocument(i);
String docId = "/zaak/" + 20;
handle.set(document);
docMgr.write(docId, handle); }
....
The Marklogic console reports the following ports to be active on my.caci.local:
Default :: Admin : 8001 [HTTP]
Default :: App-Services : 8000 [HTTP]
Default :: HealthCheck : 7997 [HTTP]
Default :: Manage : 8002 [HTTP]
I'm new to marklogic and this is my question:
- what port should I use to connect to from my java client?
In agreement with MystyxMac, I notice the console does not report a REST server on 8003.
Here's the documentation for setting up a REST server:
http://docs.marklogic.com/guide/rest-dev/intro#id_97899
You should also add users for the rest-reader, rest-writer, and rest-admin roles.
Hoping that helps,
Erik Hennum
For testing purposes you can simply switch the port you are using to 8000.
From the documentation:
When you install MarkLogic Server, a pre-configured REST API instance
is available on port 8000. This instance uses the Documents database
as the content database and the Modules database as the modules
database.
The instance on port 8000 is convenient for getting started, but you
will usually create a dedicated instance for production purposes.
http://docs.marklogic.com/guide/rest-dev/service#id_15309

Resources