Expected timestamp in the Flume event headers, but it was null - flume

I am using below configuration details to push Twitter feeds into HDFS using Flume, but getting Expected timestamp in the Flume event headers, but it was null
twitter.conf
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = bigdata, hadoop, hive, hbase
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/farooque/bigdata/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Running with command
$ flume-ng agent --conf-file twitter.conf --name TwitterAgent
where twitter.conf is my config file name
But getting Error as:
java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:388)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
15/06/04 18:26:01 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
Looking for further help??

In twitter.conf added one more config property as
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
and issue got resolved.
For more details Refer Hadoop tutorial.info

With the option "TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true", it will use the timestamp of the destination (i.e HDFS sink). Instead if you want to use the actual event's timestamp, then we have to use interceptors. Use below line in the configuration or properties file.
TwitterAgent.sources.Twitter.interceptors = interceptor1
TwitterAgent.sources.Twitter.interceptors.interceptor1.type = timestamp

You are using org.apache.flume.source.twitter.TwitterSource which is Apache provided Twitter Source. It does not come with built in timestamp in the Flume Event. So you have 2 options here:
1) Either use com.cloudera.flume.source.TwitterSource in your config file.
2) Or you can add TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true property in your config file.
Note that you are facing this issue because you have specified timestamp parameters in your HDFS path /user/farooque/bigdata/tweets/%Y/%m/%d/%H/. If you don't specify these, then both Apache and Cloudera provided sources will work without any problem.

Related

ActiveMQ Artemis / Influx Telegraf MQTT Listener - all messages (100K) delivered, but 4K messages remain in queue

I have conducted a test sending 100K persistent MQTT messages (QoS 2) to ActiveMQ Artemis. The topic has two Telegraf listeners, one on VM 85 and the other on VM 86. These listeners write data to the InfluxDB on their respective servers.
The main goal of the test is to ensure all messages delivered to VM 85 are also delivered to VM 86 even if VM 86 is down. Before executing the test both listeners connect to the broker each with a unique client ID and with clean-session = false and subscribe to the topic using QoS 2. This ensures the subscription for each is present when the messages are sent whether or not the listeners are actually active. Neither listener is connected when the test starts. The order of operations is:
Start listener on VM 85.
Send data.
Ensure messages are delivered to listener on VM 85.
Start listener on VM 86.
Ensure messages are delivered to listener on VM 86.
The good news is that all messages are delivered to the Influx DB on both VMs. However, the relevant queue for VM 86 still shows about 4.3 K messages remaining, as shown below:
If I then restart the listener on VM 86, it shows it's writing more data, as shown below:
However, the total messages in the InfluxDB correctly remains at 100K. If InfluxDB receives a duplicate record, it will overwrite it. However, the client is incrementing by one and setting the date at each increment, so this shouldn't occur, at least from the client.
I'm not clear on why this would be. Why does the the listener on VM 86 need to be restarted to completely empty the queue?
There is one parameter I haven't tried in the Telegraf plugin:
## Maximum messages to read from the broker that have not been written by an
## output. For best throughput set based on the number of metrics within
## each message and the size of the output's metric_batch_size.
##
## For example, if each message from the queue contains 10 metrics and the
## output metric_batch_size is 1000, setting this to 100 will ensure that a
## full batch is collected and the write is triggered immediately without
## waiting until the next flush_interval.
# max_undelivered_messages = 1000
It seems the batch size defaults to 1000, based on the output messages. But the maximum messages to read before output seems to be something greater, since 4.3K are output when restarted. Except that they have already been output. That's the confusing part.
Client Code:
package abc;
import java.time.Instant;
import org.eclipse.paho.client.mqttv3.MqttClient;
import org.eclipse.paho.client.mqttv3.MqttConnectOptions;
import org.eclipse.paho.client.mqttv3.MqttException;
import org.eclipse.paho.client.mqttv3.MqttMessage;
import org.eclipse.paho.client.mqttv3.MqttSecurityException;
import org.eclipse.paho.client.mqttv3.persist.MemoryPersistence;
import com.influxdb.client.domain.WritePrecision;
import com.influxdb.client.write.Point;
public class MqttPublishSample {
public static void main(String[] args) throws MqttSecurityException, MqttException, InterruptedException {
String broker = "tcp://localhost:1883";
String clientId = "JavaSample";
MemoryPersistence persistence = new MemoryPersistence();
int qos = 2;
int start = Integer.parseInt(args[0]);
int end = Integer.parseInt(args[1]);
String topic = args[2];
if (topic == null) {
topic = "testtopic/999";
}
System.out.println("start: " + start + ", end: " + end + ", topic: " + topic + " qos: " + qos);
MqttClient sampleClient = new MqttClient(broker, clientId, persistence);
MqttConnectOptions connOpts = new MqttConnectOptions();
connOpts.setCleanSession(false);
connOpts.setUserName("admin");
connOpts.setPassword("xxxxxxx".toCharArray());
System.out.println("Connecting to broker: " + broker);
sampleClient.connect(connOpts);
System.out.println("Connected");
for (int i = start; i <= end; i++) {
// print out every 1000
if (i%100 == 0) {
System.out.println("i: " + i);
}
try {
Point point = Point.measurement("temperature").addTag("machine", "unit43").addField("external", i)
.time(Instant.now(), WritePrecision.NS);
content = point.toLineProtocol();
MqttMessage message = new MqttMessage(content.getBytes());
message.setQos(qos);
sampleClient.publish(topic, message);
Thread.sleep(10);
} catch (MqttException me) {
System.out.println("reason " + me.getReasonCode());
System.out.println("msg " + me.getMessage());
System.out.println("loc " + me.getLocalizedMessage());
System.out.println("cause " + me.getCause());
System.out.println("excep " + me);
me.printStackTrace();
}
}
sampleClient.disconnect();
System.out.println("Disconnected");
}
}
Telegraph Plugin on 85:
###############################################################################
# INPUT PLUGINS #
###############################################################################
[[inputs.mqtt_consumer]]
servers = ["tcp://127.0.0.1:1883"]
## Topics that will be subscribed to.
topics = [
"testtopic/#",
]
## The message topic will be stored in a tag specified by this value. If set
## to the empty string no topic tag will be created.
# topic_tag = "topic"
## When using a QoS of 1 or 2, you should enable persistent_session to allow
## resuming unacknowledged messages.
qos = 2
persistent_session = true
## If unset, a random client ID will be generated.
client_id = "InfluxData_on_86_listen_local"
## Username and password to connect MQTT server.
username = "admin"
password = "xxxxxx"
data_format = "influx"
[[inputs.mqtt_consumer]]
servers = ["tcp://10.102.11.86:1883"]
## Topics that will be subscribed to.
topics = [
"testtopic/#",
]
## The message topic will be stored in a tag specified by this value. If set
## to the empty string no topic tag will be created.
# topic_tag = "topic"
## When using a QoS of 1 or 2, you should enable persistent_session to allow
## resuming unacknowledged messages.
qos = 2
persistent_session = true
## If unset, a random client ID will be generated.
client_id = "InfluxData_on_86_listen_85"
## Username and password to connect MQTT server.
username = "admin"
password = "xxxx"
data_format = "influx"
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
urls = ["http://127.0.0.1:8086"]
## Token for authentication.
token = "xxxx"
## Organization is the name of the organization you wish to write to.
organization = "xxxx"
# ## Destination bucket to write into.
bucket = "events"
I wasn't able to replicate this issue even initially at lower volumes, although I had it twice at 100K messages.
When i added the following parameters to the Telegraf Listener:
max_undelivered_messages = 100
It seemed to slow things down, as batches were limited to 100 according to the telegraph output.
However, when I removed it, it seemed batches where still limited to 100.
Finally, I changed the same parameter to 1000:
max_undelivered_messages = 1000
After this, message batch sizes improved to well beyond 100, as they were initially.
Furthermore, at least on the third try of 100K messages, there are no longer any messages remaining in the queue after the sequence described in the question is completed.
I'm not really sure if this change did anything, but in any case the correct amount of messages were always being received.
So, I'm marking this as answered.

Unable to send metrics to influx using graphite protocol in Gatling

I followed this URL -- https://www.linkedin.com/pulse/integrating-gatling-influx-graphite-grafana-live-tests-phani-bushan/
And it works well with the gatling-demo project provided by them but it is not working with my Gatling project
pom Gatling versions used :
<gatling.version>3.7.4</gatling.version>
<gatling-plugin.version>4.1.1</gatling-plugin.version>
<maven-jar-plugin.version>3.2.0</maven-jar-plugin.version>
<scala-maven-plugin.version>4.5.6</scala-maven-plugin.version>
Gatling Config setup:
data {
writers = [console, file, graphite] # The list of DataWriters to which Gatling write simulation data (currently supported : console, file, graphite)
console {
#light = false # When set to true, displays a light version without detailed request stats
#writePeriod = 5 # Write interval, in seconds
}
file {
#bufferSize = 8192 # FileDataWriter's internal data buffer size, in bytes
}
leak {
#noActivityTimeout = 30 # Period, in seconds, for which Gatling may have no activity before considering a leak may be happening
}
graphite {
light = false # only send the all* stats
host = "localhost" # The host where the Carbon server is located
port = 2003 # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
bufferSize = 8192 # Internal data buffer size, in bytes
writePeriod = 1 # Write period, in seconds
}
}
}

Gatling does not send metrics to InfluxDB using graphite protocol

I followed the BlazeMeter article to monitor Gatling tests with Grafana and InfluxDB but no data is sent to InfluxDB and not any database created with the name "graphite".
InfluxDB is up and listen to port :2003. This is the log from InfluxDB:
2018-06-24T09:48:17Z Listening on TCP: [::]:2003 service=graphite addr=:2003
And I set gatling.conf fields to these:
data {
#writers = [console, file] # The list of DataWriters to which Gatling write simulation data (currently supported : console, file, graphite, jdbc)
console {
#light = false # When set to true, displays a light version without detailed request stats
}
file {
#bufferSize = 8192 # FileDataWriter's internal data buffer size, in bytes
}
leak {
#noActivityTimeout = 30 # Period, in seconds, for which Gatling may have no activity before considering a leak may be happening
}
graphite {
light = false # only send the all* stats
host = "localhost" # The host where the Carbon server is located
port = 2003 # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
bufferSize = 8192 # GraphiteDataWriter's internal data buffer size, in bytes
writeInterval = 1 # GraphiteDataWriter's write interval, in seconds
}
}
gatling.conf is in src/test/resources folder and I ensured that this config file is loaded by Gatling by debugging it.
What I have missed?
You have invalid data writers configuration. Set it to:
writers = [console, file, graphite]

How can I write in InfluxDB from Gatling?

My question was already asked but I didn't succeed to solve my issue.
I don't succeed to send my data from Gatling in real time to InfluxDB.
I'm on Windows 10.
Gatling Version: 2.3.0 (the last one).
InfluxDB version: 1.3.5 (the last is 1.3.6).
My gatling.conf:
data {
writers = [console, file, graphite] # The list of DataWriters to which Gatling write simulation data (currently supported : console, file, graphite, jdbc)
console {
#light = false # When set to true, displays a light version without detailed request stats
}
file {
#bufferSize = 8192 # FileDataWriter's internal data buffer size, in bytes
}
leak {
#noActivityTimeout = 30 # Period, in seconds, for which Gatling may have no activity before considering a leak may be happening
}
graphite {
#light = false # only send the all* stats
host = "127.0.0.1" # The host where the Carbon server is located
port = "2003" # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
#bufferSize = 8192 # GraphiteDataWriter's internal data buffer size, in bytes
#writeInterval = 1 # GraphiteDataWriter's write interval, in seconds
}
}
My influxdb.conf:
[http]
# Determines whether HTTP endpoint is enabled.
enabled = true
# The bind address used by the HTTP service.
bind-address = "127.0.0.1:8086"
###
### [[graphite]]
###
### Controls one or many listeners for Graphite data.
###
[[graphite]]
# Determines whether the graphite endpoint is enabled.
enabled = true
database = "gatlingdb"
# retention-policy = ""
bind-address = ":2003"
protocol = "tcp"
# consistency-level = "one"
templates = [
"gatling.*.*.*.*.measurement.simulation.request.status.field"
]
My gatlingdb database is created on InfluxDB, it stays empty.
When I try:
C:\InfluxDB-1.3.5-1>influx -host 127.0.0.1
I'm connected to InfluxDB
>USE gatlingdb
I'm connected to my database. Then:
>SHOW SERIES
and
>SELECT * FROM gatling
Don't return anything. It's empty.
Note: I put "FROM gatling" because I put that in my gatling.conf: rootPathPrefix = "gatling"
I didn't download Graphite but I saw that InfluxDB accept the graphite protocol. I assume I can send data from Gatling to InfluxDB. I certainly missed something.
I succeeded in connecting InfluxDB to Grafana and I display data from other databases. I just missed the connection between Gatling and InfluxDB.
Thanks in advance for your help, I definitely need it!
Anthony
I'm almost finished the article which shows all the steps required to create the whole monitoring infrastructure using the Gatling, Grafana and InfluxDB (btw, without Graphite installed separately) which worked very well for me.
I think I'll publish it in my blog on the blazemeter.com just in few days! So stay tuned there!
http://blazemeter.com/blog
There you will even find the ready solution to spin up everything inside the Docker.
But until this (if it is urgent for you), can share my InfluxDB config section:
[[graphite]]
enabled = true
bind-address = ":2003"
database = "graphite"
retention-policy = ""
protocol = "tcp"
batch-size = 5000
batch-pending = 10
batch-timeout = "1s"
consistency-level = "one"
separator = "."
udp-read-buffer = 0
gatling.conf:
graphite {
light = false # only send the all* stats
host = "localhost" # The host where the Carbon server is located
port = 2003 # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
bufferSize = 8192 # GraphiteDataWriter's internal data buffer size, in bytes
writeInterval = 1 # GraphiteDataWriter's write interval, in seconds
}
The first thing you need to check is that InfluxDB actually accepts incoming metrics via graphite protocol. For example, during InfluxDB startup logs you should find this line:
influxdb_1 | [I] 2018-01-26T13:40:37Z Listening on TCP: [::]:2003 service=graphite addr=:2003

Stuck with luasec Lua secure socket

This example code fails:
require("socket")
require("ssl")
-- TLS/SSL server parameters
local params = {
mode = "server",
protocol = "sslv23",
key = "./keys/server.key",
certificate = "./keys/server.crt",
cafile = "./keys/server.key",
password = "123456",
verify = {"peer", "fail_if_no_peer_cert"},
options = {"all", "no_sslv2"},
ciphers = "ALL:!ADH:#STRENGTH",
}
local socket = require("socket")
local server = socket.bind("*", 8888)
local client = server:accept()
client:settimeout(10)
-- TLS/SSL initialization
local conn,emsg = ssl.wrap(client, params)
print(emsg)
conn:dohandshake()
--
conn:send("one line\n")
conn:close()
request
https://localhost:8888/
output
error loading CA locations ((null))
lua: a.lua:25: attempt to index local 'conn' (a nil value)
stack traceback:
a.lua:25: in main chunk
[C]: ?
Not very much info. Any idea how to trace down to the problem ?
Update
Got this now: the cafile parameter is not necessary for server mode:
local params = {
mode = "server",
protocol = "sslv23",
key = "./keys/server.key",
certificate = "./keys/server.crt",
password = "123456",
options = {"all", "no_sslv2"},
ciphers = "ALL:!ADH:#STRENGTH",
}
LuaSec is a binding for OpenSSL, so the error you are getting (error loading CA locations) means that the OpenSSL library cannot read your CA files. Are you sure they are in the current directory and with proper permissions?
EDIT: According to LuaSec sources, it currently uses only the PEM format for private key. Ensure that the private key is stored as PEM, not DER.
CAFile contains the set of certificates (.crt) that your server or client trust. You put the key (.key).

Resources