Flume-ng agent not starting and giving nullpointer exception - flume

I am writing custom source sink and channel and my config file is like
agent.sources = source
agent.sinks = sink
agent.channels = channel
agent.sources.source.type = com.flume.FlumeSource
agent.sources.source.channels = channel
agent.channels.channel.type = com.flume.FlumeChannel$Builder
agent.channels.channel.type = file
agent.sinks.sink.type = com.flume.FlumeSink
agent.sinks.sink.hdfs.path = <hdfs path>
agent.sources.source.channels = channel
agent.sinks.sink.channel = channel
I am trying to start the agent by adding the jar to the flume classpath using the command
bin/flume-ng agent --conf-file flume.config --classpath /usr/lib/flume-ng/agent.jar --name nab-agent -Dflume.root.logger=DEBUG,console`
It says the hdfs path is not present where as it exists. It gives NullPointerException saying hostname does not exist. and main class is not found in org.apache.flume.node.Application

Related

unable to define ssh key when using terraform to create linux vm

I'm trying to use terraform to create linux vm. what I see online is pretty straight forward
resource "tls_private_key" "this" {
for_each = local.worker_env_map
algorithm = "RSA"
rsa_bits = 4096
}
resource "azurerm_linux_virtual_machine" "example" {
name = "worker-machine"
resource_group_name = "rogertest"
location = "australiaeast"
size = "Standard_D2_v4"
admin_username = data.azurerm_key_vault_secret.kafkausername.value
network_interface_ids = [
azurerm_network_interface.example.id,
]
admin_ssh_key {
username = "adminuser"
public_key = tls_private_key.this["env1"].public_key_openssh
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "18_04-lts-gen2"
version = "latest"
}
}
but i'm keep getting this error
Code="InvalidParameter" Message="Destination path for SSH public keys is currently limited to its default value /home/kafkaadmin/.ssh/authorized_keys due to a known issue in Linux provisioning agent."
Target="linuxConfiguration.ssh.publicKeys.path"
but I'm following as exactly outline on this page?
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-terraform
I tired to reproduce the same issue in my environment and got the below results
This is the error I am getting for destination path for SSH public keys are currently limited to its default value, destination path on the VM for the SSH keys if the file is already exist the specific keys are appended to the file
If we need a non-default location for public keys then at the moment, the only way is to create our own custom solution.
I have used the below command to create own path for keys
az vm create --resource-group rg_name --name myVM --image UbuntuLTS --admin-username user_name --generate-ssh-keys --ssh-dest-key-path './'
I have the Linux-vm terraform code using this Document
I have followed the below steps to execute the file
terraform init
Using the above command it will initialize the file
terraform plan
This will creates an execution plan and it will preview the changes that terraform plans to make the infrastructure it will show the monitoring and email notification rules
terraform apply
This will creates or updates the infrastructure depending on the configuration and also creates the metric rules for the flexible server
I am able to see the created Linux-virtual machine
NOTE: For creating Linux-vm we can use this terraform Document also for reference

Flume can't access s3 to write the file java.lang.IllegalArgumentException: Invalid hostname in URI s3://ACCESSKEY:SECRETKEY/#bucket

Flume is install on amazon EC2 (Amazon Linux AMI 2018.03.0.20190514 x86_64 HVM gp2) Flume version: 1.9
I try to use a local as a sink the copy works perfectly. But when I use S3 as a sink, I hit the invalid hostname in URI problem.
I doubled check my access key and secret key, they are all correct.
I tried to use s3n:// it did not work
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = localhost:9092
a1.sources.r1.kafka.topics = testflume
a1.sources.r1.kafka.consumer.group.id = flumeconsumer
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = s3://AWSACCESSKEY:AWSSECRETKEY#bucket/path
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = event
a1.sinks.k1.hdfs.rollInterval = 10
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
The error
[ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.IllegalArgumentException: Invalid hostname in URI s3://AWSACCESSKEY:AWSSECRETKEY#bucket/path/event.1558997927667.tmp
I expect flume to authenticate successfully in S3 and write the files
Can you try using s3a://?
But it is good practice to assign a role to EC2 instance and give permission to S3 for that role, instead of providing AWS Access and secret keys. Once you setup that you can the path as s3a://bucket_name/path/../

Custom logger not being used in Jenkins pipeline

The methods from the below Groovy class are invoked by some other pipeline script classes about which I don't know. All the println statements have been replaced by logger.info.
class ConfigurationPluginInitBase implements Plugin<Project> {
private static final Logger logger = LoggerFactory.getLogger(ConfigurationPluginInitBase.class)
.
.
.
protected void configureDependenciesResolution(Project project) {
.
.
.
logger.info("Configuring Dependencies Resolution")
logger.info('Does the buildInfo.json exist? {}' , file.exists())
logger.info('The list of dependencies should be rewritten: {}' ,rewriteDependency)
/*Added this as there was no other way to see what happened to the logger instance*/
println 'Is the logger instance created at all???' + logger
.
.
.
logger.info('List: {}' , listToUpdate)
}
}
log4j2-test.properties
status = error
name = PropertiesConfig
filters = threshold
filter.threshold.type = ThresholdFilter
filter.threshold.level = debug
appenders = console
appender.console.type = Console
appender.console.name = STDOUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c:- %m%n
loggers = console
logger.console.name = ConsoleLog
logger.console.level = debug
logger.console.additivity = false
logger.console.appenderRef.console.ref = STDOUT
rootLogger.level = info
rootLogger.appenderRef.stdout.ref = STDOUT
The output(only relevant part shown below) on the Jenkins job console:
.
.
.
.
Download http://artifactory.net:8081/artifactory/Migration_R148_VR/tools.gradle.plugin/BuildPublishReleasePlugin/v4.0.0.37af2ff/ivy-v4.0.0.37af2ff.xml
Download http://artifactory.net:8081/artifactory/Migration_R148_VR/tools.gradle.plugin/BuildPublishReleasePlugin/v4.0.0.37af2ff/BuildPublishReleasePlugin-v4.0.0.37af2ff.jar
//Printed way before the actual logger statements, when the above artifact is //downloaded from Artifactory for further testing in the pipeline
Is the logger instance created at all???org.gradle.internal.logging.slf4j.OutputEventListenerBackedLogger#efbec93c
apache-commons:commons-collections:null
apache-commons:commons-lang:null
DAP_Framework:DAP_FrameworkExt:null
esapi:esapi:null
opensaml:opensaml:null
openws:openws:null
slf4j:slf4j:null
spring-framework:spring-framework:null
TDE_Ark_Framework:TDE_Ark_Framework:null
TDE_Ark_Infrastructure:TDE_Ark_Infrastructure_CLI:null
velocity:velocity:null
wurfl:wurfl:null
xmlsec:xmlsec:null
xmltooling:xmltooling:null.
.
.
.
[Ripple AlfaClient] Configuring Dependencies Resolution
[Ripple AlfaClient] Does the buildInfo.json exist? true
[Ripple AlfaClient] The list of dependencies should be rewritten: DAP_Framework:DAP_Framework_CLI:1.2.2-integration.adcb14d
[Ripple AlfaClient] List: [DAP_Framework:DAP_Framework_CLI:1.2.2-integration.adcb14d]
.
.
.
The logger that I have configured is probably not invoked
The run-time instance is of OutputEventListenerBackedLogger
Even if I make changes to the logger statements, they don't reflect in the output but the new println that I have added does. This is confusing i.e some changes get reflected while some don't!
I referred to the Gradle logging page and threads like this and this but I am unclear about the root cause.
Note: I am new to Jenkins pipeline, Gradle and Groovy :)
I assume you would like to use Gradle’s logging system for your log output from a Gradle plugin?
In that case I would suggest to create/get the logger instance differently. Either use project.logger.info(…) or create a new Logger like so:
private static final Logger logger = Logging.getLogger(ConfigurationPluginInitBase.class)
Having said that, the reason why your log messages might not show up currently could be that Gradle’s default log level is LIFECYCLE – but you seem to only be logging to INFO. You can try running Gradle with the --info option to see your messages.

source data from syslog into flume

I tried to setup a flume agent to source data from syslog server.
basically, I have setup a syslog server on an server so-called (server1) to receive syslog events, then forward all messages to different server (server2) where the flume agent installed, then finally all data will be sink to kafka cluster.
Flume configuration as below.
# For each one of the sources, the type is defined
agent.sources.syslogSrc.type = syslogudp
agent.sources.syslogSrc.port = 9090
agent.sources.syslogSrc.host = server2
# The channel can be defined as follows.
agent.sources.syslogSrc.channels = memoryChannel
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100
# config for kafka sink
agent.sinks.kafkaSink.channel = memoryChannel
agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkaSink.kafka.topic = flume
agent.sinks.kafkaSink.kafka.bootstrap.servers = <kafka.broker.list>:9092
agent.sinks.kafkaSink.kafka.flumeBatchSize = 20
agent.sinks.kafkaSink.kafka.producer.acks = 1
agent.sinks.kafkaSink.kafka.producer.linger.ms = 1
agent.sinks.kafkaSink.kafka.producer.compression.type = snappy
But, somehow logsys is not getting injected into flume agent.
appricate for your advice.
I have setup a syslog server on an server so-called (server1)
The syslogudp Source must bind to server1 host
agent.sources.syslogSrc.host = server1
then forward all messages to different server (server2)
the different server refers to the Sink
agent.sinks.kafkaSink.kafka.bootstrap.servers = server2:9092
Flume agent is only a process that hosts these components (Source, Sink, Channel) to facilitate the flow of events.

Custom source in flume

I have created a custom source for flume and copied the jar files in the following locations :
mkdir -p /usr/lib/flume-ng/plugins.d/MyFlumeSource/lib/MyFlumeSource.jar
chown -R flume:flume /var/lib/flume-ng/
Also in /etc/flume-ng/conf/flume-env.sh
FLUME_CLASSPATH="/usr/lib/flume-ng/plugins.d/MyFlumeSource/lib/MyFlumeSource.jar"
Updated the Flume configuration file as
# Name the components on this agent
tail1.sources = seq-source
tail1.channels = mem-channel
tail1.sinks = hdfs-sink
# Describe/configure Source
tail1.sources.seq-source.type = org.custom.flume.source.MySource
# Describe the sink
tail1.sinks.hdfs-sink.type = hdfs
tail1.sinks.hdfs-sink.hdfs.path = /user/flume
tail1.sinks.hdfs-sink.hdfs.filePrefix = log
tail1.sinks.hdfs-sink.hdfs.rollInterval = 0
tail1.sinks.hdfs-sink.hdfs.rollCount = 10000
tail1.sinks.hdfs-sink.hdfs.fileType = DataStream
# Use a channel which buffers events in file
tail1.channels.mem-channel.type = memory
tail1.channels.mem-channel.capacity = 1000
tail1.channels.mem-channel.transactionCapacity = 100
# Bind the source and sink to the channel
tail1.sources.seq-source.channels = mem-channel
tail1.sinks.hdfs-sink.channel = mem-channel
Trying to run the flume agent as
flume-ng agent --conf /var/lib/flume-ng/plugins.d/MyFlumeSource/lib/MyFlumeSource.jar --conf-file /etc/flume-ng/conf/flume-conf.properties --name tail1
flume-ng agent --conf-file /etc/flume-ng/conf/flume-conf.properties --name tail1
In both cases I am getting the following error :
ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to create source: seq-source, type: org.custom.flume.source.MySource, class: org.custom.flume.source.MySource
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:48)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InstantiationException
at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:44)
... 10 more
If any one aware of it please help me.
You installed the plugin under /usr/lib/flume-ng but you are trying to run it from /var/lib/flume-ng and /etc/flume-ng.
In addition, it must be said that --conf option should be pointing to the entire configuration folder.
Are you looking for flume Interceptor? You can add you class as Interceptors which processes message.
If Yes, you can do it in 2 simple steps.
1) Add your jar file to /lib
2)Add config in flume-conf.properties mentioning your Builder class name.

Resources