Flume can't access s3 to write the file java.lang.IllegalArgumentException: Invalid hostname in URI s3://ACCESSKEY:SECRETKEY/#bucket - flume

Flume is install on amazon EC2 (Amazon Linux AMI 2018.03.0.20190514 x86_64 HVM gp2) Flume version: 1.9
I try to use a local as a sink the copy works perfectly. But when I use S3 as a sink, I hit the invalid hostname in URI problem.
I doubled check my access key and secret key, they are all correct.
I tried to use s3n:// it did not work
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = localhost:9092
a1.sources.r1.kafka.topics = testflume
a1.sources.r1.kafka.consumer.group.id = flumeconsumer
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = s3://AWSACCESSKEY:AWSSECRETKEY#bucket/path
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = event
a1.sinks.k1.hdfs.rollInterval = 10
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
The error
[ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.IllegalArgumentException: Invalid hostname in URI s3://AWSACCESSKEY:AWSSECRETKEY#bucket/path/event.1558997927667.tmp
I expect flume to authenticate successfully in S3 and write the files

Can you try using s3a://?
But it is good practice to assign a role to EC2 instance and give permission to S3 for that role, instead of providing AWS Access and secret keys. Once you setup that you can the path as s3a://bucket_name/path/../

Related

unable to define ssh key when using terraform to create linux vm

I'm trying to use terraform to create linux vm. what I see online is pretty straight forward
resource "tls_private_key" "this" {
for_each = local.worker_env_map
algorithm = "RSA"
rsa_bits = 4096
}
resource "azurerm_linux_virtual_machine" "example" {
name = "worker-machine"
resource_group_name = "rogertest"
location = "australiaeast"
size = "Standard_D2_v4"
admin_username = data.azurerm_key_vault_secret.kafkausername.value
network_interface_ids = [
azurerm_network_interface.example.id,
]
admin_ssh_key {
username = "adminuser"
public_key = tls_private_key.this["env1"].public_key_openssh
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "18_04-lts-gen2"
version = "latest"
}
}
but i'm keep getting this error
Code="InvalidParameter" Message="Destination path for SSH public keys is currently limited to its default value /home/kafkaadmin/.ssh/authorized_keys due to a known issue in Linux provisioning agent."
Target="linuxConfiguration.ssh.publicKeys.path"
but I'm following as exactly outline on this page?
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-terraform
I tired to reproduce the same issue in my environment and got the below results
This is the error I am getting for destination path for SSH public keys are currently limited to its default value, destination path on the VM for the SSH keys if the file is already exist the specific keys are appended to the file
If we need a non-default location for public keys then at the moment, the only way is to create our own custom solution.
I have used the below command to create own path for keys
az vm create --resource-group rg_name --name myVM --image UbuntuLTS --admin-username user_name --generate-ssh-keys --ssh-dest-key-path './'
I have the Linux-vm terraform code using this Document
I have followed the below steps to execute the file
terraform init
Using the above command it will initialize the file
terraform plan
This will creates an execution plan and it will preview the changes that terraform plans to make the infrastructure it will show the monitoring and email notification rules
terraform apply
This will creates or updates the infrastructure depending on the configuration and also creates the metric rules for the flexible server
I am able to see the created Linux-virtual machine
NOTE: For creating Linux-vm we can use this terraform Document also for reference

Erlang :ssh authentication error. How to connect to ssh using identity file

I'm getting an authentication error when trying to connect ssh host.
The goal is to connect to the host using local forwarding. The command below is an example using drop bear ssh client to connect to host with local forwarding.
dbclient -N -i /opt/private-key-rsa.dropbear -L 2002:1.2.3.4:2006 -p 2002 -l
test_user 11.22.33.44
I have this code so far which returns empty connection
ip = "11.22.33.44"
user = "test_user"
port = 2002
ssh_config = [
user_interaction: false,
silently_accept_hosts: true,
user: String.to_charlist(user),
user_dir: String.to_charlist("/opt/")
]
# returns aunthentication error
{:ok, conn} = :ssh.connect(String.to_charlist(ip), port, ssh_config)
This is the error Im seeing
Server: 'SSH-2.0-OpenSSH_5.2'
Disconnects with code = 14 [RFC4253 11.1]: Unable to connect using the available authentication methods
State = {userauth,client}
Module = ssh_connection_handler, Line = 893.
Details:
User auth failed for: "test_user"
I'm a newbie to elixir and have been reading this erlang ssh document for 2 days. I did not find any examples in the documentation which makes it difficult to understand.
You are using non-default key name, private-key-rsa.dropbear. Erlang by default looks for this set of names:
From ssh module docs:
Optional: one or more User's private key(s) in case of publickey authorization. The default files are
id_dsa and id_dsa.pub
id_rsa and id_rsa.pub
id_ecdsa and id_ecdsa.pub`
To verify this is a reason, try renaming private-key-rsa.dropbear to id_rsa. If this works, the next step would be to add a key_cb callback to the ssh_config which should return the correct key file name.
One example implementation of a similar feature is labzero/ssh_client_key_api.
The solution was to convert dropbear key to ssh key. I have used this link as reference.
Here is the command to convert dropbear key to ssh key
/usr/lib/dropbear/dropbearconvert dropbear openssh /opt/private-key-rsa.dropbear /opt/id_rsa

source data from syslog into flume

I tried to setup a flume agent to source data from syslog server.
basically, I have setup a syslog server on an server so-called (server1) to receive syslog events, then forward all messages to different server (server2) where the flume agent installed, then finally all data will be sink to kafka cluster.
Flume configuration as below.
# For each one of the sources, the type is defined
agent.sources.syslogSrc.type = syslogudp
agent.sources.syslogSrc.port = 9090
agent.sources.syslogSrc.host = server2
# The channel can be defined as follows.
agent.sources.syslogSrc.channels = memoryChannel
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100
# config for kafka sink
agent.sinks.kafkaSink.channel = memoryChannel
agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkaSink.kafka.topic = flume
agent.sinks.kafkaSink.kafka.bootstrap.servers = <kafka.broker.list>:9092
agent.sinks.kafkaSink.kafka.flumeBatchSize = 20
agent.sinks.kafkaSink.kafka.producer.acks = 1
agent.sinks.kafkaSink.kafka.producer.linger.ms = 1
agent.sinks.kafkaSink.kafka.producer.compression.type = snappy
But, somehow logsys is not getting injected into flume agent.
appricate for your advice.
I have setup a syslog server on an server so-called (server1)
The syslogudp Source must bind to server1 host
agent.sources.syslogSrc.host = server1
then forward all messages to different server (server2)
the different server refers to the Sink
agent.sinks.kafkaSink.kafka.bootstrap.servers = server2:9092
Flume agent is only a process that hosts these components (Source, Sink, Channel) to facilitate the flow of events.

Oauth2 Cygnus and Cosmos Sink doesn't seems to work

Since the last update , i haven't been able to upload my data to Cosmos using Cygnus . I am aware that we now need to use Oauth2 token to do it . So i did the request for the token .
curl -k -X POST "https://cosmos.lab.fiware.org:13000/cosmos-auth/v1/token" -H "Content-Type: application/x-www-form-urlencoded" -d "grant_type=password&username=guillaume.jourdain#4planet.eu&password=XXXXX"
I get a token, but then i try to check the token :
curl -X GET "http://cosmos.lab.fiware.org:14000/webhdfs/v1/guillaume.jourdain/hostabee?op=liststatus&user.name=guillaume.jourdain#4planet.eu" -H "X-Auth-Token: TheToken"
and even this :
curl -X GET "http://cosmos.lab.fiware.org:14000/webhdfs/v1/guillaume.jourdain/hostabee?op=liststatus&user.name=guillaume.jourdain" -H "X-Auth-Token: TheToken"
And Everytime , for each of this command and for all the token I Tried i get this :
User token not authorized
Next i tried to put the oauth parameter in my cygnus conf file and this occured everytime :
2015-07-17 16:17:17,797 (lifecycleSupervisor-1-1) [INFO - es.tid.fiware.orionconnectors.cosmosinjector.hdfs.HttpFSBackend.createDir(HttpFSBackend.java:71)] HttpFS response: HTTP/1.1 401 Unauthorized
2015-07-17 16:17:17,798 (lifecycleSupervisor-1-1) [ERROR - es.tid.fiware.orionconnectors.cosmosinjector.OrionHDFSSink.start(OrionHDFSSink.java:108)] The directory could not be created in HDFS. HttpFS response: 401 Unauthorized
So yeah , for the moment i'm kinda stuck . Do you have any information for me to resolve this problem ?
EDIT :
Here's my Cygnus Configuration file , maybe the problem is located here
APACHE_FLUME_HOME/conf/cygnus.conf
orionagent.sources = http-source
orionagent.sinks = hdfs-sink
orionagent.channels = notifications
# Flume source, must not be changed
orionagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# channel name where to write the notification events
orionagent.sources.http-source.channels = notifications
# listening port the Flume source will use for receiving incoming notifications
orionagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
orionagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# regular expression for the orion version the notifications will have in their headers
orionagent.sources.http-source.handler.orion_version = 0\.23\.*
# URL target
orionagent.sources.http-source.handler.notification_target = /notify
# channel name from where to read notification events
orionagent.sinks.hdfs-sink.channel = notifications
# Flume sink that will process and persist in HDFS the notification events, must not be changed
orionagent.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.OrionHDFSSink
# IP address of the Cosmos deployment where the notification events will be persisted
orionagent.sinks.hdfs-sink.cosmos_host = 130.206.80.46
# port of the Cosmos service listening for persistence operations; 14000 for httpfs, 50070 for webhdfs and free choice for inifinty
orionagent.sinks.hdfs-sink.cosmos_port = 14000
# username allowed to write in HDFS (/user/myusername)
orionagent.sinks.hdfs-sink.cosmos_username = guillaume.jourdain
# dataset where to persist the data (/user/myusername/mydataset)
orionagent.sinks.hdfs-sink.cosmos_password = XXXXX
orionagent.sinks.hdfs-sink.cosmos_dataset = hostABee
orionagent.sinks.hdfs-sink.attr_persistence = column
orionagent.sinks.hdfs-sink.hive_host = 130.206.80.46
orionagent.sinks.hdfs-sink.hive_port = 10000
orionagent.sinks.hdfs-sink.oauth2_token = TheTOKEN
# HDFS backend type (webhdfs, httpfs or infinity)
orionagent.sinks.hdfs-sink.hdfs_api = webhdfs
# channel name
orionagent.channels.notifications.type = memory
# capacity of the channel
orionagent.channels.notifications.capacity = 1000
# amount of bytes that can be sent per transaction
orionagent.channels.notifications.transactionCapacity = 100
Now I get this error (and others). The sink and the handlers does'nt seems to be found
2015-07-27 14:27:10,562 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: hdfs-sink, type: com.telefonica.iot.cygnus.sinks.OrionHDFSSink
2015-07-27 14:27:10,562 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:142)] Failed to load configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to load sink type: com.telefonica.iot.cygnus.sinks.OrionHDFSSink, class: com.telefonica.iot.cygnus.sinks.OrionHDFSSink
at org.apache.flume.sink.DefaultSinkFactory.getClass(DefaultSinkFactory.java:69)
at org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:415)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.ClassNotFoundException: com.telefonica.iot.cygnus.sinks.OrionHDFSSink
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.flume.sink.DefaultSinkFactory.getClass(DefaultSinkFactory.java:67)
... 12 more
Thank you for reading .
Regarding the WebHDFS command for listing a HDFS folder:
curl -X GET "http://cosmos.lab.fiware.org:14000/webhdfs/v1/guillaume.jourdain/hostabee?op=liststatus&user.name=guillaume.jourdain#4planet.eu" -H "X-Auth-Token: TheToken"
The user.name should be user.name=guillaume.jourdain (without the #4planet.eu part).
Regarding Cygnus, have you upgraded to 0.8.2? It is the only Cygnus version supporting OAuth2. I guess you did not upgrade because of the es.tid.fiware.orionconnectors.cosmosinjector.OrionHDFSSink logs (those packages are previous to 0.8.0). You have all the details for upgrading here.

Flume-ng agent not starting and giving nullpointer exception

I am writing custom source sink and channel and my config file is like
agent.sources = source
agent.sinks = sink
agent.channels = channel
agent.sources.source.type = com.flume.FlumeSource
agent.sources.source.channels = channel
agent.channels.channel.type = com.flume.FlumeChannel$Builder
agent.channels.channel.type = file
agent.sinks.sink.type = com.flume.FlumeSink
agent.sinks.sink.hdfs.path = <hdfs path>
agent.sources.source.channels = channel
agent.sinks.sink.channel = channel
I am trying to start the agent by adding the jar to the flume classpath using the command
bin/flume-ng agent --conf-file flume.config --classpath /usr/lib/flume-ng/agent.jar --name nab-agent -Dflume.root.logger=DEBUG,console`
It says the hdfs path is not present where as it exists. It gives NullPointerException saying hostname does not exist. and main class is not found in org.apache.flume.node.Application

Resources