how to configure apache flume agent with webserver for window - flume

I am able to configure an agent for window but i have a confusion regarding connectivity between web servers logs with agent.
1: How to connect web server with agent ?
2: while starting flume.bat file. It is generating flume.log file in which i am getting below mentioned Exception.
org.apache.flume.conf.ConfigurationException: No channel configured for sink: hdfssink
at org.apache.flume.conf.sink.SinkConfiguration.configure(SinkConfiguration.java:51)
atorg.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:661)

1.The data flow is as below
your application (or web server) --> source --> channel --> sink
Now, the data can flow from your webserver to the source either by "pull" mechanism or "push" mechanism. In your case, you can either tail the webserver logs or use a spooling source.
2.This looks like a misconfiguration issue. You need to post your config file to figure out the issue

Related

How to set credentials in bindings file for JMSInput Node

I am using the JMSInput node in an IIB flow to connect with Rabbitmq. Locally it's working fine with a binding file, but how/where to set login credentials for a remote rabbitmq server?
You will have to use JMS Node and create your own JMSProviders in configurable services.

kafka connect in distributed mode is not generating logs specified via log4j properties

I have been using Kafka Connect in my work setup for a while now and it works perfectly fine.
Recently I thought of dabbling with few connectors of my own in my docker based kafka cluser with just one broker (ubuntu:18.04 with kafka installed) and a separate node acting as client for deploying connector apps.
Here is the problem:
Once my broker is up and running, I login to the client node (with no broker running,just the vanilla kafka installation), i setup the class path to point to my connector libraries. Also the KAFKA_LOG4J_OPTS environment variable to point to the location of log file to generate with debug mode enabled.
So every time i start the kafka worker using command:
nohup /opt//bin/connect-distributed /opt//config/connect-distributed.properties > /dev/null 2>&1 &
the connector starts running, but I don't see the log file getting generated.
I have tried several changes but nothing works out.
QUESTIONS:
Does this mean that connect-distributed.sh doesn't generate the log file after reading the variable
KAFKA_LOG4J_OPTS? and if it does, could someone explain how?
NOTE:
(I have already debugged the connect-distributed.sh script and tried the options where daemon mode is included and not included, by default if KAFKA_LOG4J_OPTS is not provided, it uses the connect-log4j.properties file in config directory, but even then no log file is getting generated).
OBSERVATION:
Only when I start zookeeper/broker on the client node, then provided KAFKA_LOG4J_OPTS value is picked and logs start getting generated but nothing related to the Kafka connector. I have already verified the connectivity b/w the client and the broker using kafkacat
The interesting part is:
The same process i follow in my workpalce and logs start getting generated every time the worker (connnect-distributed.sh) is started, but I haven't' been to replicate the behaviors in my own setup). And I have no clue what I am missing here.
Could someone provide some reasoning, this is really driving me mad.

Spring Cloud Data Flow Local Server + Skipper Server: Error after undeploying streams

I trying to manage my streams on spring cloud data flow with skipper server.
I followed the instruction here:
https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-deploying-spring-cloud-dataflow
The app registration and stream definition/deployment goes quite well, but after I undeploy the deployed stream, I can't see any stream on the dashboard any more.
The dashboard shows an error instead:
Could not parse Skipper Platform Status JSON:null
I have to restart the scdf server and skipper server in order to see my stream definition again.
The version of the components are:
scdf local server: 1.6.0.RELEASE
skipper server: 1.0.8.RELEASE
metrics collector: kafka-10-1.0.0.RELEASE
Some operation details:
I registered my app using scdf shell in skipper mode.
I defined and depolyed my stream on the scdf dashboard. I undeployed the stream via the stop button on the dashboard, too.
How should I solve this problem?
We have recently observed this on our side, too, and it has been fixed! [see spring-cloud/spring-cloud-dataflow#2361]
We are preparing for a 1.6.1 GA release, but in the meantime, please feel free to pull the 1.6.1.BUILD-SNAPSHOT from Spring repo and give it a go.
If you see any other anomaly, it'd be great to have a bug report on the matter.

Flume agent: How flume agent gets data from a webserver located in different physical server

I am trying to understand Flume and referring to the official page of flume at flume.apache.org
In particular, referring to this section, I am bit confused in this.
Do we need to run the flume agent on the actual webserver or can we run flume agents in a different physical server and acquire data from webserver?
If above is correct, then how flume agent gets the data from webserver logs? How can webserver make its data available to the flume agent ?
Can anyone help understand this?
The Flume agent must pull data from a source, publish to a channel, which then writes to a sink.
You can install Flume agent in either a local or remote configuration. But, keep in mind that having it remote will add some network latency to your event processing, if you are concerned about that. You can also "multiplex" Flume agents to have one remote aggregation agent, then individual local agents on each web server.
Assuming a flume agent is locally installed using a Spooldir or exec source, it'll essentially tail any file or run that command locally. This is how it would get data from logs.
If the Flume agent is setup as a Syslog or TCP source (see Data ingestion section on network sources), then it can be on a remote machine, and you must establish a network socket in your logging application to publish messages to the other server. This is a similar pattern to Apache Kafka.

What does "Jenkins URL" means in configuration settings?

On Jenkins configuration page in section "Jenkins URL" I've set this option to "http://name_of_my_machine.jenkins:8080/"
Usually I open jenkins by: "http://localhost:8080/"
But this new option did not work for me - Jenkins does not open. So what does it mean?
Jenkins can't determine its URL on its own. So when it needs to create full links that's where the URL is taken from. In general even if you specify the wrong URL it should not affect the way Jenkins works in any significant way. It certainly has no effect on the URL that you enter in your browser to connect to Jenkins server. You can either specify http://localhost:8080 (when connecting from your machine and assuming that you started Jenkins on port 8080) or http://<machine_hostname>:8080 when connecting from anywhere.
So no matter what you specify it has no effect on connecting to Jenkins, therefore http://name_of_my_machine.jenkins:8080/ won't work, as .jenkins is not part of the name (e.g. ping name_of_my_machine.jenkins won't find the host).
Whenever Jenkins needs to create a URL that points to itself, Jenkins picks it up from the "Jenkins URL" setting in the global configuration.
Jenkins could try to guess the URL by e.g. getting the hostname and combining that with the port it is running on. But sometimes the hostname is not the same as the DNS name. And what if you have placed a front-end or proxy before Jenkins that e.g. terminates SSL connections and you would really like people to use Jenkins at https://company.com/jenkins/. Jenkins running in port 8080 cannot know about the front-end. The only reliable way for Jenkins to get the URL to itself is for an administrator setting it in Jenkins configuration.
Jenkins needs to know it's own URL when it is creating links that point back to itself. It does this e.g. when it sends out emails containing direct links to build results. Also, if you have a JNLP type slave, the slave initiates the connection to the master and the master returns a message which contains a link back to Jenkins for downloading the slave agent software.
Do you mean the option in the E-mail configuration section? This is only to generate the links in emails Jenkins sends (see the help for the option -- click the symbol with the question mark). If after changing it you cannot access your server anymore, it must be something else.

Resources