Nebula Graph fails on CentOS 6.5 - centos6.5

Nebula Graph fails on CentOS 6.5, the error message is as follows:
# storage log
Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
# meta log
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0415 22:32:38.944437 15532 AsyncServerSocket.cpp:762] failed to set SO_REUSEPORT on async server socket Protocol not available
E0415 22:32:38.945001 15510 ThriftServer.cpp:440] Got an exception while setting up the server: 92failed to bind to async server socket: [::]:0: Protocol not available
E0415 22:32:38.945057 15510 RaftexService.cpp:90] Setup the Raftex Service failed, error: 92failed to bind to async server socket: [::]:0: Protocol not available
E0415 22:32:38.949586 15463 NebulaStore.cpp:47] Start the raft service failed
E0415 22:32:38.949597 15463 MetaDaemon.cpp:88] Nebula store init failed
E0415 22:32:38.949796 15463 MetaDaemon.cpp:215] Init kv failed!
Nebula service status is as follows:
[root#redhat6 scripts]# ./nebula.service status all
[WARN] The maximum files allowed to open might be too few: 1024
[INFO] nebula-metad: Exited
[INFO] nebula-graphd: Exited
[INFO] nebula-storaged: Running as 15547, Listening on 44500

Reason for error: CentOS 6.5 system kernel version is 2.6.32, which is less than 3.9. However, SO_REUSEPORT only supports Linux 3.9 and above.
Upgrading the system to CentOS 7.5 can solve the problem by itself.

Related

IoTEdge on K8S, Could not initialize module runtime

I'm running iotedge on kubernetes.
The K8S cluster is a local cluster setup largely using the "Kubernetes the hard way" method, with some modifications.
I did manage to get things working on one installation. However, I'm now getting this on another installation. The initial installation works fine, but after shutting down a machine to simulate a hardware failure, the pod gets recreated, but starts to show this error again. This error happens EVEN if the node shutdown is NOT the one iotedged is running on.
Environment
3 Nodes running Ubuntu 20.04 LTS
Two networks on each node, one for the internet, one for an internal network. K8S is setup using the internal, static IP address
HAProxy/Keepalived for HA without a load balancer, running on a Virtual IP address
Multus CNI for attaching pods to additional networks
CoreDNS
Troubleshooting
Confirmed that CoreDNS seems to be functioning fine, and is able to resolve internal and external addresses
Remaining nodes are able to ping pods on other nodes
Deleting the iotedged pod and allowing k8s to recreate it works, but then edgeAgent an edgeHub have errors until I delete/recreate them as well
Re-run the entire k8s installation. Initial installation works fine, but simulating machine failure continues to be problematic.
Kubernetes Versions:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
edgeiotd error:
<6>2021-07-09T22:00:02Z [INFO] - Starting Azure IoT Edge Security Daemon - Kubernetes mode
<6>2021-07-09T22:00:02Z [INFO] - Version - 1.1.3
<6>2021-07-09T22:00:02Z [INFO] - Using config file: /etc/iotedged/config.yaml
<6>2021-07-09T22:00:02Z [INFO] - Configuring /var/lib/iotedge as the home directory.
<6>2021-07-09T22:00:02Z [INFO] - Configuring certificates...
<6>2021-07-09T22:00:02Z [INFO] - Transparent gateway certificates not found, operating in quick start mode...
<6>2021-07-09T22:00:02Z [INFO] - Finished configuring provisioning environment variables and certificates.
<6>2021-07-09T22:00:02Z [INFO] - Initializing hsm...
<6>2021-07-09T22:00:02Z [INFO] - Finished initializing hsm.
<6>2021-07-09T22:00:02Z [INFO] - Provisioning edge device...
<6>2021-07-09T22:00:02Z [INFO] - Starting provisioning edge device via manual mode using a device connection string...
<6>2021-07-09T22:00:02Z [INFO] - Manually provisioning device "********" in hub "********.azure-devices.net"
<6>2021-07-09T22:00:02Z [INFO] - Finished provisioning edge device.
<6>2021-07-09T22:00:02Z [INFO] - Initializing the module runtime...
<6>2021-07-09T22:00:02Z [INFO] - Attempting to use config from /home/edgeletuser/.kube/config file.
<6>2021-07-09T22:00:02Z [INFO] - Using in-cluster config
<3>2021-07-09T22:00:34Z [ERR!] - The daemon could not start up successfully: Could not initialize module runtime
<3>2021-07-09T22:00:34Z [ERR!] - caused by: Could not initialize kubernetes module runtime
<3>2021-07-09T22:00:34Z [ERR!] - caused by: HTTP response error: SelfSubjectAccessReviewCreate
<3>2021-07-09T22:00:34Z [ERR!] - caused by: Hyper HTTP error
<3>2021-07-09T22:00:34Z [ERR!] - caused by: error trying to connect: Connection timed out (os error 110)
<6>2021-07-09T22:00:02Z [INFO] (/project/hsm-sys/azure-iot-hsm-c/src/hsm_log.c:log_init:41) Initialized logging
edgeHub Logs after recreating iotedged:
2021-08-18 19:05:40 Starting Edge Hub
2021-08-18 19:05:40.481 +00:00 Edge Hub Main()
<7> 2021-08-18 19:05:40.609 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to http://localhost:35001/ to CreateServerCertificateAsync
<7> 2021-08-18 19:05:40.912 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Error when getting an Http response from http://localhost:35001/ for CreateServerCertificateAsync
HTTP Response:
{"message":"Module not found"}
Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.IoTEdgedException`1[Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.ErrorResponse]: Not Found
at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.CreateServerCertificateAsync(String api_version, String name, String genid, ServerCertificateRequest request, CancellationToken cancellationToken) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 624
at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter[T](Task`1 task, TimeSpan timeout) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 126
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 59
Unhandled exception. System.AggregateException: One or more errors occurred. (Error calling CreateServerCertificateAsync: Module not found)
---> Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadCommunicationException- Message:Error calling CreateServerCertificateAsync: Module not found, StatusCode:404, at: at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.HandleException(Exception ex, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 109
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 77
at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.CreateServerCertificateAsync(String hostname, DateTime expiration) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 35
at Microsoft.Azure.Devices.Edge.Util.CertificateHelper.GetServerCertificatesFromEdgelet(Uri workloadUri, String workloadApiVersion, String workloadClientApiVersion, String moduleId, String moduleGenerationId, String edgeHubHostname, DateTime expiration) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/CertificateHelper.cs:line 260
at Microsoft.Azure.Devices.Edge.Hub.Service.EdgeHubCertificates.LoadAsync(IConfigurationRoot configuration, ILogger logger) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/EdgeHubCertificates.cs:line 54
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.MainAsync(IConfigurationRoot configuration) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 54
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at System.Threading.Tasks.Task`1.get_Result()
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 33
Are you still blocked? what troubleshooting steps you have tried so far? Did you check the Common issues and resolutions for Azure IoT Edge? As per the error messages Transparent gateway certificates not found, operating in quick start mode and The daemon could not start up successfully: Could not initialize module runtime looks like the setup is not configured properly. Try restarting the server and check the transparent gateway setup. Please refer the transparent gateway setup and check if you have missed anything.

HyperLedger Fabric and Docker Swarm: Handshake failed with fatal error SSL_ERROR_SSL

We are trying to establish a grpcs (TLS) connection between a docker container running API server (based on Node.js) and another docker container running peer0 from Fabric network.
All containers are orchestated by docker swarm, and both containers happen to be running on the same Linux host.
The error log thrown by API container is the following:
2021-01-07T18:27:38.110Z - error: [Remote.js]: Error: Failed to
connect before the deadline URL:grpcs://10.0.1.2:9051 Query has
completed, checking results error from query = { Error: Failed to
connect before the deadline URL:grpcs://10.0.1.2:9051
at checkState (/usr/src/app/node_modules/grpc/src/client.js:833:16) connectFailed:
true } sampleEvent ERROR : Error: 14 UNAVAILABLE: Connect Failed E0107
18:27:53.602719124 16 ssl_transport_security.cc:1229] Handshake
failed with fatal error SSL_ERROR_SSL: error:14090086:SSL
routines:ssl3_get_server_certificate:certificate verify failed.
And the error log thrown from peer0 is:
2021-01-07 18:50:22.224 UTC [core.comm] ServerHandshake -> ERRO 043 TLS handshake failed with error EOF server=PeerServer remoteaddress=10.0.1.4:46212
IP addresses layout
IP address for API container is 10.0.1.94
IP address for peer0 container is 10.0.1.3
virtual IP address for docker service peer0 is 10.0.1.2
IP address for docker swarm load balancer endpoint is 10.0.1.4
Any suggestion of where to further troubleshoot? At this point is not clear if the problem is with the docker swarm internal networking, or an issue with ssl certificates in either side of the network.
UPDATE Feb 2 2021
The original TLS handshake error was fixed by upgrading the javascript used in NodeSDK. Among other things we started using the addToWallet.js script contained in the commercial-paper example
After being able to stablish TLS succesfully between Node.js API and peer0, we get a new access denied error when making a simple query to chaincode_example02
Facts:
We are running the query with 2 Admin users
One Admin is first-network original Admin#org1.example.com, with credentials generated by cryptogen tool
The other Admin is Admin#buyer.dlt.com whose credentials were created with openssl and a self signed in-company CA
From CLI, both Admin are good and are allowed to run peer commands interchangeably
From Node.js app, only Admin#org1.example.com is allowed to run queries. The message printed to console.log is:
Transaction has been evaluated, result is: 100
When running queries with Admin#buyer.dlt.com we get the following error logs:
Error logs from peer0#buyer.dlt.com
2021-02-02T04:08:45.291086617Z ^[[36m2021-02-02 04:08:45.290 UTC [protoutils] checkSignatureFromCreator -> DEBU 6e637^[[0m creator is &{BuyerMSP 8b7cc2ee996be4f7e5dbb1a4f64db67afd2ff8a2f41276c9bd7f33a2447dd9df}
2021-02-02T04:08:45.291094817Z ^[[36m2021-02-02 04:08:45.290 UTC [protoutils] checkSignatureFromCreator -> DEBU 6e638^[[0m creator is valid
2021-02-02T04:08:45.291100418Z ^[[36m2021-02-02 04:08:45.290 UTC [msp.identity] 2021-02-02T04:08:45.303821799Z ^[[33m2021-02-02 04:08:45.303 UTC [protoutils] ValidateProposalMessage -> WARN 6e63b^[[0m channel [mychannel]: creator's signature over the proposal is not valid: The signature is invalid
2021-02-02T04:08:45.303891604Z ^[[36m2021-02-02 04:08:45.303 UTC [endorser] func1 -> DEBU 6e63c^[[0m Exit: request from 10.0.1.84:52696
2021-02-02T04:08:45.303902005Z ^[[34m2021-02-02 04:08:45.303 UTC [comm.grpc.server] 1 -> INFO 6e63d^[[0m unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=10.0.1.84:52696 error="access denied: channel [mychannel] creator org [BuyerMSP]" grpc.code=Unknown grpc.call_duration=13.783655ms
Error log on console.log from script query.js:
2021-02-02T04:08:45.305Z - error: [Channel.js]: Error: 2 UNKNOWN: access denied: channel [mychannel] creator org [BuyerMSP]
2021-02-02T04:08:45.307Z - error: [Network]: _initializeInternalChannel: Unable to initialize channel. Attempted to contact 1 Peers. Last error was Error: 2 UNKNOWN: access denied: channel [mychannel] creator org [BuyerMSP]
Failed to evaluate transaction: Error: Unable to initialize channel. Attempted to contact 1 Peers. Last error was Error: 2 UNKNOWN: access denied: channel [mychannel] creator org [BuyerMSP]
In the end, this issue turned out to be two issues, in a 'russian doll like' style.
1. First issue: TLS Handshake error
This was fixed by upgrading the SDK library to the latest release
2. Second issue: Node SDK query triggers error "The signature is invalid".
The reason turned out to be that the CLI (written on Go) is using the Go crypto support which allows it to generate a signature from a hash without any knowledge of the curve used for the key. Instead, the SDK libraries used by the Node implementation require a specific curve to be specified by the code generating the signature, separately from the private key itself.
Bottom line, private keys used within Node SDK should be P-256.
As an alternative, as suggested by hyperledger dev team:
If you really must use a curve other than P-256 then you might be able
to use one of the following approaches:
-Use the off-line signing approach included in the documentation but specify an alternative curve instead of 'p256'. The supported curves
for the elliptic package documented here:
https://github.com/indutny/elliptic
-Set your own CryptoSuite implementation on the Client that underpins the Gateway object, with your own CryptoSuite.sign() implementation:
https://hyperledger.github.io/fabric-sdk-node/release-2.2/CryptoSuite.html#sign

Random failure of creating a New Cassandra Cluster using OpsCenter

OpsCenter version: 5.1.0 and
DSE Version: 4.6.0
Creating a brand new cluster by using OpsCenter directly, gives us the following error. It randomly works with the same settings but 95% of the times it fails with the same error. Opscenter is running on its own box but sharing the same Security groups as the cluster instances. For good measure, I have opened up all TCP ports to all IPs. The following is the stack trace of the error from the opscenterd.log:
*2015-03-19 10:06:12+0000 [] INFO: Starting provisioning process
2015-03-19 10:06:12+0000 [] INFO: Starting installation phase of cluster provisioning
2015-03-19 10:06:13+0000 [] WARN: HTTP request http://10.x.x.x:61621/alive? failed: Connection was refused by other side: 111: Connection refused.
2015-03-19 10:06:13+0000 [] INFO: Beginning install of OpsCenter agent to 54.x.x.x
2015-03-19 10:06:26+0000 [] WARN: HTTP request http://10.x.x.x:61621/alive? failed: Connection was refused by other side: 111: Connection refused.
2015-03-19 10:06:31+0000 [] INFO: Agent for ip 10.x.x.x is version None
2015-03-19 10:06:31+0000 [] INFO: Agent for ip 10.x.x.x is version u'5.1.0'
2015-03-19 10:07:23+0000 [] INFO: Successfully installed agent and dse on node 10.x.x.x
2015-03-19 10:07:23+0000 [] INFO: Beginning "stop" phase of cluster provisioning
2015-03-19 10:07:25+0000 [] WARN: Marking request '10.x.x.x: /ops/stop' (f6708fa2-b45f-42b4-b992-90a82b460ac7) as failed: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] WARN: Marking request 'stop stage' (0b6fcb6b-96ba-404e-a484-b4b6b167b309) as failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] WARN: Marking request 'provision' (daf1c15d-92e3-40b0-83ca-34d548ea835b) as failed: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR:
2015-03-19 10:07:25+0000 [] ERROR: Cluster provisioning failed: Exception: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR: Failed to provision cluster: Cluster provisioning failed: Exception: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] WARN: Marking request 28c021fd-d21a-4fed-bb5c-a4fe17d362e0 as failed: Cluster provisioning failed: Exception: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:41+0000 [] WARN: Unable to find a matching cluster for node with IP [u'fe80:0:0:0:2000:aff:feeb:31c7%2', u'10.x.x.x', u'0:0:0:0:0:0:0:1%1', u'127.0.0.1']; the message was [u'5.1.0', u'/1947480708/conf']. This usually indicates that an OpsCenter agent is still running on an old node that was decommissioned or is part of a cluster that OpsCenter is no longer monitoring.
Appreciate any help!
Thanks in advance
Harsha
OpCenter developer here. I make the OpsCenter provisioning features go zoom (or splat occasionally as you've seen). It is with sadness and shame that I must tell you that you're hitting a bug.
The Datastax AMI version 2.4 used by OpsCenter provisioning (https://github.com/riptano/ComboAMI/tree/2.4) does quite a bit of work at boot time via startup scripts. One of those tasks is to set up some gpg repository keys used to validate packages. Intermittently that process can fail, breaking package installs and leading to the series of errors that you're seeing. This failure is intermittent and has greatly increased in frequency recently. If you check /home/ubuntu/datastax-ami/ami.log you should see the gpg key failures that begin the rest of the failure chain.
Unfortunately, this error is pretty far down the technology stack and is difficult to manually work around. If you just need to provision a single cluster you can retry until you get a good run. Otherwise your best best is to manually launch the instances and use local provisioning to deploy dse/dsc to their private ip addresses:
Launch instances using ami-ada2b6c4 (assuming you're in us-east-1)
Make sure to add the instances to the OpsCenterSecurity group.
Make sure you have the private half of the keypair you use (you'll need it during local provisioning)
On the instance data page, hit the advanced pulldown and add the following userdata as text "--raidonly --java7"
Do a local-provisioning run against the private-ip's
Not a super-simple workaround. I wish your experience with OpsCenter this time around was more awesome. The good news is I'm on this bug and it will be fixed in an upcoming point release.
Edit: No longer necessary to manually remove /etc/security/limits.d/cassandra.conf
if its just complaining about java then install the java 7 preferably datastax wants oracle jdk and jre. you might already have java 7 and another version on your nodes but java 7 is not the default version. to change this do:
sudo update-java-alternatives -s java-7-oracle
which is a command you can script to run with ssh so you dont have to log in to each node

Failed to establish connection SQLSTATE: HY000[DataStax][Hardy] (22) Error from ThriftHiveClient: connect() failed: errno = 10061

I am using Datastax enterprise edition and in cluster one is Hadoop/Hive .I am trying to connect to hive with datastax hive odbc connector.I am getting error like :
Connector Version: V1.0.0.1007
Running connectivity tests...
Attempting connection
Failed to establish connection
SQLSTATE: HY000[DataStax][Hardy] (22) Error from ThriftHiveClient: connect() failed: errno = 10061
TESTS COMPLETED WITH ERROR
The error 10061 means connection refused
Seems like you have not started the hive service on your Analytics node therefore nothing is listening on TCP 10000
Please login into one of your DSE Analytics node and execute:
dse hive --service hiveserver
Then try again your test from your Windows system
Source: http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/ana/anaHivStrtSvr.html

Flume agent throws java.net.ConnectException: Connection refused

I have been using flume for a while now, I have got agent and collector running on same machine.
Configuration
agent: exec("/usr/bin/tail -n +0 -F /path/to/file") | agentE2ESink("hostname", 35855)
collector: collectorSource(35855) | collector(10000) { collectorSink("/hdfs/path/to/sink","name") }
Facing issues in the agent node:
2012-06-04 19:13:33,625 [naive file wal consumer-27] INFO debug.InsistentOpenDecorator: open attempt 0 failed, backoff (1000ms): Failed to open thrift event sink to hostname:35855 : java.net.ConnectException: Connection refused
2012-06-04 19:13:34,625 [logicalNode hostname-19] ERROR connector.DirectDriver: Expected ACTIVE but timed out in state OPENING
2012-06-04 19:13:34,632 [naive file wal consumer-27] INFO debug.InsistentOpenDecorator: open attempt 1 failed, backoff (2000ms): Failed to open thrift event sink to hostname:35855 : java.net.ConnectException: Connection refused
2012-06-04 19:13:36,635 [naive file wal consumer-27] INFO debug.InsistentOpenDecorator: open attempt 2 failed, backoff (4000ms): Failed to open thrift event sink to hostname:35855 : java.net.ConnectException: Connection refused
and then empty ACKs will be sent continuously
2012-06-04 19:19:56,960 [Roll-TriggerThread-0] INFO endtoend.AckListener$Empty: Empty Ack Listener began 20120604-191956958+0530.881565921235084.00000026
2012-06-04 19:20:07,043 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: closed /tmp/flume-user1/agent/hostname/writing/20120604-191956958+0530.881565921235084.00000026
I dont understand why the connection is refused. Are there any system level changes that needs to be done ?
Note: the collector is listening to the port but agent is unable to send data through the 35855 port.
Can anyone help me with this problem.
Thanks
If you are running both the agent and the collector on the same box, you should be using localhost as the address.
agentE2ESink("localhost", 35855)

Resources