Milo OPC UA - unable to connect to server from client when server restarts. server is set to 'USER_TOKEN_POLICY_USERNAME' - milo

I am having milo opcua server with USER_TOKEN_POLICY_USERNAME enabled and used UsernameIdentityValidator to set username and password.
From milo client side, I have used UsernameProvider to set setIdentityProvider.
When I run this setup everything works fine.
But when I restart opcua server, milo client won't reconnect. I'm getting below exception:
[milo-shared-thread-pool-2] Skipping validation for certificate: C=DE, ST=" ", L=Locality, OU=OrganizationUnit, O=Organization, CN=AggrServer#7aaf488fd8d6
29.01.2021 09:25:48.282+0000 INFO [m.o.serv.KafkaConsumer(1bc715b8)] [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] Sent record successfully to topic : NSCH_TEST_Data.
29.01.2021 09:26:55.681+0000 WARN [o.e.m.opcua.sdk.client.SessionFsm] [milo-shared-thread-pool-3] [2] Keep Alive failureCount=4 exceeds failuresAllowed=3
29.01.2021 09:26:55.681+0000 WARN [o.e.m.opcua.sdk.client.SessionFsm] [milo-shared-thread-pool-3] [2] Keep Alive failureCount=5 exceeds failuresAllowed=3
29.01.2021 09:26:55.682+0000 INFO [m.o.MiloConnectorRemote(7b76b59d)] [milo-shared-thread-pool-6] opc.tcp://192.168.56.101:4840: onSessionInactive: OpcUaSession{sessionId=NodeId{ns=1, id=Session:fc6fdb4f-0e8a-441d-ba25-45d067d434e7}, sessionName=OpcUa#0b8bc292754c}
29.01.2021 09:26:55.682+0000 INFO [m.o.MiloConnectorRemote(7b76b59d)] [milo-shared-thread-pool-6] opc.tcp://192.168.56.101:4840: sessionInactive: OpcUaSession{sessionId=NodeId{ns=1, id=Session:fc6fdb4f-0e8a-441d-ba25-45d067d434e7}, sessionName=OpcUa#0b8bc292754c}
29.01.2021 09:26:55.682+0000 INFO [m.o.MiloConnectorRemote(7b76b59d)] [milo-shared-thread-pool-6] opc.tcp://192.168.56.101:4840: notify Observer-opc.tcp://192.168.56.101:4840 about ConnectionEvent(state=Connecting, prevState=Connected, label=opc.tcp://192.168.56.101:4840)
29.01.2021 09:26:55.683+0000 INFO [m.opcua.OpcUaObserverImpl(754d0f4a)] [milo-shared-thread-pool-6] Observer-opc.tcp://192.168.56.101:4840: handle the event ConnectionEvent(state=Connecting, prevState=Connected, label=opc.tcp://192.168.56.101:4840)
29.01.2021 09:26:55.683+0000 INFO [m.o.OpcUaObserverImpl$ModelReadyChangeChecker(3dd6dea0)] [milo-shared-thread-pool-6] OpcUaObserverImpl-opc.tcp://192.168.56.101:4840: stop
29.01.2021 09:26:55.683+0000 INFO [m.opcua.OpcUaObserverImpl(754d0f4a)] [milo-shared-thread-pool-6] Observer-opc.tcp://192.168.56.101:4840: notify 2 listeners about ModelUnavailableEvent#1791022155[uri=opc.tcp://192.168.56.101:4840,nodesCount=0,label=Observer-opc.tcp://192.168.56.101:4840]
29.01.2021 09:26:55.683+0000 INFO [m.opcua.OpcUaObserverImpl(754d0f4a)] [DefaultDispatcher-worker-1] Observer-opc.tcp://192.168.56.101:4840: notify Subscriber-opc.tcp://192.168.56.101:4840 about ModelUnavailableEvent#1791022155[uri=opc.tcp://192.168.56.101:4840,nodesCount=0,label=Observer-opc.tcp://192.168.56.101:4840]
29.01.2021 09:26:55.683+0000 INFO [opcua.MiloSubscriber(364cd1b9)] [DefaultDispatcher-worker-1] Subscriber-opc.tcp://192.168.56.101:4840: unsubscribe 1 subscriptions
29.01.2021 09:26:55.683+0000 INFO [m.opcua.OpcUaObserverImpl(754d0f4a)] [DefaultDispatcher-worker-2] Observer-opc.tcp://192.168.56.101:4840: notify SyncProcessor-opc.tcp://192.168.56.101:4840 about ModelUnavailableEvent#1791022155[uri=opc.tcp://192.168.56.101:4840,nodesCount=0,label=Observer-opc.tcp://192.168.56.101:4840]
29.01.2021 09:26:55.683+0000 INFO [m.opcua.serv.SyncProcessor(2474528)] [DefaultDispatcher-worker-2] SyncProcessor: ignore the event ModelUnavailableEvent#1791022155[uri=opc.tcp://192.168.56.101:4840,nodesCount=0,label=Observer-opc.tcp://192.168.56.101:4840]
29.01.2021 09:26:55.686+0000 INFO [opcua.MiloSubscriber(364cd1b9)] [DefaultDispatcher-worker-1] SyncExecutor-Subscriber(364cd1b9)-opc.tcp://192.168.56.101:4840: SyncExecutor-Subscriber(364cd1b9)-opc.tcp://192.168.56.101:4840: unsubscribe, subscriptionId=1
29.01.2021 09:26:55.686+0000 INFO [opcua.MiloSubscriber(364cd1b9)] [DefaultDispatcher-worker-1] Subscriber-opc.tcp://192.168.56.101:4840: delete subscription SyncExecutor-Subscriber(364cd1b9)-opc.tcp://192.168.56.101:4840(SyncExecutor-Subscriber(364cd1b9)-opc.tcp://192.168.56.101:4840)
29.01.2021 09:27:11.685+0000 WARN [opcua.MiloSubscriber(364cd1b9)] [DefaultDispatcher-worker-1] [Subscriber-opc.tcp://192.168.56.101:4840: deleteSubscription(1) of SyncExecutor-Subscriber(364cd1b9)-opc.tcp://192.168.56.101:4840] return null, because of UaException: status=Bad_ConnectionRejected, message=io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.56.101:4840
29.01.2021 09:27:27.703+0000 WARN [o.e.m.o.s.c.s.ClientCertificateValidator$InsecureValidator] [milo-shared-thread-pool-5] Skipping validation for certificate: C=DE, ST=" ", L=Locality, OU=OrganizationUnit, O=Organization, CN=AggrServer#7aaf488fd8d6
29.01.2021 09:27:31.782+0000 WARN [o.e.m.o.s.c.s.ClientCertificateValidator$InsecureValidator] [milo-shared-thread-pool-2] Skipping validation for certificate: C=DE, ST=" ", L=Locality, OU=OrganizationUnit, O=Organization, CN=AggrServer#7aaf488fd8d6
29.01.2021 09:27:39.806+0000 WARN [o.e.m.o.s.c.s.ClientCertificateValidator$InsecureValidator] [milo-shared-thread-pool-6] Skipping validation for certificate: C=DE, ST=" ", L=Locality, OU=OrganizationUnit, O=Organization, CN=AggrServer#7aaf488fd8d6
29.01.2021 09:27:55.830+0000 WARN [o.e.m.o.s.c.s.ClientCertificateValidator$InsecureValidator] [milo-shared-thread-pool-3] Skipping validation for certificate: C=DE, ST=" ", L=Locality, OU=OrganizationUnit, O=Organization, CN=AggrServer#7aaf488fd8d6
NEW LOGS
02.02.2021 18:32:55.541+0000 WARN [opcua.MiloSubscriber(3c5d9688)] [DefaultDispatcher-worker-3] [Subscriber-opc.tcp://192.168.56.101:4840: deleteSubscription(1) of SyncExecutor-Subscriber(3c5d9688)-opc.tcp://192.168.56.101:4840] return null, because of UaException: status=Bad_ConnectionRejected, message=io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.56.101:4840
02.02.2021 18:32:55.542+0000 INFO [opcua.MiloBrowser(1d141b2d)] [DefaultDispatcher-worker-2] idNameTypeSet.nodes.size
02.02.2021 18:32:55.542+0000 INFO [m.o.OpcUaObserverImpl$ModelReadyChangeChecker(3c8bf12c)] [DefaultDispatcher-worker-2] OpcUaObserverImpl-opc.tcp://192.168.56.101:4840: exit model checking, because stopped externally
02.02.2021 18:33:59.790+0000 INFO [m.o.MiloConnectorRemote(74c9951c)] [milo-shared-thread-pool-3] opc.tcp://192.168.56.101:4840: onSessionActive: OpcUaSession{sessionId=NodeId{ns=1, id=Session:d27e7db7-4401-4f08-8c17-7bfaf9075fe4}, sessionName=OpcUa#154c9f72aa09}
02.02.2021 18:33:59.790+0000 INFO [m.o.MiloConnectorRemote(74c9951c)] [milo-shared-thread-pool-3] opc.tcp://192.168.56.101:4840: notify Observer-opc.tcp://192.168.56.101:4840 about ConnectionEvent(state=Connected, prevState=Connecting, label=opc.tcp://192.168.56.101:4840)
02.02.2021 18:33:59.790+0000 INFO [m.opcua.OpcUaObserverImpl(ff09afd)] [milo-shared-thread-pool-3] Observer-opc.tcp://192.168.56.101:4840: handle the event ConnectionEvent(state=Connected, prevState=Connecting, label=opc.tcp://192.168.56.101:4840)
02.02.2021 18:33:59.790+0000 INFO [m.o.OpcUaObserverImpl$ModelReadyChangeChecker(3c8bf12c)] [milo-shared-thread-pool-3] OpcUaObserverImpl-opc.tcp://192.168.56.101:4840: start
02.02.2021 18:33:59.790+0000 INFO [m.o.OpcUaObserverImpl$ModelReadyChangeChecker(3c8bf12c)] [milo-shared-thread-pool-3] OpcUaObserverImpl-opc.tcp://192.168.56.101:4840: modelReadyChecking=MinMaxInterval(min=10, max=30, timeUnit=SECONDS, current=10, step=3), modelChangeChecking=MinMaxInterval(min=60, max=1800, timeUnit=SECONDS, current=60, step=180), modelReadyMinNodesCount=0
02.02.2021 18:33:59.804+0000 INFO [m.o.OpcUaObserverImpl$ModelReadyChangeChecker(3c8bf12c)] [DefaultDispatcher-worker-2] OpcUaObserverImpl-opc.tcp://192.168.56.101:4840: -> check(modelReadyMinNodesCount=0,modelChangeCheckingRunning=false)
02.02.2021 18:33:59.804+0000 INFO [opcua.MiloBrowser(1d141b2d)] [DefaultDispatcher-worker-2] In nodesCount method
02.02.2021 18:33:59.817+0000 INFO [opcua.MiloBrowser(1d141b2d)] [DefaultDispatcher-worker-2] nodesCount=3605

Seems there is an issue with client/server certificate validation.
UA PKI, X509 and other is complex and hard to understand and even harder to configure properly, can't answer this with few words. If you are just starting with OPC UA, try to skip server policies and user identification until you have learned about.
Server and client will need certificates in order the decrypt or encrypt the user authentification.
But do some checks:
Check, if the client has the server certificate in its trusted path.
Check, if the server certificate has altered. The server should not regenerate its self signed certificate with each server start, but only with installation setup or administration.
Workarounds:
Disable client and/or server security checks if possible
Use another security profile, e.g. http://opcfoundation.org/UA/SecurityPolicy#None, but then you may not use user identification policies.

I think the meaningful Exception to extract from your new logs is this:
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.56.101:4840
Simple networking error. The server isn't there, isn't running, a firewall is in the way, etc...
It's not anything you're doing wrong in client code right now.

Related

Neo4j.conf restart issue: ExitOnOutOfMemoryError

When using neo4j-admin memrec there is a recommendation to activate the following in the neo4j.conf file:
dbms.jvm.additional=-XX:+ExitOnOutOfMemoryError
However if I add the line in, I am unable to restart the neo4j service at all. If I check the debug log the service does not even begin to startup. Is there an error in that particular suggested config line that prevents the service from starting before it has even checked anything else?
:~$ sudo service neo4j restart
:~$ tail -f /var/log/neo4j/debug.log
2022-01-24 14:49:26.937+0000 INFO [o.n.d.d.DefaultDatabaseManager] Stopped 'DatabaseId{21dafb04[neo4j]}' successfully.
2022-01-24 14:49:26.937+0000 INFO [o.n.d.d.DefaultDatabaseManager] Stopping 'DatabaseId{00000000[system]}'.
2022-01-24 14:49:26.938+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] [system/00000000] Requirement `Database unavailable` makes database system unavailable.
2022-01-24 14:49:26.938+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] [system/00000000] DatabaseId{00000000[system]} is unavailable.
2022-01-24 14:49:26.939+0000 INFO [o.n.k.d.Database] [system/00000000] Waiting for closing transactions.
2022-01-24 14:49:26.940+0000 INFO [o.n.k.d.Database] [system/00000000] All transactions are closed.
2022-01-24 14:49:26.940+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] [system/00000000] Checkpoint triggered by "Database shutdown" # txId: 71 checkpoint started. ..
2022-01-24 14:49:26.956+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] [system/00000000] Checkpoint triggered by "Database shutdown" # txId: 71 checkpoint completed in 15ms
2022-01-24 14:49:26.956+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] [system/00000000] No log version pruned. The strategy used was '1 days'.
2022-01-24 14:49:26.974+0000 INFO [o.n.d.d.DefaultDatabaseManager] Stopped 'DatabaseId{00000000[system]}' successfully.

Unable to start neo4j with systemctl: 'Failed to load from plugin jar'

I've been trying to restart neo4j after adding new data on an EC2 instance. I stopped the neo4j instance, then I called systemctl start neo4j, but when I call cypher-shell it says Connection refused, and connection to the browser port doesn't work anymore.
In the beginning I assumed it was a heap space problem, since looking at the debug.log it said there was a memory issue. I adjusted the heap space and cache settings in neo4j.conf as recommended by neo4j-admin memrec, but still neo4j won't start.
Then I assumed it was because my APOC package was outdated. My neo4j version is 3.5.6, but APOC is 3.5.0.3. I download the latest 3.5.0.4 version, but still neo4j won't start.
At last I tried chmod 777 on every file in the data/database and plugin directories and the directories themselves, but still neo4j won't start.
What's strange is when I try neo4j console for all of these attempts, both cypher-shell and the neo4j browser port works just fine. However, obviously I would prefer to be able to launch neo4j with systemctl.
Right now the only hint of error I can find in debug.log is the following:
2019-06-19 21:19:55.508+0000 INFO [o.n.i.d.DiagnosticsManager] Storage summary:
2019-06-19 21:19:55.508+0000 INFO [o.n.i.d.DiagnosticsManager] Total size of store: 3.07 GB
2019-06-19 21:19:55.509+0000 INFO [o.n.i.d.DiagnosticsManager] Total size of mapped files: 3.07 GB
2019-06-19 21:19:55.509+0000 INFO [o.n.i.d.DiagnosticsManager] --- STARTED diagnostics for KernelDiagnostics:StoreFiles
END ---
2019-06-19 21:19:55.509+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Fulfilling of requirement 'Database available' mak
es database available.
2019-06-19 21:19:55.509+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Database is ready.
2019-06-19 21:19:55.568+0000 INFO [o.n.k.i.DatabaseHealth] Database health set to OK
2019-06-19 21:19:56.198+0000 WARN [o.n.k.i.p.Procedures] Failed to load `apoc.util.s3.S3URLConnection` from plugin jar `
/var/lib/neo4j/plugins/apoc-3.5.0.4-all.jar`: com/amazonaws/ClientConfiguration
2019-06-19 21:19:56.199+0000 WARN [o.n.k.i.p.Procedures] Failed to load `apoc.util.s3.S3Aws` from plugin jar `/var/lib/n
eo4j/plugins/apoc-3.5.0.4-all.jar`: com/amazonaws/auth/AWSCredentials
2019-06-19 21:19:56.200+0000 WARN [o.n.k.i.p.Procedures] Failed to load `apoc.util.s3.S3Aws$1` from plugin jar `/var/lib
/neo4j/plugins/apoc-3.5.0.4-all.jar`: com/amazonaws/services/s3/model/S3ObjectInputStream
2019-06-19 21:19:56.207+0000 WARN [o.n.k.i.p.Procedures] Failed to load `apoc.util.hdfs.HDFSUtils$1` from plugin jar `/v
ar/lib/neo4j/plugins/apoc-3.5.0.4-all.jar`: org/apache/hadoop/fs/FSDataInputStream
2019-06-19 21:19:56.208+0000 WARN [o.n.k.i.p.Procedures] Failed to load `apoc.util.hdfs.HDFSUtils` from plugin jar `/var
/lib/neo4j/plugins/apoc-3.5.0.4-all.jar`: org/apache/hadoop/fs/FSDataOutputStream
...
...
...
2019-06-19 21:20:00.678+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutting down database.
2019-06-19 21:20:00.679+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-06-19 21:20:00.679+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Database is unavailable.
2019-06-19 21:20:00.684+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Database shutdown" # txId: 1
checkpoint started...
2019-06-19 21:20:00.704+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Database shutdown" # txId: 1
checkpoint completed in 20ms
2019-06-19 21:20:00.705+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in vers
ion 0
2019-06-19 21:20:00.725+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics START ---
2019-06-19 21:20:00.725+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics END ---
2019-06-19 21:20:00.725+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-06-19 21:20:05.875+0000 INFO [o.n.g.f.m.e.CommunityEditionModule] No locking implementation specified, defaulting
to 'community'
2019-06-19 21:20:06.080+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Creating database.
2019-06-19 21:20:06.154+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Requirement `Database available` makes database unavailable.
2019-06-19 21:20:06.156+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Database is unavailable.
2019-06-19 21:20:06.183+0000 INFO [o.n.i.d.DiagnosticsManager] --- INITIALIZED diagnostics START ---
I think the warning isn't an issue, since it's just a warning and not an error or exception. Also it seems that the database just shuts down automatically, and then restarts, creating an infinite loop. This loop does not happen when I call neo4j console (all the warnings still exist in the logs). All my ports are default.
Any clue why this is happening? I've never encountered this error when I previously launched neo4j on this instance.
If it works with neo4j console but not with systemctl, you should check the rights of the Neo4j folder.
I'm pretty sure you have a problem on it, and that the systemctl doesn't run Neo4j with the same user as you

Can not enable Alwayson sql in DSE

I get this error when start Alwayson sql, tried many ways but the results still same. any ideas why?
Im using 1 cluster, 1 analytics+search center, 2 ubuntu 16.04 nodes.
INFO [ALWAYSON-SQL] 2019-02-14 11:36:01,348 ALWAYSON-SQL AlwaysOnSqlRunner.scala:304 - Shutting down AlwaysOn SQL.
INFO [ALWAYSON-SQL] 2019-02-14 11:36:01,617 ALWAYSON-SQL AlwaysOnSqlRunner.scala:328 - Set status to stopped
INFO [ALWAYSON-SQL] 2019-02-14 11:36:01,620 ALWAYSON-SQL AlwaysOnSqlRunner.scala:382 - Reserve port for AlwaysOn SQL
INFO [ALWAYSON-SQL] 2019-02-14 11:36:04,621 ALWAYSON-SQL AlwaysOnSqlRunner.scala:375 - Release reserved port
INFO [ALWAYSON-SQL] 2019-02-14 11:36:04,622 ALWAYSON-SQL AlwaysOnSqlRunner.scala:805 - Set InCluster token to DseFs client
INFO [ForkJoinPool-1-worker-1] 2019-02-14 11:36:04,650 AlwaysOnSqlRunner.scala:740 - dsefs server heartbeat response: pong
INFO [ForkJoinPool-1-worker-3] 2019-02-14 11:36:04,757 AlwaysOnSqlRunner.scala:704 - Create DseFs directory /var/log/spark/alwayson_sql
INFO [ForkJoinPool-1-worker-3] 2019-02-14 11:36:04,758 AlwaysOnSqlRunner.scala:805 - Set InCluster token to DseFs client
ERROR [ForkJoinPool-1-worker-3] 2019-02-14 11:36:04,788 AlwaysOnSqlRunner.scala:722 - Failed to check dsefs directory alwayson_sql
com.datastax.bdp.fs.model.AccessDeniedException: Insufficient permissions to path /
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:258)
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:232)
at spray.json.JsValue.convertTo(JsValue.scala:31)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:48)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:44)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:465)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at java.lang.Thread.run(Thread.java:748)
INFO [ALWAYSON-SQL] 2019-02-14 11:36:04,788 ALWAYSON-SQL AlwaysOnSqlRunner.scala:247 - ALWAYSON-SQL caused an exception in state RUNNING : com.datastax.bdp.fs.model.AccessDeniedException: Insufficient permissions to path /
com.datastax.bdp.fs.model.AccessDeniedException: Insufficient permissions to path /
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:258)
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:232)
at spray.json.JsValue.convertTo(JsValue.scala:31)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:48)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:44)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:465)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at java.lang.Thread.run(Thread.java:748)
I have seen this problem too! It was a permissions problem in dsefs! To fix, login with the root Cassandra user, and change permissions of the your alwayson log directory to the alwayson user.

Setting up Causal Cluster Fails

I am trying to setup up a Neo4J Causal Cluster with 3 cores (core only). I have three Debian servers all debian 8.5. I have installed Java 8 and Neo4J Enterprise 3.4.0 (package source deb https://debian.neo4j.org/repo stable/) on each server.
My hosts are 192.168.20.163, 192.168.20.164 and 192.168.20.165. The config is the same on each host with the obvious change for IP address. The following is for the .163 host
dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=192.168.20.163
dbms.mode=CORE
causal_clustering.expected_core_cluster_size=3
causal_clustering.minimum_core_cluster_size_at_formation=3
causal_clustering.minimum_core_cluster_size_at_runtime=3
causal_clustering.initial_discovery_members=192.168.20.163:5000,192.168.20.164:5000,192.168.20.165:5000
causal_clustering.discovery_type=LIST
causal_clustering.discovery_listen_address=192.168.20.163:5000
causal_clustering.transaction_listen_address=192.168.20.163:6000
causal_clustering.raft_listen_address=192.168.20.163:7000
The servers go through the election process but the LEADER continues to switch back to FOLLOWER and trigger a new election.
The non-leader servers or 'members' each get the following error:
ERROR [o.n.c.c.s.s.CoreStateDownloader] Store copy failed due to store
ID mismatch
The server that was started first becomes a LEADER but as indicated switches back to FOLLOWER:
2018-05-30 14:58:22.808+0000 INFO [o.n.c.c.c.RaftMachine] Moving to CANDIDATE state after successfully starting election
2018-05-30 14:58:22.825+0000 INFO [o.n.c.m.SenderService] Creating channel to: [192.168.20.165:7000]
2018-05-30 14:58:22.827+0000 INFO [o.n.c.m.SenderService] Creating channel to: [192.168.20.164:7000]
2018-05-30 14:58:22.838+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Scheduling handshake (and timeout) local null remote null
2018-05-30 14:58:22.848+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Scheduling handshake (and timeout) local null remote null
2018-05-30 14:58:22.861+0000 INFO [o.n.c.m.SenderService] Connected: [id: 0x2ee2e930, L:/192.168.20.163:50169 - R:/192.168.20.165:7000]
2018-05-30 14:58:22.862+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Initiating handshake local /192.168.20.163:50169 remote /192.168.20.165:7000
2018-05-30 14:58:22.863+0000 INFO [o.n.c.m.SenderService] Connected: [id: 0x3d670ef3, L:/192.168.20.163:38239 - R:/192.168.20.164:7000]
2018-05-30 14:58:22.863+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Initiating handshake local /192.168.20.163:38239 remote /192.168.20.164:7000
2018-05-30 14:58:22.928+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Installing: ProtocolStack{applicationProtocol=RAFT_1, modifierProtocols=[]}
2018-05-30 14:58:22.929+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Installing: ProtocolStack{applicationProtocol=RAFT_1, modifierProtocols=[]}
2018-05-30 14:58:22.965+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:7000 remote /192.168.20.164:41725
2018-05-30 14:58:23.036+0000 INFO [o.n.c.c.c.RaftMachine] Moving to LEADER state at term 111 (I am MemberId{fbdff840}), voted for by [MemberId{4fe121e0}]
2018-05-30 14:58:23.036+0000 INFO [o.n.c.c.c.s.RaftState] First leader elected: MemberId{fbdff840}
2018-05-30 14:58:23.044+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Starting log shipper: MemberId{f202d023}[matchIndex: -1, lastSentIndex: 0, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:23.045+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Starting log shipper: MemberId{4fe121e0}[matchIndex: -1, lastSentIndex: 0, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:23.045+0000 INFO [o.n.c.c.c.m.RaftMembershipChanger] Idle{}
2018-05-30 14:58:23.046+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Leader MemberId{fbdff840} updating leader info for database default and term 111
2018-05-30 14:58:24.105+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:6000 remote /192.168.20.164:58041
2018-05-30 14:58:26.841+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:7000 remote /192.168.20.165:48317
2018-05-30 14:58:30.881+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:6000 remote /192.168.20.165:47015
2018-05-30 14:58:38.462+0000 INFO [o.n.c.c.c.m.MembershipWaiter] Leader commit unknown
2018-05-30 14:58:40.411+0000 INFO [o.n.c.c.c.RaftMachine] Moving to FOLLOWER state after not receiving heartbeat responses in this election timeout period. Heartbeats received: []
2018-05-30 14:58:40.411+0000 INFO [o.n.c.c.c.s.RaftState] Leader changed from MemberId{fbdff840} to null
2018-05-30 14:58:40.412+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Stopping log shipper MemberId{f202d023}[matchIndex: -1, lastSentIndex: 3, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:40.413+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Stopping log shipper MemberId{4fe121e0}[matchIndex: -1, lastSentIndex: 3, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:40.413+0000 INFO [o.n.c.c.c.m.RaftMembershipChanger] Inactive{}
2018-05-30 14:58:40.413+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Step down event detected. This topology member, with MemberId MemberId{fbdff840}, was leader in term 111, now moving to follower.
2018-05-30 14:58:48.342+0000 INFO [o.n.c.c.c.RaftMachine] Election timeout triggered
Eventually servers fail with:
ERROR [o.n.c.c.c.m.MembershipWaiterLifecycle] Server failed to join cluster within catchup time limit [600000 ms]
Based on the messages you have I assume you are trying to seed the cluster with a backup from somewhere ? Here's what you should do :
Check if the cluster forms correctly with no seeding (so with an empty database). That way you verify if all your settings are correct.
When seeding the cluster with a backup you need to neo4j-admin unbind the database on each of the instances before starting. Check https://neo4j.com/docs/operations-manual/current/clustering/causal-clustering/seed-cluster/ to find out the specific instructions for your case. The store ID mismatch is what you get if you don't unbind.
If 1. and 2. don't solve your problem, check with Neo4j support (since you are using the EE I assume you do have support).
Hope this helps.
Regards,
Tom

DATASTAX OPSCENTER 6.0 not able to connect to DSE 4.8 cluster

I am trying to connect opscenter to DSE cluster, I tried and verified the same configurations with in the other environments (DEV, STAGING) but while I am trying to do the same on Prod, I am getting these error on agent.log file
INFO [async-dispatch-64] 2017-12-14 18:29:24,728 Starting system.
INFO [async-dispatch-64] 2017-12-14 18:29:24,730 Starting
JMXComponent
WARN [async-dispatch-64] 2017-12-14 18:29:24,732 Exception while
processing JMX data: java.lang.SecurityException: Authentication
failed! Credentials required
ERROR [async-dispatch-64] 2017-12-14 18:29:24,732 Security failure
connecting to JMX: Authentication failed! Credentials required
INFO [async-dispatch-64] 2017-12-14 18:29:24,733 Starting
StompComponent
INFO [async-dispatch-64] 2017-12-14 18:29:24,733 SSL communication is
disabled
INFO [async-dispatch-64] 2017-12-14 18:29:24,733 Creating stomp
connection to x.y.z.x:61620
ERROR [async-dispatch-64] 2017-12-14 18:29:24,736 Dec 14, 2017
6:29:24 PM org.jgroups.client.StompConnection connect
INFO: Connected to x.y.z.x:1234
WARN [async-dispatch-64] 2017-12-14 18:29:29,738 Attempted to ping
opscenterd on stomp but did not receive a reply in time, will retry
again later.
ERROR [StompConnection receiver] 2017-12-14 18:29:29,740 Dec 14, 2017
6:29:29 PM org.jgroups.client.StompConnection run
SEVERE: JGRP000112: Connection closed unexpectedly:
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:224)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.jgroups.util.Util.readLine(Util.java:2825)
at org.jgroups.protocols.STOMP.readFrame(STOMP.java:240)
at org.jgroups.client.StompConnection.run(StompConnection.java:274)
at java.lang.Thread.run(Thread.java:745)
INFO [async-dispatch-64] 2017-12-14 18:29:29,742 Starting
JMXComponent
WARN [async-dispatch-64] 2017-12-14 18:29:29,744 Exception while
processing JMX data: java.lang.SecurityException: Authentication
failed! Credentials required
ERROR [async-dispatch-64] 2017-12-14 18:29:29,744 Security failure
connecting to JMX: Authentication failed! Credentials required
INFO [async-dispatch-64] 2017-12-14 18:29:29,744 Starting
JMXComponent
WARN [async-dispatch-64] 2017-12-14 18:29:29,746 Exception while
processing JMX data: java.lang.SecurityException: Authentication
failed! Credentials required
ERROR [async-dispatch-64] 2017-12-14 18:29:29,746 Security failure
connecting to JMX: Authentication failed! Credentials required
Also the opscenterd.log show no errors for the cluster.
The configuration that i used are below.
I have used address.yaml
stomp_interface: x.x.x.x
use_ssl: 0
metrics_enabled: 1
hosts: [x.x.x.x, x.x.x.x]
storage_keyspace: opscenter_abc
cluster_name.conf
[jmx]
port = 7199
password = abc
username = abc
[cassandra]
seed_hosts = x.x.x.x , x.x.x.x , x.x.x.x
api_port = 9160
cql_port = 9042
password = abc
username = fabe
[storage_cassandra]
seed_hosts = x.x.x.x, x.x.x.x
api_port = 9160
cql_port = 9042
keyspace = opscenter_abc
[cassandra_metrics]
5min_ttl = 2419200
[cluster_display_options]
display_name = badkfj

Resources