Why can quorum change when adding new nodes to Cassandra? - datastax-enterprise

I doubled the size of our cluster from 3 nodes to 6 nodes. We have RF=3. Also this is Datastax Enterprise 4.8.0 cqlsh 5.0.1 | Cassandra 2.1.9.791 | DSE 4.8.0 | CQL spec 3.2.0 | Native protocol v3
I'm rebalancing the cluster and now we're seeing some weird errors from our application:
ERROR: core.RequestProcessor.logRequest() Could not log request 4d96f06a-b465-4608-904f-2014a271d507 for brand 5954: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency QUORUM (3 replica were required but only 2 acknowledged the write)
How can 3 replica be required when RF=3? Shouldn't quorum be 2? Also, disk usage has really ballooned on the nodes. Is this expected?

Related

Neo4J ETL Tool mapping and data transfer takes a lot of time

I'm testing the Neo 4J ETL Tool with several MySQL and MSSQL databases and so far mapping and data transfer takes a long time to the point that for some I can't get the mapping done and for none I have been able to finish the data transfer. The details below:
Target Database: Neo4j Graph database version 5.1.0
Processor: Intel(R) Xeon(R) Gold 5217 CPU # 3.00GHz 2.99 GHz (2 processors)
Ram: 64GB
Windows 10
For all tests, I see the Zulu Platform x64 Architecture running with significant CPU and Memory usage.
Test 1&2: MySQL and MSSQL database stuck on data transfer
After mapping and discarding some tables, I tested both import online import modes with the default configuration (Unwind Row Size: 1000, Transaction Batch Size: 10000): The process is stuck at
Creating nodes with label DspIexGmp
since more than 24h. The table itself has a size of 100MB, the database in total a size of 400 MB. The MSSQL database is on the same machine (localhost). The MySQL on another machine in the network.
Test 3: MSSQL database stuck on mapping
For a different MSSQL database, the mapping is stuck since 10h after the following lines:
Crawling routines
Retrieved 0 routines
Not retrieving synonyms, since this was not requested
Not retrieving sequences, since this was not requested
Server is hosted on a server in the local network
The Neo4j log gives me always the same lines over and over again during either tests mapping and data transfer:
[2022-10-28 13:15:56.009] [info] Online check request:
https://dist.neo4j.org/neo4j-desktop/win/latest.yml [2022-10-28
13:15:56.073] [info] Online check response: 200 version: 1.5.2 files
And sometimes
[2022-10-28 12:57:10.949] [info] Graph App[neo4j-browser-id]: Checking 5.0.0 for suitable desktop API version
[2022-10-28 12:57:10.952] [info] Graph App[neo4j-browser-id]: Version 5.0.0 satisfied. { desktopApiVersion: '1.4.0', packageDesktopApiVersion: '^1.4.0' }
[2022-10-28 12:57:10.956] [info] Graph App[f72f3631-b844-48c0-81ec-c2591b9c0852]: Checking 1.5.1 for suitable desktop API version
[2022-10-28 12:57:10.958] [info] Graph App[f72f3631-b844-48c0-81ec-c2591b9c0852]: Version 1.5.1 satisfied. { desktopApiVersion: '1.4.0', packageDesktopApiVersion: '^1.4.0' }
[2022-10-28 12:57:10.960] [info] Graph App[neo4j-bloom-id]: Checking 2.5.1 for suitable desktop API version
[2022-10-28 12:57:10.962] [info] Graph App[neo4j-bloom-id]: Version 2.5.1 satisfied. { desktopApiVersion: '1.4.0', packageDesktopApiVersion: '^1.4.0' }
[2022-10-28 12:57:11.069] [info] Graph App[neo4j-bloom-id]: Checking 2.5.1 for suitable desktop API version
[2022-10-28 12:57:11.072] [info] Graph App[neo4j-bloom-id]: Version 2.5.1 satisfied. { desktopApiVersion: '1.4.0', packageDesktopApiVersion: '^1.4.0' }
[2022-10-28 12:57:11.078] [warn] Failed to parse manifest file for Graph-App[neo4j-bloom]. Error: Graph-App[neo4j-bloom] does not contain manifest.json
at getManifest (C:\Users\XXX\AppData\Local\Programs\Neo4j Desktop\resources\app.asar\dist\main.prod.js:3828:15)
at async LocalProcessor.resolveConfig (C:\Users\XXX\AppData\Local\Programs\Neo4j Desktop\resources\app.asar\dist\main.prod.js:4268:30)
at async LocalProcessor.downloadUpdate (C:\Users\XXX\AppData\Local\Programs\Neo4j Desktop\resources\app.asar\dist\main.prod.js:4496:45)
[2022-10-28 12:57:11.080] [warn] update version: 2.5.1 already downloaded for neo4j-bloom
[2022-10-28 12:57:38.066] [info] Online check request: https://dist.neo4j.org/neo4j-desktop/win/latest.yml
[2022-10-28 12:57:38.135] [info] Online check response: 200 version: 1.5.2
Does anyone have any ideas what can help? I am open to alternative solutions on how to import a existing database to Neo4j but have very limited knowledge about how to do so.
Thanks in advance!
If more information is required feel free to ask, I can also send the full logs both from the Tool and the Desktop Application.

Kafka Neo4j Connect - using bolt+routing protocol

I'm trying to change the protocol from bolt to bolt+routing in the Kafka Neo4j Connector configuration. However I'm facing the error
"trace":"java.lang.IllegalArgumentException: Invalid address format bolt+routing\n\tat org.neo4j.driver.internal.Scheme.validateScheme(Scheme.java:46)\n\tat org.neo4j.driver.internal.SecuritySettings.createSecurityPlan(SecuritySettings.java:64)\n\tat org.neo4j.driver.GraphDatabase.driver(GraphDatabase.java:138)\n\tat streams.kafka.connect.sink.Neo4jService.<init>(Neo4jService.kt:83)\n\tat streams.kafka.connect.sink.Neo4jSinkTask.start(Neo4jSinkTask.kt:29)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:308)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:199)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.-bash-4.3$ ps aux | grep java^C hread.run(Thread.java:748)\n"}
After reviewing somethings and I get confused on Neo4j Drivers. Currently there are 2 neo4j driver: Neo4j JDBC Driver and Neo4j Java Driver. I only see Neo4j JDBC Driver supports bolt+routing, but the Neo4j Java Driver doesn't. I see Sink Connector is using Neo4j Java Driver, wondering how we can use bolt+routing in Sink Connector.
TIA

How to apply query timeout in Community Neo4j JAVA Bolt Driver Api

How to apply Transaction timeout in Community Neo4j JAVA Bolt Driver API
I am trying to apply TransactionConfig timeout but it seems to not take effect.
session.beginTransaction(TransactionConfig.builder()
.withTimeout(Duration.ofSeconds(3))
.build())
Is there a way to automatically have threads to timeout after x seconds per query-based.
Community edition Neo4j honours configs
dbms.transaction.timeout
dbms.lock.acquisition.timeout
But is there a way to set these configs per query/transaction?

How to specify my connection to neo4j as a leader role in K8s

I deployed a 3-core (by default) Neo4j in K8s via this helm chart. I'm quite a newbie to neo4j.
I'm using neo4jrb in a Ruby on Rails project.
When I tried to connect the neo4j service to write data. I often (not always) met this error
Neo4j::Core::CypherSession::CypherError: Cypher error:
Neo.ClientError.Cluster.NotALeader: No write operations are allowed directly on this database. Writes must pass through the leader. The role of this server is: FOLLOWER
I read this article Querying Neo4j Clusters. Then I realized there is one leader and two follower core created by the helm chart. In the cypher-shell, when I run
CALL dbms.cluster.overview() YIELD id, role RETURN id, role
I got
+-----------------------------------------------------+
| id | role |
+-----------------------------------------------------+
| "acce2b2c-53ae-498c-a49b-84f42897445e" | "FOLLOWER" |
| "03cabb09-de1a-40cc-b8b0-bb02981cf551" | "FOLLOWER" |
| "1aa96add-f5cd-43a1-9fc6-2a5360668bb7" | "LEADER" |
+-----------------------------------------------------+
So I should connect to the LEADER when I try to write data. And I know a cluster can't be leader permanently. If the current leader is down, then the follower will become a new leader.
I once thought bolt+routing to a causal cluster may be an easy way to fix my issue. When I went back to the ruby client, I found it doesn't support bolt+routing for now.
What should I do now? I can't configure a LoadBalancer. I have access to writing a config for Ingress.
I'm not sure that neo4jrb supports bolt+routing.
You could try to use the java driver from graalvm's truffleruby, see:
https://github.com/michael-simons/neo4j-graalvm-polyglot-examples

Neo4j 2.3.2 cluster fails to start because of Unknown replication strategy

After upgrading to 2.3.2 I am getting the following error when starting up the cluster.
Starting getting this in 2.3.2 upgrade and neo4j cluster fails to start:
2016-01-22 00:54:42.499+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#483013b3' was successfully initialized, but failed to start. Please see attached cause exception. Starting Neo4j failed: Component
Caused by: java.lang.RuntimeException: Unknown replication strategy
at org.neo4j.kernel.ha.transaction.TransactionPropagator$1.getReplicationStrategy(TransactionPropagator.java:115)
at org.neo4j.kernel.ha.transaction.TransactionPropagator.start(TransactionPropagator.java:175)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 14 more
This issue seems to be related to the updates made to the ha.tx_push_strategy setting in conf/neo4j.properties. With this setting at ha.tx_push_strategy=fixed the error occurs. When choosing a more specific strategy i.e. ha.tx_push_strategy=fixed_ascending the error goes away and the cluster forms correctly.
The push strategy determines a tie breaker where if tx ids are the same, which server id is pushed to next. The new strategies are fixed_descending and fixed_ascending. While the default of fixed_descending is the default for this version, fixed_ascending is the better choice because the election strategy uses an ascending order when determining which instance is elected as the next master. Thus using fixed_ascending reduces the chances for branched data under certain situations.

Resources